the following was written in response to this post on the FoRK mailing list.
Those of you in the large-scale technology operations space will be familiar with Puppet, a long-time favorite for infrastructure automation, and Chef, a more recent entrant. Both of them are open source and both have a component responsible for system discovery: Puppet includes a tool called facter and Chef includes one called ohai[4].
System discovery is the task of collecting various facts (hence the name facter) about the system on which you are running so the rest of the automation can run from a consistent view. System discovery is abstraction, and that brings a host of questions around implementation and presentation. facter takes a minimalist approach: it returns a compact set of information and relies on a number of native C extensions (which, on might argue, pushes you towards only returning a compact set of information). ohai (now) takes a rather maximalist approach: the data returned can be quite large, for example when run on OS X with the plist gem installed, and avoids use of any native C extensions.
I cannot comment on the history or philosophy of facter, but I can do so for ohai. I wrote quite a bit of the ohai code (though I did not write the initial version, am not its maintainer, and have not contributed code for some time), and am primarily responsible for the volume of information it collects compared to similar tools (the philosophy being ‘collect it all, let the users decide what matters’). ohai began life as approximately a pure Ruby version of facter to support Chef. The data returned was similar (and similarly unstructured), the main difference being its avoidance of C extensions. The motivation for remaining pure Ruby was some combination of simplicity and a desire for consistency with the rest of Chef. Where facter uses native interfaces to collect system data, ohai relies on a lot of popen4() and regex matching. This has made ohai incredibly easy to port to new platforms, and it went from 1 (Linux) to 4 (Linux of various descriptions, Solaris, FreeBSD, and OSX) in a couple of weeks by folks more familiar with Ruby and a command line than the system-level C interfaces.
In so doing, I learned quite a bit about how command-line output varies between platforms and the issues of semantic mismatch between programming languages and the operating systems on which they run. My lessons do not lead me to conclude we are so far from an “80%” solution or that there is cause for despair.
The first contribution I made to ohai was to suggest a slight restructuring to support multiple platforms. This introduced hierarchy both in the code layout (with OS-specific plugins) and in the data output. The use of JSON for the output is the least bad option, given the common alternatives and the use of CouchDB at the server, but is not entirely satisfactory. The most bothersome issue is its lack of references. For example, I might have several IP addresses on a host, but I want one that I can refer to as its canonical address in the rest of the automation. Automatically deciding which IP to use is easy (take the primary IP address on the interface used for the default route), but indicating which address has been chosen creates a new problem: I have a top level notion of the IP address, but no way to indicate, in the data structure, where it came from.
As an example, the top level entry looks like this:
"ipaddress": "172.16.100.202"
And the actual network interface definition (which is in the network->interfaces sub-hash) looks like this:
"en1": {
"status": "active",
"flags": [
"UP",
"BROADCAST",
"SMART",
"RUNNING",
"SIMPLEX",
"MULTICAST"
],
"number": "1",
"addresses": {
"00:23:6c:90:47:10": {
"family": "lladdr"
},
"fe80::223:6cff:fe90:4710": {
"scope": "Link",
"prefixlen": "64",
"family": "inet6"
},
"172.16.100.202": {
"broadcast": "172.16.100.255",
"netmask": "255.255.255.0",
"family": "inet"
}
},
"mtu": "1500",
"media": {
"supported": [
{
"autoselect": {
"options": [
]
}
}
],
"selected": [
{
"autoselect": {
"options": [
]
}
}
]
},
"type": "en",
"arp": {
"172.16.100.1": "0:1b:c:f:90:23",
"172.16.100.201": "0:23:12:a8:2d:84",
"172.16.100.246": "0:16:cb:a9:70:4b"
},
"encapsulation": "Ethernet"
}
If I want to know where the default address came from, I have to iterate of the interfaces to find it. If I added a tag to the default interface, I then have to update in 2 places should there be a change. Storing a reference to the default interface would be a cleaner solution, but is not supported in JSON. Creating a JSON-based format that supported references seems not such a problem, it just hasn’t been done, to my knowledge (and please don’t suggest XML, it is too bloated and complex for consideration). This is minor compared to the other, big challenges, though.
The second problem, and one most clearly an issue for all languages interacting with the OS for systems work, is process management. While abstractions like threads and event callbacks are (reasonably) well understood, Unix-style process management remains just this side of a black art; look at the daemonization code in any C server code for an example. Scripting languages like Ruby and Python tend to just punt and directly expose the C process management interface, hence the use of popen4() all over the place in ohai. Mocking for testing and dealing with grandchildren and orphan processes present many, often obscure, problems. They can be dealt with, but nobody has bothered to write reasonable libraries to do this in Ruby (parts of it are now in Chef), and I am not familiar enough with Python to know what folks do there. Again, there is a semantic gap between what the OS is exposing and how the languages consume them. This gap does not really exist for the lightweight concurrency mechanisms, particularly event-based concurrency, where the language support is quite good (see EventMachine in Ruby and Twisted in Python, both of which are libraries, not language features; process management should yield to similar effort).
The third big problem I encountered was the wild variation in command output. At the amusing end of the spectrum, I received a bug report from someone running Linux with German localization and the output of ifconfig was entirely translated into German, something you are unlikely to see in a C API. Generally, the challenge in working across platforms might be summarized in this way: the more optimized a system is for direct consumption by a human operator, the harder it is to write automation that doesn’t use ‘native’ APIs. Windows is the obvious extreme example of this, but the unexpected offender here is Solaris.
Solaris is, in my estimation, the best OS core (kernel, filesystems, etc) on the market. It is also the long-time favorite of old-school sysadmins who pride themselves on knowing every last inch of their systems and only using automation to take care of certain, recurring tasks, rather than the full-auto, lights out style encouraged by Puppet and Chef. The output from things like ifconfig is optimized for them, being particularly verbose and human-readable, but extensive variation in output makes them very involved to parse (see the ifconfig man page for a taste). At another point in the space, there are things like the OSX system_profiler command that will happily generate XML output exactly for ease of consumption by code rather than people. All of which is really to say operating systems can, should, and sometimes do, expose interfaces above the level of the native C APIs, but intended for consumption by scripting tools. Things like system_profiler show one way of doing that, though the XML-ified plist output is not a winner. An OS that had ‘automation modes’ on all its system management tools would be a massive win for system language users and would, I think, not be hard (just a small matter of programming).
To sum up, I see three areas that need attention to better integrate systems languages and the operating systems on which they run:
- A simple data format that supports internal references and more data types, whether a new format or JSON convention.
- Proper process management as robust and simple as event-based concurrency.
Operating system interfaces appropriate for scripting/automation tools.
I didn’t intend this post to be quite so long, so my apologies and thanks to those of you who made it this far. It represents my experience in one, possibly representative, corner of dealing with the challenges at the interface between systems languages and systems. It is my pious hope, to quote Roger Penrose, that none of the challenges I describe above are fundamental and all could be solved with only a modicum of effort from some motivated folk. Whether they are the same sort of problems that raised Jeff Bone’s ire I can’t say, but I remain quite optimistic there isn’t cause for despair or anger in this.

