A program calling command-line tools is the moral equivalent of web scraping.

I gave this talk at LPC 2012. It promotes the idea that programs layered on top of human-centric interfaces is a bad idea.

Click to access 50cef-all-plumbing-needs-an-api.pdf

The timing of this post with the announcement of the most recent bash vulnerability is not entirely coincidental.

tgtd -> LIO kernel target in RHEL 7

RHEL 7 will be using the LIO kernel target subsystem for iSCSI and other protocols, instead of the tgtd daemon, aka scsi-target-utils. This is a change from RHEL 6, where we use tgtd for iSCSI target support, and LIO only for FCoE targets, via the fcoe-target-utils package.

Users of tgtd can prepare for RHEL 7 by trying Fedora 17 or 18, which have current LIO/targetcli code. LIO has many features tgtd doesn’t, but LIO also won’t cover 100% of tgtd’s features at first, either. File bugs for these regressions and we’ll work to address them.

Other software that currently uses tgtd may wish to look at the rtslib Python library, and targetcli’s JSON config format — LIO has a nice API so if you find yourself wanting to parse targetcli output, please hold off and email me instead, ok? 🙂

scsi-target-utils will still be available via EPEL for RHEL 7, and supported in RHEL 6 for its lifetime.

Screencast: targetd and lsmcli

I’ve whipped up a short (7min) screencast on targetd and lsmcli, two new additions to Fedora 18. targetd glues together LVM and LIO to expose a remote API for configuring a system for a storage array role. lsmcli is part of libstoragemgmt, which provides a common way to manage storage arrays from multiple vendors.

python-kmod will be in Fedora 18

python-kmod is a basic Python wrapper around the kmod library. It allows you to load, unload, and view Linux kernel modules without resorting to the subprocess module.

If you have Python code that works with kernel modules, please consider using this library in the future. If you have C code that works with modules, you should use libkmod directly! As I’ll be talking about at this years LPC, proper libraries are preferable to calling cmdline progams for low-level stuff, and now there’s one less reason to do so.

Using python-kmod, python-rtslib (and thus targetcli and targetd) now work with no use of subprocess, although rtslib will fall back to modprobe via subprocess if python-kmod is not present.

Thanks to Jiri Popelka for reviewing the python-kmod package, sorry it took me so long to fix it up 🙂

targetd update: 0.2.2

targetd is up to version 0.2.2 from 0.1, just five weeks later.

Added manpage
Added API specification. API support for multiple storage pools.
Deferred completion for long-running operations
Volume copy
Config file format changed to YAML
Saves configuration across reboots
Packaged, submitted for inclusion, and accepted for Fedora 18

If you are interested in contributing to targetd, there are two things you could help with. First, now that we have an API specification, please review it and give feedback or submit a bug.

Second, I’m having trouble implementing SSL support. I searched around and found some pages that talked about easily adding SSL support to Python’s HTTPServer, but after spending a day on it, it still didn’t work. I’m sure this is easy for someone, but that someone is not me 🙂 Anyone care to take a look?

targetd: remote administration of a Linux storage appliance

targetd is a new service that will make it easier to configure Linux machines to export block-based volumes over iSCSI or other protocol.

The Why

Virtual machines need disk images to run from. While you can put the disk image on the local storage of the machine executing the VM, there are many benefits to centralizing disk images — this lets you seamlessly migrate the VM’s execution between machines, is easier to manage, and can be more space-efficient when many of the disk images are almost the same, which is pretty common.

However, it hasn’t been nearly as easy to provision a new storage volume on the central storage box as it could be. High-end storage appliances have tools for this, but the normal way using only open-source tools has been to ssh into the machine, create the new volume and export it using command–line tools, and then go back to however you’re creating the new VM and tell it about the new volume you created.

targetd is a step towards making this process a little easier.

The What

The remote API is based on jsonrpc-2.0 over HTTP. The Python standard library does much of the work towards implementing our API server, what’s left is the actual implementation of functionality and the jsonrpc error handling.

This API will let targetd eventually tie into existing storage management tools. One such under development is libstoragemgmt, a framework by my colleague Tony Asleson, which will give virtualization tools like oVirt or OpenStack a common API for management of the many proprietary storage appliances, and also open-source appliances like targetd.

Linux has had reliable LVM-based volume management for a long time, and now has an excellent kernel-based storage target subsystem called LIO. targetd uses both of these heavily. In configuring a machine for a storage appliance role, give targetd a volume group to allocate volumes from, set user/password for access, and you’re just about done.

There is one coding pet peeve of mine that I’ve ranted about before that targetd avoids completely. targetd uses libraries to interface with LVM and LIO instead of the all-too-common alternative of passing commands to command-line tools, and parsing the output. Much of the time spent towards targetd was improving these libraries. I believe proper error propagation, reduced text parsing, and better library APIs make this bottom-up approach a long-term win.

Current state and future plans

Today’s announcement is a pre-alpha 0.1 release. I’m really hoping to get to 1.0 for Fedora 18. There is a manpage to write and SSL support to add, but much more important than the source code of targetd is the layout of the remote API itself. I’d love to get some more review of that, as well as the code, before 1.0. I have been working from what I believe are some common use-cases, but more feedback on how admins configure and use storage appliances would be most welcome too!

Please see the project on github for more info.

A hundred other languages want to call your code

The users of a hundred programming languages would like to call your low-level code, but they can’t.

Things have changed in the last 20 years. More people are using languages like Python, Ruby, and a hundred more, that are further from the bare metal. People are building service stacks that tie together many lower-level functions.

Libraries and APIs that make low-level features available to convenient high-level languages (HLLs) are a good thing. As a HL coder, it’s pretty handy to install python-foo, type “import foo” and then have access to that functionality.

What if python-foo isn’t there? HLL users are out of luck, unless they are so determined they make their own python-foo that calls system(), and then parses the output using their language’s fancy text parsing features.

But system() is the devil. We hate system(), folks. If your code calls system() it’s bad, for four reasons:

Overhead. It creates a new process and subshell.
Security. If your code has elevated privileges and is including text input by an untrusted user, watch out. Remember little Bobby Tables, a semicolon is a dangerous thing.
Ease. Parsing command-line programs’ output can be a pain, even if your language helps lessen it. Parsing of errors is even harder and prone to be overlooked.
Portability. A different platform may (or may not) have the program you’re relying on, or its output may be different, and you won’t know.

Early on when I was learning Python, I tried to write a gui for OProfile by parsing its output. OProfile did nice (for the user) things like adding headers on its output, and changing the format of output depending on what it found. Great for users, but it doomed my project. I couldn’t parse the output reliably.

You want to make it easy for the people who are language gurus for each of the hundred languages out there to wrap your functionality without having to become an expert in your code, or even change it. Then the hordes using all the hundred languages can use your library without being an expert in your code or being enough of a guru in their language to write a wrapper. They can just happily use it.

Here’s a positive development, kmod. kmod is a new implementation of the utilities in module-init-tools: modprobe, lsmod, lsmod, etc. Not only does kmod include a libkmod C library, but the commandline programs use it, so we know it works. Yeah! This made it super easy for someone (me) to come along and write a language wrapper (python-kmod) without having to know about module internals. python-kmod makes it easy for Python users to manipulate modules using the friendly language features they’re used to, like exceptions for errors, and lists. If I had been forced to use system(), it probably would have mostly worked, but it would have failed when output parsing failed for some edge case.

I encourage all low-level program writers, my fellow Linux Plumbers, to consider how to make native language bindings possible for your code. You don’t have to write them, just make them possible and you will find all sorts of people calling your code, safely, who couldn’t before.