We recently had a curious bug report from a Samba user. The bug report showed a strange hang in our provision script. Andrew Bartlett worked with the user to get a gdb backtrace, which showed that an internal heimdal library was calling out to a net_read() function in /opt/lib/libmediaclient.so. That seemed strange, as net_read() is an internal heimdal function (it is part of the roken library), so why would it be calling a “media” library?
The answer turned out to be quite bizarre! The authors of the Sundtek driver install their driver using this script. That script is easily the worst abuse of loader preload mechanisms that I have ever seen. I’m a big fan of LD_PRELOAD as a driver development aid, and a debug tool, but this “driver” goes much further.
ld.so.preload is evil
The script installs the libmediaclient.so library in /etc/ld.so.preload. It implements a “driver” for the Sundtek USB device by intercepting 48 functions (including open()/close()/read()/poll()/mmap() etc) in _all_ installed programs. For those of you who don’t know, the /etc/ld.so.preload mechanism is one of those oh so tempting things that no developer should ever use. The way it works is that any library listed in that file is “preloaded” into all binaries on the system, and overrides any of the libraries functions in all those binaries.
In this case the author of this driver decided that he would avoid writing a real driver by instead intercepting library calls from all programs on the system and faking the return values. In this case the author didn’t just intercept standard interfaces, but also intercepted a bunch of non-standard functions. The interception of net_read() is what broke Heimdal, which in turn broke Samba.
I thought it would be useful to email the author of the driver to ask them to stop using ld.so.preload. I was pleasantly surprised when a few minutes later Markus Rechberger popped up on #samba-technical to discuss the problem. I wish all driver authors were this responsive.
Unfortunately Markus had turned up to tell me that there was no other way than using ld.so.preload. He insisted that problems were rare (how would he know? it isn’t him that gets the bug reports!) and that it would be far too difficult for his users to have to run a script that uses LD_PRELOAD instead. He thinks CUSE is too unstable and its far too difficult to write a driver for lots of different kernel versions.
The conversation was quite surreal. I asked towards the end if he would mind if I posted a log, but he asked me not to. That is a pity, as I think it would make a great case study for what has gone wrong with some Linux device driver development.