worst abuse of preload I’ve ever seen

We recently had a curious bug report from a Samba user. The bug report showed a strange hang in our provision script. Andrew Bartlett worked with the user to get a gdb backtrace, which showed that an internal heimdal library was calling out to a net_read() function in /opt/lib/libmediaclient.so. That seemed strange, as net_read() is an internal heimdal function (it is part of the roken library), so why would it be calling a “media” library?

The answer turned out to be quite bizarre! The authors of the Sundtek driver install their driver using this script. That script is easily the worst abuse of loader preload mechanisms that I have ever seen. I’m a big fan of LD_PRELOAD as a driver development aid, and a debug tool, but this “driver” goes much further.

ld.so.preload is evil

The script installs the libmediaclient.so library in /etc/ld.so.preload. It implements a “driver” for the Sundtek USB device by intercepting 48 functions (including open()/close()/read()/poll()/mmap() etc) in _all_ installed programs. For those of you who don’t know, the /etc/ld.so.preload mechanism is one of those oh so tempting things that no developer should ever use. The way it works is that any library listed in that file is “preloaded” into all binaries on the system, and overrides any of the libraries functions in all those binaries.

In this case the author of this driver decided that he would avoid writing a real driver by instead intercepting library calls from all programs on the system and faking the return values. In this case the author didn’t just intercept standard interfaces, but also intercepted a bunch of non-standard functions. The interception of net_read() is what broke Heimdal, which in turn broke Samba.

Interesting conversation

I thought it would be useful to email the author of the driver to ask them to stop using ld.so.preload. I was pleasantly surprised when a few minutes later Markus Rechberger popped up on #samba-technical to discuss the problem. I wish all driver authors were this responsive.

Unfortunately Markus had turned up to tell me that there was no other way than using ld.so.preload. He insisted that problems were rare (how would he know? it isn’t him that gets the bug reports!) and that it would be far too difficult for his users to have to run a script that uses LD_PRELOAD instead. He thinks CUSE is too unstable and its far too difficult to write a driver for lots of different kernel versions.

The conversation was quite surreal. I asked towards the end if he would mind if I posted a log, but he asked me not to. That is a pity, as I think it would make a great case study for what has gone wrong with some Linux device driver development.

12 Responses to “worst abuse of preload I’ve ever seen”

  1. Markus Rechberger says:

    For people like you (or anyone else wanting to avoid this) we have it documented that you can remove it and use LD_PRELOAD

    http://support.sundtek.com/index.php/topic,364.msg1917.html#msg1917

    it was always like that and will remain like that until a stable and working alternative (CUSE – last time it was unstable) is available.

    The use of ld.so.preload is the comfortable way (I use it like that and many other people too). Who cannot live with it should remove it. That you cannot live with it won’t remove it from my systems.

    Technically seen the best way would be to have some direct hooks in libc for registering virtual files.

  2. Markus Rechberger says:

    CUSE unstable? Thinks? Knows!

    Go fix it if you know everything better than everyone else!

    [28778.753168] ————[ cut here ]————
    [28778.753178] WARNING: at fs/sysfs/dir.c:487 sysfs_add_one+0xc5/0×130()
    [28778.753181] Hardware name: UL80VT
    [28778.753183] sysfs: cannot create duplicate filename
    ‘/devices/virtual/cuse/foo’
    [28778.753185] Modules linked in: cuse uinput snd_hda_codec_realtek
    nfsd exportfs snd_hda_intel nfs snd_hda_codec lockd snd_pcm_oss
    snd_mixer_oss snd_hwdep nfs_acl fbcon snd_seq_dummy snd_pcm tileblit
    auth_rpcgss font bitblit softcursor snd_seq_oss snd_seq_midi
    snd_rawmidi snd_seq_midi_event i915 sunrpc arc4 snd_seq drm_kms_helper
    iwlagn snd_timer iwlcore drm uvcvideo snd_seq_device i2c_algo_bit
    videodev asus_laptop psmouse mac80211 video v4l1_compat snd output
    intel_agp atl1c led_class v4l2_compat_ioctl32 serio_raw soundcore
    btusb cfg80211 snd_page_alloc
    [28778.753239] Pid: 19677, comm: lt-cusexmp Not tainted 2.6.33 #4
    [28778.753241] Call Trace:
    [28778.753248] [] warn_slowpath_common+0×78/0xb0
    [28778.753252] [] warn_slowpath_fmt+0x3c/0×40
    [28778.753256] [] sysfs_add_one+0xc5/0×130
    [28778.753260] [] create_dir+0×63/0xb0
    [28778.753264] [] sysfs_create_dir+0×34/0×50
    [28778.753270] [] ? kobject_get+0x1a/0×30
    [28778.753273] [] kobject_add_internal+0xb7/0×200
    [28778.753277] [] kobject_add_varg+0×38/0×60
    [28778.753281] [] kobject_add+0×44/0×70
    [28778.753287] [] ? get_device_parent+0xc5/0x1b0
    [28778.753291] [] device_add+0xbb/0x5d0
    [28778.753294] [] ? dev_set_name+0x3c/0×40
    [28778.753300] [] cuse_process_init_reply+0x2a4/0×450 [cuse]
    [28778.753305] [] ? cuse_process_init_reply+0×0/0×450 [cuse]
    [28778.753311] [] request_end+0x16b/0×200
    [28778.753315] [] fuse_dev_write+0x3a2/0x4f0
    [28778.753321] [] ? __vma_link_rb+0x2b/0×30
    [28778.753325] [] ? fuse_dev_write+0×0/0x4f0
    [28778.753330] [] do_sync_readv_writev+0xcb/0×110
    [28778.753336] [] ? _raw_spin_lock_irq+0×10/0×20
    [28778.753340] [] ? __do_fault+0×409/0x4d0
    [28778.753344] [] ? security_file_permission+0×11/0×20
    [28778.753348] [] do_readv_writev+0xca/0x1f0
    [28778.753354] [] ? default_spin_lock_flags+0×9/0×10
    [28778.753358] [] ? __up_read+0xa6/0xd0
    [28778.753361] [] vfs_writev+0x3e/0×60
    [28778.753364] [] sys_writev+0x4c/0xb0
    [28778.753369] [] system_call_fastpath+0×16/0x1b
    [28778.753372] —[ end trace fba55b2689d0febd ]—
    [28778.753376] kobject_add_internal failed for foo with -EEXIST, don’t
    try to register things with the same name in the same directory.
    [28778.753381] Pid: 19677, comm: lt-cusexmp Tainted: G W 2.6.33 #4
    [28778.753383] Call Trace:
    [28778.753387] [] kobject_add_internal+0x15d/0×200
    [28778.753391] [] kobject_add_varg+0×38/0×60
    [28778.753395] [] kobject_add+0×44/0×70
    [28778.753399] [] ? get_device_parent+0xc5/0x1b0
    [28778.753402] [] device_add+0xbb/0x5d0
    [28778.753406] [] ? dev_set_name+0x3c/0×40
    [28778.753410] [] cuse_process_init_reply+0x2a4/0×450 [cuse]
    [28778.753414] [] ? cuse_process_init_reply+0×0/0×450 [cuse]
    [28778.753419] [] request_end+0x16b/0×200
    [28778.753423] [] fuse_dev_write+0x3a2/0x4f0
    [28778.753427] [] ? __vma_link_rb+0x2b/0×30
    [28778.753431] [] ? fuse_dev_write+0×0/0x4f0
    [28778.753435] [] do_sync_readv_writev+0xcb/0×110
    [28778.753439] [] ? _raw_spin_lock_irq+0×10/0×20
    [28778.753443] [] ? __do_fault+0×409/0x4d0
    [28778.753446] [] ? security_file_permission+0×11/0×20
    [28778.753450] [] do_readv_writev+0xca/0x1f0
    [28778.753454] [] ? default_spin_lock_flags+0×9/0×10
    [28778.753458] [] ? __up_read+0xa6/0xd0
    [28778.753461] [] vfs_writev+0x3e/0×60
    [28778.753465] [] sys_writev+0x4c/0xb0
    [28778.753469] [] system_call_fastpath+0×16/0x1b

  3. Domenico says:

    Yes, as surreal as seeing me buying any Sundek USB device.

  4. tridge says:

    Markus,

    I have used CUSE, and I didn’t have any instability problems. The API for general USB devices is quite horrible, as you need to jump through a lot of hoops to transfer the various pieces of the URB structure over the ioctl interface, but it is possible.

    I can quite believe there are bugs in CUSE on the systems you are testing on, in which case I’d suggest you try to fix those rather than inserting your driver into every binary on the system. I’m not interesting in fixing CUSE for you, as for what I needed it for it worked fine.

    Distributing a “driver” that uses /etc/ld.so.preload just causes pain for other Linux developers. If you can’t develop a driver using a more reasonable means then please just let other people develop a driver for you. See for example Greg’s offer at http://www.kroah.com/log/linux/free_drivers.html

  5. [...] This post was mentioned on Twitter by TK, flare2004. flare2004 said: http://blog.tridgell.net/?p=141 – I'm not surprised he is complaining [...]

  6. Joel says:

    > That is a pity, as I think it would make a great case study for what has gone wrong with some Linux device driver development.

    This isn’t Linux’s fault. It’s Sundtek’s, for hiring such an incompetent developer.

    The only thing that can protect users is this developer being fired. He’s hopelessly unqualified for his job and he’s going to permanently damage whatever reputation his company might have.

  7. Thorsten says:

    As a friend of Markus, and one of the first customers of that device in question I have to say that what you are writing is something like amateurs would write.
    Markus showed me this page a day ago and what you are writing is more than disgusting, noone of you seems to have any experience with that device (my experience is that it works better than any other device I had in the past – with any system I used.. fedora and a western digital router).

    All this nothing but draws a bad light on everything (on you and my friend)… even without any serious issue. Once again everything works for me and there seem to be many other people in the sundtek forum who are happy with it.

    Just be fair instead of lying,
    Thorsten

  8. tridge says:

    Thorsten,

    I do believe that the device works well for you. That is not the problem.

    The problem is that the approach taken, while it works for individual users, is very dangerous and inevitably leads to bugs. What is particularly bad is that those bugs don’t turn up in any code or program that Markus wrote – the bugs turn up in other programs and the developers of those programs get the blame. Markus is inserting his incorrect code into every binary on the system.

    I don’t know if you are a programmer, but if you are then perhaps this simple example will help you understand. If you use nm on libmediaclient.so then you’ll see that it intercepts dup() but not dup2(). That means there _must_ be a bug in the code.

    To understand this I’ll need to explain how preload based drivers work. A preload based driver catches open() and open64() on a specific filename in /dev, such as /dev/dsp. When it sees an open of that file it remembers the file descriptor number that the open gave, usually in a static local variable.

    Then when the program later does a fd based operation, such as read(), write(), mmap() etc, it checks the fd against the list of currently intercepted fds, and overrides the result, thus providing access to the virtual device.

    Given the above, now look at what dup2() does. dup2() is a system call that changes a fd from one number to another. So if a program first opens /dev/dsp and later uses dup2() to move it to another fd, then the Sundtek driver will not properly intercept later operations on that fd. Even worse the next open which does then re-use that fd may be intercepted by the SUntek driver and the application could well end up getting audio data in a document file.

    Markus could “fix” this bug, but it is but one example of many possible bugs that Markus would need to catch. If he fixed the dup2() problem then he’d next be bitten by dup3(). Then he’d be bitten by the flags to clone() that control sharing of the fs and fd space. Then he’d be bitten by the subtleties of locking and mmap. After that he’ll be bitten by the subtleties of the changes in futex() semantics between glibc versions.

    After doing all that, you _still_ can’t get it completely right. The weak symbol aliasing in glibc prevents you from correctly intercepting some library calls that you should be intercepting for this type of “preload” driver. That was a deliberate change by the glibc maintainer (Ulrich Drepper) to make this type of “preload driver” impossible because he thought it is such a bad idea (I know, because I was the one that triggered that change, when I made the mistake of doing a SMB based preload driver for Samba).

    It is a neverending task. You end up having to copy more and more of glibc into that “driver”, and then you have to choose what version? Then if bugs are fixed in glibc, you end up with the bugs still present because of the Sundtek driver being loaded.

    It is just not possible to write a completely correct preload driver like this. Doing it just causes pain for other developers in very subtle ways.

  9. Markus Rechberger says:

    tridge, we tested the driver multiple times with many applications.
    Your considerations are all valid, but in practice not valid.

    That dup2 is not intercepted is because it is not needed for us. The number of TV applications out there are limited to a few, definitely below 50.
    You can go start some tests buy several different TV cards none of them will work with all available TV applications due different API implementation in the background. The clone operation is indeed very tricky to handle, but we took care about this and guess what it works. It was a very tough way to have those things work.

    I saw your USB preloadable library, if this would be set globally it would break various applications, just because you intercepted a few syscalls the wrong way (it would deadlock some applications to be more specific). To me that shows that you know the concept of preloading but that you never went into it as deep as we are.

    It does not cause any pain to other developers, if you see that our application is running ignore it tell them to remove it and or to ask us about support for it, Linux Kerneldevelopers can do that – they just set the tainted flag and tell the user to remove the module which causes this before helping them further.
    Additionally we use to give away tuners to multimedia developers for testing and integrating better support for it, and the feedback we get is very good from beginner users to experienced developers, if there are any problems – especially with external applications – we’ll check it (so far there are none).
    Last driver update was just a day ago to increase DVB-S/S2 support, should we tell every customer to recompile the drivers now? That is very much unrealistic, they now just pull the installer for their Ubuntu, Mandrake, Suse, Redhat, Gentoo, etc system and they are done.

    This tuner works nearly everywhere including different architectures. You misinterpreted another fact, the fact of writing a driver for different kernel versions.
    I’m sure you and some other people are very well aware how to compile kernelmodules, but many users are not. So how to distribute this?
    You might remember eeePC or Acer Aspire One netbooks, it came without multimedia stack, without kernel headers, our driver adds support for TV within seconds without even having to take care about the platform. If you compile drivers the manual way you’ll end up with an ABI issue with the audio drivers (for the audio part of analog TV) since Asus never seemed to distribute or document which audio stack they used, we know all the kernelspace issues with regular drivers.
    And this particular driver is not only a driver, it’s an application which provides a streaming interface to the TV tuner, the driver interface is an addon.

    The discussion should go into the direction, how can this be advanced without the need of preloading. Obviously a driver installable within seconds on all systems with different kernelversions is something that increases the usability of Linux a lot.

    We are certainly on the better side when saying that we are only satisfied if everything works, if you checked the installer there’s even a flag to disable preloading – because it was discussed in our support forums, we don’t have to hide what we do.
    There’s nothing else to discuss rather than the last sentence here, I just found time to answer this because I’m updating my Mac :-)

  10. Benjamin Herrenschmidt says:

    You could just as well distribute a source code and a script that recompiles the driver for the various distributions you care about, it’s actually not very complicated.

    The -fact- is that your approach, from an engineering perspective, is the most risky for the overall system stability you could have chosen. Fact it, any piece of code -will- have bugs, imagine what happens if I now have a dozen random “drivers” staring to override various libc functions like you do. And you -do- have bugs, see tridge crash, even in totally unrelated applications.

    What about exploits ? What about security holes your library might introduce into every other program in the system ?

    You seem to be a Mac person, well, I used to be as well, remember how “stable” the system was when we had something like 2 rows of INITs patching every single A-Trap to oblivion, including Apple own stuff as they couldn’t figure out themselves how to get their own stuff working without binary patching at runtime ?

    Your approach is wrong and risk prone, and justifying it by “doing the right thing is too hard” isn’t going to fly.

  11. Well, I guess I’ll avoid hardware produced by a company supporting such practices of dumping the debugging/fixup effort onto others instead of implementing proper driver and proposing it for mainline.

    But defending being *that* cheap is ugly.

  12. Can someone tell me what to look for in identifying a Sundtek device? I want to make absolutely sure that I, and the people around me, successfully avoid these horrors.

    I’m sorry, but calling a LD_PRELOAD hack a driver is false advertising. Blatantly overriding core system calls in every application, whether it needs to access this particular device or not, is straight plain wrong.

    Do you replace core Microsoft Windows DLLs with “hacked” versions of your own? It’s not MS-DOS, it’s Linux. While hooking interrupts (effectively system calls) was the way to get drivers written in DOS, it is not the way it’s done in Linux, or in any other sane OS.

Leave a Reply