Comment 69 for bug 428435

Revision history for this message
Julien Plissonneau Duquene (julien-plissonneau-duquene) wrote :

Nice to see that at last some sanity finally got in. I am a bit late here, but I have another suggestion below. Skip to the end to avoid the rant.

Context: FYI, yesterday I had to patch (live) the bootblock of a Debian "squeeze" system that stopped to boot a few updates ago. It was a pain diagnosing that one, because it was installed "cleanly" from scratch (I thought), and blkid on the live system reported both partitions: /dev/hda1 root ext3, and /dev/hda5 swap. In the initramfs, only swap was detected.

Side note: usually on a PC I like my MBR being a regular PC MBR (or debian's enhanced MBR that allows you to choose the partition), and my GRUB being on the boot block of the "boot" (if any) or "root" partition. Telling users that GRUB should be on the MBR is, IMHO, asking for trouble. But for the system above I decided to go by the book and just follow what the install CD told me to do next, next, next.

Context again: Finding what was wrong was a pain. First, I had to figure out that blkid used a cache, which explained why the root partition was shown on the live system but not at boot time. Using "-c /dev/null" the same problem was visible on the live system. Deleted the cache, hacked grub.conf, lazily waited a few weeks for the fix to come (or not) because my last experience of sending blkid bug reports left me with the impression that the only sane thing to do would be to fork the package and took it off the hands of its current maintainers.

Context continued: Somehow the ext3 partition managed to get back in blkid's cache over these weeks. Thought it was solved, tried "-c /dev/null", nope, deleted cache, nope. Then finally decided to take a few hours solving that "perfectly-legit-installed-by-the-book-no-longed-detected-ext3-partition" mystery. Played around with blkid options but could not get anything useful of it, excepted that "probably more filesystems" message (option -p) that did not tell which other signatures it found. Downloaded source, activated all debug flags in libblkid, tried again. Could finally see that libblkid detected a vfat and an ext3 signature. vfat?!? Then FINALLY I checked hda1's boot block, and found out that nothing erased the vfat boot block when I installed the Debian system on it, following the regular procedure. That old box used to have a W2K system on it... Of course, there are failures on multiple levels here, just like in this bug report.

Now the suggestion for blkid:

Implement an option (e.g. -a for "all") that will use blkid_do_probe and a loop instead of blkid_do_safeprobe, and report all detected signatures.

I guess that it should make everybody happy, because scripts can now try the "safe" (aka your system is so safe it won't boot anymore) version first, then -a, then handle special cases at script level or issue warning or error messages as needed. But the system will boot.

Alternative: a second run of blkid_do_probe in the error message of lowprobe_device so the poor user can get an idea of which signatures are confusing blkid.