User cancel of fsck gives: "fsck.ext4: Inode bitmap not loaded while setting block group checksum"

Bug #582035 reported by Martin Erik Werner
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
e2fsprogs
Fix Released
Undecided
Unassigned
e2fsprogs (Ubuntu)
Fix Released
Undecided
Unassigned
Lucid
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: e2fsprogs

"Parents": Bug #571707 and Bug #577331

PROBLEM

Whenever fsck is cancelled by user, this error message is given:
"fsck.ext4: Inode bitmap not loaded while setting block group checksum info"
And the exit status of fsck is 8

(This leads to plymouth/mountall reporting "Serious errors" on the filesystem in question when fsck is cancelled during boot, and acting accordingly.)

TEST CASE
1. Force a fsck at boot time
  $ sudo touch /forcefsck
2. Reboot your system
3. When the 'Checking filesystem' prompt is displayed press 'C' to cancel
4. Finish the boot sequence
5. Check the content of /var/log/boot.log

VERIFICATION-DONE
The boot.log contains
-----
fsck from util-linux-ng 2.17.2
User cancelled filesystem checks
/dev/sda1: e2fsck canceled.
mountall: fsck / [235] terminated with status 32
-----

VERIFICATION-FAILED
The boot.log contains
-----
fsck from util-linux-ng 2.17.2
User cancelled filesystem checks
/dev/sda1: e2fsck canceled.
fsck.ext4: Inode bitmap not loaded while setting block group checksum info
mountall: fsck / [234] terminated with status 8
mountall: Unrecoverable fsck error: /
-----

OBSERVATIONS

Seen on both lucid and maverick in virtualbox, and confirmed by Benjamin Kay on lucid non-virtual ( https://bugs.edge.launchpad.net/ubuntu/lucid/+source/mountall/+bug/577331/comments/21 )

Same error is seen when running from maintenance shell on boot, or liveCD.

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: e2fsprogs 1.41.11-1ubuntu2
ProcVersionSignature: Ubuntu 2.6.34-2.9-generic 2.6.34-rc7
Uname: Linux 2.6.34-2-generic i686
Architecture: i386
Date: Tue May 18 02:24:53 2010
InstallationMedia: Ubuntu 10.04 LTS "Lucid Lynx" - Release i386 (20100429)
ProcEnviron:
 LANG=en_GB.utf8
 SHELL=/bin/bash
SourcePackage: e2fsprogs

Revision history for this message
Martin Erik Werner (arand) wrote :
Changed in e2fsprogs (Ubuntu):
status: New → Confirmed
description: updated
Revision history for this message
Martin Erik Werner (arand) wrote :

I talked with psusi (Phillip Susi) in #ubuntu-devel and he seemed to have some interesting ideas, unfortunately I had absolutely no idea what he was talking about (me not knowing the code, or any theory of filesystems at all), so I though I would post it here for reference by others.

Main points (edited):

Sounds like an assertion failure triggered by the sigint... search the code for that string.
Sounds like the canceling tries to do some cleanup in the SIGINT handler, and part of that cleanup is trying to write out the inode allocation bitmap, which an assertion check makes sure is actually loaded first and it isn't.
It may not be loaded either because it has not been loaded yet at the time of the SIGINT, or because that block group's inode allocation bitmap is uninitialized.

First thing I'd do is search the source code for the string "Inode bitmap not loaded while"
My guess is you will find an assert() with that string in it and that assertion is failing because of a race condition in which the inode allocation bitmap is not loaded yet, but should be eventually... and that could be a little tricky to fix.

But I have a feeling that probably it is not loaded because that block group's inode table is uninitialized in the first place.. meaning there are no inodes used in that block group so the allocation table is not actually valid and a flag says it is unused, assume it's all empty.

Fixing that should be as simple as putting in a check for the uninitialized flag in the code trying to flush the allocation bitmap and bail out of it is set.

/lastlog attached for completeness

tags: added: lucid
Revision history for this message
Theodore Ts'o (tytso) wrote :

Thanks for the very clear bug report. I wish all Ubuntu bug reports were this clear! :-)

The fix for this bug has been committed into the 'maint' branch of e2fsprogs, and is commit 2e6436d. This fix is in the (just-released) e2fsprogs 1.41.12.

Revision history for this message
Martin Erik Werner (arand) wrote :

And thanks for the fast solution!

I can confirm that the new version of e2fsprogs (1.41.12) solves this bug on my system.
___

As far as packaging goes, it seems that if one only applies the old ubuntu .diff.gz and build straight off, the resulting e2fsprogs will depend on the old version of e2fslibs (shlibdeps hiccups in some way? Force-installing worked just fine as far as testing goes).
Just thought I'd put that down here for reference just in case, although since it's already in Debian, if we pull from there it might be a non-issue.

Changed in e2fsprogs:
status: New → Fix Released
Revision history for this message
Martin Erik Werner (arand) wrote :

Since this is a fairly common, and pretty ugly problem in lucid, I'll try getting an SRU in order.

Revision history for this message
Theodore Ts'o (tytso) wrote : Re: [Bug 582035] Re: User cancel of fsck gives: "fsck.ext4: Inode bitmap not loaded while setting block group checksum"

On Wed, May 26, 2010 at 02:53:09AM -0000, arand wrote:
> Since this is a fairly common, and pretty ugly problem in lucid, I'll
> try getting an SRU in order.

Are you planning on taking all of 1.41.12, or just the relevant patch
from the e2fsprogs git tree?

      - Ted

Revision history for this message
Martin Erik Werner (arand) wrote :

For the released 10.04 I was planning on just grabbing the commit in question , since I'm not sure if a new version with more changes would go well with the updating policy.

For in-devel 10.10 I hope we'll simply merge with the new version from Debian (I was thinking I might try my luck at that, but I'll have to check if the maintainer proper has plans for it).

Revision history for this message
Theodore Ts'o (tytso) wrote :

On Wed, May 26, 2010 at 02:03:55PM -0000, arand wrote:
> For the released 10.04 I was planning on just grabbing the commit in
> question , since I'm not sure if a new version with more changes would
> go well with the updating policy.

OK, great. 1.41.12 does fix a number of high priority bugs beyond the
commit which you've grabbed, but it does introduce one, which will be
fixed in 1.41.13 (to be released in a few days).

The unfixed bugs in Ubuntu's e2fsprogs are probably only going to
matter for server workloads, and that's OK, because the 2.6.32
kernel's ext4 is missing so many bug fixes it's really not going to be
ready for server-class workloads anyway. (Hint: take a look at all of
the ext4 patches which Red Hat rawhide has taken for 2.6.32.)

                                    - Ted

Revision history for this message
Martin Erik Werner (arand) wrote :

Hmm, it could be a good idea, however the server side of things is something I know nothing about, so I reckon it's completely someone else's decision to take whether the other bug-fixes are 10.04-updates material, or if indeed pulling the whole new version of e2fsprogs in 10.04 is the best way to go.

In the meantime, here is a debdiff for this particular issue, at least.

Revision history for this message
Martin Erik Werner (arand) wrote :

Hmm, there really should be a LP reference in there, new debdiff, removing the other.

As far as the Maverick version goes, Scott James Remnant indicated that the Ubuntu package is in fact kept in git and will be simply pulled from your (Ted's) tree, so it's not even going through Debian merging as I previously though (provided I've understood things correctly this time around...).

Revision history for this message
Martin Erik Werner (arand) wrote :

Hmm, that is quite obviously the wrong debdiff, remove and re-attach...

tags: added: patch
Changed in e2fsprogs (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Martin Erik Werner (arand) wrote :

Okay, with the new version this change is in the e2fsprogs in Maverick.

I would argue that this is something that'd be worth an SRU, hence subscribing sponsors and sru teams.

Also fixing debdiff to use lucid-proposed instead.

Revision history for this message
Benjamin Drung (bdrung) wrote :

uploaded to lucid-proposed

Changed in e2fsprogs (Ubuntu Lucid):
status: New → Fix Committed
Revision history for this message
Jonathan Riddell (jr) wrote :

waiting for ubuntu-sru approval

Revision history for this message
John Dong (jdong) wrote :

ACK from SRU team.

Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Accepted e2fsprogs into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

SRU verification for Lucid:
I have reproduced the problem with e2fsprogs 1.41.11-1ubuntu2 in lucid and have verified that the version of e2fsprogs 1.41.11-1ubuntu2.1 in -proposed fixes the issue.

Marking as verification-done

description: updated
tags: added: verification-done
removed: verification-needed
Revision history for this message
Martin Pitt (pitti) wrote :

Copied to lucid-updates. Closing manually due to typo in the changelog.

Changed in e2fsprogs (Ubuntu Lucid):
status: Fix Committed → Fix Released
tags: added: testcase
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.