lvm2 hangs when creating snapshot, requires reboot to resolve

Bug #605551 reported by Chris Irwin
74
This bug affects 13 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
New
Undecided
Unassigned

Bug Description

Binary package hint: lvm2

Unable to create snapshots. Whatever filesystem it is attempted on seems to hang after attempting, requiring a reboot to resolve.

$ sudo lvcreate -s -L 500M -n lv_root_lucid_snap /dev/vg_thinkpad/lv_root_lucid

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: lvm2 2.02.54-1ubuntu4
ProcVersionSignature: Ubuntu 2.6.32-24.38-generic 2.6.32.15+drm33.5
Uname: Linux 2.6.32-24-generic x86_64
NonfreeKernelModules: nvidia
Architecture: amd64
Date: Wed Jul 14 14:10:32 2010
EcryptfsInUse: Yes
ProcEnviron:
 PATH=(custom, user)
 LANG=en_CA.utf8
 SHELL=/bin/bash
SourcePackage: lvm2

Revision history for this message
Chris Irwin (chrisirwin) wrote :
Revision history for this message
Chris Irwin (chrisirwin) wrote :
Revision history for this message
Jka (jka-pub) wrote :

After system hang reboot the snapshot is there and so use the following command to remove it:

$ sudo lvremove -f /dev/vghi/snap_root
  Logical volume "snap_root" successfully removed

Revision history for this message
grouch (grouch) wrote :

I had the exact same problem. I managed to get around it though. Here's what happened on my system.

I created a snapshot of a logical volume, containing my root partition (ext4). The lvcreate command hanged (unkillable), my system broke down and needed a hardware reset to recover.

I reinstalled lvm2, udev and dev-mapper related packages to no avail: the problem remained. I found out though that the problem does not arise when creating snapshots of unmounted logical volumes. When creating a snapshot of a mounted logical volume in another volume group, i.e. one without system critical partitions, the lvcreate command also hangs (unkillable), but at least the system remains operational. It turns out that the problem only affects the volume group in which the snapshot is being created. Still, the only way to shut down was with a hardware reset.

When I boot into the 2.6.32-23 kernel instead of the 2.6.32-24 kernel the problem is gone and everything works fine. So it seems that this is may be a kernel bug rather than a lvm2 bug. Dunno. I'm just happy that I found this workaround, albeit after many frustrating hours :-). Hopefully this messages will save someone else that time or help some one to fix the bug. Keep up the good work folks.

Cheers,
Oscar

PS Bugs 604807 and 595489 look like duplicates of this one.

David Hoffman (hoffman)
tags: added: regression
tags: added: regression-update
removed: regression
tags: removed: regression-update
tags: added: regression-update
affects: lvm2 (Ubuntu) → linux (Ubuntu)
Revision history for this message
Stefan Bader (smb) wrote :

Somehow it seems that one of the changes to ext4 that were brought back from upstream (and also will be part of the upstream stable release 2.6.32.17) triggers some of the known regressions in the writeback code. In bug 585092 there is a reference to some test kernels (I recently added a newer version, so only -24.39+ package would be required). If it is possible I would like to get feedback for these test kernels and whether they would solve this problem as I would like to see the backported patchset included in them proposed for upstream stable as well.

Revision history for this message
Patrick Pfeifer (patrick2000) wrote :

@stefan

i have tested your -24.39+lp585092v4 kernel from bug 585092 - i.e. http://people.canonical.com/~smb/lp585092/linux-image-2.6.32-24-generic_2.6.32-24.39+lp585092v4_amd64.deb - it does NOT fix this issue. dmesg included

Revision history for this message
Stefan Bader (smb) wrote :

Thanks a lot for testing Patrick. It would be awesome if you could help out with some some more tests.

http://kernel.ubuntu.com/~kernel-ppa/mainline/

The link above contains pre-compiled kernel packages for upstream kernel versions. It would be quite valuable to see whether 2.6.35-rc6 has the same problem. If that also shows the problem, then it needs to be worked on upstream. Otherwise maybe have a look at 2.6.33 and 2.6.34 to narrow things down. The ext4 changes that were backported went into upstream during that timeframe but I don't know exactly right now which portions went in at which time. Having a few more pointers would help to narrow the search for other changes.

Revision history for this message
Patrick Pfeifer (patrick2000) wrote :

let's see ... at the moment i'm running on a working 2.6.32-34-generic kernel and so i will probably be able to do some further testing with the help of kvm's running with writeable lvm-snapshots as root fs (for the modules, ...)

anyway - i constructed this currently running 2.6.32-34-generic by recompiling it from source with the fix-ext4.patch from bug #595489 (http://launchpadlibrarian.net/50912732/fix-ext4.patch) included. What is the (/ Is there a) problem with that patch?

Revision history for this message
Stefan Bader (smb) wrote :

Ok, reading through the patch and the comments in the other bug as well, I'd suspect that upstream is still affected, too. So .35-rc6 would still be affected and .33 and .34 would be fine as the patch referenced as breaking things was part of 2.6.35-rc1. The patch itself looks reasonable. And if you too can verify that it resolves the snapshot problem, we should try to get that fix up as soon as possible. At the same time it should be checked why this is not upstream yet.

Revision history for this message
Stefan Bader (smb) wrote :

Ok sent query to Eric Sandeen and the upstream stable review list about this. 2.6.32.17 has the same patches included as we got in our 2.6.32-24.37.

Revision history for this message
Patrick Pfeifer (patrick2000) wrote :

Well, i would have really liked to test some more kernels but i'm hopelessly distracted by learning make-kpkg, git, fiddling with make-kpkg --append-to-version, wondering why somehow there is still a "+" added when i compile a patched vanilla 2.6.35 which causes make-dpkg to fail.

(It's because the tree is not in a clean state, there is no annotated (! - "git tag -a") git tag for the non-existant commit and therefore scripts/setlocalversion spits out the "+".)

Anyway ... by now I figured out (1% of) how to use make-kpkg as well as git and managed to "git clone --reference linus git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".

$ cd linux-next
$ git describe
next-20100803
$ git diff v2.6.35 -- fs/ext4/super.c | grep FREEZE
- vfs_check_frozen(sb, SB_FREEZE_WRITE);
+ vfs_check_frozen(sb, SB_FREEZE_TRANS);
- vfs_check_frozen(sb, SB_FREEZE_WRITE);
+ vfs_check_frozen(sb, SB_FREEZE_TRANS);

At the end of the day .... this (imho) means 2.6.25 as well as 2.6.32.17 are non-functional with respect to live ext4 lvm snapshots but there is a good chance that future 2.6.36 and 2.6.32.18 will be working again.....

cheers

PS:
for the reference, i did "git clone --reference linus git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.32.y.git" as well and
$ cd linux-2.6.32.y
$ git describe
v2.6.32.17
$ git diff v2.6.32.16 -- fs/ext4/super.c | grep FREEZE
+ vfs_check_frozen(sb, SB_FREEZE_WRITE);
+ vfs_check_frozen(sb, SB_FREEZE_WRITE);
... this means v2.6.32.16 should be ok as well

PPS:
$ cd linus (git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git)
$ git diff v2.6.34 v2.6.35-rc1 -- fs/ext4/super.c | grep FREEZE
+ vfs_check_frozen(sb, SB_FREEZE_WRITE);
+ vfs_check_frozen(sb, SB_FREEZE_WRITE);
it looks like the change spoiling the current ubuntu 2.6.32-24-generic kernel came from change published by linus to an -rc tree only (despite it now having diffused to released 2.6.35 and 2.6.32.17 as well, sadly) ... why was that backported to the lucid kernel so early ?

Revision history for this message
Stefan Bader (smb) wrote : Re: [Bug 605551] Re: lvm2 hangs when creating snapshot, requires reboot to resolve

Actually the results from testing would be obvious and I should have pointed
this out in a better way. As the patch mentioned in the fix only went into rc1
of 2.6.35 the previous kernels were not affected. So it is pretty clear that all
2.6.35 and 2.6.32.17 are affected and need the mentioned patch soon.

This has been gone into current upstream releases and now also into the stable
update 2.6.32.17 as Greg had done the release before looking over the replies.
We did that backporting early because Ted Tso was pushing for them and
filesystem tests were showing some real and also severe problems with the
current ext4. So the patches came from a branch of Ted which he keeps
specifically for things that should get back to 2.6.32-
According to Eric Sandeen the patch has been recently acked by Ted and should
show up in upstream soon. As you saw it already was in linux-next and I hope
Linus will pick that up soon.
Depending on the timing upstream stable, too needs to either revert the
offending patch and re-release or add the fix as soon as it gets out officially.

I need to wait for the release of a security kernel, but then we will try to get
fixed kernels uploaded as soon as possible. There are other process issues which
will cause delays but for affected parties, the pre-proposed PPA would contain
the fixes first.

https://launchpad.net/~kernel-ppa/+archive/pre-proposed

Revision history for this message
Patrick Pfeifer (patrick2000) wrote :

Oo-h Ok. Thanks for the update! :-)

I guess I will install 2.6.32-23-generic and just wait for 2.6.32-25-generic then.

Right now I just wonder if Greg is about to do "2. The insane thing" with v2.6.32.17 then ? :-)

http://www.kernel.org/pub/software/scm/git/docs/git-tag.html#_on_re_tagging

Revision history for this message
Stefan Bader (smb) wrote :

> I guess I will install 2.6.32-23-generic and just wait for
> 2.6.32-25-generic then.

Given the changes are small there won't be a -25. Rather a 2.6.32-24.40.

> Right now I just wonder if Greg is about to do "2. The insane thing"
> with v2.6.32.17 then ? :-)

Definitely no retagging. Either 2.6.32.18 consists of just a revert or the
additional fix and nothing else if it is deemed important enough.

Revision history for this message
Peter Passchier (peter-passchier) wrote :

It's not strictly a duplicate, because it pertains to different versions of the kernel.

BTW, I have this on Lucid 10.04.4 fully updated as of 2013-01-01 with kernel:
2.6.32-45-pae-latest
2.6.35-32.28-pae

Which kernel should I use if I want to make an lvm2 snapshot of an ext4 root? Would I avoid this bug if I mount the root filesystem as ext3 instead??

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.