BUG: Dentry ffff81003ac17410{i=161b,n=cow} still in use (1) [unmount of rootfs rootfs]

Bug #251223 reported by Matt Zimmerman
14
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Ben Collins
Intrepid
Fix Released
High
Ben Collins

Bug Description

This bug reproducibly hangs the system toward the end of an installation of the desktop CD (I tested with 20080723 Ubuntu amd64 desktop). A trace can be found below. After a few seconds, some further traces appear on the console which are presumably fallout. They were not logged to kmsg, and thus I had to take photos of them. I can provide these on request.

<3>[ 2888.979902] BUG: Dentry ffff81003ac17410{i=161b,n=cow} still in use (1) [unmount of rootfs rootfs]
<0>[ 2888.979902] ------------[ cut here ]------------
<2>[ 2888.979902] kernel BUG at /build/buildd/linux-2.6.26/fs/dcache.c:640!
<0>[ 2888.979902] invalid opcode: 0000 [1] SMP
<4>[ 2888.979902] CPU 1
<4>[ 2888.979902] Modules linked in: sbp2 usb_storage libusual mtd zlib_deflate lzo_decompress lzo_compress nls_utf8 ufs qnx4 hfsplus hfs minix ntfs msdos xfs reiserfs jfs vfat fat ext2 ipv6 af_packet i915 drm rfcomm l2cap bluetooth uinput ppdev parport_pc lp parport acpi_cpufreq cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand freq_table cpufreq_conservative sbs sbshc container iptable_filter ip_tables x_tables arc4 ecb crypto_blkcipher sierra video joydev usbserial battery iwlcore ac pcmcia rfkill output mac80211 yenta_socket psmouse sdhci rsrc_nonstatic serio_raw cfg80211 ricoh_mmc pcmcia_core mmc_core button wmi_acer snd_hda_intel snd_seq_dummy snd_seq_oss snd_pcsp snd_seq_midi snd_pcm_oss snd_rawmidi iTCO_wdt snd_mixer_oss iTCO_vendor_support snd_seq_midi_event snd_seq snd_seq_device snd_pcm snd_timer snd soundcore shpchp snd_page_alloc pci_hotplug intel_agp thinkpad_acpi evdev led_class nvram squashfs loop nls_cp437 isofs ext3 jbd mbcache sg sr_mod cdrom sd_mod
 ata_piix pata_acpi ohci1394 ieee1394 ahci ata_generic libata scsi_mod uhci_hcd ehci_hcd usbcore e1000e dock thermal processor fan fbcon tileblit font bitblit softcursor uvesafb cn fuse [last unloaded: jffs2]
<4>[ 2888.979902] Pid: 14276, comm: gparted.postrm Not tainted 2.6.26-4-generic #1
<4>[ 2888.979902] RIP: 0010:[shrink_dcache_for_umount_subtree+0x25c/0x270] [shrink_dcache_for_umount_subtree+0x25c/0x270] shrink_dcache_for_umount_subtree+0x25c/0x270
<4>[ 2888.979902] RSP: 0018:ffff81001e95de88 EFLAGS: 00010292
<4>[ 2888.979902] RAX: 0000000000000069 RBX: ffff81003d4020e0 RCX: ffff81000392fe60
<4>[ 2888.979902] RDX: ffffffff805fd348 RSI: 0000000000000086 RDI: 0000000000000296
<4>[ 2888.979902] RBP: ffff81003ac17410 R08: fffffffffd9da600 R09: 000000c1d7d6b5de
<4>[ 2888.979902] R10: 0000000000000000 R11: ffffffff802243d0 R12: ffff81003ac17470
<4>[ 2888.979902] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000001000
<4>[ 2888.979902] FS: 00007f28490186e0(0000) GS:ffff81003d802780(0000) knlGS:0000000000000000
<4>[ 2888.979902] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 2888.979902] CR2: 0000000001dcd438 CR3: 000000001c5fb000 CR4: 00000000000006e0
<4>[ 2888.979902] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[ 2888.979902] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[ 2888.979902] Process gparted.postrm (pid: 14276, threadinfo ffff81001e95c000, task ffff8100025d0000)
<4>[ 2888.979902] Stack: ffff81003d9ac668 ffff81003d9ac400 ffffffff804f4fc0 ffff81001e95df28
<4>[ 2888.979902] ffff81001e95df38 ffffffff802da519 ffff81003d9ac400 ffffffff802c7509
<4>[ 2888.979902] ffffffff8060ba20 0000000000000001 ffffffff8060ba20 ffffffff802c7639
<4>[ 2888.979902] Call Trace:
<4>[ 2888.979902] [shrink_dcache_for_umount+0x29/0x50] ? shrink_dcache_for_umount+0x29/0x50
<4>[ 2888.979902] [mtd:generic_shutdown_super+0x19/0x1a0] ? generic_shutdown_super+0x19/0x120
<4>[ 2888.979902] [fuse:kill_anon_super+0x9/0x40] ? kill_anon_super+0x9/0x40
<4>[ 2888.979902] [mtd:deactivate_super+0x69/0x6170] ? deactivate_super+0x69/0xa0
<4>[ 2888.979902] [sys_getcwd+0x108/0x180] ? sys_getcwd+0x108/0x180
<4>[ 2888.979902] [system_call_after_swapgs+0x8a/0x8f] ? system_call_after_swapgs+0x8a/0x8f
<4>[ 2888.979902]
<4>[ 2888.979902]
<0>[ 2888.979902] Code: 8b 08 48 8b 45 10 48 85 c0 74 04 48 8b 50 40 48 8d 86 68 02 00 00 48 c7 c7 08 0a 5b 80 48 89 ee 48 89 04 24 31 c0 e8 26 53 1f 00 <0f> 0b eb fe 0f 0b eb fe 66 66 66 2e 0f 1f 84 00 00 00 00 00 53
<1>[ 2888.979902] RIP [shrink_dcache_for_umount_subtree+0x25c/0x270] shrink_dcache_for_umount_subtree+0x25c/0x270
<4>[ 2888.979902] RSP <ffff81001e95de88>
<4>[ 2888.998399] ---[ end trace c1b471ca0e041271 ]---

Tags: iso-testing
Revision history for this message
Matt Zimmerman (mdz) wrote :

Attaching a complete dmesg for the system where this occurs

Changed in linux:
importance: Undecided → High
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Thanks Matt. Marking this as Triaged and will bring to the attention of the kernel team.

Changed in linux:
assignee: nobody → ubuntu-kernel-team
status: New → Triaged
Revision history for this message
Colin Watson (cjwatson) wrote :

Does union=unionfs on the kernel command line still work? If so, that would be a viable fallback for Alpha 3.

Revision history for this message
Colin Watson (cjwatson) wrote :

Sigh, unionfs seems to have been deleted, thus no possibility of fallback. (How about not deleting old modules until their replacements have been proven to work?)

Revision history for this message
Matt Zimmerman (mdz) wrote :

I'm uploading more traces (photos) to http://people.ubuntu.com/~mdz/251223/ in case they're useful. All of them were captured after the initial BUG on one occasion or another, not all from the same boot.

Revision history for this message
Matt Zimmerman (mdz) wrote :

I'm attaching an strace -f of the dpkg process showing the events which lead up to the BUG()

Ignore the long sleep() and the SIGSTOP/SIGCONTs; that's just me arranging to attach to it at the appropriate time.

Revision history for this message
Matt Zimmerman (mdz) wrote :

I've reduced my test case down to this:

1. kvm -m 256 -hda hda.img -cdrom intrepid-desktop-amd64.iso -boot d
2. boot with 'single' and select root shell
3. mkdir /target
4. mount /dev/sda1 /target # ext3 filesystem containing a copy of the contents of /rofs
5. chroot /target /bin/true

this simple case crashes, but with a different trace:

[ 110.376037] double fault: 0000 [1] SMP
[ 110.376037] CPU 0
[ 110.376037] Modules linked in: iptable_filter ip_tables x_tables snd_pcsp snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi psmouse snd_seq_midi_event serio_raw snd_seq snd_timer snd_seq_device snd soundcore snd_page_alloc i2c_piix4 i2c_core button evdev battery squashfs loop nls_cp437 isofs ext3 jbd mbcache sg sr_mod cdrom sd_mod 8139too ata_piix pata_acpi ata_generic 8139cp mii libata scsi_mod dock thermal processor fan fbcon tileblit font bitblit softcursor uvesafb cn fuse
[ 110.376037] Pid: 5639, comm: true Not tainted 2.6.26-4-generic #1
[ 110.376037] RIP: 0010:[<000000008022a2c0>] [<000000008022a2c0>]
[ 110.376037] RSP: 0018:0000000000000000 EFLAGS: 00010092
[ 110.376037] RAX: 000000000000002d RBX: 0000000000000000 RCX: 00000000f7fceff4
[ 110.376037] RDX: 0000000000000000 RSI: 00000000f7fd0b10 RDI: 0000000000021000
[ 110.376037] RBP: 00000000ffafc898 R08: 0000000000000000 R09: 0000000000000000
[ 110.376037] R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
[ 110.376037] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 110.376037] FS: 0000000000000000(0000) GS:ffffffff8063d000(0063) knlGS:00000000f7e748c0
[ 110.376037] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
[ 110.376037] CR2: 000000008022a2c0 CR3: 0000000009d49000 CR4: 00000000000006e0
[ 110.376037] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 110.376037] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 110.376037] Process true (pid: 5639, threadinfo ffff81000f5d2000, task ffff81000cd0e090)
[ 110.376037] Stack: ffffffff80714e68 000000008022a295 ffffffff80714f58 000000008022a2c0
[ 110.376037] 0000000000000000 0000000000000040 000000000000002b ffffffff8020e1b9
[ 110.376037] 0000000000000000 0000000000000000 ffffffff80714f58 ffffffff804db758
[ 110.376037] Call Trace:
[ 110.376037] <#DF> [<ffffffff8020e1b9>] ? show_registers+0xf9/0x270
[ 110.376037] [<ffffffff804c951f>] ? __die+0xaf/0x110
[ 110.376037] [<ffffffff8020e470>] ? die+0x40/0x90
[ 110.376037] [<ffffffff8020efa2>] ? do_double_fault+0x62/0x70
[ 110.376037] [<ffffffff8020d819>] ? double_fault+0x89/0xa0
[ 110.376037] <<EOE>>
[ 110.376037]
[ 110.376037] Code: Bad RIP value.
[ 110.376037] RIP [<000000008022a2c0>]
[ 110.376037] RSP <0000000000000000>
[ 110.376037] ---[ end trace f085c05b5bc36a73 ]---

Revision history for this message
Matt Zimmerman (mdz) wrote :

interestingly, replacing "/bin/true" with "/bin/ld_static" (statically linked) avoids the crash. ldconfig.real similarly works.

Revision history for this message
Matt Zimmerman (mdz) wrote :

Ignore my last two comments, my test rig had changed inadvertently and this was a 32-bit chroot rather than a matching 64-bit one. There is still a bug, there, surely, but I can't say that it's the same one.

My test case is back up to 'dpkg --root=/target --purge gparted'

Notably, 'chroot /target dpkg --purge gparted' works fine.

Revision history for this message
Matt Zimmerman (mdz) wrote :

I've reduced the test case down to the attached program. Note that nothing needs to be mounted on /mnt, and uncommenting the chdir call avoids the crash.

Revision history for this message
Ben Collins (ben-collins) wrote :

Tough bug to find. This is basically caused by AppArmor's VFS patches, most notably the unambiguous-__d_path.diff patch. In sys_getcwd() it changes the call to __d_path() and has two mistakes:

* First, it passes the actual struct path root, which ends up being changed when we are looking up chroots (and most likely bind/union mounts too). When it calls path_put(root) it's doing so on something other than what we started with, hence improper ref counting.
* Second, it does not pass D_PATH_FAIL_DELETED like it should. So we end up not properly catching failures.

FIxing both of these, will also send upstream.

Changed in linux:
assignee: ubuntu-kernel-team → ben-collins
milestone: none → intrepid-alpha-4
status: Triaged → In Progress
Revision history for this message
Ben Collins (ben-collins) wrote :

Applied to our git tree. Should be in next upload.

Changed in linux:
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.26-4.12

---------------
linux (2.6.26-4.12) intrepid; urgency=low

  [ Ben Collins ]

  * e1000e: Upgraded module to 0.4.1.7 upstream. Placed in ubuntu/,
    in-kernel driver disabled
  * config: Disable e1000e in-kernel, and enable newer driver in ubuntu/
  * rfkill: Update to 1.3 drivers, and move to common location
  * ubuntu: Actually link kconfig/kbuild into rfkill subdir
  * config: Enable loading dsdt from initramfs
    - LP: #246222
  * ubuntu: [compcache] Update to fix crashes in improper BUG()
  * build: Create a retag scripts to recover tags from rebases
  * build: Updates for dbg pkg
  * build: Make sure no empty lines show up in debian/files
  * ubuntu: atl1e: Add new driver from 2.6.27-pre-rc1
    - LP: #243894
  * sys_getcwd: Fix some brokeness introduced by AppArmor __d_path
    changes
    - LP: #251223
  * ubuntu: unionfs: Added v1.4 module from hardy
  * build: Add sub-flavour infrastructure, and virtual subflav

  [ Eric Piel ]

  * ACPI: Allow custom DSDT tables to be loaded from initramfs

  [ Kees Cook ]

  * AppArmor: Smack VFS patches

  [ Mario Limonciello ]

  * Work around ACPI corruption upon suspend on some Dell machines.
    - LP: #183033

  [ Tim Gardner ]

  * Export usbhid_modify_dquirk for LBM module bcm5974
    - LP: #250838
  * VIA - Add VIA DRM Chrome9 3D engine
    - LP: #251862
  * Define TRUE/FALSE for VIA DRM driver.

 -- Ben Collins <email address hidden> Tue, 15 Jul 2008 12:51:39 -0400

Changed in linux:
status: Fix Committed → Fix Released
Revision history for this message
Colin Watson (cjwatson) wrote :

For the record, we worked around this for Alpha 3 like this:

dpkg (1.14.20ubuntu3) intrepid; urgency=low

  * src/help.c: chdir("/") after chroot(). Not only is this good practice,
    but it works around bug #251223

 -- Matt Zimmerman <email address hidden> Thu, 24 Jul 2008 18:31:52 +0100

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.