Analyzing the blktrace logs more carefully, it looks like the umount path is doing a synchronous inode sync for each inode, which means that we're executing a journal transaction and a barrier between every single inode update. Doh!
I haven't analyzed the kernel code yet (probably won't have time tonight), but that does seem to be what's going on. Hopefully the kernel fix should be simple....
I can confirm this using a bleeding edge kernel 2.6.34-rc2+, and when using a tmpfs mounted for /tmp, it takes about two minutes on a T400 laptop.
Using blktrace, it looks like we're doing a whole ton of journal writes after the umount.
Here's the blktrace summary of the reproducer:
Total (loop0):
Reads Queued: 0, 0KiB Writes Queued: 74944, 299776KiB
Read Dispatches: 0, 0KiB Write Dispatches: 0, 0KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 0, 0KiB Writes Completed: 0, 0KiB
Read Merges: 0, 0KiB Write Merges: 0, 0KiB
IO unplugs: 66567 Timer unplugs: 0
And here's the blktrace of the reproducer adding a "sync" before the "time umount /mnt/test":
Total (loop1):
Reads Queued: 2, 8KiB Writes Queued: 9438, 37752KiB
Read Dispatches: 0, 0KiB Write Dispatches: 0, 0KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 0, 0KiB Writes Completed: 0, 0KiB
Read Merges: 0, 0KiB Write Merges: 0, 0KiB
IO unplugs: 55 Timer unplugs: 0
Analyzing the blktrace logs more carefully, it looks like the umount path is doing a synchronous inode sync for each inode, which means that we're executing a journal transaction and a barrier between every single inode update. Doh!
I haven't analyzed the kernel code yet (probably won't have time tonight), but that does seem to be what's going on. Hopefully the kernel fix should be simple....