cloud-init growpart race with udev

Bug #1834875 reported by Tobias Koch
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-initramfs-tools
Fix Released
Medium
Unassigned
cloud-utils
Fix Released
Undecided
Unassigned
cloud-initramfs-tools (Ubuntu)
Fix Released
Undecided
Unassigned
Eoan
Won't Fix
Undecided
Unassigned
cloud-utils (Ubuntu)
Fix Released
Undecided
Unassigned
Eoan
Won't Fix
Undecided
Unassigned

Bug Description

[impact]
On Azure, it happens regularly (20-30%), that cloud-init's growpart module fails to extend the partition to full size.

Such as in this example:

========================================

2019-06-28 12:24:18,666 - util.py[DEBUG]: Running command ['growpart', '--dry-run', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
2019-06-28 12:24:19,157 - util.py[DEBUG]: Running command ['growpart', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
2019-06-28 12:24:19,726 - util.py[DEBUG]: resize_devices took 1.075 seconds
2019-06-28 12:24:19,726 - handlers.py[DEBUG]: finish: init-network/config-growpart: FAIL: running config-growpart with frequency always
2019-06-28 12:24:19,727 - util.py[WARNING]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
2019-06-28 12:24:19,727 - util.py[DEBUG]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 812, in _run_modules
    freq=freq)
  File "/usr/lib/python3/dist-packages/cloudinit/cloud.py", line 54, in run
    return self._runners.run(name, functor, args, freq, clear_on_fail)
  File "/usr/lib/python3/dist-packages/cloudinit/helpers.py", line 187, in run
    results = functor(*args)
  File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 351, in handle
    func=resize_devices, args=(resizer, devices))
  File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2521, in log_time
    ret = func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 298, in resize_devices
    (old, new) = resizer.resize(disk, ptnum, blockdev)
  File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 159, in resize
    return (before, get_size(partdev))
  File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 198, in get_size
    fd = os.open(filename, os.O_RDONLY)
FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-partuuid/a5f2b49f-abd6-427f-bbc4-ba5559235cf3'

========================================

@rcj suggested this is a race with udev. This seems to only happen on Cosmic and later.

[test case]
slightly unfortunately, the only way we've been able to reliably reproduce this is to get the CPC team to run their tests in azure, so we'll have to rely on them for SRU verification too

[regression potential]
The change is fairly simple and was thoroughly reviewed and tested before it got uploaded to focal, and the backport is trivial so I am confident that the regression potential is slight.

Related branches

Revision history for this message
Ryan Harper (raharper) wrote :

Hi,

Could you run cloud-init collect-logs and attach the tarball provided?

Are there any particular instance specific defaults ?what type, what region? Size of the ephemeral disk?

Changed in cloud-init:
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
Tobias Koch (tobijk) wrote :

This seems hard to reproduce with the azure-cli, but happens regularly on instances we launch directly via the API. I'm not aware of any special configurations. These are stock images launched in northeurope. The ephemeral disk is 30GB.

Revision history for this message
Tobias Koch (tobijk) wrote :

This happens across different instance types without a clear pattern.

Tobias Koch (tobijk)
Changed in cloud-init:
status: Incomplete → New
Revision history for this message
Dan Watkins (oddbloke) wrote :

Hey Tobi, could you clarify what the failure here actually is? Looking at the journal (specifically the kernel EXT4 entries), it looks like some resizing _does_ happen. (Obviously we don't want a traceback in the log so we can chase that down, but I'd like to understand the concrete effect that this is having on users in order to prioritise it. :)

(Also, if you could get us access to a failing instance, that would be great! I don't have cloud-test-framework checked out any longer, else I'd try to do it.)

Changed in cloud-init:
status: New → Incomplete
Revision history for this message
Dan Watkins (oddbloke) wrote :

OK, so Ryan and I just kicked this back and forth a bit in a meeting. Looking at the failure, I believe that we successfully resize the disks/partitions, but when we look up the partition by partuuid to double-check what changed we fail. (Ryan's hypothesis is that the symlink is in the process of being recreated by udev when we go to check it.)

(The reason I believe we successfully resize the disk is that the traceback occurs after we've successfully run the growpart command which we can see in the cloud-init log and in dmesg.)

Access to a failing instance would be really helpful, so I can double-check our thinking here.

Revision history for this message
Tobias Koch (tobijk) wrote :

Duh, you're right, the resize happens. Then the only problem with this is that the tests cannot tell why the cloud-init service starts degraded.

Test instance at 137.135.172.1.

Revision history for this message
Dan Watkins (oddbloke) wrote :

Thanks for the test instance! Yeah, even though the resize is successful, we don't want tracebacks, warnings or degraded boot! (Also, I haven't checked, but I imagine this would cause functional issues if there was more than a single disk/partition that needed resizing; I expect we would resize the first only.)

Looking at the test instance, the by-partuuid symlinks are created after we hit this error:

$ ls -l --time-style=full-iso /dev/disk/by-partuuid/
total 0
lrwxrwxrwx 1 root root 10 2019-07-03 11:39:34.578632000 +0000 251ca684-01 -> ../../sdb1
lrwxrwxrwx 1 root root 11 2019-07-03 11:39:34.718632000 +0000 46150cea-0763-4a42-b688-5d7c32604a99 -> ../../sda14
lrwxrwxrwx 1 root root 11 2019-07-03 11:39:34.674632000 +0000 76944892-a450-4661-ad65-fbe719c872b0 -> ../../sda15
lrwxrwxrwx 1 root root 10 2019-07-03 11:39:34.698632000 +0000 b429205c-8c8c-434b-bb30-f80882728e23 -> ../../sda1

That said, those timestamps are _so_ far after the error (~28s) that I _guess_ they were regenerated again. (It's also possible that nothing poked udev for that long, I suppose.)

Toby, if I give you a patched cloud-init in a PPA, would you be able to build a test image from it?

Dan Watkins (oddbloke)
Changed in cloud-init:
status: Incomplete → Confirmed
Revision history for this message
Tobias Koch (tobijk) wrote :

Thanks for looking into this! Sure, send over the PPA.

Revision history for this message
Dan Watkins (oddbloke) wrote :
Revision history for this message
Tobias Koch (tobijk) wrote :

I still see this happening with the test package from the PPA.

Revision history for this message
Tobias Koch (tobijk) wrote :

It definitely happens less frequently. Test instance at 52.169.141.202.

Revision history for this message
Dan Watkins (oddbloke) wrote :

Thanks for the feedback Tobi. I've built a new cloud-init package in the same PPA, which just adds a little more debug logging. Please give this a spin and then give me SSH access to both a passing and a failing instance.

Thanks!

Revision history for this message
Tobias Koch (tobijk) wrote :

Sorry, for the slow feedback. It's a pain to produce these instances.

failed instance: 52.178.178.45
passed instance: 13.74.143.107

Revision history for this message
Dan Watkins (oddbloke) wrote :
Download full text (4.2 KiB)

No worries, thanks for the instances!

OK, here are the relevant lines from the log:

== SUCCESSFUL ==

2019-07-24 06:34:11,361 - util.py[DEBUG]: Running command ['ls', '--full-time', '/dev/disk/by-partuuid/23f70cea-8174-4426-bc82-462217b97245'] with allowed return codes [0] (shell=False, capture=True)
2019-07-24 06:34:11,363 - cc_growpart.py[CRITICAL]: DEBUG ODD_BLOKE: ls output:
lrwxrwxrwx 1 root root 10 2019-07-24 06:33:59.017040800 +0000 /dev/disk/by-partuuid/23f70cea-8174-4426-bc82-462217b97245 -> ../../sda1

2019-07-24 06:34:11,377 - util.py[DEBUG]: Running command ['growpart', '--dry-run', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
2019-07-24 06:34:11,649 - util.py[DEBUG]: Running command ['growpart', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
2019-07-24 06:34:12,907 - util.py[DEBUG]: Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=True)
2019-07-24 06:34:12,991 - util.py[DEBUG]: Running command ['ls', '--full-time', '/dev/disk/by-partuuid/23f70cea-8174-4426-bc82-462217b97245'] with allowed return codes [0] (shell=False, capture=True)
2019-07-24 06:34:12,994 - cc_growpart.py[CRITICAL]: DEBUG ODD_BLOKE: ls output:
lrwxrwxrwx 1 root root 10 2019-07-24 06:34:12.981040800 +0000 /dev/disk/by-partuuid/23f70cea-8174-4426-bc82-462217b97245 -> ../../sda1

== FAILURE ==

2019-07-24 06:25:42,590 - cc_growpart.py[CRITICAL]: DEBUG ODD_BLOKE: ls output:
lrwxrwxrwx 1 root root 10 2019-07-24 06:25:27.707830000 +0000 /dev/disk/by-partuuid/23f70cea-8174-4426-bc82-462217b97245 -> ../../sda1

2019-07-24 06:25:42,590 - util.py[DEBUG]: Running command ['growpart', '--dry-run', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
2019-07-24 06:25:43,097 - util.py[DEBUG]: Running command ['growpart', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
2019-07-24 06:25:43,624 - util.py[DEBUG]: Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=True)
2019-07-24 06:25:43,633 - util.py[DEBUG]: Running command ['ls', '--full-time', '/dev/disk/by-partuuid/23f70cea-8174-4426-bc82-462217b97245'] with allowed return codes [0] (shell=False, capture=True)
2019-07-24 06:25:43,636 - util.py[DEBUG]: resize_devices took 1.059 seconds
2019-07-24 06:25:43,636 - handlers.py[DEBUG]: finish: init-network/config-growpart: FAIL: running config-growpart with frequency always
2019-07-24 06:25:43,636 - util.py[WARNING]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
2019-07-24 06:25:43,637 - util.py[DEBUG]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 827, in _run_modules
    freq=freq)
  File "/usr/lib/python3/dist-packages/cloudinit/cloud.py", line 54, in run
    return self._runners.run(name, functor, args, freq, clear_on_fail)
  File "/usr/lib/python3/dist-packages/cloudinit/helpers.py", line 187, in run
...

Read more...

Revision history for this message
Dan Watkins (oddbloke) wrote :

So looking at that output, I have these observations:

* we are dealing with a fairly tight window; in the successful run, the timestamp of the symlink is 06:34:12.981040800 and we run the `ls` command to show us that at 06:34:12,991; that's about 10ms
* based on the timestamp of the next log message in each case, `udevadm settle` on success appears to take substantially longer (06:34:12,907 -> 06:34:12,991; 84ms) than on failure (06:25:43,624 -> 06:25:43,633; 9ms)
* on success, the timestamp of the symlink is between the `udevadm settle` and the next log message, which suggests that it was created as a result of the settle

So none of this points to something _obvious_ (or, at least, not something obvious to me). Given the disparity in the apparent `udevadm settle` execution times, I wonder if the udev events from the resize haven't made their way in to the queue before we execute the settle, meaning it just exits without waiting at all?

Revision history for this message
Ryan Harper (raharper) wrote :

Looking at the growpart command; it does two primary things. 1) issues commands to update the size of the partition (via sgdisk or sfdisk) 2) issues a "rereadpt" command.

I really suspect you're right in that the "rereadpt" may have been issued but not yet completed, including the uevent which would trigger udev events; Now, the partition already exists we really shouldn't see the the symlink go away; which is the _really_ strange part.

So, I think we should fire up udevadm monitor

sudo udevadm monitor --property --kernel --udev --subsystem-match=block

And then issue growpart commands on a disk whose partition can be grown.

You can launch and instance in azure with user-data to diable growpart; then issue the same growpart command after logging in; I'd also run growpart under bash -x for a trace so we can see what's happening.

Lastly, we can also use an inotify watch on /dev/disk/by-partuuid

% inotifywait -mr --timefmt "%s" --format "%T %w%f %e" /dev/disk/by-partuuid
Setting up watches. Beware: since -r was given, this may take a while!
Watches established.
1563987982 /dev/disk/by-partuuid/foo CREATE
1563987982 /dev/disk/by-partuuid/foo OPEN
1563987982 /dev/disk/by-partuuid/foo ATTRIB
1563987982 /dev/disk/by-partuuid/foo CLOSE_WRITE,CLOSE
1563987985 /dev/disk/by-partuuid/foo DELETE

Revision history for this message
Dan Watkins (oddbloke) wrote :

So manually running `growpart /dev/sda 1` resulted in the following inotifywait output:

1563992577 /dev/disk/by-partuuid/60a13b74-44c8-4d0d-b415-6c697e398204 ATTRIB
1563992577 /dev/disk/by-partuuid/43b8aeb4-14df-4ae2-8a6a-d3ada9ea01dd ATTRIB
1563992577 /dev/disk/by-partuuid/7f427e61-bcb8-491d-b067-689ba6f4702b ATTRIB

and the attached `udevadm monitor` output. The fact we don't see a removal here suggests we aren't tickling the bug, so I'm going to launch another instance and see if running the cloud-init module makes a difference.

Revision history for this message
Dan Watkins (oddbloke) wrote :

I tried a few more times to get a manual reproducer and failed. So I've created two systemd units, one[0] to collect the `udevadm monitor` info, and one[1] to collect the `inotifywait` info. (As these commands output to stdout, their output can be found in the systemd journal.) I put these in place, and iterated a few reboots to get them working OK.

I then `cloud-init clean`'d and captured the image (following Azure's instructions[2]). I launched my captured image multiple times (I didn't keep track of exactly how many, but N>10 for sure) and didn't hit a single failure. So either (a) I got "lucky", (b) the extra instrumentation fixes the race condition, (c) the differences between my image and a truly pristine one are too big, (d) how I'm launching instances and how the CPC test framework is launching instances is sufficiently different to not tickle the bug, or, of course, (e) something else entirely.

Tobi, I know we're pushing things a bit now, but are you able to add these two units to an image to see what they produce? (Make sure to enable them as well as install them! It took me two reboots to understand why they weren't executing. ¬.¬)

(The `inotifywait` unit[1] appears to not yet be perfect, as it sometimes emits "Current command vanished from the unit file, execution of the command list won't be resumed" and then stops emitting anything else. I haven't been able to work out what this error message actually means either via search or by looking at the systemd source code briefly, and asking in #systemd on Freenode hasn't yet yielded any further info.)

[0] https://paste.ubuntu.com/p/FgbZqwZR6Y/
[1] https://paste.ubuntu.com/p/57HkmCKN67/
[2] https://docs.microsoft.com/en-us/azure/virtual-machines/linux/capture-image

Changed in cloud-init:
status: Confirmed → In Progress
assignee: nobody → Dan Watkins (daniel-thewatkins)
Revision history for this message
Dan Watkins (oddbloke) wrote :

(Marked as In Progress, though depending on the outcome of the triage we're doing ATM this might go back to Triaged rather than get fixed immediately.)

Revision history for this message
Tobias Koch (tobijk) wrote :

Is the problem with inotifywait an instance of this systemd bug [1]? I don't see a work-around mentioned.

With the two services from above included:

failed instance: 52.178.156.32
passed instance: 13.94.113.245

[1] https://github.com/systemd/systemd/issues/12258

Revision history for this message
Dan Watkins (oddbloke) wrote :

Thanks for the continued debugging assistance! It definitely could be that bug, though looking at the failing instance, we're getting enough information from the inotifywait service, so thankfully I don't think we need to worry about it:

Jul 26 10:39:01.324469 tobijk-x-alan-eoan-base-tgqxhgulkkndpdyjlolx inotifywait[462]: 1564137541 /dev/disk/by-partuuid/23f70cea-8174-4426-bc82-462217b97245 DELETE

Revision history for this message
Dan Watkins (oddbloke) wrote :

OK, so we're definitely seeing a DELETE on failure, and it _looks_ like it happens during the growpart:

2019-07-26 10:39:00,725 - util.py[DEBUG]: Running command ['growpart', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
Jul 26 10:39:01.324469 tobijk-x-alan-eoan-base-tgqxhgulkkndpdyjlolx inotifywait[462]: 1564137541 /dev/disk/by-partuuid/23f70cea-8174-4426-bc82-462217b97245 DELETE
2019-07-26 10:39:01,339 - util.py[DEBUG]: resize_devices took 1.334 seconds

Looking at the `udevadm monitor` logs from each instance, we only have one udev/kernel remove on each (listed twice, once for the kernel event, once for the corresponding udev event):

ubuntu@<failing>:~$ journalctl -o short-precise -u udevmon.service | grep ACTION=remove -B1
Jul 26 10:39:03.571001 tobijk-x-alan-eoan-base-tgqxhgulkkndpdyjlolx udevadm[251]: KERNEL[36.612409] remove /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/00000000-0001-8899-0000-000000000000/host3/target3:0:1/3:0:1:0/block/sdb/sdb1 (block)
Jul 26 10:39:03.571001 tobijk-x-alan-eoan-base-tgqxhgulkkndpdyjlolx udevadm[251]: ACTION=remove
--
Jul 26 10:39:03.588454 tobijk-x-alan-eoan-base-tgqxhgulkkndpdyjlolx udevadm[251]: UDEV [36.629717] remove /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/00000000-0001-8899-0000-000000000000/host3/target3:0:1/3:0:1:0/block/sdb/sdb1 (block)
Jul 26 10:39:03.588454 tobijk-x-alan-eoan-base-tgqxhgulkkndpdyjlolx udevadm[251]: ACTION=remove

ubuntu@<passing>:~$ journalctl -o short-precise -u udevmon.service | grep ACTION=remove -B1
Jul 26 10:43:33.613194 tobijk-x-alan-eoan-base-afjyuaryhtemavrzofcr udevadm[247]: KERNEL[33.632423] remove /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/00000000-0001-8899-0000-000000000000/host3/target3:0:1/3:0:1:0/block/sdb/sdb1 (block)
Jul 26 10:43:33.613194 tobijk-x-alan-eoan-base-afjyuaryhtemavrzofcr udevadm[247]: ACTION=remove
--
Jul 26 10:43:33.627533 tobijk-x-alan-eoan-base-afjyuaryhtemavrzofcr udevadm[247]: UDEV [33.644486] remove /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/00000000-0001-8899-0000-000000000000/host3/target3:0:1/3:0:1:0/block/sdb/sdb1 (block)
Jul 26 10:43:33.627533 tobijk-x-alan-eoan-base-afjyuaryhtemavrzofcr udevadm[247]: ACTION=remove

I'm going to dig in to the logs a little further to see if I can find anything.

Revision history for this message
Dan Watkins (oddbloke) wrote :

I've confirmed that the udev and kernel ACTION=change event on /dev/sda is the same across the failing and succeeding instances (except for some timestamps which I edited out of these two documents, and the ordering/naming of the DEVLINKS):

$ diff -u failing.sda_event success.sda_event
--- failing.sda_event 2019-07-26 10:25:13.088884853 -0400
+++ success.sda_event 2019-07-26 10:23:51.465073329 -0400
@@ -41,5 +41,5 @@
 ID_FS_TYPE=
 MAJOR=8
 MINOR=0
-DEVLINKS=/dev/disk/by-id/wwn-0x60022480d7a4872310ff8d8a6789eec9 /dev/disk/azure/root /dev/disk/by-path/acpi-VMBUS:01-vmbus-00000000000088990000000000000000-lun-0 /dev/disk/by-id/scsi-14d53465420202020d7a4872310ff41908c3b8d8a6789eec9 /dev/disk/cloud/azure_root /dev/disk/by-id/scsi-360022480d7a4872310ff8d8a6789eec9
+DEVLINKS=/dev/disk/by-path/acpi-VMBUS:01-vmbus-00000000000088990000000000000000-lun-0 /dev/disk/azure/root /dev/disk/by-id/scsi-360022480d7a4872310ff8d8a6789eec9 /dev/disk/by-id/wwn-0x60022480d7a4872310ff8d8a6789eec9 /dev/disk/by-id/scsi-14d53465420202020d7a4872310ff41908c3b8d8a6789eec9 /dev/disk/cloud/azure_root
 TAGS=:systemd:

After the /dev/sda udev event is logged, each partition on /dev/sda {sda1, sda14, sda15} then gets an ACTION=change udev event logged. One difference between the two instances is the order of these events. On the failing system, the order is (sda1, sda15, sda14), and on the succeeding system, it's (sda15, sda14, sda1). This vaguely suggests a race is involved, as the timing is different, but it's certainly not categorical. It's worth noting that the _kernel_ events (which are all emitted before even the sda udev event is processed) happen in the same order on each instance: sda, sda1, sda14, sda15.

Revision history for this message
Dan Watkins (oddbloke) wrote :

OK, the sda1 events provide a substantial difference:

--- failing.sda1_event 2019-07-26 10:42:50.424486349 -0400
+++ success.sda1_event 2019-07-26 12:02:54.536766079 -0400
@@ -53,9 +53,16 @@
 ID_FS_VERSION=1.0
 ID_FS_TYPE=ext4
 ID_FS_USAGE=filesystem
+ID_PART_ENTRY_SCHEME=gpt
+ID_PART_ENTRY_UUID=23f70cea-8174-4426-bc82-462217b97245
+ID_PART_ENTRY_TYPE=0fc63daf-8483-4772-8e79-3d69d8477de4
+ID_PART_ENTRY_NUMBER=1
+ID_PART_ENTRY_OFFSET=227328
+ID_PART_ENTRY_SIZE=62689247
+ID_PART_ENTRY_DISK=8:0
 fabric_name=root
 .ID_FS_TYPE_NEW=ext4
 MAJOR=8
 MINOR=1
-DEVLINKS=/dev/disk/by-id/scsi-360022480d7a4872310ff8d8a6789eec9-part1 /dev/disk/cloud/azure_root-part1 /dev/disk/by-id/scsi-14d53465420202020d7a4872310ff41908c3b8d8a6789eec9-part1 /dev/disk/by-uuid/d800ba44-3054-4cb3-81f7-dfe32f5e8b48 /dev/disk/by-label/cloudimg-rootfs /dev/disk/azure/root-part1 /dev/disk/by-id/wwn-0x60022480d7a4872310ff8d8a6789eec9-part1 /dev/disk/by-path/acpi-VMBUS:01-vmbus-00000000000088990000000000000000-lun-0-part1
+DEVLINKS=/dev/disk/by-partuuid/23f70cea-8174-4426-bc82-462217b97245 /dev/disk/by-id/wwn-0x60022480d7a4872310ff8d8a6789eec9-part1 /dev/disk/by-uuid/d800ba44-3054-4cb3-81f7-dfe32f5e8b48 /dev/disk/by-label/cloudimg-rootfs /dev/disk/by-id/scsi-14d53465420202020d7a4872310ff41908c3b8d8a6789eec9-part1 /dev/disk/azure/root-part1 /dev/disk/by-id/scsi-360022480d7a4872310ff8d8a6789eec9-part1 /dev/disk/by-path/acpi-VMBUS:01-vmbus-00000000000088990000000000000000-lun-0-part1 /dev/disk/cloud/azure_root-part1
 TAGS=:systemd:

As we can see here, there are quite a few differences, which appear to reflect what we're seeing: DEVLINKS on the failing instance doesn't include the by-partuuid path, and we're missing all of the ID_PART_ENTRY_* lines from its udev event.

So looks like the next step is probably to determine why the udev event doesn't contain what it should!

Revision history for this message
Dan Watkins (oddbloke) wrote :

After some discussion with rharper on IRC[0], our working hypothesis is that the events that are being processed are as a result of the sgdisk command that does the partition table modification[1], _not_ the pt_update call (which boils down to `partx --update`[2]). As a result, we think we are racing the re-read of the partition table; this is consonant with the different ordering of the udev events on failure.

To test this hypothesis, I'm going to create a cloud-init package which performs its own partition table re-read (which should trigger udev events for those partitions) and `udevadm settle` after the resize. This _should_ mean that the symlink is correctly re-created. If this succeeds, we can examine what change we need to make to cloud-utils.

Tobi, I'll upload that package to the same PPA as the others, and will let you know here when it's ready. We'll want an image with that cloud-init as well as the debugging systemd units.

[0] Which should eventually show up here: https://irclogs.ubuntu.com/2019/07/26/%23cloud-init.html
[1] https://git.launchpad.net/cloud-utils/tree/bin/growpart#n505
[2] https://git.launchpad.net/cloud-utils/tree/bin/growpart#n650

Revision history for this message
Dan Watkins (oddbloke) wrote :

OK, the PPA now has a package with the partx call added, and I've tested that it does something along the lines of what we want in a test instance of my own. Tobi, when you have time, please build a test image with the PPA cloud-init and the debugging units and point us at a failed and successful instance again. Thanks!

Revision history for this message
Tobias Koch (tobijk) wrote :

passed instance: 104.41.204.32
failed instance: 104.41.207.197

Revision history for this message
Dan Watkins (oddbloke) wrote :

Thanks! TL;DR: the partx call didn't change much.

On the failing instance, this is when the partx call happens:

2019-07-29 12:09:25,391 - util.py[DEBUG]: Running command ['partx', '--update', '/dev/sda']

Looking in the journal at that timestamp, I see (lines omitted and/or edited for brevity):

2019-07-29T12:09:25.395099+0000 udevadm[246]: KERNEL[27.837161] remove /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/00000000-0000-8899-0000-000000000000/host0/target0:0:0/0:0:0:0/block/sda/sda14 (block)
2019-07-29T12:09:25.396119+0000 tobijk-x-alan-eoan-base-uutvyfllcflbntzjhhes udevadm[246]: UDEV [27.838088] remove /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/00000000-0000-8899-0000-000000000000/host0/target0:0:0/0:0:0:0/block/sda/sda14 (block)
2019-07-29T12:09:25.397301+0000 udevadm[246]: KERNEL[27.838192] add /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/00000000-0000-8899-0000-000000000000/host0/target0:0:0/0:0:0:0/block/sda/sda14 (block)
2019-07-29T12:09:25.405384+0000 inotifywait[462]: 1564402165 /dev/disk/by-partuuid/9c72c02a-6134-4811-86d8-feee960b40b2 CREATE
2019-07-29T12:09:25.418906+0000 udevadm[246]: UDEV [27.860977] add /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/00000000-0000-8899-0000-000000000000/host0/target0:0:0/0:0:0:0/block/sda/sda14 (block)

These remove/add lines for sda14 are consistent with what I saw when just running `partx` by hand in an instance.

Looking at the passing instance, there is one noticeable difference: the sda14 remove/add KERNEL events are emitted to the log _before_ any of the UDEV events from the resizing are emitted. This means that:

* On the failing instance, we see the first resizing KERNEL event (sda) emitted at 12:09:25.347818 and the sda1 UDEV event emitted at 12:09:25.372478; .025s.
* On the passing instance, we see the first resizing KERNEL event (sda) emitted at 12:14:21.925924 and the sda1 UDEV event emitted at 12:14:22.002377; .076s.

(And, once again, we see that the order in which udev processes those events goes sda15, sda14, sda1.)

Revision history for this message
Dan Watkins (oddbloke) wrote :

OK, I chatted briefly with Ryan and he agrees with my assessment that this is happening too deep underneath us for us to readily work around it. The two obvious places to look are udev and the kernel.

To help triangulate, Tobi, could you attempt an eoan build and test using the generic kernel instead of linux-azure?

Revision history for this message
Tobias Koch (tobijk) wrote :

Yep, this doesn't happen with linux-generic.

Revision history for this message
Dan Watkins (oddbloke) wrote :

Now one wrinkle about it being a linux-azure bug: if you _aren't_ seeing this on bionic but _were_ seeing it on cosmic, then there must be something else involved, because I believe (based on the publishing history[0]) that cosmic was on the same kernel version as bionic.

This could, of course, still be a kernel bug that's just not hit by bionic's userspace. But, to be sure of where we stand, can you confirm the statement in the original report that this was an issue on cosmic? Do you have enough test history to determine when this started happening?

[0] https://launchpad.net/ubuntu/+source/linux-meta-azure/+publishinghistory

Revision history for this message
Dan Watkins (oddbloke) wrote :

(Oh, and thank you for that test run!)

Revision history for this message
Tobias Koch (tobijk) wrote :

Oh, good observation, Dan. Yes, it was happening on Cosmic and later. But I cannot say when it started. Now in a moment of despair I tried something very unorthodox: I copied udevadm from Bionic to the Eoan image and ran the tests again. Look and behold, all pass.

no longer affects: linux-azure (Ubuntu)
Revision history for this message
Dan Watkins (oddbloke) wrote :

That certainly is unorthodox! When you say "copied udevadm", what specifically do you mean? Just the binary, the udev package, ...?

Revision history for this message
Tobias Koch (tobijk) wrote :

When I say unorthodox, I mean I copied the binary ;) So that should narrow it down quite a bit. Unless there is more funniness involved.

Revision history for this message
Dan Watkins (oddbloke) wrote :

So, to reset on status: I believe that we've narrowed this down to a systemd/udev problem, as it goes away when a different udevadm binary is used. I don't have a good sense of how to go about digging into that issue, so I don't think I can take it much further.

(Also, as it appears to be a systemd issue, I'm going to mark the cloud-init task Invalid. Feel free to move it back to New if you believe this is a mistake!)

Changed in cloud-init:
status: In Progress → Invalid
assignee: Dan Watkins (daniel-thewatkins) → nobody
Revision history for this message
Tobias Koch (tobijk) wrote :

@daniel-thewatkins, I'm not convinced that this bug is invalid for cloud-init. After reading through all of this again, I still don't understand, what guarantee there is that when `udevadm settle` is called, all relevant events have already been queued.

Copying udevadm over, and with that suppressing the error, suggests that maybe the event queue handling is spurious, but on the other hand, it might just be that previous versions were slower at something and the 10ms window discussed above is always exceeded because of that anyway.

I'm not saying there cannot also be a bug somewhere else. But if there is no specification that says there cannot under any circumstances be a race condition in what cloud-init is doing, then cloud-init should handle this more robustly.

I'm not an expert on that level, but somehow in a world of multi-core CPUs and fancy schedulers, invoking command line tools in a certain order does not seem to preclude the possibility of a race in how the event is submitted, routed and queued without there being an explicit locking mechanism.

Tobias Koch (tobijk)
Changed in cloud-init:
status: Invalid → New
Revision history for this message
Ryan Harper (raharper) wrote :

The sequence is:

exec growpart
  exec sgdisk --info # read-only
  exec sgdisk --pretend # read-only
  exec sgdisk --backup # read-only copy
  # modification of disk starts
  exec sgdisk --move-second-header \
              --delete=PART \
              --new=PART \
              --typecode --partition-guid --change-name
  # now that sgdisk has *closed* the filehandle on the disk, systemd-udevd will
  # get an inotify signal and trigger udevd to run udev scripts on the disk.
  # this includes the *removal* of symlinks due to the --delete portion of sgdisk call
  # and following the removal, the -new will trigger the add run on the rules which would
  # recreate the symlinks.

  # update kernel partition sizes; this is an ioctl so it does not trigger an udev events
  exec partx --update
  # the kernel has the new partition sizes, and udev scripts/events are all queued (and possibly in flight)
exit growpart

cloud-init invokes get_size() operation which:
   # this is where the race occurs if the symlink created by udev is *not* present
   os.open(/dev/disk/by-id/fancy-symlink-with-partuuid-points-to-sdb1)

Dan had put a udevadm settle in this spot like so

def get_size(filename)
   util.subp(['udevadm', 'settle'])
   os.open(....)

So, you're suggesting that somehow _not all_ of the uevents triggered by the sgdisk command in growpart *wouldn't* have been queued before we call udevadm settle?

If some other events are happening how is cloud-init to know such that it can take action to "handle this race" more robustly?

Lastly if there is a *race* in the symlink creation/remove/delay in uevent propigation; why is that a userspace let alone a cloud-init issue. This isn't universally reproducible, rather it's pretty narrow circumstances between certain kernels and udevs all the while the growpart/cloud-init code remains the same.

Changed in cloud-init:
status: New → Incomplete
Revision history for this message
Dan Watkins (oddbloke) wrote :

[N.B. I wrote the below before I saw Ryan's comment, so there is some repetition.]

OK, I've spent some time catching up on this properly so I can summarise: per comment #24, the issue is that when udev processes the events emitted by the kernel, it (sometimes) doesn't determine the correct partition information. The kernel _does_ emit all the events we would expect, and udev _does_ handle all the events we would expect (which is to say that `udevadm settle` doesn't change behaviour here, it merely ensures that the broken behaviour has completed before we proceed). The hypothesised race condition is somewhere between the kernel and udev: I believe the kernel event is emitted before the partition table has necessarily been fully updated so when udev processes the event and reads the partition table, sometimes it finds the partition and sometimes it doesn't. To be clear, the kernel event generation and the buggy udev event handling all happens as a result of the resize command, _not_ as a result of anything else cloud-init runs subsequently.

So as far as I can tell, this bug would occur regardless of what runs the resize command, and no matter what commands are executed after the resize command. (It might be possible to work around this bug by issuing commands that force a re-read of the partition table on a disk, for example, but this bug _would_ still have occurred before then.)

cloud-init could potentially work around a (kernel|systemd) that isn't handling partitions correctly, but we really shouldn't have to. Until we're satisfied that they cannot actually be fixed, we shouldn't do that. (I am _not_ convinced that this cannot be fixed in (the kernel|systemd), because using a different kernel and using a different udevadm have both caused the issue to stop reproducing.)

So, let me be a little more categorical. The information we have at the moment indicates an issue in the interactions between the kernel and udev on partition resize. cloud-init's involvement is merely as the initiator of that resize. Until we have more information that indicates the issue to be in cloud-init, this isn't a valid cloud-init issue. Once we have more information from the kernel and/or systemd folks, if it indicates that cloud-init _is_ at fault, please move this back to New.

Revision history for this message
Dan Streetman (ddstreet) wrote :

> Dan had put a udevadm settle in this spot like so
>
> def get_size(filename)
> util.subp(['udevadm', 'settle'])
> os.open(....)

if you know you've just changed (e.g.) /dev/sda, possibly its kernel-generated udev events just haven't reached udev yet, so the immediate call to 'udev settle' has nothing to wait for; maybe you should tell udev to explicitly request a new full set of events and settle on that?

udevadm trigger --settle /dev/sda

tags: added: id-5dbb311b74e3f364d8f98c56
Changed in systemd (Ubuntu):
status: New → Incomplete
33 comments hidden view all 113 comments
Revision history for this message
Dimitri John Ledkov (xnox) wrote : Re: [Bug 1834875] Re: cloud-init growpart race with udev

On Thu, 7 Nov 2019 at 17:50, Ryan Harper <email address hidden> wrote:
>
> @ddstreet
>
> Yes, settle does not help.
>
> Re-triggering udevadm trigger --action=add /sys/class/block/sda
>
> Would re-run all of them after the partition change has occurred, which
> is what I'm currently suggesting as a heavy-handed workaround.
>
> I would like to understand *why* the udevd/kernel pair exhibits this
> racy behavior whereas other kernels/udevd from Bionic do not.
>

Just because a race is always won, doesn't mean it didn't always exist ;-)

--
Regards,

Dimitri.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

On Thu, 7 Nov 2019 at 20:05, Scott Moser <email address hidden> wrote:
>
> > > So that means we have this sequence of events:
> > > a.) growpart change partition table
> > > b.) growpart call partx
> > > c.) udev created and events being processed
>
> > That is not true. whilst sfdisk is deleting, creating, finishing
> > partition table (a) and partx is called (b), udev events are already fired
> > and running in parallel and may complete against deleted, partially new,
> > completely new partition table, with or without partx completed.
>
> You're correct... I left out some 'events created and handled' after 'a'.
> But that doesn't change anything. The problem we're seeing here is *not*
> that 'b' had any issue.
>
> >
> > No amount of settling for events will fix the fact that events were run
> > against racy state of the partition table _during_ sfdisk and partx calls.
>
> complete non-sense. I dont care about any racy state *during* anything. I
> call 'udevadm settle'. That means "block until stuff is done." I think
> you're saying that I cannot:
> 1.) do something that causes udev events
> 2.) wait until all udev events caused by that something are finished
>
> if that is the case, then nothing ever can fix this, and we might as well
> go find jobs on a farm.
>

Both those thing happen, but udev events are started processing whilst
the partition table changes have not completed yet. This is what is
document in the sfdisk manpage as a know bug that nobody yet has
managed to figure out and derace.
Meaning if the udev events happened, and one waits to finish their
processing, there is no guarantee that they have been processed
against consistent disk state.

This is why sfdisk recommends taking flock. And this is why udev also
tries to take an flock.

In the past IBM has demonstrated a race similar to this one in
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1571707
where they tried to rapidly and in parallel partition 256 devices,
with only 89 of them successfully showing partitions after the limit
test is executed, and appear fully after a reboot in April 2016 on top
of Xenial.

--
Regards,

Dimitri.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

So I've been handed the job of moving this bug forward this iteration.

Having just re-read all the comments again and read some source code while listening to loud techno music my observations/questions are these:

1) this is a doozy

2) I would really like to see complete udevadm monitor and inotify output for successful and failing boots

3) As far as I can read the gdisk source, there is no point when there no entry for the partition on disk (i.e. it only writes the partition tables once)

4) I think this comment "I believe the kernel event is emitted before the partition table has necessarily been fully updated so when udev processes the event and reads the partition table, sometimes it finds the partition and sometimes it doesn't." is not quite right, or at least there is another possibility entirely consistent with the observed facts: the diff in comment #24 looks more like udev finds the partition OK but fails to fill out the ID_PART_ENTRY_* entries. These come from libblkid via udev-builtin-blkid.c. The part about this theory that doesn't make sense is that the ID_FS_* entries are there and they come from the same codepath. So who knows!

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Ah, no I think this might be along right lines: udev is calling blkid on the _partition_ of course, so it can probe for filesystem etc without looking at the partition table. After it's done that, it does look for the partition table so it can read the ID_PART_ENTRY_* values from it, but if it fails to load the partition table it just gives up and still returns success.

As to how this might happen, well there are certainly moments during sgdisk's writing of the partition table when the gpt data is not completely consistent (it contains checksums and is not written atomically). Two things are still a bit unsatisfactory about this explanation though: 1) it's a mighty tight race, hitting this with any regularity borders on the incredible and 2) I have no idea why this only occurs on Azure on Cosmic up. Although I guess 1) may explain 2) I suppose.

Setting LIBBLKID_DEBUG=lowprobe in systemd-udevd's environment during a failing boot would be very interesting to confirm or deny this line of reasoning -- although this will make udevd produce gobs of output and possibly disturb any race.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Oh yeah and one other thing I don't understand: why udev is processing the partition while sgdisk is still running.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Hi, could someone prepare an image with https://paste.ubuntu.com/p/HYGv6m8gCc/ at /etc/systemd/system/systemd-udevd.service.d/udevd-debugging.conf, boot it on azure until it fails and put the journalctl output (probably a few megs) somewhere I can read it? Output from a successful boot would also be interesting.

Revision history for this message
Pat Viafore (patviafore) wrote :

I can get that going, and will report back on this bug when I have more information.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

On another tangent, I wonder if the change that brought this to light was the switch to booting without initrd. The timing is about right, and fits with the fact that it doesn't occur with the generic kernel (which cannot boot without initrd in Azure). So if someone has an excess of time to test <wink>, testing bionic but with initrdless boot and focal with initrdful boot would be interesting. I don't know exactly what runes have to be cast to do this.

Getting the udevd debug output is still higher priority though.

Revision history for this message
Pat Viafore (patviafore) wrote :

I'm attaching the output of a failing build

Revision history for this message
Ryan Harper (raharper) wrote :

Cloud-init service starts and will run growpart, etc

Feb 06 00:37:26 ubuntu systemd[1]: Starting Initial cloud-init job (pre-networking)...
Feb 06 00:37:37 test-xrdpdnvfctsofyygmzan systemd[1]: Starting Initial cloud-init job (metadata service crawler)...

Something has modified sdb1 (growpart/sgdisk) so we see a change event, and removal of symlink
Feb 06 00:37:42 test-xrdpdnvfctsofyygmzan systemd-udevd[265]: sdb1: Device (SEQNUM=3353, ACTION=change) is queued
Feb 06 00:37:42 test-xrdpdnvfctsofyygmzan systemd-udevd[556]: sdb1: Updating old name, '/dev/disk/by-partuuid/5e01dd62-6f50-4cd7-8f62-30bb372b58ea' no longer belonging to '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/00000000-0000-8899-0000-000000000000/host0/target0:0:0/0:0:0:0/block/sdb/sdb1'
Feb 06 00:37:42 test-xrdpdnvfctsofyygmzan systemd-udevd[556]: read: off=2244476928 len=26214
Feb 06 00:37:42 test-xrdpdnvfctsofyygmzan systemd-udevd[556]: sdb1: No reference left, removing '/dev/disk/by-partuuid/5e01dd62-6f50-4cd7-8f62-30bb372b58ea'

cc_growpart attempts to check if the block device has been resized and fails
Feb 06 00:37:42 test-xrdpdnvfctsofyygmzan cloud-init[570]: 2020-02-06 00:37:42,658 - util.py[WARNING]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed

cloud-init now runs resizefs on sdb1
Feb 06 00:37:43 test-xrdpdnvfctsofyygmzan kernel: EXT4-fs (sdb1): resizing filesystem from 548091 to 7836155 blocks
Feb 06 00:37:44 test-xrdpdnvfctsofyygmzan kernel: EXT4-fs (sdb1): resized filesystem to 7836155

Almost a minute later, something triggers a rescan and we see the link return (looks like walinuxagent poking)

Feb 06 00:38:14 test-xrdpdnvfctsofyygmzan systemd-udevd[1482]: sdb1: /usr/lib/udev/rules.d/60-persistent-storage.rules:106 LINK 'disk/by-partuuid/5e01dd62-6f50-4cd7-8f62-30bb372b58ea'
Feb 06 00:38:14 test-xrdpdnvfctsofyygmzan systemd-udevd[1482]: sdb1: Creating symlink '/dev/disk/by-partuuid/5e01dd62-6f50-4cd7-8f62-30bb372b58ea' to '../../sdb1'

Revision history for this message
Pat Viafore (patviafore) wrote :

Adding in a passing version of journalctl

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

> Ah, no I think this might be along right lines: udev is calling blkid
> on the _partition_ of course, so it can probe for filesystem etc without
> looking at the partition table. After it's done that, it does look for
> the partition table so it can read the ID_PART_ENTRY_* values from it,
> but if it fails to load the partition table it just gives up and still
> returns success.

Ah so this was in the right direction, but not completely right: the failure is not in reading the partition table of the disk being probed, but failing to figure out which entry in that table corresponds to the partition being probed:

Feb 06 00:37:42 test-xrdpdnvfctsofyygmzan systemd-udevd[556]: 556: libblkid: LOWPROBE: trying to convert devno 0x811 to partition
Feb 06 00:37:42 test-xrdpdnvfctsofyygmzan systemd-udevd[556]: 556: libblkid: LOWPROBE: searching by offset/size
Feb 06 00:37:42 test-xrdpdnvfctsofyygmzan systemd-udevd[556]: 556: libblkid: LOWPROBE: not found partition for device
Feb 06 00:37:42 test-xrdpdnvfctsofyygmzan systemd-udevd[556]: 556: libblkid: LOWPROBE: parts: end probing for partition entry [nothing]

The function of interest in libblkid here is blkid_partlist_devno_to_partition, which (unless some apis have very misleading names) is reading the offset and size of the partition from sysfs. What must be happening here is that udev sees the disk being closed by gdisk and somehow runs the builting blkid command on the partition before the kernel has been informed of the resize of the partition. And indeed, we can see when the kernel notices this:

Feb 06 00:37:43 test-xrdpdnvfctsofyygmzan kernel: EXT4-fs (sdb1): resizing filesystem from 548091 to 7836155 blocks
Feb 06 00:37:44 test-xrdpdnvfctsofyygmzan kernel: EXT4-fs (sdb1): resized filesystem to 7836155

At least a second later. (In passing_journalctl_output.txt, this message is printed before udev even gets the inotify event about sdb being closed).

So there's always a race here but I don't know why we only see this on Azure with newer releases. A proper fix would be to get sgdisk to call partx_resize_partition before closing the disk but it would also be interesting to see why it takes so long to get to that part sometimes. growpart is too much shell for me, but maybe it's possible to get it to run partx under strace and get the output of that out somehow?

Revision history for this message
Ryan Harper (raharper) wrote :

Yes, this is my read on the issue as well. The trigger is related to the inotify watch that systemd-udevd puts on the disk. Something that might help that we could try per xnox's comment around use of flock.

if growpart were to flock /dev/sda (we need to sort out what flags are needed to prevent udev probe and rule execution) prior to running sgdisk, and not release this flock until after partx runs (to inform the kernel of the update) and a udevadm settle (and possibly a trigger of a CHANGE event).

This should prevent early reads of the modified but not update-to-date partition data in the kernel.

Lastly, another area of exploration is: why isn't a change event emitted when partx runs to update the kernel with new partition data?

If there is one, and the rules then generate the correct symlink; another approach for growpart is for it to block until the size of the new partition is correct. growpart calculates what the new size should be; so it could poll for this value thus blocking cloud-init's size check from running until growpart is confident that the kernel has updated the correct value (and the subsequent rules have completed execution)!

Revision history for this message
Ryan Harper (raharper) wrote :

Here's the upstream changes to growpart I'm suggesting:

https://code.launchpad.net/~raharper/cloud-utils/+git/cloud-utils/+merge/379177

I've also proposed on modifications to cloud-init's cc_growpart as a further method
to aid debugging if this hit as well as some mitigation around the race.

https://github.com/canonical/cloud-init/pull/213

Scott Moser (smoser)
Changed in cloud-utils:
status: New → Fix Committed
Changed in cloud-utils (Ubuntu):
status: New → Confirmed
Revision history for this message
Scott Moser (smoser) wrote :

The fix is in cloud-utils upstream now.
Still to do:
 a.) review/merge cloud-initramfs-tools pull request
   https://code.launchpad.net/~raharper/cloud-utils/+git/cloud-utils/+merge/379177
 b.) upload cloud-initramfs-tools to focal
 c.) upload cloud-utils to focal
 d.) any SRU

the order of 'b' and 'c' are not *terribly* important, because I do not think that there are many users of 'growroot' at this point. That said... the change to cloud-utils *does* break cloud-initramfs-tools so they should be done in that order.

I personally do not really want to babysit these getting in. Can someone else sign up for that?

Changed in cloud-initramfs-tools (Ubuntu):
status: New → Confirmed
Revision history for this message
Joshua Powers (powersj) wrote :

@smoser

a looks merged

b has not had any changes since 2018 and looks up-to-date in Focal. What am I missing?

c I see a devel branch, is it assumed that it uses a packaging structure similar to cloud-init or can someone just grab and upload master?

Revision history for this message
Scott Moser (smoser) wrote :

a.) I gave the wrong link. ugh. It should have been:
  https://code.launchpad.net/~smoser/cloud-initramfs-tools/+git/cloud-initramfs-tools/+merge/379774

b.) the fixed link to 'a' probably makes more sense now. But basically
you need a newer cloud-initramfs-tools to adjust for the fact that
growpart needs flock now.

cloud-initramfs-tools I believe is still a "native package". So it just
has the packaging along side

c.) you can use new-upstream-snapshot. cloud-utils is set to use the
same basic workflow as cloud-init or curtin.

I'm OK to upload cloud-initramfs-tools and also cloud-utils, but I think
in both cases it makes sense to get someone from server team to do it.
Just to have someone other than myself having done it.

Revision history for this message
Ryan Harper (raharper) wrote :

@Scott,

cloud-utils isn't quite new-upstream-snapshot out of the box; the debian dir does not contain the changelog; however, I think I've got this sorted out. I've a MP I can put up; but it only will show the add of the changelog file. I'll attach a debdiff and a source package.

Revision history for this message
Ryan Harper (raharper) wrote :
Revision history for this message
Ryan Harper (raharper) wrote :
Revision history for this message
Scott Moser (smoser) wrote :

this seemed to "just work" for me.
http://paste.ubuntu.com/p/93dWDPZfZT/

Revision history for this message
Ryan Harper (raharper) wrote :

On Tue, Feb 25, 2020 at 2:35 PM Scott Moser <email address hidden>
wrote:

> this seemed to "just work" for me.
> http://paste.ubuntu.com/p/93dWDPZfZT/

Ah, I didn't check that there was an existing ubuntu/devel branch. Sorry.

I've pushed a MR here:

https://code.launchpad.net/~raharper/cloud-utils/+git/cloud-utils/+merge/379851

>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1834875
>
> Title:
> cloud-init growpart race with udev
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cloud-init/+bug/1834875/+subscriptions
>

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-initramfs-tools - 0.45ubuntu1

---------------
cloud-initramfs-tools (0.45ubuntu1) focal; urgency=medium

  * Add dependency on flock for growroot's use of growpart.
    (LP: #1834875)

 -- Scott Moser <email address hidden> Tue, 25 Feb 2020 13:08:22 -0500

Changed in cloud-initramfs-tools (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-utils - 0.31-7-gd99b2d76-0ubuntu1

---------------
cloud-utils (0.31-7-gd99b2d76-0ubuntu1) focal; urgency=medium

  * New upstream snapshot.
    - growpart: add flock support to prevent udev races (LP: #1834875)

 -- Ryan Harper <email address hidden> Tue, 25 Feb 2020 14:41:21 -0600

Changed in cloud-utils (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Can someone from CPC confirm that this has fixed the issues in the Azure tests?

Revision history for this message
Pat Viafore (patviafore) wrote :

Sorry, we were waiting on a bug regarding dictionary keys being modified to resolve before we could verify this. We have not seen this in the past couple of runs in our testing, so I think we're good to proceed.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Great, thanks for confirming.

I wonder if we should SRU this. You see it on Eoan I presume? The bug is theoretically present in Bionic too aiui but if it never crops up I don't know if it's worth it.

Revision history for this message
Pat Viafore (patviafore) wrote :

Yes, we see it on Eoan and blocking tests, so if you could SRU it there, it'd be much appreciated. We have never seen it on Bionic to my knowledge.

no longer affects: linux-azure (Ubuntu)
no longer affects: systemd (Ubuntu)
Revision history for this message
Markus Schade (lp-markusschade) wrote :

It seems that the changes to cloud-initramfs-growroot have a side effect that was not present before focal.

After initially provisioning an image, the root partition is resized in initramfs, but cannot be mounted afterwards. A second boot to the now resized disk works then without problems.

Removing cloud-initramfs-growroot provides a working focal image.

Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... [ 2.261862] Btrfs loaded, crc32c=crc32c-intel
Scanning for Btrfs filesystems
done.
Begin: Will now check root file system ... fsck from util-linux 2.34
[/usr/sbin/fsck.ext4 (1) -- /dev/sda1] fsck.ext4 -a -C0 /dev/sda1
/dev/sda1: clean, 32500/188048 files, 340923/751616 blocks
done.
[ 2.397086] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
done.
Begin: Running /scripts/local-bottom ... [ 2.723549] sda: sda1 sda14 sda15
GROWROOT: CHANGED: partition=1 start=129024 old: size=6012928 end=6141952 new: size=39872479 end=40001503
done.
Begin: Running /scripts/init-bottom ... mount: mounting /dev on /root/dev failed: No such file or directory
mount: mounting /dev on /root/dev failed: No such file or directory
done.
mount: mounting /run on /root/run failed: No such file or directory
BusyBox v1.30.1 (Ubuntu 1:1.30.1-4ubuntu6) multi-call binary.

Revision history for this message
Scott Moser (smoser) wrote :

@Markus,
Please open a bug against cloud-initramfs-growroot and reference it here (and reference this bug there).

Please also provide a full console log.

fwiw, generally speaking cloud-initramfs-growroot has not been necessary since at least 14.04, quite possibly even beefore.

description: updated
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Tobias, or anyone else affected,

Accepted cloud-utils into eoan-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-utils/0.31-5-gef42f6b5-0ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-eoan to verification-done-eoan. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-eoan. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in cloud-utils (Ubuntu Eoan):
status: New → Fix Committed
tags: added: verification-needed verification-needed-eoan
Changed in cloud-initramfs-tools (Ubuntu Eoan):
status: New → Fix Committed
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Tobias, or anyone else affected,

Accepted cloud-initramfs-tools into eoan-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-initramfs-tools/0.44ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-eoan to verification-done-eoan. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-eoan. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (cloud-utils/0.31-5-gef42f6b5-0ubuntu1.1)

All autopkgtests for the newly accepted cloud-utils (0.31-5-gef42f6b5-0ubuntu1.1) for eoan have finished running.
The following regressions have been reported in tests triggered by the package:

lxc/3.0.4-0ubuntu1 (i386)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/eoan/update_excuses.html#cloud-utils

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Mathew Hodson (mhodson)
affects: cloud-init → cloud-initramfs-tools
Changed in cloud-initramfs-tools:
status: Incomplete → Fix Released
Revision history for this message
Pat Viafore (patviafore) wrote :

We are seeing some failures still in Eoan (not Focal) with tracebacks involving growpart:

2020-05-29 10:14:29,218 - util.py[DEBUG]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 851, in _run_modules
    freq=freq)
  File "/usr/lib/python3/dist-packages/cloudinit/cloud.py", line 54, in run
    return self._runners.run(name, functor, args, freq, clear_on_fail)
  File "/usr/lib/python3/dist-packages/cloudinit/helpers.py", line 187, in run
    results = functor(*args)
  File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 360, in handle
    func=resize_devices, args=(resizer, devices))
  File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2558, in log_time
    ret = func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 307, in resize_devices
    (old, new) = resizer.resize(disk, ptnum, blockdev)
  File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 159, in resize
    return (before, get_size(partdev))
  File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 206, in get_size
    fd = os.open(filename, os.O_RDONLY)
FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-partuuid/465f461c-537a-4d09-acae-cf4469c962c5'

It is still an intermittent failure.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Huh. Is that with cloud-utils 0.31-5-gef42f6b5-0ubuntu1.1 from proposed?

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Anyway given Eoan's imminent demise, maybe we don't care.

Mathew Hodson (mhodson)
Changed in cloud-utils:
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Proposed package removed from archive

The version of cloud-utils in the proposed pocket of Eoan that was purported to fix this bug report has been removed because the bugs that were to be fixed by the upload were not verified in a timely (105 days) fashion.

tags: removed: verification-needed-eoan
Changed in cloud-utils (Ubuntu Eoan):
status: Fix Committed → Won't Fix
tags: removed: verification-needed
1 comments hidden view all 113 comments
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

The version of cloud-initramfs-tools in the proposed pocket of Eoan that was purported to fix this bug report has been removed because the bugs that were to be fixed by the upload were not verified in a timely (105 days) fashion.

Changed in cloud-initramfs-tools (Ubuntu Eoan):
status: Fix Committed → Won't Fix
Displaying first 40 and last 40 comments. View all 113 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.