Comment 75 for bug 1834875

Revision history for this message
Dimitri John Ledkov (xnox) wrote : Re: [Bug 1834875] Re: cloud-init growpart race with udev

On Thu, 7 Nov 2019 at 20:05, Scott Moser <email address hidden> wrote:
>
> > > So that means we have this sequence of events:
> > > a.) growpart change partition table
> > > b.) growpart call partx
> > > c.) udev created and events being processed
>
> > That is not true. whilst sfdisk is deleting, creating, finishing
> > partition table (a) and partx is called (b), udev events are already fired
> > and running in parallel and may complete against deleted, partially new,
> > completely new partition table, with or without partx completed.
>
> You're correct... I left out some 'events created and handled' after 'a'.
> But that doesn't change anything. The problem we're seeing here is *not*
> that 'b' had any issue.
>
> >
> > No amount of settling for events will fix the fact that events were run
> > against racy state of the partition table _during_ sfdisk and partx calls.
>
> complete non-sense. I dont care about any racy state *during* anything. I
> call 'udevadm settle'. That means "block until stuff is done." I think
> you're saying that I cannot:
> 1.) do something that causes udev events
> 2.) wait until all udev events caused by that something are finished
>
> if that is the case, then nothing ever can fix this, and we might as well
> go find jobs on a farm.
>

Both those thing happen, but udev events are started processing whilst
the partition table changes have not completed yet. This is what is
document in the sfdisk manpage as a know bug that nobody yet has
managed to figure out and derace.
Meaning if the udev events happened, and one waits to finish their
processing, there is no guarantee that they have been processed
against consistent disk state.

This is why sfdisk recommends taking flock. And this is why udev also
tries to take an flock.

In the past IBM has demonstrated a race similar to this one in
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1571707
where they tried to rapidly and in parallel partition 256 devices,
with only 89 of them successfully showing partitions after the limit
test is executed, and appear fully after a reboot in April 2016 on top
of Xenial.

--
Regards,

Dimitri.