Bug #531240 “silently breaking raid: root raid_members opened as...” : Bugs : util-linux package : Ubuntu

Revision history for this message

ceg (ceg) wrote on 2010-03-03:

#1

A (more or less wild) guess why this misreporting may not have surfaced with any bad effects before:

After booting manually and reassembling the array manually I noticed the usb disk did fail after a while and was dropped from the array, so it has become unreliably.

So maybe it is this condition on boot (usb disk was droped from the array) where the array is not run yet (possibly waiting for the disk now having a different device name) and cryptsetup grabs the raid_member instead of waiting for the array.

Revision history for this message

ceg (ceg) wrote on 2010-03-05:

#2

New observation: When booting the alternate CD into rescue-mode in a luks on raid system, it asks passphrases for the raid members instead of the md device.

Problem: cryptsetup (not beeing udev but boot script driven, and looking for luks headers on its own?) really needs to be hooked into the event driven startup properly.

The md devices may most of the time get up quickly enough so that the member devices are in use when cryptsetup wants to grab them. But ol'cryptsetup will grab luks on raid members, if mds are not set up at all or a member got marked faulty.

Don't know if blkid is reporting raid_members as luks only after cryptsetup opened them or already before.

ceg (ceg) on 2010-03-05

summary:

- blkid reports root raid_member (on usb) as luks, which is booted while
- raid remains "inactive"
+ breaking raid: root raid_member opened as luks

ceg (ceg) on 2010-03-05

description:

updated

Revision history for this message

Steve Langasek (vorlon) wrote on 2010-03-05: Re: breaking raid: root raid_member opened as luks

#3

cryptsetup is event-driven in lucid.

Changed in cryptsetup (Ubuntu):
status:	New → Invalid

Revision history for this message

ceg (ceg) wrote on 2010-03-07:

#4

>cryptsetup is event-driven in lucid.

good news!
including in initramfs?

can you tell if "cryptsetup isLuks" correctly reports false for luks on raid members?

(or with what test command can I tell? This gives me nothing:
root@localhost:~# cryptsetup isLuks /dev/hda7
root@localhost:~#
)

Revision history for this message

Steve Langasek (vorlon) wrote on 2010-03-09:

#5

> including in initramfs?

The initramfs is always event driven. It waits for the physical device to be available, decrypts it as needed, then mounts it. No other event handling is required or appropriate.

> can you tell if "cryptsetup isLuks" correctly reports false for luks on raid members?

Sorry, I don't know.

Revision history for this message

ceg (ceg) wrote on 2010-03-09:

#6

>[cryptsetup] waits for the physical device to be available, decrypts it as needed, then mounts it. No other event handling is required or appropriate.

Think of the following example.

As far as I can see cryptsetup in initramfs is not called on the event that a crypt device appears. It seems cryptsetup in initramfs is currently rather linear script driven: the cryptsetup script has its own while loop waiting for $cryptsource after all other "local top" scripts. "Failure hooks" have been introduced in initramfs and mdadm, but I don't see one for cryptsetup. And I have doubts that such a two step design can cope with the general case of devices depending on others.)

The simple case: rootfs on lvm on crypt on raid:

0) The md0 raid (sda,sdb) got degraded during power down,
1 udev/mdadm does not start the array,
2) crypt on raid does not come up for 2,5 minutes until ROOTDELAY timeout and init-top fails (putting aside this bug, that cryptsetup is wrongly opening a member device)
3) mdadm failure hook runs the array degraded
4) boot currently fails never the less (but could be made to work with a cryptsetup failure-hook for this case)

If however the rootfs is actually located on md1 assembled of md0 (the internal disks) an sdc, with the failure hook design there is no further timeout given or failurehook available after md0 is started degraded to bring the rootfs up if the (external backup disk) sdc is not connected.

(This is not far fetched because only stacking raids this way allows to take advantage of write intent bitmaps for a (backup) disk that are not connected all the time. (sdc in this case))

So to handle the genreal case with unforeseen combinations, I think in initramfs:

- cryptsetup udev rules should be supplied into initrams as well (new_crypt_device event, restriced to the rootfs dependency devices)
- The initramfs should have just one ROOTDELAY waiting loop in its script (or faster loading upstart/mountall binary?) started upon initramfs_start, that is paused however while cryptsetup it is prompting for a passphrase (prompt_start/stop events).
- Package mdadm needs to supply a MIN_COMPLETION_WAIT value and the dependency tree of arrays for the root device on mkinitramfs.
- During boot when time_elapsed == MIN_COMPLETION_WAIT (raid_start_degrated event)
-If a next level in the dependency tree exists and the remaining root delay timer is lower then MIN_COMPLETION_WAIT the rootdelay_timer is increased by MIN_COMPLETION_WAIT.
-The degraded arrays of the current dependency level are started degraded.

related cryptsetup bug:
Bug #251164 boot impossible due to missing initramfs failure hook integration
Bug #247153 encrypted root initialisation races/fails on hotplug devices (does not wait)

To fix this bug # however cryptsetup must not open individual raid members directly.
I can run "cryptsetup isLuks" on such a raid member but don't know how get to the response code (true/false?).
:~# cryptsetup isLuks /dev/hda7
:~# (empty prompt with no visibe output)

>[cryptsetup] waits for the physical device to be available, decrypts it as needed, then mounts it. No other event handling is required or appropriate.

Think of the following example.

As far as I can see cryptsetup in initramfs is not called on the event that a crypt device appears. It seems cryptsetup in initramfs is currently rather linear script driven: the cryptsetup script has its own while loop waiting for $cryptsource after all other "local top" scripts. "Failure hooks" have been introduced in initramfs and mdadm, but I don't see one for cryptsetup. And I have doubts that such a two step design can cope with the general case of devices depending on others.)

The simple case: rootfs on lvm on crypt on raid:

0) The md0 raid (sda,sdb) got degraded during power down,
1 udev/mdadm does not start the array,
2) crypt on raid does not come up for 2,5 minutes until ROOTDELAY timeout and init-top fails  (putting aside this bug, that cryptsetup is wrongly opening a member device)
3) mdadm failure hook runs the array degraded
4) boot currently fails never the less (but could be made to work with a cryptsetup failure-hook for this case)

If however the rootfs is actually located on md1 assembled of md0 (the internal disks) an sdc, with the failure hook design there is no further timeout given or failurehook available after md0 is started degraded to bring the rootfs up if the (external backup disk) sdc is not connected.

(This is not far fetched because only stacking raids this way allows to take advantage of write intent bitmaps for a (backup) disk that are not connected all the time. (sdc in this case))

So to handle the genreal case with unforeseen combinations, I think in initramfs:

- cryptsetup udev rules should be supplied into initrams as well (new_crypt_device event, restriced to the rootfs dependency devices)
- The initramfs should have just one ROOTDELAY waiting loop in its script (or faster loading upstart/mountall binary?) started upon initramfs_start, that is paused however while cryptsetup it is prompting for a passphrase (prompt_start/stop events).
- Package mdadm needs to supply a MIN_COMPLETION_WAIT value and the dependency tree of arrays for the root device on mkinitramfs.
- During boot when time_elapsed == MIN_COMPLETION_WAIT (raid_start_degrated event)
   -If a next level in the dependency tree exists and the remaining root delay timer is lower then MIN_COMPLETION_WAIT the rootdelay_timer is increased by MIN_COMPLETION_WAIT.
   -The degraded arrays of the current dependency level are started degraded.

related cryptsetup bug:
Bug #251164  boot impossible due to missing initramfs failure hook integration
Bug #247153  encrypted root initialisation races/fails on hotplug devices (does not wait)

To fix this bug # however cryptsetup must not open individual raid members directly.
I can run "cryptsetup isLuks" on such a raid member but don't know how get to the response code (true/false?).
:~# cryptsetup isLuks /dev/hda7
:~# (empty prompt with no visibe output)

Revision history for this message

ceg (ceg) wrote on 2010-03-09:

#7

Err, it happened again, reduncancy in the system is destroyed, cryptsetup opened a raid member and it got mounted leaving the raid incomplet/inactive:

md3 : inactive sda7[0](S)

blkid (now again) returns TYPE="crypto_LUKS" for the raid member.

When I run "su cryptsetup isLuks" on raid members or another (active) real luks md device "echo $?" returns 0. For a vfat partition"echo $?" returns 234.

I see this Bug set to invalid for cryptsetup, but as cryptsetup is determining $cryptsource (devices it waits for to appear) on its own, and isLuks wrongly reports raid members as being luks, I think this this is valid cryptsetup bug. (Even if I don't understand what triggers the raid member to be opened and mounted.

Changed in cryptsetup (Ubuntu):
status:	Invalid → New

Revision history for this message

Steve Langasek (vorlon) wrote on 2010-03-09:

#8

> blkid (now again) returns TYPE="crypto_LUKS" for the raid member.

> I see this Bug set to invalid for cryptsetup, but as cryptsetup is
> determining $cryptsource (devices it waits for to appear) on its own,
> and isLuks wrongly reports raid members as being luks, I think this this
> is valid cryptsetup bug.

No, it's still a bug in blkid. It's blkid that wrongly detects the LUKS UUID on a device that's a RAID member.

Changed in cryptsetup (Ubuntu):
status:	New → Invalid

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2010-03-09:

#9

Steve: why is this a bug in blkid? You yourself proposed the patch that whenever LUKS metadata is detected, it *always* returns LUKS.

If we drop that patch, we'll have all the old bugs back again when people added LUKS to a partition without clearing what was there before.

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2010-03-09:

#10

ceg: I'm a little confused as to the devices involved here. Could you provide the uncensored output of running "sudo blkid" on your system, then point out to me which line you think is wrong

Changed in util-linux (Ubuntu):
status:	New → Incomplete
importance:	Undecided → Low

Revision history for this message

Steve Langasek (vorlon) wrote on 2010-03-09: Re: [Bug 531240] Re: breaking raid: root raid_member opened as luks

#11

On Tue, Mar 09, 2010 at 10:53:29PM -0000, Scott James Remnant wrote:
> Steve: why is this a bug in blkid? You yourself proposed the patch that
> whenever LUKS metadata is detected, it *always* returns LUKS.

Scott, AIUI this is a case where a given partition has both LUKS and RAID
signatures, and the LUKS signature is given precedence. I believe this is a
specific corner case that we overlooked in the previous patch, and that RAID
signatures need to be given precedence over LUKS.

For the general case of filesystem types, it still holds that LUKS should
take precedence; this particular bug only arises because both RAID and LUKS
are treated specially in blkid (on the grounds that both require manual
configuration to be activated) but the order of precedence is wrong.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message

ceg (ceg) wrote on 2010-03-10: Re: breaking raid: root raid_member opened as luks

#12

> it's still a bug in blkid. It's blkid that wrongly detects the LUKS UUID on a device that's a RAID member.

Yes, your of course right, and if "cryptsetup isLuks" is itself scanning for a luks signature (without giving RAID precedence) it got the same bug. Can you rule that out / know for sure that cryptsetup depends on/uses blkid? (the cryptsetup package does not depend on util-linux package)

> Could you provide...

Yes, (as it happened again now) I can provide the shorthand output:

#blkid
/dev/sda1: UUID="de" TYPE="ntfs"
/dev/sda2: UUID="a0" TYPE="ext2"
/dev/sda3: UUID="a1" TYPE="linux_raid_member"
/dev/sda5: UUID="3a" TYPE="linux_raid_member"
/dev/sda7: UUID="19" TYPE="linux_raid_member"
/dev/md1: UUID="f1" TYPE="linux_raid_member"
/dev/md0: UUID="b1" TYPE="crypto_LUKS"
/dev/md2: UUID="df" TYPE="crypto_LUKS"
/dev/sdb1: UUID="fd" TYPE="ext2"
/dev/sdb2: UUID="A9" TYPE="vfat"
/dev/sdb3: UUID="a1" TYPE="linux_raid_member"
/dev/sdb5: UUID="f1" TYPE="linux_raid_member"
/dev/sdb6: LABEL="boot" UUID="df" TYPE="ext2"
/dev/sdb7: UUID="11" TYPE="crypto_LUKS"
/dev/mapper/md3_crypt: UUID="aL" TYPE="LVM2_member"
/dev/mapper/vg0-lv_swap: UUID="4b" TYPE="swap"
/dev/mapper/vg0-lv_root: UUID="66" TYPE="ext4"
/dev/mapper/md2_crypt: UUID="09" TYPE="ext4"

# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md2 : active raid1 sdb5[1] md1[0]
17719488 blocks [2/2] [UU]
bitmap: 3/136 pages [12KB], 64KB chunk

md0 : active raid1 sdb3[0] sda3[1]
78123968 blocks [2/2] [UU]
bitmap: 0/150 pages [0KB], 256KB chunk

md1 : active raid1 sda5[0]
17719552 blocks [2/1] [U_]
bitmap: 80/136 pages [320KB], 64KB chunk

md3 : inactive sda7[0](S)
19334080 blocks

sdb7 should be the second member of md3 (/home fs) and have the same RAID UUID as sda7 "19". Instead sdb7 is reported with "11", the UUID of the luks signature of md3 (stored on sda7 and sdb7 of course).

> this is a case where a given partition has both LUKS and RAID

Yes, always give RAID and LUKS signatures preference over filesystem signatures because they are a containers, and you can give RAID preference over LUKS because it is a container for LUKS and when its set up/ment the other way around (RAID member contained in LUKS partition) the RAID signature is not visible.

AIUI this doesn't have to do much with requiring configuration to be activated. Redundant raids should never need user intervention to be run even degraded after a timeout, and luks can be supplied with a keyfile that gets available opening another (possibly unencrypted) partition or a cardreader or fingerprint reader that may get available. (As you see lots of other events that the initramfs has to handle for the general case, so it may pay out well to use the same and known upstart/mountal" mechanisms in initramfs, too.)

> it's still a bug in blkid. It's blkid that wrongly detects the LUKS UUID on a device that's a RAID member.

Yes, your of course right, and if "cryptsetup isLuks" is itself scanning for a luks signature (without giving RAID precedence) it got the same bug. Can you rule that out / know for sure that cryptsetup depends on/uses blkid? (the cryptsetup package does not depend on util-linux package)

> Could you provide...

Yes, (as it happened again now) I can provide the shorthand output:

#blkid
/dev/sda1: UUID="de" TYPE="ntfs" 
/dev/sda2: UUID="a0" TYPE="ext2" 
/dev/sda3: UUID="a1" TYPE="linux_raid_member" 
/dev/sda5: UUID="3a" TYPE="linux_raid_member" 
/dev/sda7: UUID="19" TYPE="linux_raid_member" 
/dev/md1: UUID="f1" TYPE="linux_raid_member" 
/dev/md0: UUID="b1" TYPE="crypto_LUKS" 
/dev/md2: UUID="df" TYPE="crypto_LUKS" 
/dev/sdb1: UUID="fd" TYPE="ext2" 
/dev/sdb2: UUID="A9" TYPE="vfat" 
/dev/sdb3: UUID="a1" TYPE="linux_raid_member" 
/dev/sdb5: UUID="f1" TYPE="linux_raid_member" 
/dev/sdb6: LABEL="boot" UUID="df" TYPE="ext2" 
/dev/sdb7: UUID="11" TYPE="crypto_LUKS" 
/dev/mapper/md3_crypt: UUID="aL" TYPE="LVM2_member" 
/dev/mapper/vg0-lv_swap: UUID="4b" TYPE="swap" 
/dev/mapper/vg0-lv_root: UUID="66" TYPE="ext4" 
/dev/mapper/md2_crypt: UUID="09" TYPE="ext4"

# cat /proc/mdstat 
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md2 : active raid1 sdb5[1] md1[0]
      17719488 blocks [2/2] [UU]
      bitmap: 3/136 pages [12KB], 64KB chunk

md0 : active raid1 sdb3[0] sda3[1]
      78123968 blocks [2/2] [UU]
      bitmap: 0/150 pages [0KB], 256KB chunk

md1 : active raid1 sda5[0]
      17719552 blocks [2/1] [U_]
      bitmap: 80/136 pages [320KB], 64KB chunk

md3 : inactive sda7[0](S)
      19334080 blocks

sdb7 should be the second member of md3 (/home fs) and have the same RAID UUID as sda7 "19". Instead sdb7 is reported with "11", the UUID of the luks signature of md3 (stored on sda7 and sdb7 of course).

> this is a case where a given partition has both LUKS and RAID

Yes, always give RAID and LUKS signatures preference over filesystem signatures because they are a containers, and you can give RAID preference over LUKS because it is a container for LUKS and when its set up/ment the other way around (RAID member contained in LUKS partition) the RAID signature is not visible.

AIUI this doesn't have to do much with requiring configuration to be activated. Redundant raids should never need user intervention to be run even degraded after a timeout, and luks can be supplied with a keyfile that gets available opening another (possibly unencrypted) partition or a cardreader or fingerprint reader that may get available. (As you see lots of other events that the initramfs has to handle for the general case, so it may pay out well to use the same and known upstart/mountal" mechanisms in initramfs, too.)

Revision history for this message

ceg (ceg) wrote on 2010-03-10:

#13

Lets think some more of signature rules.

Crypt vs Container
-> any crypt should be lower then other containers (not fs) because whaterver is in it wouldn't be visible or is random.

Crypt vs FS
<- old fs signatuers may have remained.

RAID vs LVM-LV
<- LVs never reside on RAIDs directly (always on PVs)
<- RAID should never contain a LV directly, so its a new raid member

RAID vs LVM-PV
<- RAID can not be made assembled from PVs only from LVs.
<- RAID can contain a PV.

(only implement rules as far as understandable, never a generalization without a general rule)

Upstream seems to generally refrain from prioritizing. There is of course the basic rule to return -1 (no type) if there are conficts, and no resolving rule applies.

ceg (ceg) on 2010-03-10

description:	updated
Changed in util-linux (Ubuntu):
status:	Incomplete → Confirmed
Changed in cryptsetup (Ubuntu):
status:	Invalid → New

Revision history for this message

ceg (ceg) wrote on 2010-03-10:

#14

My current conception: (WARNING)

Both blkid and "cryptsetup isLuks" misreport a raid_member as luks when checking devices, but blkid correctly reports RAID if the device is set up (mounted/active) in the system as intended (after install/update/recover).

On boot what happens depends on the order in which the partitions show up.
In my case boot works ok once or twice before one raid member gets luksOpen'ed directly while the other remains an inactive spare.

This is at least present in all systems that use luks on top of raid devices. And in those cases is very problematic because on booting individual raid members get mounted randomly, possibly in between raid resyncs, leading to raid desyncing and false automatic (rollback) recoveries with data loss.

Because ubuntu ships a raid monitoring support that was broken, no notification about raid status changes are shown to users/admins. #535417

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2010-03-10: Re: [Bug 531240] Re: breaking raid: root raid_member opened as luks

#15

On Wed, 2010-03-10 at 11:34 +0000, ceg wrote:

> Upstream seems to generally refrain from prioritizing. There is of
> course the basic rule to return -1 (no type) if there are conficts, and
> no resolving rule applies.
>
Err, Ubuntu ships unmodified upstream code here (to which we are a major
contributor) -- these same rules are used by all distributions.

Scott
--
Scott James Remnant
<email address hidden>

Revision history for this message

ceg (ceg) wrote on 2010-03-10: Re: breaking raid: root raid_member opened as luks

#16

Cool, that I didn't know from the thread my search turned up about blkid. (increases the bug weight though)

Revision history for this message

Steve Langasek (vorlon) wrote on 2010-03-10:

#17

> Yes, your of course right, and if "cryptsetup isLuks" is itself scanning
> for a luks signature (without giving RAID precedence) it got the same
> bug.

Well, we can consider it a wishlist bug that 'cryptsetup isLuks' returns true in this case; fixing that isn't actually relevant to resolving the problem you're having.

> Can you rule that out / know for sure that cryptsetup depends
> on/uses blkid? (the cryptsetup package does not depend on util-linux
> package)

cryptsetup is never *invoked* unless blkid first returns the UUID it's looking for.

Changed in cryptsetup (Ubuntu):
importance:	Undecided → Wishlist

Revision history for this message

ceg (ceg) wrote on 2010-03-14:

#18

concerning cryptsetup "wishlist":

Precautions like this show the level of safety and quality in implementing basic OS operations. If cryptsetup would check the given device before opening, this data loss would not occur now or any time later if blkid, an admin or another script makes an error.

Its just a wrong default to allow accessing individual raid members (its a dangerous operation).

i.e. "mount" returns "unknown filesytem type linux_raid_member“ for filesystems on raid members and will only mount it given the -t option, cryptsetup (or the kernel even?) should refuse by default, too, and maybe allow opening a "cypt on raid member" directly only with --force.

ceg (ceg) on 2010-03-14

description:

updated

Revision history for this message

ceg (ceg) wrote on 2010-03-14:

#19

finaly found that "cryptsetup isLuks" is actually used in scripts/local-top/cryptroot line 236:
if /sbin/cryptsetup isLuks $cryptsource > /dev/null 2>&1; then

Please remember to have the bootscripts output messages about what they are doing and their results to the (hidden) text console. Not only this thing could have showed up much earlier and been a piece of cake to debug.

ceg (ceg) on 2010-03-29

description:

updated

Revision history for this message

ceg (ceg) wrote on 2010-03-30:

#20

> Well, we can consider it a wishlist bug that 'cryptsetup isLuks' returns true in this case;
> fixing that isn't actually relevant to resolving the problem you're having.

As the cryptsetup init checks the device with "cryptsetup isLuks" it would actually stop that raid desyncing (i.e. fix the problem).

Even if blkid or some confused user continues to throw wrong/arbitrary UUIDs at it.

So IMHO its a real bug for cryptsetup upstream.

Revision history for this message

Steve Langasek (vorlon) wrote on 2010-03-30:

#21

No, you're looking at it backwards. cryptsetup should not have to carry any embedded information about what identifies a RAID container, that's the responsibility of blkid. The bug is that the cryptsetup job is *being called at all* for this block device.

Revision history for this message

ceg (ceg) wrote on 2010-04-01:

#22

Agreed.

Cryptsetup can just use blkid (if available) to safety check if the user or a buggy/incorrect script of his is not trying to open a RAID type device.

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2010-04-01:

#23

My understanding of this problem:

blkid currently, deliberately, returns only the first detected RAID or LUKS container for a filesystem - with the checking order being raid first.

This means if a partition has both a RAID and LUKS signature, it will always be returned as RAID.

You request that we invert this, so that a partition that has both a RAID and LUKS signature is now returned as LUKS.

My concern here is that in the case where we added LUKS to that break-out rule, we were only changing a "no answer" result to one with a probably valid (or at least not dangerously invalid) answer.

The change you're asking for is *changing* an existing answer.

Why is it impossible that there are systems out there whose RAID devices have left-over LUKS metadata on them?

Changed in util-linux (Ubuntu):
status:	Confirmed → Triaged
status:	Triaged → Incomplete

Revision history for this message

ceg (ceg) wrote on 2010-04-01:

#24

> blkid currently, deliberately, returns only the first detected RAID or LUKS container
> for a filesystem - with the checking order being raid first.

Did someone change it already?

The issue I reported is actually the other way around. A RAID member device partition (version 0.90 metadata, located at the end of the device) where the assembled raid contained LUKS was identified as being LUKS. Leading to a random raid member being opened and mounted instead of the then degraded raid device (corrupting of the raid integrity).

AFAIK:
blkid was patched more lately to give LUKS priority over fs signatures, because cryptsetup missed to wipe existing metadata at least in the past. (In addition to giving RAID priority over FSs.) But with this change, LUKS was also (errornously) given priority over RAID, leading to the issue at hand.

I think RAID must have priority not only over FSs but also over LUKS.

Because RAID should regularly be able to contain LUKS and in this case both metadata will be there, but if LUKS really contains RAID, the RAID metadata would regularly be encrypted.

Now, if the device really is LUKS and there is leftover RAID metadata present (Is cryptsetups actually wiping the end of devices, or only filesystem metadata at the beginning?) there is the following risk: If different luks partitions are created on raid1 partitions marked clean and consistent, they would get assembled into an inconsistent array.

If it may be possible for cryptsetup to open such a device (if chunksize larger than metadata?) cryptsetup would have to safeguard against opening this.

Two things to check with cryptsetup: Wiping end? Opening inconsisten raid?

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2010-04-01: Re: [Bug 531240] Re: breaking raid: root raid_member opened as luks

#25

On Thu, 2010-04-01 at 21:18 +0000, ceg wrote:

> > blkid currently, deliberately, returns only the first detected RAID or LUKS container
> > for a filesystem - with the checking order being raid first.
>
> Did someone change it already?
>
Not that I know of. blkid has been refactored, but the order looks the
same to me.

> The issue I reported is actually the other way around. A RAID member
> device partition (version 0.90 metadata, located at the end of the
> device) where the assembled raid contained LUKS was identified as being
> LUKS. Leading to a random raid member being opened and mounted instead
> of the then degraded raid device (corrupting of the raid integrity).
>
My understanding of the code is that (with 2.17.2 anyway) this will
return RAID.

Have you tried this with current lucid?

Scott
--
Scott James Remnant
<email address hidden>

Revision history for this message

ceg (ceg) wrote on 2010-04-01: Re: breaking raid: root raid_member opened as luks

#26

Yeah, it was a production karmic machine where this got in with update to 2.16-1ubuntu5.

I'm sorry, noticed I only mentioned that in the the issue that led to adding this behaviour.

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2010-04-02: Re: [Bug 531240] Re: breaking raid: root raid_member opened as luks

#27

On Thu, 2010-04-01 at 22:59 +0000, ceg wrote:

> Yeah, it was a production karmic machine where this got in with update
> to 2.16-1ubuntu5.
>
So you *haven't* tried this with lucid?

Scott
--
Scott James Remnant
<email address hidden>

Revision history for this message

ceg (ceg) wrote on 2010-04-02: Re: breaking raid: root raid_member opened as luks

#28

Can you confirm the behavior for 2.16-1ubuntu5?

Else:
As I reported the type was correctly reported as raid member upon (re)setting the array up (while it was mounted). But on one of the subsequent reboots things got changed around, and then blkid reported luks type (while it was mounted as such). Maybe some other part involved is the cause? What identifies the UUID in initramfs for cryptsetup to open on boot?

Changed in util-linux (Ubuntu):
status:	Incomplete → Confirmed

Revision history for this message

ceg (ceg) wrote on 2010-04-02:

#29

Sorry, I was unclear: I did not try raid with lucid.

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2010-04-03:

#30

The lucid util-linux code looks right to me; it should report RAID over LUKS since RAID is higher in the probe list, and the first successful probe is returned.

If you can reproduce this problem on lucid, please re-open this bug

Changed in util-linux (Ubuntu):
status:	Confirmed → Fix Released

Revision history for this message

ceg (ceg) wrote on 2010-04-03:

#31

I only have one of the luks on raid member drives left to examine (I reinstalled the system since I have no use for data corrupting raid setups and not unlimited disks available.), but I booted lucid beta1 CD and looked at that drive.

The blkid output looks just the same as on 9.10 to me:

/dev/sdd1: UUID="fd" TYPE="ext2"
/dev/sdd2: UUID="A9" TYPE="vfat"
/dev/sdd3: UUID="a1" TYPE="linux_raid_member"
/dev/sdd5: UUID="f1" TYPE="linux_raid_member"
/dev/sdd6: LABEL="boot" UUID="df" TYPE="ext2"
/dev/sdd7: UUID="11" TYPE="crypto_LUKS" <- this was set up as raid member an got messed up

Running cfdisk gives *conflicting* results and looks even worse:

cfdisk (util-linux-ng 2.17)

                                 Disk Drive: /dev/sdc
                          Size: 120060444672 bytes, 120.0 GB
                 Heads: 255 Sectors per Track: 63 Cylinders: 14596

    Name Flags Part Type FS Type [Label] Size (MB)
-------------------------------------------------------------------------------------
    sdc1 Boot Primary ext2 8,23
    sdc2 Primary vfat 1998,75
    sdc3 Primary linux_raid_m 79999,08
    sdc5 Logical crypto_LUKS 18144,97
    sdc6 Logical ext2 [boot] 148,06
    sdc7 Logical crypto_LUKS 19757,13

Changed in util-linux (Ubuntu):
status:	Fix Released → Confirmed

Revision history for this message

ceg (ceg) wrote on 2010-04-03:

#32

(The sdc/sdd change was due to me disconnecting the USB drive in between.)

Revision history for this message

ceg (ceg) wrote on 2010-04-03:

#33

At the same time "fdisk -l" output looks OK:

Disk /dev/sdc: 120.1 GB, 120060444672 bytes
255 heads, 63 sectors/track, 14596 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical / optimal IO): 512 bytes / 512 bytes

Device Boot Start End Blocks Id System
/dev/sdc1 * 1 1 8001 83 Linux
/dev/sdc2 2 244 1951897+ b W95 FAT32
/dev/sdc3 245 9970 78124095 fd Linux raid autodetect
/dev/sdc4 9971 14596 37158345 5 Extended
/dev/sdc5 9971 12176 17719663+ fd Linux raid autodetect
/dev/sdc6 12177 12194 144553+ 83 Linux
/dev/sdc7 12195 14596 19294033+ fd Linux raid autodetect

Revision history for this message

ceg (ceg) wrote on 2010-04-03:

#34

Back on 9.10: cfdisk and fdisk see those partitions as raid and only blkid gives luks (just as reported)

Revision history for this message

ceg (ceg) wrote on 2010-04-03:

#35

Are newer blkid and cfdisk still probing the end of devices (older metadata)?

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2010-04-04: Re: [Bug 531240] Re: breaking raid: root raid_member opened as luks

#36

On Sat, 2010-04-03 at 14:07 +0000, ceg wrote:

> I only have one of the luks on raid member drives left to examine (I
> reinstalled the system since I have no use for data corrupting raid
> setups and not unlimited disks available.), but I booted lucid beta1 CD
> and looked at that drive.
>
Could you attach the first and last ~1MB of this drive - and let me know
how large it is - this should allow me to trace through blkid to see why
it's not working as intended.

status incomplete

Scott
--
Scott James Remnant
<email address hidden>

Changed in cryptsetup (Ubuntu):
status:	New → Incomplete

Revision history for this message

ceg (ceg) wrote on 2010-04-05: Re: breaking raid: root raid_member opened as luks

#37

I got the beginning with
dd if=/dev/sdc7 of=raid-part-head bs=1M count=1

Fdisk etc. output with various units and "block sizes" but never stating the actual unit sizes is a mess.

#blockdev --report /dev/sdc7
RO RA SSZ BSZ StartSec Size Device
rw 256 512 512 195896673 19757090304 /dev/sdc7

#blockdev --getsize /dev/sdc7
38588067

Finaly this got me hopefully the last 1MB from the partition.

#dd if=/dev/sdc7 of=raid-part-tail bs=512 skip=38586019
2048+0 Datensätze ein
2048+0 Datensätze aus
1048576 Bytes (1,0 MB)

Note:
I noticed the fdisk output added "+" to to the size column of the partition (what ever that means), it being an odd number, and these "Notes_on_dd_and_Odd_Sized_Disks" (last sector missing with linux):

http://74.125.77.132/search?q=cache:wxd4K1L8Bc4J:www.cftt.nist.gov/Notes_on_dd_and_Odd_Sized_Disks4.doc+dd+copy+last+sector&cd=7&hl=de&ct=clnk&gl=de

Revision history for this message

ceg (ceg) wrote on 2010-04-06:

#38

Does cfdisk (in lucid) deliberately show blkid output instead of types given in the partition table? If so, would it make sense to show that additionally not exclusively.

ceg (ceg) on 2010-04-06

Changed in cryptsetup (Ubuntu):
status:	Incomplete → New

Revision history for this message

ceg (ceg) wrote on 2010-04-09:

#39

Anybody tested if karmic to lucid upgrades work with raid_members wrongly identified as luks?

Is there a ubuntu VM farm/cloud where test cases and bug replications can be run?

ceg (ceg) on 2010-04-21

description:	updated
description:	updated

Revision history for this message

Steve Langasek (vorlon) wrote on 2010-04-21:

#40

I don't see anything to release note here. Scott has said the code appears to do the right thing in lucid; the only counterexample given is of a disk that has had its metadata irreversibly *altered* by a previous version of util-linux in karmic. So we don't even really have a confirmed bug here to be documented.

Changed in ubuntu-release-notes:
status:	New → Invalid

Revision history for this message

ceg (ceg) wrote on 2010-04-21:

#41

I can't tell if the metadata has been altered. Scott do the dd files contain data that looks altered or irregular in any way?

Revision history for this message

ceg (ceg) wrote on 2010-04-21:

#42

Just installed lucid beta2-alternate with luks on raid in virtualbox.
Guess what, all turning down of this seems to be based on false speculations.

If I boot the CD again in rescue mode after the install
* It wants to open all md member devices as luks and asks for passphrases for each member.
* It does not seem to detect nor being able to open the actual luks on the md devices.

Changed in ubuntu-release-notes:
status:	Invalid → New
description:	updated

ceg (ceg) on 2010-04-21

summary:

- breaking raid: root raid_member opened as luks
+ silently breaking raid: root raid_members opened as luks

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2010-04-21: Re: [Bug 531240] Re: breaking raid: root raid_member opened as luks

#43

On Wed, 2010-04-21 at 11:52 +0000, ceg wrote:

> I can't tell if the metadata has been altered. Scott do the dd files
> contain data that looks altered or irregular in any way?
>
I wouldn't know how to tell that.

Scott
--
Scott James Remnant
<email address hidden>

ceg (ceg) on 2010-04-22

description:

updated

ceg (ceg) on 2010-04-24

description:

updated

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2010-04-26:

#44

ceg was able to provide me a copy of the front and back 1MB of his block device. Having carefully looked through it, I cannot see what this bug is about.

The block device given to me is LUKS encrypted with no RAID metadata.

Changed in util-linux (Ubuntu):
status:	Confirmed → Invalid

Steve Langasek (vorlon) on 2010-04-27

Changed in ubuntu-release-notes:
status:	New → Invalid

Revision history for this message

ceg (ceg) wrote on 2010-04-27:

#45

Thank you for helping to track this down.

So the superblock on the member that got mis-opened appears to have been overwritten. Maybe by writing/using to the opened luks device that thought the full partition is used by luks, without respecting the room of the raid superblock at the end of the disk.

How can opening a member happen? The cryptsetup-hook tries to identify the luks dependencies of ther rootfs.
http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/lucid/cryptsetup/lucid/annotate/head%3A/debian/initramfs/cryptroot-hook

May be cryptsetup selects a raid member as source device, or the right luks UUID is identified but gets tested againsts the raid member and it incorrectly matches ("cryptsetup isLuks" does not recongnize it really is a raid member not a luks device).

Revision history for this message

ceg (ceg) wrote on 2010-04-27:

#46

Importance was set to "wishlist" for cryptsetup, could you please revisit that with the new findings in mind?

Revision history for this message

Steve Langasek (vorlon) wrote on 2010-04-27:

#47

No. It's still a wishlist request on cryptsetup for the reasons already stated, and the util-linux bug does not exist in lucid.

Revision history for this message

ceg (ceg) wrote on 2010-04-27:

#48

I would not thing blkid (util-linux) has altered the metadata on disk.

As it got altered blkid may actually never have (occasionally) misreported anything to cause the behaviour. (The util-linux bug may not have exsted in karmic nor lucid.) I have seen blkid reporting raid correctly after installation and rebuilding the array, then things got repeatedly messed up and it reported luks (correctly as we have to assume now).

Can we be sure that blkid ever had a time when it would actually (occasionally) misreport "luks on raid" as luks in karmic?

If not this mostly puts all possibilities that can cause this to cryptsetup.

The question would then not only be whether "cryptsetup isLuks" could correctly check for luks devices to prevent any misopening. (Possibly by simply checking with blkid prior of probing the luks metadata.) But cryptsetup's initramfs scripts may itself select/detect/open a wrong device on occasions (mkinitramfs/boot).

Revision history for this message

ceg (ceg) wrote on 2010-04-27:

#49

I'd suggest to make "cryptsetup isLuks" check with blikid prior to probing the luks header (and have it report errors), as the next step in finding/fixing the root cause and reestablishing trust in using luks on raid.

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2010-04-27: Re: [Bug 531240] Re: silently breaking raid: root raid_members opened as luks

#50

On Tue, 2010-04-27 at 11:21 +0000, ceg wrote:

> Can we be sure that blkid ever had a time when it would actually
> (occasionally) misreport "luks on raid" as luks in karmic?
>
I don't think so; reading the code history, blkid has always "tried" to
report RAID before LUKS - however this may have been broken, of course.

But my hunch would be that blkid has generally worked.

Scott
--
Scott James Remnant
<email address hidden>

Revision history for this message

ceg (ceg) wrote on 2010-07-09:

#51

Just booted a 10.04 machine with "root on lvm on on luks on raid" with an incomplete array.

It failed to degrade the array and boot of course because the mdadm package is broken and unmaintained since years in ubuntu, but what is relevant here is the cryptsetup prompt that showed before failing instead of asking for the password:

cryptsetup: lvm device name (/dev/disk/by-uuid/...) does not begin with /dev/mapper/

Boy, that was close! Seems like cryptsetup was about to open a root raid_member a luks.
I am glad somebody introduced at least that check and it was able prevented breaking the raid array,
despite you guys denying that cryptsetup checks would be relevant.

Yet, this shows there is still a fundamental problem with detecting raid_members as devices they contain in the long term stable bug release.

Revision history for this message

Surbhi Palande (csurbhi) wrote on 2010-12-21:

#52

@ceg, wont the uuid for the raid array be different from that of the individual raid member device?

Revision history for this message

ceg (ceg) wrote on 2010-12-22:

#53

The UUID of the filesystem created on a raid mirror is of course present and identical on any member devices of the raid mirror mirror.

The raid member device (superblocks) are probably also all taged with (another) UUID of the raid device they assemble, plus the device ID. Mdadm seems to handle that correctly, its something else that wants to mount the filesystem directly from a member device instead of (waiting for) the raid device.

Revision history for this message

alfonso (alfonso-fiore) wrote on 2011-10-17:

#54

Hi,

I'm using LUKS over a RAID6 array created with mdadm...

is this bug still existing?

When using luks encryption on top of software raid devices, it can eventually break because linux_raid_member devices get opened directly as luks instead of being assembled into md devices (Bug #531240), and all this happens silently because mdadm monitoring is not set up (Bug #491443).

does it mean that if I manage to setup emails, the system will tell me something is going wrong? What kind of message should I get?

thank you!

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-10-21:

#55

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cryptsetup (Ubuntu):
status:	New → Confirmed
Changed in rescue (Ubuntu):
status:	New → Confirmed

Revision history for this message

Phillip Susi (psusi) wrote on 2012-05-09:

#57

I tried to reproduce this with 12.04 and it seems that both blkid and now cryptsetup both report the underlying raid component as such, and not as a luks component, so I think it's time to put this bug to bed.

Changed in cryptsetup (Ubuntu):
status:	Confirmed → Fix Released
no longer affects:	rescue (Ubuntu)

Revision history for this message

ceg (ceg) wrote on 2012-05-10:

#58

Best way to reproduce the actual wrong opening seemed to be in a virtualbox as in comment #42.

Revision history for this message

Phillip Susi (psusi) wrote on 2012-05-10:

#59

I used qemu-kvm. The disk partition was identified as a raid member and
bound to mdadm, the md array was identified as a luks member and bound
to dm-crypt. When trying to boot with one disk missing, the system
would not boot at all, which appears to be another bug ( mdadm won't
activate the array degraded ). I couldn't get dm-crypt to bypass mdadm
and decrypt the disk directly, and by all indications from SJR etc, this
should not be possible since blkid correctly identifies the disk as a
raid member, and cryptsetup is only called on things identified as a
luks member.

Can you reproduce this in virtualbox with 12.04?

Revision history for this message

ceg (ceg) wrote on 2012-05-11:

#60

If you still have the vm setup, you could just try to boot the text installer media again, enter the rescue mode and see how it wants to mount the existing raid disks (not degraded, both present) in the vm. That's where it used to always happen. Oherwise, only occasionally during normal reboots.

> When trying to boot with one disk missing, the system would not boot

Right, after way too many release cycles with the unmaintained mdadm modifications in ubuntu, I moved upstream.

Revision history for this message

Phillip Susi (psusi) wrote on 2012-05-12:

#61

When I try to boot the alternate cd in rescue mode, it gives me a screen to choose the root fs, which has an option to activate raid arrays. After activating the raid array, and selecting it, it tells me the mount failed. It looks like rescue mode has no cryptsetup support at all, but it isn't trying to activate the raid member by mistake, and blkid and cryptsetup isLuks do not think the raid members are luks volumes.

Revision history for this message

ceg (ceg) wrote on 2012-05-12:

#62

Thank you for testing, Phillip!

Cryptsetup support should be on the CD. But it only seemed to run in the first boot up stage of the rescue CD and used to try to open the raid members there, even before you set up the raids with the debian installer.
Good that it does not do that any more.

I thus consent that the bug can not be reproduced with new installs anymore. It could be though, that this is only due to a newer mdadm superblock version being used in newer installs. (I think the 0.9 version is at the end of the partion.)

Nevertheless, as you can see mdadm, lvm, and cryptsetup will need an improved event driven (udev) setup in the the rootfs and initramfs (both used also in the installer system) to setup the devices no matter how they are stacked up. Bug #251164

Revision history for this message

ceg (ceg) wrote on 2012-05-12:

#63

@alfonso: mdadm monitor will send you an email if the raid can not be set up completely (is degraded)

Revision history for this message

alfonso (alfonso-fiore) wrote on 2012-05-14:

#64

@ceg: thank you, I'm aware of that now. There are several howtos about simply setting up ubuntu to send out emails using postfix and any gmail account so I get all mdadm notification in my email.
also, I can confirm system doesn't boot with a degraded array (I added GRUB_CMDLINE_LINUX="bootdegraded=true" in /etc/default/grub).

Ubuntu
util-linux package

silently breaking raid: root raid_members opened as luks

Bug Description

Other bug subscribers

Remote bug watches

	Status	Importance	Assigned to
Release Notes for Ubuntu	Invalid	Undecided	Unassigned
cryptsetup (Ubuntu)	Fix Released	Wishlist	Unassigned
util-linux (Ubuntu)	Invalid	Low	Unassigned

Ubuntuutil-linux package

silently breaking raid: root raid_members opened as luks

Bug Description

Other bug subscribers

Remote bug watches

Ubuntu
util-linux package