curtin fails when part of a VG is around

Bug #1870037 reported by Frank Heimes
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
Undecided
Unassigned
curtin
Fix Released
Critical
Ryan Harper
subiquity
Fix Released
Undecided
Unassigned

Bug Description

On a LPAR installation with DASD disks, where two DASD disks where activated, but where the installation should just be done on the second, I came across the following issue:

At 'Guided storage configuration' by default the first disk is pre-selected '0X260B':

================================================================================
  Guided storage configuration [ Help ]
================================================================================
  Configure a guided storage layout, or create a custom one:

  (X) Use an entire disk

       [ 0X260B local disk 6.877G v ]

       [ ] Set up this disk as an LVM group

            [ ] Encrypt the LVM group with LUKS

                         Passphrase:

                 Confirm passphrase:

  ( ) Custom storage layout

                                 [ Done ]
                                 [ Back ]

I then selected the second disk 'LX260C':

================================================================================
  Guided storage configuration [ Help ]
================================================================================
  Configure a guided storage layout, or create a custom one:

  (X) Use an entire disk

       [ LX260C local disk 6.877G v ]

       [ ] Set up this disk as an LVM group

            [ ] Encrypt the LVM group with LUKS

                         Passphrase:

                 Confirm passphrase:

  ( ) Custom storage layout

                                  [ Done ]
                                  [ Back ]

I get the following summary - separated by 'available devices' and 'unused devices':

================================================================================
  Storage configuration [ Help ]
================================================================================
  FILE SYSTEM SUMMARY ^
                                                                             
    MOUNT POINT SIZE TYPE DEVICE TYPE │
  [ / 6.875G new ext4 new partition of local disk > ] │
                                                                             
                                                                             
  AVAILABLE DEVICES │
                                                                             
    DEVICE TYPE SIZE │
  [ 0X260B local disk 6.877G > ]│
    partition 1 existing, already formatted as ext4, not 1.000G >
                 mounted
    partition 2 existing, unused 5.876G >

  [ Create software RAID (md) > ]
  [ Create volume group (LVM) > ] v

                                 [ Done ]
                                 [ Reset ]
                                 [ Back ]

================================================================================
  Storage configuration [ Help ]
================================================================================
    DEVICE TYPE SIZE ^
  [ 0X260B local disk 6.877G > ]
    partition 1 existing, already formatted as ext4, not 1.000G >
                 mounted
    partition 2 existing, unused 5.876G >

  [ Create software RAID (md) > ] │
  [ Create volume group (LVM) > ] │
                                                                             
                                                                             
  USED DEVICES │
                                                                             
    DEVICE TYPE SIZE │
  [ LX260C local disk 6.877G > ]│
    partition 1 new, to be formatted as ext4, mounted at / 6.875G > │
                                                                             v

                                 [ Done ]
                                 [ Reset ]
                                 [ Back ]

After proceeding I later face this crash:

================================================================================
  Storage configuration [ Help ]
================================================================================
    DEVICE TYPE SIZE ^
  [ 0X260B local disk 6.877G > ]

   ┌────────────────────── Confirm destructive action ──────────────────────┐
   │ │
   │ Selecting Continue below will begin the installation process and │
   │ result in the loss of data on the disks selected to be formatted. │
   │ │
   │ You will not be able to return to this or a previous screen once the │
   │ installation has started. │
   │ │
   │ Are you sure you want to continue? │
   │ │
   │ [ No ] │
   │ [ Continue ] │
   │ │
   └────────────────────────────────────────────────────────────────────────┘

                                 [ Reset ]
                                 [ Back ]

================================================================================
  Profile setup [ Help ]
================================================================================
  Enter the username and password you will use to log in to the system. You
  can configure SSH access on the next screen but a password is still needed
  for sudo.

              Your name: ubuntu

     Your server's name: s1lp15
                          The name it uses when it talks to other computers.

        Pick a username: ubuntu

      Choose a password: ********

  Confirm your password: ********

                                 [ Done ]

================================================================================
  An error occurred during installation [ Help ]
================================================================================

   ┌────────────────────────────────────────────────────────────────────────┐
   │ │
   │ Sorry, there was a problem completing the installation. │
   │ │
   │ [ View full report ] │
   │ │
   │ If you want to help improve the installer, you can send an error │
   │ report. │
   │ │
   │ [ Send to Canonical ] │
   │ │
   │ Do you want to try starting the installation again? │
   │ │
   │ [ Restart the installer ] │
   │ │
   │ [ Close report ] │
   │ │
   └────────────────────────────────────────────────────────────────────────┘

ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu21
Architecture: s390x
CasperVersion: 1.441
CrashDB: {'impl': 'launchpad', 'project': 'subiquity'}
CurrentDmesg:
 [ 0.091308] Linux version 5.4.0-18-generic (buildd@bos02-s390x-009) (gcc ver
sion 9.2.1 20200306 (Ubuntu 9.2.1-31ubuntu3)) #22-Ubuntu SMP Sat Mar 7 18:05:50
UTC 2020 (Ubuntu 5.4.0-18.22-generic 5.4.24)
 [ 0.091311] setup.6bac7a: Linux is running natively in 64-bit mode
 [ 0.091342] setup.b050d0: The maximum memory size is 43008MB
 [ 0.091359] numa.196305: NUMA mode: plain
 [ 0.091399] cpu.33a262: 8 configured CPUs, 0 standby CPUs
 [ 0.091410] cpu.643eaf: The CPU configuration topology of the machine is: 0
0 4 2 3 8 / 4
 [ 0.091994] Write protected kernel read-only data: 11696k
 [ 0.092302] Zone ranges:
 [ 0.092303] DMA [mem 0x0000000000000000-0x000000007fffffff]
 [ 0.092304] Normal [mem 0x0000000080000000-0x0000000a7fffffff]
 [ 0.092305] Movable zone start for each node
 [ 0.092306] Early memory node ranges
 [ 0.092307] node 0: [mem 0x0000000000000000-0x00000007ffffffff]
 [ 0.092406] Initmem setup node 0 [mem 0x0000000000000000-0x00000007ffffffff]/var/crash/1585726685.214678526.install_fail.crash

curtin.util.ProcessExecutionError: Unexpected error while running command.
Command: ['vgchange', '--activate=y']
Exit code: 5
Reason: -
Stdout: 0 logical volume(s) in volume group "s1lp15_vg" now active

Stderr: WARNING: Couldn't find device with uuid 4uOyAr-gwjV-Wx0F-BX2q-oxF4-IQH
s-9Ic6Xh.
          WARNING: VG s1lp15_vg is missing PV 4uOyAr-gwjV-Wx0F-BX2q-oxF4-IQHs-9I
c6Xh (last written to /dev/dasda2).
          Refusing activation of partial LV s1lp15_vg/s1lp15_lv. Use '--activat
ionmode partial' to override.

Unexpected error while running command.
Command: ['vgchange', '--activate=y']
Exit code: 5
Reason: -
Stdout: 0 logical volume(s) in volume group "s1lp15_vg" now active

Stderr: WARNING: Couldn't find device with uuid 4uOyAr-gwjV-Wx0F-BX2q-oxF4-IQH
s-9Ic6Xh.
          WARNING: VG s1lp15_vg is missing PV 4uOyAr-gwjV-Wx0F-BX2q-oxF4-IQHs-9
Ic6Xh (last written to /dev/dasda2).

I only selected to reformat the entire disk, which seem to have happend, and afterwards the entire disk is used for the new installation.
I'm wondering about LVM and VG here, is using LVM always the default in case an installation is done on an entire disk? I don't think so ...

I've attached the full /var/log and /var/crash ...

Related branches

Revision history for this message
Frank Heimes (fheimes) wrote :
Changed in ubuntu-z-systems:
assignee: nobody → Canonical Foundations Team (canonical-foundations)
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

This looks like there is one disk that was previously part of a LVM volume group, but not all disks that were part of said group. I guess curtin needs to be more tolerant about this?

summary: - subiquity fails to handle multi disk configuration (DASDs on s390x)
+ curtin fails when part of a VG is around
Revision history for this message
Frank Heimes (fheimes) wrote :

Hi yes, could be that one of the two or both disks were part of an LVM before.
Since I reformatted one disk and the other is 'not in use' I think an installation should succeed.
I agree more tolerance (or freedom) is needed in such cases ...

Changed in ubuntu-z-systems:
assignee: Canonical Foundations Team (canonical-foundations) → nobody
Revision history for this message
Ryan Harper (raharper) wrote :

Thanks for the bug. Curtin's clear-holders attempt to activate the vg should tolerate failure. I'm now working on recreating the partial VG in our testing harness, once I can recreate I expect the change is for curtin to eat any non-zero exit on the vgchange; the physical devices, If used in the storage configuration, will end up getting wiped at the partition or disk level; and we don't need to worry about Logical Volumes becoming active if the VG is not available.

Changed in curtin:
assignee: nobody → Ryan Harper (raharper)
importance: Undecided → Critical
status: New → In Progress
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → In Progress
Revision history for this message
Ryan Harper (raharper) wrote :

Hrm, it's a scary warning, but it's not fatal.

root@ubuntu:/home/ubuntu# vgchange -ay
  WARNING: Couldn't find device with uuid tdDmVY-Wx2J-4sFM-pZCc-fNFM-Bt4L-bdo8jA.
  WARNING: VG vg8 is missing PV tdDmVY-Wx2J-4sFM-pZCc-fNFM-Bt4L-bdo8jA (last written to /dev/vdb1).
  0 logical volume(s) in volume group "vg8" now active
root@ubuntu:/home/ubuntu# echo $?
0

However, I see the logs for this one, vgchange returns 5, and says "Refusing to " ... I wonder if there's a different lvm policy on s390x or something about dasd...

Working on reproducing the failure. trying with scsi disks instead of virtio.

Revision history for this message
Ryan Harper (raharper) wrote :

If you have an LPAR where you can reproduce this; it would be really great to get a shell there so I can understand why on s390x we see the exit code of 5.

Revision history for this message
Frank Heimes (fheimes) wrote :

I recreated that situation and left the system 's1lp15' into the state where the error just happened and got displayed in d-i.
One can now open the "Integrated ASCII Console" (press F1, to trigger a refresh - in case needed) and entering the shell.
Feel free to directly reach out to me (e.g. via IRC) ...

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

i hate all of the broken copy & paste screens, they look hedious.

Please take screenshots of a window and attach them as pngs going forward.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

or at least attach them as .txt attachments to the bug, such that launchpad doesn't wrap them and break layout.

Revision history for this message
Ryan Harper (raharper) wrote :

OK. I've recreated the failure. The key was to use striped logical volumes and *fill* them with data to allocate extents on each physical volume.

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

This bug is fixed with commit 3240d84b to curtin on branch master.
To view that commit see the following URL:
https://git.launchpad.net/curtin/commit/?id=3240d84b

Changed in curtin:
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
tags: added: req4focal
Changed in subiquity:
status: New → Incomplete
status: Incomplete → Invalid
Changed in ubuntu-z-systems:
status: Fix Committed → In Progress
Changed in curtin:
status: Fix Committed → In Progress
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@Andrew you cannot add req4focal tags without agreement that we are accepting these. I'm not sure if there are any actions on this bug report.

Closing tasks, dropping req4focal tag, and setting z-systems as incomplete.

tags: removed: req4focal
Changed in curtin:
status: In Progress → Fix Released
Changed in ubuntu-z-systems:
status: In Progress → Incomplete
Revision history for this message
John George (jog) wrote :

When will the curtin fix be incorporated into the subiquity snap?
The 19.3-56-ga6cd01f01 version of curtin gets stuck at vgremove.
Even the subiquity edge snap still has this version.

root@ubuntu-server:/# curtin version
19.3-56-ga6cd01f01
root@ubuntu-server:/# snap info subiquity
name: subiquity
summary: Ubuntu installer
publisher: Canonical*
store-url: https://snapcraft.io/subiquity
license: unset
description: |
  The Ubuntu server installer
commands:
  - subiquity.console-conf
  - subiquity.probert
  - subiquity
  - subiquity.subiquity-configure-apt
  - subiquity.subiquity-configure-run
  - subiquity.subiquity-loadkeys
services:
  subiquity.subiquity-service: simple, enabled, active
snap-id: ba2aj8guta0zSRlT3QM5aJNAUXPlBtf9
tracking: latest/stable/ubuntu-20.04
refresh-date: today at 13:21 UTC
channels:
  latest/stable: 20.03.2 2020-03-20 (1570) 51MB classic
  latest/candidate: 20.03.3 2020-03-27 (1582) 55MB classic
  latest/beta: ^
  latest/edge: 20.04.3 2020-04-22 (1773) 55MB classic
installed: 20.04.3 (1773) 55MB classic

root@ubuntu-server:/# snap refresh --channel latest/edge subiquity
subiquity (edge) 20.04.3 from Canonical* refreshed

root@ubuntu-server:/# curtin version
19.3-56-ga6cd01f01
root@ubuntu-server:/# snap list subiquity
Name Version Rev Tracking Publisher Notes
subiquity 20.04.3 1773 latest/edge canonical* classic

Revision history for this message
John George (jog) wrote :
Revision history for this message
Ryan Harper (raharper) wrote :

Thanks for the log. I do not see any curtin errors. But something is odd, curtin starts the install, successfully removes the LVs, and then there is no more output, and no error:

Current device storage tree:
dasdb
`-- dasdb1
    |-- dm-1
    `-- dm-0
dasdb1
|-- dm-1
`-- dm-0
Shutdown Plan:
{'level': 6, 'device': '/sys/class/block/dm-1', 'dev_type': 'lvm'}
{'level': 6, 'device': '/sys/class/block/dm-0', 'dev_type': 'lvm'}
{'level': 4, 'device': '/sys/class/block/dasdb/dasdb1', 'dev_type': 'partition'}
{'level': 2, 'device': '/sys/class/block/dasdb', 'dev_type': 'disk'}
shutdown running on holder type: 'lvm' syspath: '/sys/class/block/dm-1'
Running command ['dmsetup', 'splitname', 's5lp1--gen01--vg-swap_1', '-c', '--noheadings', '--separator', '=', '-o', 'vg_name,lv_name'] with allowed return codes [0] (capture=True)
Wiping lvm logical volume: /dev/s5lp1-gen01-vg/swap_1
wiping 1M on /dev/s5lp1-gen01-vg/swap_1 at offsets [0, -1048576]
using "lvremove" on s5lp1-gen01-vg/swap_1
Running command ['lvremove', '--force', '--force', 's5lp1-gen01-vg/swap_1'] with allowed return codes [0] (capture=False)
  Logical volume "swap_1" successfully removed
Running command ['lvdisplay', '-C', '--separator', '=', '--noheadings', '-o', 'vg_name,lv_name'] with allowed return codes [0] (capture=True)
Running command ['pvscan'] with allowed return codes [0] (capture=True)
Running command ['vgscan', '--mknodes'] with allowed return codes [0] (capture=True)
shutdown running on holder type: 'lvm' syspath: '/sys/class/block/dm-0'
Running command ['dmsetup', 'splitname', 's5lp1--gen01--vg-root', '-c', '--noheadings', '--separator', '=', '-o', 'vg_name,lv_name'] with allowed return codes [0] (capture=True)
Wiping lvm logical volume: /dev/s5lp1-gen01-vg/root
wiping 1M on /dev/s5lp1-gen01-vg/root at offsets [0, -1048576]
using "lvremove" on s5lp1-gen01-vg/root
Running command ['lvremove', '--force', '--force', 's5lp1-gen01-vg/root'] with allowed return codes [0] (capture=False)
  Logical volume "root" successfully removed
Running command ['lvdisplay', '-C', '--separator', '=', '--noheadings', '-o', 'vg_name,lv_name'] with allowed return codes [0] (capture=True)
Running command ['pvdisplay', '-C', '--separator', '=', '--noheadings', '-o', 'vg_name,pv_name'] with allowed return codes [0] (capture=True)
Running command ['vgremove', '--force', '--force', 's5lp1-gen01-vg'] with allowed return codes [0, 5] (capture=False)

The crashes are all "block probe failures"

 2020-04-27 13:30:58,072 ERROR block-discover:151 block probing failed restricted=False
 Traceback (most recent call last):
   File "/snap/subiquity/1773/lib/python3.6/site-packages/subiquity/controllers/filesystem.py", line 144, in _probe
     self._probe_once_task.task, 15.0)
   File "/snap/subiquity/1773/usr/lib/python3.6/asyncio/tasks.py", line 362, in wait_for
     raise futures.TimeoutError()
 concurrent.futures._base.TimeoutError
 2020-04-27 13:30:58,074 INFO subiquity.core:438 saving crash report 'block probing crashed with TimeoutError' to /var/crash/1587994258.074117661.block_probe_fail.crash

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Can you drop to a shell (f2, ctrl-z, select the option in the help menu) and run "probert"? Does that hang forever? It looks like some lvm related commands are hanging. In any case, this seems to be a separate bug that the initially reported one.

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Moving back to "incomplete" for subiquity following John George's reproduction of this issue in comment #13.

Changed in subiquity:
status: Invalid → Incomplete
Revision history for this message
John George (jog) wrote :

@mwhudson Yes running "probert" in the shell hangs forever.

Revision history for this message
John George (jog) wrote :

I've opened https://bugs.launchpad.net/subiquity/+bug/1875948 to track this new issue.

Revision history for this message
Frank Heimes (fheimes) wrote :

The original issue reported here is Fix Released - hence changing the project entry of this ticket to Fix Released, too, since the new issue is reported and tracked now in LP 1875948.

Changed in ubuntu-z-systems:
status: Incomplete → Fix Released
Changed in subiquity:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.