cloud-init does not apply network configuration from NoCloud resource

Bug #1958377 reported by Martin Steigerwald
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
subiquity
New
Undecided
Unassigned
cloud-init (Ubuntu)
Incomplete
High
Chad Smith

Bug Description

I installed a new Ubuntu 20.04.3 LTS from server ISO yesterday to prepare a new template for my Proxmox VE based training setup.

It works, but cloud-init does not apply network configuration from NoCloud resource.

I have:

root@ubuntutemplate:~# cat /mnt/tmp/network-config
version: 1
config:
    - type: physical
      name: eth0
      mac_address: '66:50:19:8c:97:ef'
      subnets:
      - type: static
        address: '10.0.88.35'
        netmask: '255.0.0.0'
        gateway: '10.254.254.254'
    - type: nameserver
      address:
      - '10.0.88.90'
      search:
      - 'tux.lab'

as well as a matching network interface:

root@ubuntutemplate:~# ip link sh eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 66:50:19:8c:97:ef brd ff:ff:ff:ff:ff:ff

Yet, cloud-init does not apply this configuration. It also does not apply hostname.

It may be due to:

2022-01-18 12:14:07,520 - cc_final_message.py[WARNING]: Used fallback datasource

Confusingly enough some time ago I prepared an Ubuntu LTS 20.04 with Cloud Init and there it works. I made sure that the new VM uses the exact same cloud init configuration.

I had issues like this several times now that cloud init does not apply a configuration and it has been a mystery so far to me.

I tried:

- cloud-init clean ; cloud-init init
- rm /etc/machine.id ; cloud-init clean; reboot
- apt remove cloud-init*; rm -r /var/lib/cloud; apt install cloud-init packages; reboot

cause at first I thought it may be that is does not detect a second boot.

None of this worked.

So apparently cloud-init does not recognize the cloud-init NoCloud resource configuration. The NoCloud information is provided by an ISO images generated by Proxmox VE 7.1-9. This works with Debian, Devuan, Ubuntu LTS 18.04, another Ubuntu LTS 20.04 VM, CentOS 7/8, SLES 12/15.

The other information in there is:

root@ubuntutemplate:~# cat /mnt/tmp/meta-data
instance-id: 61a74c24a0b88039cc7ee3e0560d6ffe0a91f956

root@ubuntutemplate:~# cat /mnt/tmp/user-data
#cloud-config
hostname: ubuntutemplate
manage_etc_hosts: true
fqdn: ubuntutemplate.tux.lab
ssh_authorized_keys:
  - ssh-rsa […]
chpasswd:
  expire: False
users:
  - default
package_upgrade: true

root@ubuntutemplate:~# cat /mnt/tmp/vendor-data
root@ubuntutemplate:~#

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: cloud-init 21.4-0ubuntu1~20.04.1
ProcVersionSignature: Ubuntu 5.4.0-96.109-generic 5.4.157
Uname: Linux 5.4.0-96-generic x86_64
ApportVersion: 2.20.11-0ubuntu27.21
Architecture: amd64
CasperMD5CheckResult: pass
Date: Wed Jan 19 11:32:10 2022
InstallationDate: Installed on 2022-01-17 (1 days ago)
InstallationMedia: Ubuntu-Server 20.04.3 LTS "Focal Fossa" - Release amd64 (20210824)
PackageArchitecture: all
ProcEnviron:
 TERM=screen-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
SourcePackage: cloud-init
UpgradeStatus: No upgrade log present (probably fresh install)
cloud-init-log-warnings:
 2022-01-17 15:14:00,189 - cc_final_message.py[WARNING]: Used fallback datasource
 2022-01-18 12:14:07,520 - cc_final_message.py[WARNING]: Used fallback datasource
mtime.conffile..etc.cloud.cloud.cfg: 2022-01-19T10:34:45.660002
mtime.conffile..etc.cloud.cloud.cfg.d.05_logging.cfg: 2022-01-18T13:57:17.179925
user_data.txt: Error: path contained symlinks.

Revision history for this message
Martin Steigerwald (ms-proact) wrote :
Revision history for this message
Martin Steigerwald (ms-proact) wrote :

One more comment. I switched to

GRUB_CMDLINE_LINUX_DEFAULT="net.ifnames=0 biosdevname=0"

as well, in order to make sure cloud-init finds the network interface after it did not work with "ens18".

However on the other Ubuntu LTS 20.04 VM I did not so this and cloud-init generates a config with "set-name: eth0" into it. I'd consider uploading the working template to the Proxmox VE instance with the non working one, but it would take a considerable amount of time using my uplink.

So I'd prefer to fix the non working one and finally understand what is going on in case cloud-init just does not apply the configuration. I had it with a Debian image that was provided to me some time ago, there it also did not apply the configuration, while on the Debian image I made, it did.

Revision history for this message
Martin Steigerwald (ms-proact) wrote :

Ah, and of course in order to send the additional data for this bug report, I manually fixed up the network configuration.

Revision history for this message
Martin Steigerwald (ms-proact) wrote :

I believe the ISO to have the correct name as well:

root@ubuntutemplate:~# file -sk /dev/sr0
/dev/sr0: ISO 9660 CD-ROM filesystem data 'cidata'\012- (Lepton 3.x), scale 0-0, spot sensor temperature 0.000000, unit celsius, color scheme 0, calibration: offset 0.000000, slope 0.000000\012- (Lepton 2.x), scale 0-0, spot sensor temperature 0.000000, unit celsius, color scheme 0, calibration: offset 0.000000, slope 0.000000\012- data

Revision history for this message
James Falcon (falcojr) wrote :

Looking at the cloud-init.log, it appears that network is intentionally disabled.

2022-01-17 15:13:47,267 - stages.py[DEBUG]: network config disabled by system_cfg
2022-01-17 15:13:47,267 - stages.py[INFO]: network config is disabled by system_cfg

which means network is disabled via the "cloud config" section here:
https://cloudinit.readthedocs.io/en/latest/topics/network-config.html#disabling-network-configuration

That said, it doesn't appear that this log matches what you're describing in your report. Looking at run/cloud-init/instance-data-sensitive.json in logs.tgz.gz , the userdata and metadata don't match what you described in the bug description.

Can you verify the the logs are correct? If not, can you attach logs pertaining to your run?

Changed in cloud-init (Ubuntu):
status: New → Incomplete
Revision history for this message
Martin Steigerwald (ms-proact) wrote :

I used "ubuntu-bug cloud-init" to report the bug. I do not see how the logs it send should be incorrect. I did not disable network configuration. I disable a lot of things I do not need, but initial network configuration is mostly what I use cloud-init for. I made sure the cloud init configuration is identical to the other Ubuntu 20.04 LTS VM where cloud-init actually works.

I do not have the affect Ubuntu VM available at the moment, but I can check later on when I have it available again. But I would be highly surprised in case "ubuntu-bug" altered the log files or somehow got the wrong log files.

Revision history for this message
Martin Steigerwald (ms-proact) wrote :

I can confirm the log lines you posted with exactly the same time stamp on the affected VM.

One thing to note: There was already an initial cloud init setup straight after the installation. I changed cloud-init.cfg and logging to my needs afterwards. As far as I can remember also the initial configuration did not disable networking, but I am not completely sure on that. The initial configuration by have been provided by the Ubuntu LTS 20.04.3 server install ISO or may be the default configuration from the cloud-init package itself.

I did a clone from ubuntutemplate. I will fix up its network configuration using iproute2 and then use "ubuntu-bug cloud-init" to provide another set of log files. This cloned VM is supposed to be called "ubuntu01", yet cloud-init did not adapt its host name, so don't be surprised to see the same "ubuntutemplate" hostname as before.

Revision history for this message
Martin Steigerwald (ms-proact) wrote :

Hmmm, Launchpad does not seem to like me to add the log files from cloned VM "ubuntu01". Well, I confirmed that the provided log matches what I see on the original "ubuntutemplate" VM so I bet this will do.

Revision history for this message
James Falcon (falcojr) wrote :

Sorry, I should have been more clear in my comment.

In your bug report, you mentioned:
root@ubuntutemplate:~# cat /mnt/tmp/meta-data
instance-id: 61a74c24a0b88039cc7ee3e0560d6ffe0a91f956

root@ubuntutemplate:~# cat /mnt/tmp/user-data
#cloud-config
hostname: ubuntutemplate
manage_etc_hosts: true
fqdn: ubuntutemplate.tux.lab
ssh_authorized_keys:
  - ssh-rsa […]
chpasswd:
  expire: False
users:
  - default
package_upgrade: true

but in logs.tgz.gz, at cloud-init-logs-2022-01-19/run/cloud-init/instance-data-sensitive.json, we have:
"instance-id": "3f3046df-c334-4b08-b37f-53f80bca337a"

"userdata_raw": "#cloud-config\ngrowpart:\n mode: 'off'\nlocale: de_DE.UTF-8\npreserve_hostname: true\nresize_rootfs: false\nssh_pwauth: true\nusers:\n- gecos: tux\n groups: !!set\n adm: null\n cdrom: null\n dip: null\n lxd: null\n plugdev: null\n sudo: null\n lock_passwd: false\n name: tux\n passwd: $6$hHzVL5ddgPuMNw/A$D2.oXZyEnuw34910K7TjTGJxp7Lx6AFTl76nNvA2svCndStUwJ8wS9nc7mrTfL3cA0BludPDHnwCAFqkO9clj1\n shell: /bin/bash\n"

That doesn't match what you posted in the bug report, so I'm just wondering why there's a mismatch between the logs and text of your bug report. You mentioned an initial cloud-init installation. Is one config from that installation with the other being from after your updates?

Either way, if the log line I posted matches the time stamp on your VM, then I think that points to the problem. cloud-init found a network config with the contents of 'config: disabled' in one of these locations:
https://cloudinit.readthedocs.io/en/latest/topics/network-config.html#default-behavior

Do you have a "network:" entry in /etc/cloud/cloud.cfg or any files under /etc/cloud/cloud.cfg.d ? Do you have "ip=", "ip6=" or "network-config=" in your kernel cmdline? Since it's a NoCloud instance, does /var/lib/cloud/seed/nocloud-net/network-config contain a "network: disabled" entry?

Revision history for this message
Martin Steigerwald (ms-proact) wrote :
Download full text (3.5 KiB)

I certainly did not put one of those in there. But Subiquity did:

root@ubuntutemplate:~# cat /etc/cloud/cloud.cfg.d/subiquity-disable-cloudinit-networking.cfg
network: {config: disabled}

I may not have noticed a warning of the installer regarding this, but in case it did not warn about it, IMO it definitely should.

Now – after removing that distraction – I am back at the original issue of it not applying the network settings from the NoCloud.

I have

root@ubuntutemplate:~# cat /mnt/tmp/meta-data
instance-id: 61a74c24a0b88039cc7ee3e0560d6ffe0a91f956
root@ubuntutemplate:~# cat /mnt/tmp/network-config
version: 1
config:
    - type: physical
      name: eth0
      mac_address: '66:50:19:8c:97:ef'
      subnets:
      - type: static
        address: '10.0.88.35'
        netmask: '255.0.0.0'
        gateway: '10.254.254.254'
    - type: nameserver
      address:
      - '10.0.88.90'
      search:
      - 'tux.lab'

versus

root@ubuntutemplate:~# cat /etc/netplan/50-cloud-init.yaml
# This file is generated from information provided by the datasource. Changes
# to it will not persist across an instance reboot. To disable cloud-init's
# network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
network:
    ethernets:
        eth0:
            dhcp4: true
            match:
                macaddress: 66:50:19:8c:97:ef
            set-name: eth0
    version: 2

This is clearly not a match.

Excerpt from "/run/cloud-init/instance-data-sensitive.json":

{
 "base64_encoded_keys": [],
 "ds": {
  "_doc": "EXPERIMENTAL: The structure and format of content scoped under the 'ds' key may change in subsequent releases of cloud-init.",
  "meta_data": {
   "instance-id": "3f3046df-c334-4b08-b37f-53f80bca337a"
  }

  "datasource": {
   "None": {
    "metadata": {
     "instance-id": "3f3046df-c334-4b08-b37f-53f80bca337a"
    },
    "userdata_raw": …
   }
  },
  "datasource_list": [
   "None"
  ],

  "instance-id": "iid-datasource-none",
  "instance_id": "iid-datasource-none",

Why does it say datasource "none" instead of NoCloud?

The different instance ID I bet comes from the "none" datasource.

To me it still appears that for some reason it does not pick up the NoCloud resource that Proxmox VE provides, despite cloud-init recognizing it on all of my other VMs including another Ubuntu LTS 20.04 one.

I would like to find out why.

root@ubuntutemplate:~# df -hT -t iso9660
Filesystem Type Size Used Avail Use% Mounted on
/dev/sr0 iso9660 356K 356K 0 100% /mnt/tmp

root@ubuntutemplate:~# file -sk /dev/sr0
/dev/sr0: ISO 9660 CD-ROM filesystem data 'cidata'\012- (Lepton 3.x), scale 0-0, spot sensor temperature 0.000000, unit celsius, color scheme 0, calibration: offset 0.000000, slope 0.000000\012- (Lepton 2.x), scale 0-0, spot sensor temperature 0.000000, unit celsius, color scheme 0, calibration: offset 0.000000, slope 0.000000\012- data

root@ubuntutemplate:~# find /mnt/tmp
/mnt/tmp
/mnt/tmp/meta-data
/mnt/tmp/network-config
/mnt/tmp/user-data
/mnt/tmp/vendor-data

seems perfectly reasonable to me.

Regarding /run/cloud-init/instance-data...

Read more...

Revision history for this message
Chad Smith (chad.smith) wrote :
Download full text (5.3 KiB)

TLDR: generally for individuals looking to create custom Ubuntu Server golden images, we suggest they start from a stock Ubuntu Server cloud image instead[1] of the subiquity-based Ubuntu Server Live installer images[2].

[1] Ubuntu Server cloud images for 20.04:https://cloud-images.ubuntu.com/releases/focal/release/
[2] Ubuntu Server Live installer(subiquity) images: https://releases.ubuntu.com/20.04/

Sorry for the delay here in response. I wanted to get this bug all the right artifacts given the way that subiquity installer uses cloud-init currently on focal 20.04 to make sure we understand the "why".

The primary reason you are not seeing your NoCloud config is because subiquity cloud.cfg.d. artifacts in this case are preventing cloud-init from even detecting DataSourceNoCloud:
  Check any of:
   - DataSourceNone detected in /run/cloud-init/status.json
   - sudo cloud-id # returns none instead of nocloud
   - sudo cloud-init query v1.platform # none instead of NoCloud

Your desire looks to be that you want NoCloud datasource detected instead of "None".

The failures you are seeing are expected at the moment though I agree they are not optimal for your use-case. Both cloud-init and subiquity teams are working to iteratively improve this use-case for reuse of golden images for subiquity-installed images.

Both cloud-init and subiquity teams are working to improve support for use-cases with subiquity-based installer ISO images to make sure this path is better handled in the future. That said, here's a workaround and some reasoning behind the behavior at the moment.

-- workaroud to create golden image on subiquity-installed Ubuntu Server ISOs --
I expect you can "clean up" subiquity artifacts and re-enable cloud-init if you wish to use that manually installed server image as a golden image for cloining:

1. Clean up any cloud-init artifacts to ensure next image boot will be a seen as a greenfield(fresh) cloud-init install

  sudo cloud-init clean --logs # best-practice to always use on all cloud-init-based golden images

2. Clean up subiquity install config artifacts which had disabled cloud-init in this image:

sudo rm -f /etc/netplan/00-installer-config.yaml /etc/cloud/cloud.cfg.d/curtin-preserve-sources.cfg /etc/cloud/cloud.cfg.d/99-installer.cfg /etc/cloud/cloud.cfg.d/subiquity-disable-cloudinit-networking.cfg

The next boot of the machine will have an active cloud-init that should detect the proper datasource config.

-- background --
Subiquity uses curtin and cloud-init as tools to accomplish a one-shot install of server images using the cloud-init's DatasourceNone configuration (note "None" not "NoCloud").

There is a tension between the subiquity install use-case and cloud-init which generally tries to inspect whether it needs to re-run across reboots of a system to adapt in the event that instance metadata configuration changes across boot. This is where the rub comes.

I believe we want cloud-init to be in a disabled state after a manual Ubuntu Server Live installer(subiquity) because typical human-driven installer configuration is generally for a one-time/unique server deployment, not so much at scale. I'd expect s...

Read more...

Revision history for this message
Chad Smith (chad.smith) wrote :

Marking subiquity project for reference tracking and information as we better define what supported use-cases cloud-init/subiquity we need to cover in the future. It may very will be a WON'T FIX from suquitity point of view at the moment as this bug doesn't yet represent an actionable feature request for subiquity.

Revision history for this message
Martin Steigerwald (ms-proact) wrote :
Download full text (3.5 KiB)

Dear Chad. That you very much for those very detailed comment about subiquity artifacts.

I did

root@ubuntutemplate:~# cloud-init clean --logs
root@ubuntutemplate:~# sudo rm -f /etc/netplan/00-installer-config.yaml /etc/cloud/cloud.cfg.d/curtin-preserve-sources.cfg /etc/cloud/cloud.cfg.d/99-installer.cfg /etc/cloud/cloud.cfg.d/subiquity-disable-cloudinit-networking.cfg
root@ubuntutemplate:~# cloud-init clean --logs
root@ubuntutemplate:~# cloud-init clean

I do get "nocloud" as result for "cloud-id" now.

I also get a valid network configuration in '/etc/netplan/50-cloud-init.yaml' for the right MAC address.

Also NetPlan seems to apply it:

root@ubuntutemplate:~# ls -l /run/systemd/network/
total 8
-rw-r--r-- 1 root root 69 Feb 17 14:50 10-netplan-eth0.link
-rw-r--r-- 1 root root 158 Feb 17 14:50 10-netplan-eth0.network

However I got a long wait on "a start job is running for Waiting for Network to be Configured", an unconfigured network after it continues after about 2 minutes and this:

root@ubuntutemplate:~# systemctl --state=failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
● systemd-networkd-wait-online.service loaded failed failed Wait for Network to be Configured

Excerpt from journalctl:

Feb 17 14:38:33 ubuntutemplate systemd-networkd[520]: Enumeration completed
Feb 17 14:38:33 ubuntutemplate systemd[1]: Started Network Service.
Feb 17 14:38:33 ubuntutemplate systemd[1]: Starting Wait for Network to be Configured...
Feb 17 14:38:33 ubuntutemplate systemd-networkd[520]: eth0: IPv6 successfully enabled
Feb 17 14:38:33 ubuntutemplate systemd-networkd[520]: eth0: DHCP6 CLIENT: Failed to set DUID: No such file or directory
Feb 17 14:38:33 ubuntutemplate systemd-networkd[520]: eth0: Failed
Feb 17 14:38:33 ubuntutemplate systemd[1]: Starting Network Name Resolution...
Feb 17 14:38:33 ubuntutemplate systemd-networkd-wait-online[521]: managing: eth0

Feb 17 14:40:33 ubuntutemplate systemd-networkd-wait-online[521]: Event loop failed: Connection timed out
Feb 17 14:40:33 ubuntutemplate systemd[1]: systemd-networkd-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Feb 17 14:40:33 ubuntutemplate systemd[1]: systemd-networkd-wait-online.service: Failed with result 'exit-code'.
Feb 17 14:40:33 ubuntutemplate systemd[1]: Failed to start Wait for Network to be Configured.

However I did not tell cloud-init to do anything about IPv6.

However I found

systemd, no internet after install [solved]
https://forums.gentoo.org/viewtopic-t-1144917.html

I verified and I did indeed not have an "/etc/machine-id".

So I did:

root@ubuntutemplate:~# systemd-machine-id-setup
Initializing machine ID from KVM UUID.

and now it works.

So okay, finally I have an solution. But is has been a very long way. And there are some open questions.

1) Why didn't the VM not have a machine-id to begin with?

2) And why on earth does Systemd fail to configure the network if there is no machine-id?

3) Why does it fail to configure at least the IPv4 part of the network, in case the IPv6 part does not work?

I bet there are detailed and complex answers to each of those, but with my admin hat I don't e...

Read more...

Chad Smith (chad.smith)
Changed in cloud-init (Ubuntu):
importance: Undecided → High
assignee: nobody → Chad Smith (chad.smith)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.