[SRU] Creating snapshot fails due to nonexistent temporary directory

Bug #1244694 reported by Daniel Speichert
42
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
libvirt (Ubuntu)
Fix Released
Undecided
Unassigned
Saucy
Won't Fix
High
Unassigned

Bug Description

---- SRU Justification ---

[Impact]

In a libvirt-based OpenStack deployment, Nova fails to snapshot instances, failing with error:

internal error: unable to execute QEMU command 'drive-mirror': Could not open '/var/lib/nova/instances/snapshots/tmp5DdrIO/236df3e170e64fabaeb3c7601e2d6c47.delta'

I had originally discovered this bug using the Tempset test suite while verifying an unrelated OpenStack SRU, but other users are experiencing this in the wild.

[Test Case]

Deploy an OpenStack cloud based on Ubuntu Saucy and OpenStack Havana, then attempt to snapshot a running instance. The Tempest integration test suite contains a snapshot test case: tempest.api.compute.images.test_images_oneserver.ImagesOneServerTestJSON.test_create_delete_image [1]

[Regression Potential]

The proposed fix is isolated to the libvirt packaging and simply appends an additional directory exception to the packages apparmor configuration, so that libvirt has appropriate access to the directory used during the process of snapshotting an instance.

[1]https://github.com/openstack/tempest/blob/master/tempest/api/compute/images/test_images_oneserver.py#L92

--- Original Bug ---

In some cases (not for all instances, just for some) the following error prevents creating the snapshot:

2013-10-25 14:49:30.724 22980 AUDIT nova.compute.manager [req-6e9326d7-64df-40f7-bc81-190ec5234de2 657f1aca48d24eaf9655e0b77b2bc6d9 35b2b08cc3f44a538cf3535043793a2a] [instance: db9c8a72-6ce2-41b7-8f7a-be0be8468667] instance snapshotting
2013-10-25 14:49:30.944 22980 INFO nova.virt.libvirt.driver [req-6e9326d7-64df-40f7-bc81-190ec5234de2 657f1aca48d24eaf9655e0b77b2bc6d9 35b2b08cc3f44a538cf3535043793a2a] [instance: db9c8a72-6ce2-41b7-8f7a-be0be8468667] Beginning live snapshot process
2013-10-25 14:49:32.006 22980 INFO nova.virt.libvirt.driver [req-6e9326d7-64df-40f7-bc81-190ec5234de2 657f1aca48d24eaf9655e0b77b2bc6d9 35b2b08cc3f44a538cf3535043793a2a] [instance: db9c8a72-6ce2-41b7-8f7a-be0be8468667] Snapshot extracted, beginning image upload
2013-10-25 14:49:32.329 22980 ERROR nova.openstack.common.rpc.amqp [req-6e9326d7-64df-40f7-bc81-190ec5234de2 657f1aca48d24eaf9655e0b77b2bc6d9 35b2b08cc3f44a538cf3535043793a2a] Exception during message handling
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 461, in _process_data
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp **args)
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/dispatcher.py", line 172, in dispatch
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 353, in decorated_function
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 90, in wrapped
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp payload)
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 73, in wrapped
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp return f(self, context, *args, **kw)
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 243, in decorated_function
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp pass
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 229, in decorated_function
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 271, in decorated_function
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp e, sys.exc_info())
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 258, in decorated_function
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 319, in decorated_function
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp % image_id, instance=instance)
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 309, in decorated_function
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp *args, **kwargs)
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2293, in snapshot_instance
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp task_states.IMAGE_SNAPSHOT)
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2324, in _snapshot_instance
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp update_task_state)
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 1397, in snapshot
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp image_format)
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 1483, in _live_snapshot
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp libvirt.VIR_DOMAIN_BLOCK_REBASE_SHALLOW)
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 179, in doit
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp result = proxy_call(self._autowrap, f, *args, **kwargs)
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 139, in proxy_call
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp rv = execute(f,*args,**kwargs)
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 77, in tworker
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp rv = meth(*args,**kwargs)
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/libvirt.py", line 646, in blockRebase
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp if ret == -1: raise libvirtError ('virDomainBlockRebase() failed', dom=self)
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp libvirtError: internal error: unable to execute QEMU command 'drive-mirror': Could not open '/var/lib/nova/instances/snapshots/tmp5DdrIO/236df3e170e64fabaeb3c7601e2d6c47.delta'
2013-10-25 14:49:32.329 22980 TRACE nova.openstack.common.rpc.amqp

The directory /var/lib/nova/instances/snapshots/tmp5DdrIO does not exist immidiately after this error shows up.
Permissions to the parent directory are nova:nova 755, I believe they are correct.

Package: nova-compute
Version: 1:2013.2-0ubuntu1~cloud0

Revision history for this message
Daniel Speichert (dasp) wrote :

Creating a snapshot of the same instance on the same physical node when the instance is shut off works.

Revision history for this message
Dirk Grunwald (grunwald) wrote :
Download full text (5.8 KiB)

I'm having the same problem - I had a working Havana setup (with Ceph/rbd as the cinder storage) and within the last few days, whammo, this starts occuring. I had updated ubuntu 13.10 patches across parts of the cluster.

nova-compute log from the node with the exception below

2013-11-03 08:37:03.487 1796 ERROR nova.openstack.common.rpc.amqp [req-16923a1f-cf2b-43fb-bf81-3cdceb698fb0 f2b1d092d8144d\
17aa1c51eecfd067bb 305f4fa9adcc48b09b8d07df5b247df7] Exception during message handling
2013-11-03 08:37:03.487 1796 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2013-11-03 08:37:03.487 1796 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/\
common/rpc/amqp.py", line 461, in _process_data
2013-11-03 08:37:03.487 1796 TRACE nova.openstack.common.rpc.amqp **args)
2013-11-03 08:37:03.487 1796 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/\
common/rpc/dispatcher.py", line 172, in dispatch
2013-11-03 08:37:03.487 1796 TRACE nova.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
2013-11-03 08:37:03.487 1796 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/ma\
nager.py", line 353, in decorated_function
2013-11-03 08:37:03.487 1796 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2013-11-03 08:37:03.487 1796 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.\
py", line 90, in wrapped
2013-11-03 08:37:03.487 1796 TRACE nova.openstack.common.rpc.amqp payload)
2013-11-03 08:37:03.487 1796 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.\
py", line 73, in wrapped
2013-11-03 08:37:03.487 1796 TRACE nova.openstack.common.rpc.amqp return f(self, context, *args, **kw)
2013-11-03 08:37:03.487 1796 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/ma\
nager.py", line 243, in decorated_function
2013-11-03 08:37:03.487 1796 TRACE nova.openstack.common.rpc.amqp pass
2013-11-03 08:37:03.487 1796 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/ma\
nager.py", line 229, in decorated_function
2013-11-03 08:37:03.487 1796 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2013-11-03 08:37:03.487 1796 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/ma\
nager.py", line 271, in decorated_function
2013-11-03 08:37:03.487 1796 TRACE nova.openstack.common.rpc.amqp e, sys.exc_info())
2013-11-03 08:37:03.487 1796 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/ma\
nager.py", line 258, in decorated_function
2013-11-03 08:37:03.487 1796 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2013-11-03 08:37:03.487 1796 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/ma\
nager.py", line 319, in decorated_function
2013-11-03 08:37:03.487 1796 TRACE nova.openstack.common.rpc....

Read more...

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nova (Ubuntu):
status: New → Confirmed
Revision history for this message
Dirk Grunwald (grunwald) wrote :

In my previous comment, I mentioned that this was a Havana install - that's incorrect. It's grizzley from Ubuntu 13.10.

Daniel Speichert (dasp), did you check the contents of the snapshot that you were able to make when the system was shut off? I too was able to make a snapshot, but I had modified /etc/apt/sources.list prior to sync'ing and shutting the virtualized system. When I booted up the original server image, the changes to /etc/apt/sources.list were still in the virtualized system. But, when I shut it down, took a snapshot and booted from the snapshot, the contents had reverted to the original contents of /etc/apt/sources.list -- i.e. it appeared that the snapshot was using the underlying volume and not the modifications to the booted image.

I have no idea why this occurs, but wanted to see if you had a similar experience with your setup?

Revision history for this message
Dirk Grunwald (grunwald) wrote :

double checking indicates that this may be specific to e.g. /etc/apt/sources.list -- it might be that the image I'm using resets that. I created a file (/foobar/baz), snapped and booted from the snap. The /etc/apt/sources.list was reset to default mirror (http://nova.clouds.archive.ubuntu.com/ubuntu/) but the /foobar/baz directory was there & the package updates I had applied were also there.

So, it's just the issue of not being able to do a snap of a running system.

Revision history for this message
Dirk Grunwald (grunwald) wrote :

Daniel Speichert (dasp) - were you using a ceph/RBD backend?

Were you trying to snapshot a QCOW image? Try it with a RAW.
Openstack has issues snapping QCOW (I think there's a cinder bug filed on this).

I can create a snapshow of RAW images and create a volume from that snapshot.
I can not, however, boot directly from the snap-shotted image.

Revision history for this message
Daniel Speichert (dasp) wrote :

Dirk,
I am not using ceph, I'm using shared NFS storage at /var/lib/nova/instances only.

All the snapshots we do are of QCOW2 images. OpenStack by default converts all images to QCOW2 when you run an instance from them.
We had QCOW snapshots working perfectly in Grizzly, it only stopped working in Havana.

As a result of above, we can not create any snapshot so I don't know if we would have any problems to boot from them. We can boot properly from snapshots created before upgrade.

Revision history for this message
Tino Schmeier (tis-x) wrote :

Hello, we have the same problem. I found logentries in syslog.

for example:

kernel: [866752.543340] type=1400 audit(1384526598.408:216): apparmor="DENIED" operation="open" parent=28895 profile="/usr/lib/libvirt/virt-aa-helper" name="/var/lib/nova/instances/snapshots/tmpCqBRJZ/ac7c0eb07b474579a02775ce9cd4301c.delta" pid=17696 comm="virt-aa-helper" requested_mask="r" denied_mask="r" fsuid=0 ouid=109

Maybe the apparmor-profile /etc/apparmor.d/usr.lib.libvirt.virt-aa-helper should be extended, or how does this work at runtime !?

/**.delta r, or something like that

Revision history for this message
Tino Schmeier (tis-x) wrote :

Another logentry points to the dynamic-profiles

[868559.234898] type=1400 audit(1384528406.841:246): apparmor="DENIED" operation="open" parent=1 profile="libvirt-7aa8a408-8ca0-46b3-8ab8-e89b334082d2" name="/var/lib/nova/instances/snapshots/tmpflKt2v/9f52ec7cb15d458488df1a1efad52d9c.delta" pid=7957 comm="qemu-system-x86" requested_mask="rw" denied_mask="rw" fsuid=110 ouid=110

Revision history for this message
Daniel Speichert (dasp) wrote :

Disabling apparmor didn't help in our case. Did you try disabling it?

Revision history for this message
Tino Schmeier (tis-x) wrote :

I set the dynamic profile

aa-complain libvirt-7aa8a408-8ca0-46b3-8ab8-e89b334082d2

and shut-off and start the instance. After that i get an snapshot

Revision history for this message
Tino Schmeier (tis-x) wrote :

Ok. I can reproduce the problem. If i enforce the profile and restart the machine. The snapshot fails. The profile seems to be ok:

Before snapshot:
# DO NOT EDIT THIS FILE DIRECTLY. IT IS MANAGED BY LIBVIRT.
  "/var/log/libvirt/**/instance-0000001e.log" w,
  "/var/lib/libvirt/**/instance-0000001e.monitor" rw,
  "/var/run/libvirt/**/instance-0000001e.pid" rwk,
  "/run/libvirt/**/instance-0000001e.pid" rwk,
  "/var/run/libvirt/**/*.tunnelmigrate.dest.instance-0000001e" rw,
  "/run/libvirt/**/*.tunnelmigrate.dest.instance-0000001e" rw,
  "/var/lib/nova/instances/7aa8a408-8ca0-46b3-8ab8-e89b334082d2/disk" rw,
  "/var/lib/nova/instances/_base/50c5b15302a6962ee9f93d2e70ce27fedce888ed" r,
  # don't audit writes to readonly files
  deny "/var/lib/nova/instances/_base/50c5b15302a6962ee9f93d2e70ce27fedce888ed" w,
  "/var/lib/nova/instances/7aa8a408-8ca0-46b3-8ab8-e89b334082d2/console.log" rw,
  "/var/lib/nova/instances/7aa8a408-8ca0-46b3-8ab8-e89b334082d2/console.log" rw,

After snapshot:
# DO NOT EDIT THIS FILE DIRECTLY. IT IS MANAGED BY LIBVIRT.
  "/var/log/libvirt/**/instance-0000001e.log" w,
  "/var/lib/libvirt/**/instance-0000001e.monitor" rw,
  "/var/run/libvirt/**/instance-0000001e.pid" rwk,
  "/run/libvirt/**/instance-0000001e.pid" rwk,
  "/var/run/libvirt/**/*.tunnelmigrate.dest.instance-0000001e" rw,
  "/run/libvirt/**/*.tunnelmigrate.dest.instance-0000001e" rw,
  "/var/lib/nova/instances/snapshots/tmpZzbTj8/9ce55aae608343b987a3b287da39cc59.delta" rw,
  "/var/lib/nova/instances/7aa8a408-8ca0-46b3-8ab8-e89b334082d2/console.log" rw,
  "/var/lib/nova/instances/7aa8a408-8ca0-46b3-8ab8-e89b334082d2/console.log" rw,

Is this an racecondition? Maybe the profile is replaced/reloaded to late in an parrallel task?

Revision history for this message
Tino Schmeier (tis-x) wrote :

I added the following line to abstractions/libvirt-qemu:

/var/lib/nova/instances/snapshots/*/*.delta rw,

This is not very secure but at the moment it work. Keep in mind, i have to restart the kvm-process to apply the new/reloaded profile.

Revision history for this message
Dirk Grunwald (grunwald) wrote :

I concur that the apparmor solution works. I haven't tried to figure out if it's a race or not.

Is this an OpenStack issue or an Ubuntu issue? Sounds like OpenStack to me.

Matt Riedemann (mriedem)
tags: added: libvirt
Revision history for this message
Tino Schmeier (tis-x) wrote :
Download full text (7.7 KiB)

I tried to repeatedly reproduce the error. I removed the rw-permission from abstractions/libvirt-qemu, reloaded the apparmor-profiles. I start an new intstance an try to get an snapshot.

I have added some debug-output to the soruce code of /usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py to check if the directory exists or not.

            disk_delta = out_path + '.delta'
            ...
            import os.path
            disk_delta_exists = os.path.isfile(disk_delta)
            LOG.debug(_("disk_delta_exists: %s") % disk_delta_exists)

            domain.blockRebase(disk_path, disk_delta, 0,
                               libvirt.VIR_DOMAIN_BLOCK_REBASE_COPY |
                               libvirt.VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT |
                               libvirt.VIR_DOMAIN_BLOCK_REBASE_SHALLOW)

             LOG.debug(_("disk_delta_exists2: %s") % disk_delta_exists)

This leads to this debug-log:

2013-11-27 09:21:21.172 5568 DEBUG nova.virt.libvirt.driver [req-5bd8a420-2747-490f-a688-7468ffcd3c00 b6d2602f38914f20a0fec492fd1482e8 32c4869a7fee41ecb953cd00
c3e71eac] disk_delta_exists: True _live_snapshot /usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py:1484
2013-11-27 09:21:21.500 5568 INFO nova.virt.libvirt.driver [req-5bd8a420-2747-490f-a688-7468ffcd3c00 b6d2602f38914f20a0fec492fd1482e8 32c4869a7fee41ecb953cd00c
3e71eac] [instance: df82e015-077c-48d7-8008-5b5f64efb6da] Snapshot extracted, beginning image upload
2013-11-27 09:21:21.500 5568 DEBUG nova.compute.manager [req-5bd8a420-2747-490f-a688-7468ffcd3c00 b6d2602f38914f20a0fec492fd1482e8 32c4869a7fee41ecb953cd00c3e7
1eac] [instance: df82e015-077c-48d7-8008-5b5f64efb6da] Cleaning up image a2074ffb-e556-4263-9843-698867e79dae decorated_function /usr/lib/python2.7/dist-packag
es/nova/compute/manager.py:313
2013-11-27 09:21:21.500 5568 TRACE nova.compute.manager [instance: df82e015-077c-48d7-8008-5b5f64efb6da] Traceback (most recent call last):
2013-11-27 09:21:21.500 5568 TRACE nova.compute.manager [instance: df82e015-077c-48d7-8008-5b5f64efb6da] File "/usr/lib/python2.7/dist-packages/nova/compute/
manager.py", line 309, in decorated_function
2013-11-27 09:21:21.500 5568 TRACE nova.compute.manager [instance: df82e015-077c-48d7-8008-5b5f64efb6da] *args, **kwargs)
2013-11-27 09:21:21.500 5568 TRACE nova.compute.manager [instance: df82e015-077c-48d7-8008-5b5f64efb6da] File "/usr/lib/python2.7/dist-packages/nova/compute/
manager.py", line 2293, in snapshot_instance
2013-11-27 09:21:21.500 5568 TRACE nova.compute.manager [instance: df82e015-077c-48d7-8008-5b5f64efb6da] task_states.IMAGE_SNAPSHOT)
2013-11-27 09:21:21.500 5568 TRACE nova.compute.manager [instance: df82e015-077c-48d7-8008-5b5f64efb6da] File "/usr/lib/python2.7/dist-packages/nova/compute/
manager.py", line 2324, in _snapshot_instance
2013-11-27 09:21:21.500 5568 TRACE nova.compute.manager [instance: df82e015-077c-48d7-8008-5b5f64efb6da] update_task_state)
2013-11-27 09:21:21.500 5568 TRACE nova.compute.manager [instance: df82e015-077c-48d7-8008-5b5f64efb6da] File "/usr/lib/python2.7/dist-packages/nova/virt/lib
virt/driver.py", line 1400, in snapshot
2013-11-27 09:21:21.500 ...

Read more...

Chuck Short (zulcss)
affects: nova (Ubuntu) → libvirt (Ubuntu)
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

This may be related to, if not actually a dup of, bug 1004606.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

The existing virt-aa-helper profile only allows:

/var/lib/nova/images/** r,
  /var/lib/nova/instances/_base/** r,

Judging by comment #8, it sounds like adding

/var/lib/nova/instances/snapshots/** r,

should do the trick. This should then allow virt-aa-helper to add the snapshot images to the domain's whitelist as well.

Can someone test that? If noone reports in the next few days whether thtat works or not, then I will optimistically add the rule to the trusty package.

Revision history for this message
Dirk Grunwald (grunwald) wrote :
Download full text (7.2 KiB)

I added that to virt-aa-helper but it's still broken. The libvirt/apparmor file is below as is the log output.
I don't know if it matters, but I'm using CEPH for volume store.

The instance-specific apparmour file is

# DO NOT EDIT THIS FILE DIRECTLY. IT IS MANAGED BY LIBVIRT.
  "/var/log/libvirt/**/instance-000000ea.log" w,
  "/var/lib/libvirt/**/instance-000000ea.monitor" rw,
  "/var/run/libvirt/**/instance-000000ea.pid" rwk,
  "/run/libvirt/**/instance-000000ea.pid" rwk,
  "/var/run/libvirt/**/*.tunnelmigrate.dest.instance-000000ea" rw,
  "/run/libvirt/**/*.tunnelmigrate.dest.instance-000000ea" rw,
  "/var/lib/nova/instances/snapshots/tmpJXEx_R/ca1ff017f53741eea3ee7532205ff7fe.delta" rw,
  "/var/lib/nova/instances/e2b47b8a-19ae-40bd-8bea-fdb885b1e49b/console.log" rw,
  "/var/lib/nova/instances/e2b47b8a-19ae-40bd-8bea-fdb885b1e49b/console.log" rw,

=========================

The log output is:

2014-01-03 18:58:16.242 2255 INFO nova.compute.resource_tracker [-] Compute_service record updated for slab-07b:slab-07b
2014-01-03 18:58:25.707 2255 AUDIT nova.compute.manager [req-b2f5930e-2235-4bdf-ad74-c15809b58aac e9bb2814b1364a738d7c313a37a18b60 305f4fa9adcc48b09b8d07d\
f5b247df7] [instance: e2b47b8a-19ae-40bd-8bea-fdb885b1e49b] instance snapshotting
2014-01-03 18:58:25.927 2255 INFO nova.virt.libvirt.driver [req-b2f5930e-2235-4bdf-ad74-c15809b58aac e9bb2814b1364a738d7c313a37a18b60 305f4fa9adcc48b09b8d\
07df5b247df7] [instance: e2b47b8a-19ae-40bd-8bea-fdb885b1e49b] Beginning live snapshot process
2014-01-03 18:58:26.987 2255 INFO nova.virt.libvirt.driver [req-b2f5930e-2235-4bdf-ad74-c15809b58aac e9bb2814b1364a738d7c313a37a18b60 305f4fa9adcc48b09b8d\
07df5b247df7] [instance: e2b47b8a-19ae-40bd-8bea-fdb885b1e49b] Snapshot extracted, beginning image upload
2014-01-03 18:58:27.395 2255 ERROR nova.openstack.common.rpc.amqp [req-b2f5930e-2235-4bdf-ad74-c15809b58aac e9bb2814b1364a738d7c313a37a18b60 305f4fa9adcc4\
8b09b8d07df5b247df7] Exception during message handling
2014-01-03 18:58:27.395 2255 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2014-01-03 18:58:27.395 2255 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 461, i\
n _process_data
2014-01-03 18:58:27.395 2255 TRACE nova.openstack.common.rpc.amqp **args)
2014-01-03 18:58:27.395 2255 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/dispatcher.py", line \
172, in dispatch
2014-01-03 18:58:27.395 2255 TRACE nova.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
2014-01-03 18:58:27.395 2255 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 353, in decorate\
d_function
2014-01-03 18:58:27.395 2255 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2014-01-03 18:58:27.395 2255 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 90, in wrapped
2014-01-03 18:58:27.395 2255 TRACE nova.openstack.common.rpc.amqp payload)
2014-01-03 18:58:27.395 2255 TRACE nova.openstack...

Read more...

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1244694] Re: Creating snapshot fails due to nonexistent temporary directory

Thanks, Dirk. So is
/var/lib/nova/instances/snapshots/tmpJXEx_R/ca1ff017f53741eea3ee7532205ff7fe.delta
also on ceph? Where is ceph mounted? Can you install the auditd
package, then retry this and attach the /var/log/audit/audit.log?

Revision history for this message
Dirk Grunwald (grunwald) wrote : Re: Creating snapshot fails due to nonexistent temporary directory

I'm using RBD (their volume interface) and not a file system.

However, it appears that this IS working -- I restarted the server rather than just the services
and now I can correctly snapshot a running instance.

This is all on a 13.10 base system, so I must have forgotten to restart some service.
Is this patch being pushed into the 13.10 releases? If not, I'll manually update the other
servers in my (small) cluster.

Btw, I looked through the AUDIT log for things that were DENIED and it looks like qemu-ifdown is not
in the libvirt-qemu file -- is this something that should be added? I've included the log output below

root@zfs2:/var/log/audit# grep -i denied audit.log
type=AVC msg=audit(1389036336.913:4605): apparmor="DENIED" operation="exec" parent=6994 pro\
file="libvirt-1f2b61f6-0cd1-4bc0-b3fe-4575e0478e0a" name="/etc/qemu-ifdown" pid=11733 comm=\
"qemu-system-x86" requested_mask="x" denied_mask="x" fsuid=110 ouid=0
type=AVC msg=audit(1389036373.338:4920): apparmor="DENIED" operation="capable" parent=1 pro\
file="/usr/sbin/libvirtd" pid=4173 comm="libvirtd" pid=4173 comm="libvirtd" capability=29 \
capname="audit_write"
type=AVC msg=audit(1389036968.075:7337): apparmor="DENIED" operation="exec" parent=12382 pr\
ofile="libvirt-97be2846-7e24-4538-90bf-d3783fece0ef" name="/etc/qemu-ifdown" pid=15063 comm\
="qemu-system-x86" requested_mask="x" denied_mask="x" fsuid=110 ouid=0
root@zfs2:/var/log/audit#

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 1.2.0-0ubuntu3

---------------
libvirt (1.2.0-0ubuntu3) trusty; urgency=medium

  * debian/apparmor/usr.lib.libvirt.virt-aa-helper: add
    /var/lib/nova/instances/snapshots/** r to allow virt-aa-helper to read
    the snapshot directory to find images which VMs should be granted access
    to. (LP: #1244694)
 -- Serge Hallyn <email address hidden> Thu, 09 Jan 2014 16:39:13 -0600

Changed in libvirt (Ubuntu):
status: Confirmed → Fix Released
description: updated
summary: - Creating snapshot fails due to nonexistent temporary directory
+ [SRU] Creating snapshot fails due to nonexistent temporary directory
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Daniel, or anyone else affected,

Accepted libvirt into saucy-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/libvirt/1.1.1-0ubuntu8.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in libvirt (Ubuntu Saucy):
importance: Undecided → High
status: New → Triaged
status: Triaged → Fix Committed
tags: added: verification-needed
Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Having some trouble reproducing the original issue again using the current package in saucy-updates (1.1.1-0ubuntu8.2) Seems I was hitting this consistently last week but now I can no longer trigger the original issue.

@Dirk- I'm curious if you might revert some of the local changes you've made and see if the issue surfaces, and if you'd be willing to test the proposed package on your cluster? The proposed update will be released to 13.10 once we can get it verified.

Revision history for this message
Dirk Grunwald (grunwald) wrote :

Yes, I will revert changes & test this. Just did an update & see a large number of changes to openstack. I'll apply that, disable patch, verify failure an then apply proposed.

Revision history for this message
Marc Deslauriers (mdeslaur) wrote :

Can someone please test this? I have a libvirt security update I need to push out, and if nobody verifies this, the update will supersede it...

Changed in nova:
status: New → Invalid
Revision history for this message
James Page (james-page) wrote :

Does the 8.9 in -proposed still contain this fix? I was able to run the failing tempest test that detected this in openstack against the version in -proposed:

tempest.api.compute.images.test_images_oneserver.ImagesOneServerTestJSON
    test_create_delete_image[gate,smoke] OK 42.70
Slowest 1 tests took 42.70 secs:
tempest.api.compute.images.test_images_oneserver.ImagesOneServerTestJSON
    test_create_delete_image[gate,smoke] 42.70

Ran 1 test in 68.976s

OK
ubuntu@james-page-bastion:~/openstack-charm-testing/tempest⟫ juju run --unit nova-compute/0 "apt-cache policy libvirt-bin"
libvirt-bin:
  Installed: 1.1.1-0ubuntu8.9
  Candidate: 1.1.1-0ubuntu8.9
  Version table:
 *** 1.1.1-0ubuntu8.9 0
        500 http://archive.ubuntu.com/ubuntu/ saucy-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     1.1.1-0ubuntu8.7 0
        500 http://archive.ubuntu.com/ubuntu/ saucy-updates/main amd64 Packages
     1.1.1-0ubuntu8.5 0
        500 http://security.ubuntu.com/ubuntu/ saucy-security/main amd64 Packages
     1.1.1-0ubuntu8 0
        500 http://archive.ubuntu.com/ubuntu/ saucy/main amd64 Packages

Revision history for this message
James Page (james-page) wrote :

Hmm - actually maybe not (this one from hardware rather than :

2014-04-04 08:48:52.378 24424 TRACE nova.openstack.common.rpc.amqp libvirtError: internal error: unable to execute QEMU command 'drive-mirror': Could not open '/var/lib/nova/instances/snapshots/tmpBcAos9/2a76340dc56241b995ae734887bab851.delta'

I'm having trouble getting a consistent result.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@james-page,

no, that fix is not in 8.9

Changed in libvirt (Ubuntu Saucy):
status: Fix Committed → Confirmed
Revision history for this message
Rolf Leggewie (r0lf) wrote :

saucy has seen the end of its life and is no longer receiving any updates. Marking the saucy task for this ticket as "Won't Fix".

Changed in libvirt (Ubuntu Saucy):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.