suddenly no instance starts anymore - bad date formatting?

Bug #619970 reported by C de-Avillez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
eucalyptus (Ubuntu)
Fix Released
High
Dustin Kirkland 
Maverick
Fix Released
High
Dustin Kirkland 

Bug Description

Here's the deal: running a stress test, 600 instances. Life is good, pretty much successes, with the (very) rare failure. Suddenly, pretty much all we get are failures to start new instances. If I keep on running, *no* new instance succeeds: they all go into pending, stay there for a while, and are terminated by timeout.

A quick look at the NC logs shows an error like "Failed to prepare images for instance <whatever> (error 1)":

On sapodilla:

ubuntu@sapodilla:/var/log/eucalyptus$ grep "Failed to prepare images" *
nc.log:[Wed Aug 18 10:37:11 2010][001992][EUCAFATAL ] Failed to prepare images for instance (error=1)
nc.log:[Wed Aug 18 10:37:11 2010][001992][EUCAFATAL ] Failed to prepare images for instance (error=1)
nc.log:[Wed Aug 18 10:37:11 2010][001992][EUCAFATAL ] Failed to prepare images for instance i-50EC08E8 (error=1)
nc.log:[Wed Aug 18 10:37:11 2010][001992][EUCAFATAL ] Failed to prepare images for instance i-51EF08FC (error=1)
nc.log:[Wed Aug 18 10:37:11 2010][001992][EUCAFATAL ] Failed to prepare images for instance i-51EF08FC (error=1)
nc.log:[Wed Aug 18 10:37:11 2010][001992][EUCAFATAL ] Failed to prepare images for instance ��d8� (error=1)
nc.log:[Wed Aug 18 10:37:11 2010][001992][EUCAFATAL ] Failed to prepare images for instance `�8� (error=1)
nc.log:[Wed Aug 18 10:37:11 2010][001992][EUCAFATAL ] Failed to prepare images for instance i-4FC10831 (error=1)
nc.log:[Wed Aug 18 10:37:11 2010][001992][EUCAFATAL ] Failed to prepare images for instance i-4A8B0811 (error=1)
nc.log:[Wed Aug 18 10:37:11 2010][001992][EUCAFATAL ] Failed to prepare images for instance (error=1)
nc.log:[Wed Aug 18 10:37:11 2010][001992][EUCAFATAL ] Failed to prepare images for instance P�{8� (error=1)
nc.log:[Wed Aug 18 10:37:11 2010][001992][EUCAFATAL ] Failed to prepare images for instance i-43C50871 (error=1)
nc.log:[Wed Aug 18 10:37:11 2010][001992][EUCAFATAL ] Failed to prepare images for instance i-456D0882 (error=1)
nc.log:[Wed Aug 18 10:37:11 2010][001992][EUCAFATAL ] Failed to prepare images for instance i-37B107A3 (error=1)
nc.log:[Wed Aug 18 10:37:12 2010][001992][EUCAFATAL ] Failed to prepare images for instance i-50EC08E8 (error=1)
nc.log:[Wed Aug 18 10:37:12 2010][001992][EUCAFATAL ] Failed to prepare images for instance i-397606C6 (error=1)
nc.log:[Wed Aug 18 10:37:12 2010][001992][EUCAFATAL ] Failed to prepare images for instance i-49B0087A (error=1)
nc.log:[Wed Aug 18 10:37:12 2010][001992][EUCAFATAL ] Failed to prepare images for instance i-4A8B0811 (error=1)
nc.log:[Wed Aug 18 10:37:12 2010][001992][EUCAFATAL ] Failed to prepare images for instance i-51EF08FC (error=1)
nc.log:[Wed Aug 18 10:37:12 2010][001992][EUCAFATAL ] Failed to prepare images for instance i-50EC08E8 (error=1)
nc.log:[Wed Aug 18 10:37:12 2010][001992][EUCAFATAL ] Failed to prepare images for instance i-33C805F7 (error=1)
nc.log:[Wed Aug 18 10:37:12 2010][001992][EUCAFATAL ] Failed to prepare images for instance i-46AC06F8 (error=1)
nc.log:[Wed Aug 18 10:37:21 2010][001992][EUCAFATAL ] Failed to prepare images for instance � (error=1)
ubuntu@sapodilla:/var/log/eucalyptus$

An equivalent list can be found on the other test NC.

Following one such instance, we can see (from the nc.log):

[Wed Aug 18 10:22:08 2010][001992][EUCAINFO ] doRunInstance() invoked (id=i-50EC08E8 cores=1 disk=5 memory=256)
[Wed Aug 18 10:22:08 2010][001992][EUCAINFO ] image=emi-3C651C4F at http://10.55.55.5:8773/services/Walrus/maverick-20100817-amd64-20100818080418/maverick-server-uec-amd64.img.manifest.xml
[Wed Aug 18 10:22:08 2010][001992][EUCAINFO ] krnel=eki-D9512160 at http://10.55.55.5:8773/services/Walrus/maverick-20100817-amd64-20100818080418/maverick-server-uec-amd64-vmlinuz-virtual.manifest.xml
[Wed Aug 18 10:22:08 2010][001992][EUCAINFO ] vlan=12 priMAC=D0:0D:50:EC:08:E8 privIp=172.19.5.5
[Wed Aug 18 10:22:08 2010][001992][EUCADEBUG ] state change for instance i-50EC08E8: Unknown -> Staging (Pending)
[Wed Aug 18 10:22:08 2010][001992][EUCAINFO ] network started for instance i-50EC08E8
[Wed Aug 18 10:22:08 2010][001992][EUCAINFO ] retrieving images for instance i-50EC08E8 (disk limit=5120MB)...
...
[Wed Aug 18 10:37:11 2010][001992][EUCAERROR ] error: file /var/lib/eucalyptus/instances//eucalyptus/cache/eki-D9512160/kernel not found
[Wed Aug 18 10:37:11 2010][001992][EUCAFATAL ] Failed to prepare images for instance i-50EC08E8 (error=1)
[Wed Aug 18 10:37:11 2010][001992][EUCAERROR ] [Wed Aug 18 10:37:11 2010][001992][EUCADEBUG ] state change for instance i-50EC08E8: Teardown -> Shutoff (Extant)

I am not sure this is a cause or consequence of a previous error. Logs are being saved.

C de-Avillez (hggdh2)
Changed in eucalyptus (Ubuntu):
importance: Undecided → High
Revision history for this message
C de-Avillez (hggdh2) wrote :

Logs saved on /uec-qa/maverick/2.0~bzr1231-0ubuntu2/rig-topo2-logs_20100818-113709.

Using saved push location: bzr+ssh://bazaar.launchpad.net/~hggdh2/%2Bjunk/uec-qa/
Pushed up to revision 39.

C de-Avillez (hggdh2)
summary: - suddenly no instance starts anymore - walrus issue?
+ suddenly no instance starts anymore - bad date formatting?
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eucalyptus - 2.0~bzr1233-0ubuntu1

---------------
eucalyptus (2.0~bzr1233-0ubuntu1) maverick; urgency=low

  * New upstream snapshot, -r1233, fixes:
    - LP: #619970 - more robust Date: generation on NC
 -- Dustin Kirkland <email address hidden> Thu, 19 Aug 2010 20:19:34 -0500

Changed in eucalyptus (Ubuntu Maverick):
status: New → Fix Released
Changed in eucalyptus (Ubuntu Maverick):
assignee: nobody → Dustin Kirkland (kirkland)
Revision history for this message
C de-Avillez (hggdh2) wrote :

I ran, during the night, a 2,000 instances -- and got a lot of failures. After that, analysing the output, I noticed that together with the 1233 update, a series of other system libraries had been also updated (including a new kernel). So I discarded this test, rebooted all machines, and ran a 500 initial test. The logs are saved on the standard location, at revision 44.

Summary of this run:

success_rate: 0.98999999999999999

2010-08-20 09:45:03,628 SUMMARY:INFO not-tested=0
2010-08-20 09:45:03,628 SUMMARY:INFO being-tested=0
2010-08-20 09:45:03,628 SUMMARY:INFO success=495
2010-08-20 09:45:03,628 SUMMARY:INFO failed=0
2010-08-20 09:45:03,628 SUMMARY:INFO rescheduled=1
2010-08-20 09:45:03,628 SUMMARY:INFO boot-failed=4

SO we are looking quite good. I will start a 1,000 run now.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.