euca-* commands stopped responding

Bug #639639 reported by C de-Avillez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Eucalyptus
Fix Released
Undecided
Unassigned
eucalyptus (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Eucalyptus 2.0+brz1239-0ubuntu3.2 (actually, r1240).

After about 1,300 instances started Eucalyptus suddenly stopped responding to euca-* commands. A brief look at the eucalyptus logs show a series of

23:44:20 DEBUG [AbstractClusterMessageDispatcher:Hashed wheel timer #1] org.jboss.netty.handler.timeout.ReadTimeoutException
org.jboss.netty.handler.timeout.ReadTimeoutException
        at org.jboss.netty.handler.timeout.ReadTimeoutHandler.<clinit>(ReadTimeoutHandler.java:59)
        at com.eucalyptus.ws.util.ChannelUtil.addPipelineMonitors(ChannelUtil.java:134)
        at com.eucalyptus.cluster.handlers.AbstractClusterMessageDispatcher.getPipeline(AbstractClusterMessageDispatcher.java:103)
        at com.eucalyptus.ws.client.NioBootstrap.connect(NioBootstrap.java:177)
        at com.eucalyptus.ws.client.NioBootstrap.connect(NioBootstrap.java:163)
        at com.eucalyptus.cluster.handlers.AbstractClusterMessageDispatcher.write(AbstractClusterMessageDispatcher.java:123)
        at com.eucalyptus.cluster.handlers.ClusterCertificateHandler.trigger(ClusterCertificateHandler.java:32)
        at com.eucalyptus.cluster.handlers.ClusterCertificateHandler.fireEvent(ClusterCertificateHandler.java:57)
        at com.eucalyptus.event.ReentrantListenerRegistry.fireEvent(ReentrantListenerRegistry.java:87)
        at com.eucalyptus.event.ReentrantListenerRegistry.fireEvent(ReentrantListenerRegistry.java:68)
        at com.eucalyptus.event.ListenerRegistry.fireEvent(ListenerRegistry.java:69)
        at com.eucalyptus.cluster.Clusters.start(Clusters.java:135)
        at com.eucalyptus.cluster.ClusterBuilder.fireStart(ClusterBuilder.java:93)
        at com.eucalyptus.component.Component.startService(Component.java:179)
        at com.eucalyptus.ws.ServiceDispatchBootstrapper.start(ServiceDispatchBootstrapper.java:143)
        at com.eucalyptus.bootstrap.Bootstrap$Stage.start(Bootstrap.java:112)
        at com.eucalyptus.bootstrap.SystemBootstrapper.start(SystemBootstrapper.java:148)

test logs uploaded to lp:~hggdh2/+junk/uec-qa, revision 56.

Revision history for this message
Dmitrii Zagorodnov (dmitrii) wrote :

Carlos, do you have a PPA with this code? (1240 is not in the main repository from what I can tell.)

Revision history for this message
Dave Walker (davewalker) wrote :

Hi Dmitrii,

I uploaded r1240 (rather confusingly titled 2.0+bzr1239-0ubuntu3.2)
https://launchpad.net/~davewalker/+archive/uec-devel

Thanks.

C de-Avillez (hggdh2)
description: updated
Revision history for this message
Dmitrii Zagorodnov (dmitrii) wrote :

It appears to be an out-of-memory error. It would help us track it down if the CLC ran with "--debug" flag. I don't know how long your run takes, but would you be able to re-run the experiment with the flag added? (Having it in there for these QA runs is a good idea in general.) Thanks, Carlos!

Revision history for this message
C de-Avillez (hggdh2) wrote :

I was already running a second try -- I *always* do it, Just In Case ;-)

But instead I hit bug 639781. I wonder if this could be same OOM error? Will try again.

Revision history for this message
C de-Avillez (hggdh2) wrote :

I was unable to repeat it on a third run; instead, I hit bug 639781 again. I think I should go to a brand new install, and try again.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eucalyptus - 2.0+bzr1241-0ubuntu1

---------------
eucalyptus (2.0+bzr1241-0ubuntu1) maverick; urgency=low

  * New upstream bug fix snapshot, -r1241
    - Fixes euca-* commands stop responding. (LP: #639639
    - Fixes metadata service returning 500 error. (LP: #637659)
  * debian/patches/06-UEC-webinterface.patch: Improved cross
    browser compatability, particularly on the login screen.
 -- Dave Walker (Daviey) <email address hidden> Thu, 16 Sep 2010 17:10:38 +0100

Changed in eucalyptus (Ubuntu):
status: New → Fix Released
Revision history for this message
C de-Avillez (hggdh2) wrote :

I ran a total of 4,000 instances during the night (logs uploaded to the usual place, revisions 58 and 59). I consider this fix released.

Neil Soman (neilsoman)
Changed in eucalyptus:
status: New → Incomplete
Daniel Nurmi (nurmi)
Changed in eucalyptus:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.