Jobs whose commands can't be found or exit with signals result in strange behavior and dead backends

Bug #1024541 reported by Daniel Manrique
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
checkbox (Ubuntu)
Fix Released
High
Daniel Manrique

Bug Description

This was observed when /usr/share/checkbox/scripts was emptied by accident. In this case, the backend tries to execute a nonexistent command, and then dies silently. However this affects checkbox.job generically, so if a non-root job specifies a non-findable command, it will potentially also fail (albeit with a trace on the log).

The reason was found (by cr3) to be that the Job.execute method could return either bytes (in most cases, the result of the command is returned as bytes), or a string (a generic error message assigned by Job in the case of commands that aren't present or exit with signals). This was confusing the backend, which always tries to decode data returned from jobs.

The fix is to explicitly encode the error strings assigned by Jobs, so that the data received by the backend (and other invokers of Job.execute) is always consistent.

Steps to reproduce:

- Either move /usr/share/checkbox/scripts out of the way, or create a job with a nonexistent command and user: root.
- Launch checkbox

Expected result:
- List of tests to select from
- Backend remains there, awaiting our commands.

Actual result:
- Backend dies silently.
- Frontend stalls forever waiting for backend.

Since the backend dies quite silently, the way to determine what was happening was to add some debugging statements to the backend, so that it reported exceptions other than UnicodeDecodeError after executing the job.

ProblemType: Bug
DistroRelease: Ubuntu 12.10
Package: checkbox 0.14.1
ProcVersionSignature: Ubuntu 3.4.0-3.8-generic 3.4.0
Uname: Linux 3.4.0-3-generic i686
ApportVersion: 2.1.1-0ubuntu2
Architecture: i386
Date: Fri Jul 13 16:26:53 2012
InstallationMedia: Ubuntu 12.10 "Quantal Quetzal" - Alpha i386 (20120601)
ProcEnviron:
 LANGUAGE=en_CA:en
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_CA.UTF-8
 SHELL=/bin/bash
SourcePackage: checkbox
UpgradeStatus: No upgrade log present (probably fresh install)

Related branches

Revision history for this message
Daniel Manrique (roadmr) wrote :
Changed in checkbox (Ubuntu):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Daniel Manrique (roadmr)
milestone: none → quantal-alpha-3
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (6.3 KiB)

This bug was fixed in the package checkbox - 0.14.2

---------------
checkbox (0.14.2) quantal; urgency=low

  * New upstream release (LP: #1025869)

  [Jeff Marcom]
  * scripts/gpu_test - Fixed potential thread exiting issue.

  [Javier Collado]
  * Fixed detection of circular references in resolver.

  [Jeff Lane]
  * New version 0.14.2 for Quantal Quetzal development.
  * jobs/cpu.txt.in: added cpu_scaling_test log attachment job
  * jobs/disk.txt.in: modified block_device requirements so they'll work right
    jobs/info.txt.in: added block_device resource requirements to hdparm job so
    it won't run on removable stuff where it's not necessary.
  * jobs/info.txt.in: removed extraneous fwts_log job
    jobs/miscellanea.txt.in: modified fwts_results.log job
  * scripts/optical_detect: minor tweak to send error output to stderr
    scripts/optical_read_test: added root user check because this needs to be
    run with root privileges. Added some additional output for stderr for
    failures so we will know WHY a test or the script failed. Replaced
    sys.stdout.write() and flush() calls with simple print statements.
  * scripts/ipmi_test: output tweaks so error messages now go to stderr. No BMC
    message is a little more clear. Module failed to load now generates an
    error rather than a simple exit.
  * scripts/network_device_info: minor change so that the fail message now
    specifies that it was an error and outputs to stderr properly.
  * scripts/disk_smart: Improvements to the logging and output during testing.
  * scripts/cpu_scaling_test: lots of output changes using logging module.
    renamed script to frequency_governors_test to be more descriptive and less
    confusing. Added a --log option to write logs to an actual file
    jobs/cpu.txt.in: added an attachment job to attach the freq_governors log.
    Modified cpu/frequency_governors to write to log file
  * scripts/cpu_offlining: added an extra bit of output in case of failures.
  * scripts/fwts_test: improved console output so that the info displayed in
    submission.xml is more useful.
    jobs/power-management.txt.in: added job to attach fwts_wakealarm.log to
    results.
  * scripts/network_ntp_test: Tweaked output to use log levels more
    appropriately. Added some decoding so that bytes output show up as strings
    properly in output. Converted from optparse to argparse. Added a root
    check because this needs to be root to properly run.
  * scripts/disk_read_performance_test: Added extra targeted output so that
    users can understand what's going on. Moved the exit bits so the test will
    actuall run on multiple drives as originally intended and not exit on the
    first failure.
  * scripts/removable_storage_test: vastly improved the output from that script
    and also introduced some new error handling to cover a couple conditions
    that generated unhelpful tracebacks.
  * scripts/memory_compare: changed the output a little so failures now dump
    data to stderr and success to stdout. Also added a try/except block to
    catch possible ZeroDivisionError cases if dmi or meminfo return 0 (found on
    my local system due to a library issue)
  * jobs/p...

Read more...

Changed in checkbox (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.