Comment 13 for bug 1670291

Revision history for this message
Dave Jones (waveform) wrote :

I've now pushed a PR which should alleviate the reboot issue *partially*. The complexity in fixing this arises from the differing behaviours of implementations of the shutdown, poweroff, and reboot commands; upstart's poweroff & reboot commands for example, immediately return (briefly) permitting the caller to query their exit codes. In contrast, systemd's don't return at all unless the poweroff or reboot fails. None of the reboot or poweroff implementations provide a scheduling facility like the /sbin/shutdown implementation (unsurprisingly), but systemd's poweroff and reboot commands *do* work even in the post-trusty upgrade environment (even though systemd's shutdown command doesn't).

The quandry is as follows: we can reliably detect when the shutdown command fails (it exits with an exit code and some error messages), and when it succeeds ... sort of. We currently schedule the shutdown for 4 minutes time and if it hasn't died within 10 seconds we assume it's going to work; this assumes that the implementation won't fail at the end of the schedule but so far that seems a reliable assumption under both upstart and systemd. We cannot *reliably* detect when the poweroff or reboot commands either succeed or fail, but they do seem reliable in practice even in aberrant environments like the post-trusty upgrade.

So, the method I've chosen is as follows: treat shutdown and restart requests as we presently do (schedule /bin/shutdown for 4 minutes time, monitor for 10 seconds). If it succeeds, nothing changes. If it fails, report the failure with an extra message indicating we are going to attempt to force the procedure, then run /sbin/poweroff or /sbin/reboot as appropriate. The result is that in the post-trusty environment (testing on containers and VMs) a reboot request does succeed, but is still reported as failing (albeit with an extended message explaining we're going to try and force it).

We *could* report it as succeeding, but frankly there's no guarantees so it's a choice between giving the user bad news ("sorry your reboot request failed ...") with a possible silver lining ("... but we're going to try another way which *might* work, however we won't be able to tell"), or giving the user good news ("your reboot request worked after we forced it") and hoping we're not lying! The former sounds like the more sensible approach to me, and hence is what is in the PR (https://github.com/CanonicalLtd/landscape-client/pull/55).