Hardy, Intrepid, Jaunty 64 "hiccup" on Sager notebook.

Bug #217849 reported by QuentinHartman
34
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
Low
Stefan Bader
Karmic
Won't Fix
Undecided
Unassigned
Lucid
Won't Fix
Undecided
Unassigned

Bug Description

Binary package hint: debian-installer

I have a Sager NP2092 (specs can be seen at http://www.sagernotebook.com/product_customed.php?pid=46090) laptop which "stalls" during the installation of 64-bit Ubuntu Hardy Beta, as well as the nightlies from 4-13 and 4-14. The behavior is most pronounced if choosing the whole drive encryption option, though it also exists elsewhere. Basically, progress on the installation just stops and seemingly random intervals until I hit a key on the keyboard. Once I hit any key, drive activity picks up again and things carry on. It can be when accessing the CD-ROM or the HDD. This behavior does not exist on Gutsy 64, and I do not yet know if it exists on older Hardy builds. I will check the earlier alphas and report back.

This behavior also does not seem to exist if installing to a VMware virtual machine, I'm not sure if it exists on any of the other real hardware I own.

In googling around for this problem I found a VMware KB article which seems to describe a similar problem when installing Gutsy into a Guest. The article can be seen at http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004384 . I have tried the work around they suggest there, but it did not change the behavior.

Any suggestions as to what other information would be useful to collect would be welcome.

Revision history for this message
QuentinHartman (qhartman) wrote :

perhaps a duplicate of bug #217815

Revision history for this message
QuentinHartman (qhartman) wrote :

Ok, so I just sat and ran through the installation process, letting the installer sit when it stalled and timing the lags. I got to the point that I had previously thought was an indefinite hang where it was preparing the encrypted LVM stuff and went to lunch. When I came back an hour later, it was sitting waiting for input (LVM password, username, etc). The stalls I timed during the remainder of the install ranged anywhere from 2-5 minutes each. After sitting through that process for about 45 minutes I just started tapping the keyboard to keep the install moving. The remainder of the install took about 20 minutes. All told, it took about 2 and half hours to complete, including some indeterminate amount of time it was sitting waiting for input while I was gone. I was actually here watching it though for one and half hours of that time, sitting through numerous stalls.

I've also done an install using the default "use whole disk" option on this machine and it exhibited the same stalling behavior, though it did not take nearly as long to complete, presumably because encryption was no longer a factor and I was tapping the keyboard as soon as I noticed a stall rather than waiting for stalls to unstick on their own. This installation finished in less than 30 minutes. A much more acceptable amount of time.

As a point of reference I did an install on a VM using the same ISO and install options on my desktop machine. It took about 45 minutes including time to wipe an 8GB virtual drive, with no apparent stalling. The drives on the machine remained active through all portions of the install that I expected them to be active.

I really don't know where to go from here. While this technically "works", if this problem makes it into the final release, it could potentially be quite a black eye if it surfaces on even a relatively small number of machines. I can only assume that this problem was introduced relatively recently (it did not exist in Gutsy on this hardware) and only affects certain hardware combinations.

Revision history for this message
QuentinHartman (qhartman) wrote :

I really don't think partman-crypto is responsible for this as the stalls exist even when doing an unencrypted install. The partman-crypto related portions of the installation seem to just be creating particularly large stalls.

Revision history for this message
Colin Watson (cjwatson) wrote :

Ben indicated on the phone that this is probably to do with lost interrupts, and that keyboard and mouse actions tickle the kernel into action; apparently we've had some similar issues before. Reassigning over to linux.

Rick Clark (dendrobates)
Changed in linux:
assignee: nobody → ubuntu-kernel-team
importance: Undecided → Critical
milestone: none → ubuntu-8.04
status: New → Confirmed
Revision history for this message
QuentinHartman (qhartman) wrote :

I've been adding my experiences to bug #217815 over there, Colin King suggested that one try different clock sources

On my machine I have hpet, acpi_pm, jiffies, and tsc as available clock sources. hpet seems to be the default. Here are my results using the daily from the 21st (I think...)

hpet: stalls
acpi_pm: stalls (perhaps worse than hpet)
jiffies: Works! If this does not genuinely eliminate the stalls, it at least makes them so short they are easily confused with normal processing.
tsc: stalls really badly, worse than hpet or acpi_pm

I'm going to test jiffies a couple more times, but the one run though worked great! The only times that normal disc activity was interrupted were very brief (a few seconds each) pauses while packages were being configured. Otherwise CD-ROM or HDD was busy the entire time I would expect it to be.

Revision history for this message
Steve Langasek (vorlon) wrote :

Since there appears to be a workaround identified for this bug (changing the clock source), I'm tentatively milestoning it for 8.04.1. The kernel team should advise if this isn't actually feasible.

Changed in linux:
milestone: ubuntu-8.04 → ubuntu-8.04.1
Revision history for this message
Steve Langasek (vorlon) wrote :

Can someone from the kernel team please comment here on the feasibility of fixing this in an SRU?

Revision history for this message
Louis-Dominique Dubeau (ldd) wrote :

I've also encoutered this problem while installing Kubuntu 8.04 on my wife's Sager NP2092.

However, this same problem also happens when booting a *fully installed* and *fully updated* Kubuntu. After entering the passphrase to decrypt the filesystems, it is sometimes necessary to hit keys from time to time to keep the boot process moving forward. Otherwise, the system just waits there forever. The problem does not seem to recur once the system is fully up.

Revision history for this message
QuentinHartman (qhartman) wrote :

Louis-Dominique:

If you add "clocksource=jiffies" to your kernel options at boot time, the stalls will go away. I've been running my machine with this setting since I discovered it in April and it has been working well.

Stefan Bader (smb)
Changed in linux:
assignee: ubuntu-kernel-team → stefan-bader-canonical
Revision history for this message
Stefan Bader (smb) wrote :

Could someone please verify that this problem is still occurring on the latest Hardy kernel (and maybe also try the latest Intrepid alpha for comparison). If the same problem still is there with the work-around still good, please add the output of dmidecode of that system. Thanks.

Changed in linux:
status: Confirmed → Incomplete
Revision history for this message
Gustavo (gustavocavallin-deactivatedaccount) wrote :

I'm not sure how relevant this is, but I have installed Ubuntu 8.04.1 LTS Desktop two days ago in a VMWare Workstation (version 6.0.4 build 93057) and it also stalls (for several hours) during boot sequence.

I'm about to try the suggestions mentioned here. One question how do I obtain the dmidecode?

Revision history for this message
Gustavo (gustavocavallin-deactivatedaccount) wrote :

Folks, I could get around the issue by disabling the CD-ROM drive (I also disabled the floppy disk drive, but this seems to be of no importance).
Once I disabled the CD-ROM the boot sequence happened in a matter of seconds.

Revision history for this message
QuentinHartman (qhartman) wrote : Re: [Bug 217849] Re: Hardy 64-bit beta and nightly alternate installation stalls

On Wed, Aug 20, 2008 at 8:19 AM, Stefan Bader
<email address hidden> wrote:
> Could someone please verify that this problem is still occurring on the
> latest Hardy kernel (and maybe also try the latest Intrepid alpha for
> comparison). If the same problem still is there with the work-around
> still good, please add the output of dmidecode of that system. Thanks.

I'll check what happens on my machine when I remove the "jiffies"
workaround this weekend. FWIW, the machine has been performing great
with that workaround in place.

-QH-

Revision history for this message
Leann Ogasawara (leannogasawara) wrote : Re: Hardy 64-bit beta and nightly alternate installation stalls

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Progressive (vbgraphix2003) wrote :

The kubuntu intrepid 64bit alpha 5 live CD booted up for me in lightning speed. As for the installation process, that is another matter.

Revision history for this message
QuentinHartman (qhartman) wrote :

I just upgraded my machine to Intrepid beta and removed the "jiffies" work around. It seems to be performing well!

I throw my second HDD in here this week and do an install using the Intrepid beta installer and report back...

Revision history for this message
QuentinHartman (qhartman) wrote :

After further testing, this still seems to be a problem on this machine. The boot process usually stall unless I tap keys, so I turned the "jiffies" work-around back on. I'm currently running the 2.6.27-5 kernel because last time I tried 27-7 the machine would not boot at all. I'll be trying that some more in the next couple days... I still haven't had a chance to test a clean install, I'll try to get that done at the same time as testing 27-7.

Revision history for this message
Stefan Bader (smb) wrote :

Please, if the problem occurs, can you provide the dmesg output and maybe the output of 'cat /proc/interrupts'? Thanks

Revision history for this message
Louis-Dominique Dubeau (ldd) wrote :

I've experienced a problem which may be related to this but it is on a Sager NP2090, not NP2092. I've been able to run Hardy for months without the problem manifesting itself but for the past month or so I've experienced quasi-freezes on Hardy and the same on Intrepid. At this point, I'm pretty convinced that I did not do anything wrong and that there is a real bug somewhere.

All of my Ubuntu partitions are encrypted. If I let the kernel boot normally and issue:

$ dd if=/dev/zero of=~/tmp/garbage bs=500M count=10

Pretty soon the system becomes unusable. The GUI freezes (applets no longer update their information) except for the mouse pointer which moves but with a huge lag. Keyboard response is also severely lagged as evidenced by the fact that it takes several seconds for a "CapsLock" toggle to register. Eventually the system gets out of that quasi-freeze but it takes a while. At the time the test is run, the following appears in the syslog:

Nov 1 11:06:26 bodhi kernel: [ 608.405383] CE: hpet increasing min_delta_ns to 15000 nsec
Nov 1 11:10:26 bodhi kernel: [ 848.045122] CE: hpet increasing min_delta_ns to 22500 nsec
Nov 1 11:20:43 bodhi kernel: [ 1465.497279] CE: hpet increasing min_delta_ns to 33750 nsec
Nov 1 11:36:14 bodhi kernel: [ 2396.253626] CE: hpet increasing min_delta_ns to 50624 nsec
Nov 1 11:37:05 bodhi kernel: [ 2447.124177] CE: hpet increasing min_delta_ns to 75936 nsec
Nov 1 12:31:06 bodhi kernel: [ 5688.448082] CE: hpet increasing min_delta_ns to 113904 nsec
Nov 1 13:21:01 bodhi kernel: [ 8682.672764] CE: hpet increasing min_delta_ns to 170856 nsec

In cases where the problem occurred by itself even after more than 5 minutes the system could still be hung.

If I boot with clocksource=jiffies then the same command above the system is still responsive. The GUI does not freeze: the applets I use to report CPU usage and temperature continue to update as normal. The system is a bit laggy but still usable. Also, the "hpet increasing" messages are absent from the syslog.

$ uname -a
Linux bodhi 2.6.27-7-generic #1 SMP Thu Oct 30 04:12:22 UTC 2008 x86_64 GNU/Linux

I'm suspecting the problems may be related because the NP2090's hardware is very similar to that of the NP2092 and the solution in either case is to set clocksource=jiffies

Quentin, have you ever run into the "hpet increasing" message? Can you try the "dd" command above and see what happens without and with the clocksource=jiffies parameter?

Revision history for this message
QuentinHartman (qhartman) wrote :

Stefan Bader:

I'll post the dmesg output of the system without the "jiffies" workaround and the output of /proc/interrupts shortly. I'll get both of those pieces of information collected as shortly after a boot as possible to limit the amount of extraneous data.

I've finally been able to do a clean install of Intrepid on this machine, and it still hangs during installation and it still needs clocksource=jiffies in order to operate normally once installed.

Louis-Dominique Dubeau:

I have not ever seen to "hpet increasing" messages, and when I try your dd test, the machine does indeed become unresponsive. In fact, it remains somewhat unresponsive even after the dd completes. Programs that were open during the test "die" even though they remain drawn on the screen.

In a nutshell, the behavior you describe accurately describes what I am seeing, though I am missing the "hoet increasing" messages.

Setting the clocksource to "jiffies" eliminates these problems. During the dd test, the system does become quite slow, but only as much as I would expect the system to given the high IO and processor load that puts the system under, and upon completion of the test, performance and other behaviors return to normal.

Revision history for this message
QuentinHartman (qhartman) wrote :

dmesg output attached.

Revision history for this message
QuentinHartman (qhartman) wrote :

/proc/interrupts output attached.

Revision history for this message
Stefan Bader (smb) wrote :

@Quentin, could you try whether the following combination would also circumvent the problem: "noapic lapic"? This should prevent usage of the ioapic for interrupt routing and was reported to help in some cases.

Revision history for this message
QuentinHartman (qhartman) wrote :

@Stefan

So, "noapic lapic" does seem to reduce the severity of the problem, but it does not eliminate it like the "clocksource=jiffies" workaround seems to. There was still at least one pause durring the boot process that had to be "tickled" away, and while the interactive performance is much better, I'd have to (completely subjectively, and after limited testing) say that it is not as good as when using the jiffies clocksource.

contents of "/proc/timer_list" attached.

Revision history for this message
Stefan Bader (smb) wrote :

Thanks for trying. The time list also looks sane at least for the time time it was taken and unfortunately I cannot really think of a way to take it when the pause occurs. Since typing disturbs that. Have you tried to play with idle= in your experiments? Either idle=halt or idle=nomwait?

Revision history for this message
QuentinHartman (qhartman) wrote :

I have not. Do you have specific suggestions? I take it using alternate clocksources is not a desirable work around?

Revision history for this message
Stefan Bader (smb) wrote :

To answer the last first: it is a work around. jiffies is the least fine grained source of clock and does not really explain what is wrong. And there are also other cases that are very similar but cannot be worked around with the jiffies setting. So I really would like to get down to the source. Yet, none of the systems I got can reproduce this.
As for which of those to use: idle=halt limits the used C-state to C1 only but uses the mwait to enter this state while idle=nomwait is supposed to not use the mwait call (in theory). So both might be interesting...

Revision history for this message
jscc88 (jscc88-deactivatedaccount) wrote :

this version has this bug

Changed in linux:
assignee: stefan-bader-canonical → sebastiancobaleda
status: Incomplete → Confirmed
Revision history for this message
jscc88 (jscc88-deactivatedaccount) wrote :

this is a duplicate of another bug. the bug #217815

Revision history for this message
QuentinHartman (qhartman) wrote :

This is not a dupe of that bug. Though the behavior is similar, that bug was found to be a flaw in KVM. That was fixed, according to the comments in that bug. This problem happens on _real_ hardware. I believe it is a kernel problem.

Revision history for this message
jscc88 (jscc88-deactivatedaccount) wrote :

yes, i believe too that is a kernel problem.

i had recompiled whit the -19 kernel version and this problem appear don be present.

Changed in linux:
status: Confirmed → Fix Committed
Revision history for this message
jscc88 (jscc88-deactivatedaccount) wrote :

yes, i believe too that is a kernel problem.

i had recompiled whit the -19 kernel version and this problem appear don be present.

Changed in linux:
status: Fix Committed → Fix Released
Revision history for this message
Julian Alarcon (julian-alarcon) wrote :

The user Juan Sebastian Cobaleda Cano is a troll, or something. We, in the Ubuntu-Co Team are checking all his changes in Launchpad. Sorry for the problems.

Changed in linux:
assignee: sebastiancobaleda → nobody
status: Fix Released → Incomplete
Revision history for this message
QuentinHartman (qhartman) wrote : Re: [Bug 217849] Re: Hardy 64-bit beta and nightly alternate installation stalls

Gotcha, thanks for the info. FWIW, I updated this machine to Jaunty
over the weekend, and it still seems to stall out randomly without the
"jiffies fix". The video is also kinda broken, but I know that's
fixable. As soon as I get that straightened out, I'll start poking at
this problem again. If there's any information I can provide, or
anything I can do, let me know. This machine is used for testing and
whatnot anymore, so I have no problem with flattening it and
rebuilding or trying other kernel builds or whatever.

Revision history for this message
QuentinHartman (qhartman) wrote : Re: Hardy 64-bit beta and nightly alternate installation stalls

Returning to this. Problem seems to exist in the latest Jaunty as of Feb 2nd 2009...

Revision history for this message
TJ (tj) wrote :

I've just taken a look at the original dmesg and noticed something which could be significant:

[ 0.004000] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[ 0.004000] Placing software IO TLB between 0x20000000 - 0x24000000

Quentin, when you get the opportunity could you attach a new Jaunty dmesg captured without any workarounds in place, and also:

uname -a
sudo lspci -vvnn > /tmp/lspci-vvnn.log
lsmod >/tmp/lsmod.log

Secondly, would it be possible to try temporarily removing 2GB of RAM (I'm guessing the system has 2 x 2GB RAM modules) and seeing if the problem still occurs?
With only 2GB of RAM the software bounce buffering required by Intel CPUs that have no IOMMU controller will not be used. The main test is to reduce total RAM to less than 4GB.

Revision history for this message
QuentinHartman (qhartman) wrote : Re: [Bug 217849] Re: Hardy 64-bit beta and nightly alternate installation stalls

I'm working on producing the information you asked for. An interesting
data point though is that using the Jaunty Alpha 5 64 alt installer,
there have been no signs of "stalling" during the installation, which
is where I first saw the behavior before. A good sign? I hope so....

-QH-

Stefan Bader (smb)
Changed in linux:
assignee: nobody → stefan-bader-canonical
Revision history for this message
TJ (tj) wrote : Re: Hardy 64-bit beta and nightly alternate installation stalls

Quentin, any news on this?

Revision history for this message
QuentinHartman (qhartman) wrote : Re: [Bug 217849] Re: Hardy 64-bit beta and nightly alternate installation stalls

On Thu, Mar 19, 2009 at 7:58 PM, TJ <email address hidden> wrote:

> Quentin, any news on this?
>

I just finished upgrading to the Jaunty Beta last night and took out the
"jiffies" hack. In the few minutes I had to work on the machine after that,
it seemed to be behaving normally. Hopefully I'll have time tonight to pull
the information you requested before. Sorry for the delays on this.

-QH-

Revision history for this message
QuentinHartman (qhartman) wrote :
  • lsmod.log Edit (2.9 KiB, text/x-log; charset=US-ASCII; name="lsmod.log")
  • lspci-vvnn.log Edit (33.9 KiB, text/x-log; charset=US-ASCII; name="lspci-vvnn.log")

Ack, looks like the few minutes were wrong. Got some "stalls" now that I
have jiffies turned off.

Output from uname -a:

Linux sage 2.6.28-11-generic #38-Ubuntu SMP Fri Mar 27 10:01:17 UTC 2009
x86_64 GNU/Linux

logs of requested info attached

I'll remove half the RAM next...

Revision history for this message
QuentinHartman (qhartman) wrote :
  • dmesg.txt Edit (51.9 KiB, text/plain; charset=US-ASCII; name="dmesg.txt")
  • lsmod.log Edit (2.9 KiB, text/x-log; charset=US-ASCII; name="lsmod.log")
  • lspci-vvnn.log Edit (34.0 KiB, text/x-log; charset=US-ASCII; name="lspci-vvnn.log")

Ok, removed half the RAM (so I'm at 2GB) and the problem still exists.

Attached are new logs and dmesg output.

Revision history for this message
Paul Dufresne (paulduf) wrote : Re: Hardy 64-bit beta and nightly alternate installation stalls

As the bug reporter have given the requested info, marking the bug as Confirmed.

There are many interesting in this latest dmes.log (http://launchpadlibrarian.net/24728819/dmesg.txt):
[ 0.000000] Scanning 2 areas for low memory corruption
[ 0.000000] modified physical RAM map:
...
[ 0.465400] pci 0000:00:1c.5: PME# supported from D0 D3hot D3cold
[ 0.465404] pci 0000:00:1c.5: PME# disabled
...
[ 1.177439] pciehp: PCI Express Hot Plug Controller Driver version: 0.4
[ 1.179534] ACPI: EC: non-query interrupt received, switching to interrupt mode
[ 1.182713] ACPI: EC: GPE storm detected, transactions will use polling mode
[ 1.680004] ACPI: EC: missing confirmations, switch off interrupt mode.
...
[ 0.528840] pci 0000:01:00.0: BAR 6: can't allocate mem resource [0xe0000000-0xdfffffff]
...
[ 3.290134] EDD information not available.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
QuentinHartman (qhartman) wrote :

Just modified the title of this bug to make it more accurate.

summary: - Hardy 64-bit beta and nightly alternate installation stalls
+ Hardy, Intrepid, Jaunty 64 "hiccup" on Sager notebook.
Revision history for this message
QuentinHartman (qhartman) wrote :

just found a BIOS update for this machine, upgraded to version 1.16. doesn't seem to have made a difference. still getting "hiccups" that have to be tickled away with kb / mouse input.

Revision history for this message
Justin Lloyd (captlloyd) wrote :

I too have this issue on a Sager NP2092/Compal JFL92. I have found that switching the hard drive controller to IDE mode seems to help. However this is a less than ideal solution.

I would like to help resolve this issue if anyone has any suggestions of what I could do to track down the cause.

Revision history for this message
QuentinHartman (qhartman) wrote : Re: [Bug 217849] Re: Hardy, Intrepid, Jaunty 64 "hiccup" on Sager notebook.

On Thu, May 7, 2009 at 5:52 PM, Justin Lloyd <email address hidden> wrote:

> I too have this issue on a Sager NP2092/Compal JFL92. I have found that
> switching the hard drive controller to IDE mode seems to help. However
> this is a less than ideal solution.
>

Yay! I'm not alone! Does switching to IDE mode seem to have any sort of
performance impact? Presumably it would preclude taking advantage of things
like native command queuing and whatnot, so there is likely a bit of a hit.
Also, when you say it "helps", does it eliminate the problem, or just make
it less frequent? I'll do some tests and benchmarks on my machine as soon as
I'm able

> I would like to help resolve this issue if anyone has any suggestions of
> what I could do to track down the cause.
>

Thanks for any assistance you can offer!

-QH-

Revision history for this message
Stefan Bader (smb) wrote :

Hi Quentin, sorry for not getting back for that long. If you have some time it would be great if you could verify tat hangs against the latest daily kernel from https://wiki.ubuntu.com/KernelMainlineBuilds
We want to make sure this problem still exists upstream. That kernel can be installed in parallel to your current kernel, so you can quickly change back and forth. If that upstream kernel still shows the problem, could you try booting with
"acpi_skip_timer_override debug lapic=debug" and post the result here?

Revision history for this message
QuentinHartman (qhartman) wrote :

On Thu, Jun 4, 2009 at 5:59 AM, Stefan Bader <email address hidden>wrote:

> Hi Quentin, sorry for not getting back for that long. If you have some time
> it would be great if you could verify tat hangs against the latest daily
> kernel from https://wiki.ubuntu.com/KernelMainlineBuilds
> We want to make sure this problem still exists upstream. That kernel can be
> installed in parallel to your current kernel, so you can quickly change back
> and forth. If that upstream kernel still shows the problem, could you try
> booting with
> "acpi_skip_timer_override debug lapic=debug" and post the result here?
>

No worries. I'll test it out just as soon as I can. Probably not until the
weekend though. Thanks for staying on this!

QH

Revision history for this message
QuentinHartman (qhartman) wrote :

Ok, So I am running the daily mainline from today, June 11, and the problem still exists. After rebooting I had to recompile my Nvidia drivers. During the compilation the progress stopped at 30% for about 3 minutes. Once that time passed and I was quite sure it wasn't just thinking _real hard_, I hit the right-arrow cursor and the compilation continued, completing in only a few seconds.

I've added the kernel flags you suggest above ("acpi_skip_timer_override debug lapic=debug") and rebooted, and so far, I haven't noticed any hiccups. was that expected? I'm not sure what you are after with "post the result here". I'm going to setup a few long-running processes and see if they hang up. Are you after some debugging output?

Revision history for this message
Stefan Bader (smb) wrote :

Yes I am. When using the debug options in conjunction with the skip argument, the dmesg will contain some information about the system trying figure out the right routing for the timer interrupt. That might be interesting to look at.

Revision history for this message
QuentinHartman (qhartman) wrote :

dmesg output. Hope it's useful!

Revision history for this message
Stefan Bader (smb) wrote :

It looks useful. At least to understand better. It looks like the timer interrupt is actually connected to pin 0 of the apic, not pin 2 as the override tells the system. Sadly I messed up with the debug options, so the apic table was not dumped (the right one is apic=debug). But even with what is there, it looks like "cat /proc/interrups" will show IO-APIC as interrupt source. Which means Linux found the right apic pin by guessing.
This proves the BIOS is incorrect at that point. What I cannot say is, whether Linux has a chance to detect this automatically. In other cases I saw, this required some knowledge of the chipset and in some cases this was NDA. You have an Intel chipset, maybe there are chances.
All in all you should, with that work-around, have no more troubles with that sort of hangs.

wm3 (y.s)
Changed in linux (Ubuntu):
assignee: Stefan Bader (stefan-bader-canonical) → 3n!Gma (wm3)
Stefan Bader (smb)
Changed in linux (Ubuntu):
assignee: 3n!Gma (wm3) → Stefan Bader (stefan-bader-canonical)
Juliet (titania-bianca)
Changed in linux (Ubuntu):
status: Confirmed → Fix Committed
Revision history for this message
Stefan Bader (smb) wrote :

This seemed to be troll attacked. There is no fix for that. We got a work-around but I don't know how to detect and fix misrouted irq0 overriders, yet.

Changed in linux (Ubuntu):
status: Fix Committed → Triaged
Revision history for this message
Chris Johnston (cjohnston) wrote :

The nominations may not be appropriate. Please investigate and fix as appropriate.

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 217849] Re: Hardy, Intrepid, Jaunty 64 "hiccup" on Sager notebook.

Hi Chris,

On Mon, Feb 08, 2010 at 06:38:10PM -0000, Chris Johnston wrote:
> The nominations may not be appropriate. Please investigate and fix as
> appropriate.

It's not necessary to follow up asking for investigations of these
nominations. This generates more noise than the nominations themselves do.
:)

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
Stefan Bader (smb) wrote :

This is very often a BIOS issue which cannot be solved (very often due to missing chipset documentation) by Linux. My recommendation would be to try "acpi_skip_timer_override" added to the kernel boot arguments. This will cause Linux to try finding a working connection to the timer (see below).

[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: BIOS IRQ0 pin2 override ignored.
[ 0.036552] ..TIMER: vector=0x30 apic1=0 pin1=0 apic2=-1 pin2=-1
[ 0.044001] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[ 0.044001] ...trying to set up timer (IRQ0) through the 8259A ...
[ 0.044001] ..... (found apic 0 pin 0) ...
[ 0.085678] ....... works.

Revision history for this message
QuentinHartman (qhartman) wrote :

I'll give that a shot.

Revision history for this message
Rami Autiomäki (rami-autiomaki) wrote :

I have JFL92 laptop. Until now I used kernel parameter clocksource=jiffies to prevent machine pauses. This has a downside that processor is polling all the time and consuming a lot of power.
I discovered that better alternative is to use pci=nomsi as kernel parameter. This will also allow me to enable AHCI mode for harddrive in bios.

https://bugzilla.redhat.com/show_bug.cgi?id=502536
http://www.mjmwired.net/kernel/Documentation/MSI-HOWTO.txt

Revision history for this message
QuentinHartman (qhartman) wrote : Re: [Bug 217849] Re: Hardy, Intrepid, Jaunty 64 "hiccup" on Sager notebook.

On Mon, May 10, 2010 at 9:49 PM, Rami Autiomäki
<email address hidden> wrote:
> I have JFL92 laptop. Until now I used kernel parameter clocksource=jiffies to prevent machine pauses. This has a downside that processor is polling all the time and consuming a lot of power.
> I discovered that better alternative is to use pci=nomsi as kernel parameter. This will also allow me to enable AHCI mode for harddrive in bios.

This also works for me, except for the first few moments that kernel
is loading. I still have to "tickle" the machine with a keypress to
get the boot started, but once it's going, it seems to carry on fine.
Thanks for the tip! It seems we are going to be doomed to live with
this bug until these machines are retired.

Revision history for this message
Rami Autiomäki (rami-autiomaki) wrote :

I think this https://bugzilla.kernel.org/show_bug.cgi?id=12118 is same kind of bug report in kernel bug tracker. This bug seems to concern other laptops than Compal or Sager also.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Please try the Maverick Live CD.

Revision history for this message
QuentinHartman (qhartman) wrote : Re: [Bug 217849] Re: Hardy, Intrepid, Jaunty 64 "hiccup" on Sager notebook.

On Wed, Dec 8, 2010 at 8:32 AM, Tim Gardner <email address hidden> wrote:
> Please try the Maverick Live CD.

I've upgraded to Maverick on this machine and I am pretty sure the
problem still exists. Is there reason to believe the LiveCD would
behave differently? I'll confirm that the problem still exists under
Maverick when I get home tonight.

Q

Revision history for this message
Tim Gardner (timg-tpi) wrote :

QuentinHartman - It would be helpful if we could get your machine information attached. Please run 'apport-collect 217849' which will upload a bunch of HW info.

Revision history for this message
Stefan Bader (smb) wrote :

I just want to repeat some observations I made over time here: I have seen this on quite a few machines. Though solutions vary and I am not sure there is a good automated way yet. The basic problem is that using lower C-states has some effects on the timer tick sources normally used (lapic and tsc stop). In general, the kernel tries to detect that and use a broadcast mechanism that takes the timer device to fire an interrupt (irq0) when the kernel should wake up. On newer boards that timer is hardware emulated using the hpet.

In the simplest case, the bios does not initialize the chipset correctly. There is a acpi bios entry which tells the OS to which apic pin the timer is really connected to. Sometimes this is wrong and "acpi_skip_timer_override" does help. One way to test for it is to look at irq0 in /proc/interrupts. This number should increment at least on one cpu.

But I have seen this odd case on my netbook, where the timer interrupt increments and still I had those weird issues with the system being stuck until I pressed a key. Which seems to be a strange case of the timer interrupt not being sufficient to wake up the system (which is something I still need to get explained by experts). In that case the i915 card also has the issue of not triggering interrupts on sync which usually seems to keep systems alive enough. Plus the other devices that usually keeps the system busy (ethernet and ahci) was using MSI interrupts, which also seems to insufficient. In this constellation "pci=nomsi" seems to help well enough as it causes enough other interrupts to be triggered.

If nothing else helps, there might be a chance to prevent the system to go into the deeper c-states at all: "processor.max_cstate=<nr>" (or when intel_idle is used "intel_idle.max_cstate=<nr>"). One would see the result in powertop. This comes at the expense of higher power usage though.

Revision history for this message
Stefan Bader (smb) wrote :

At this point I think two things should be tried and reported:
1. Does experimenting with the boot option described in the previous comment lead to any success?
2. It might be interesting to see the results of a run of the firmware test suite (without suspend and hibernate tests). This can be found at https://launchpad.net/~firmware-testing-team/+archive/ppa-firmware-test-suite-natty-stable/+packages. You can just force the natty package to be installed.

Meanwhile I would set this report to incomplete.

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
QuentinHartman (qhartman) wrote :

Ok, I will try those as well. Sorry for the delay in getting
information collected, this week got kinda hectic.

Pete Graner (pgraner)
Changed in linux (Ubuntu Karmic):
status: New → Won't Fix
Changed in linux (Ubuntu Lucid):
status: New → Won't Fix
Changed in linux (Ubuntu):
importance: Critical → Low
Revision history for this message
Brad Figg (brad-figg) wrote : Unsupported series, setting status to "Won't Fix".

This bug was filed against a series that is no longer supported and so is being marked as Won't Fix. If this issue still exists in a supported series, please file a new bug.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.