Call trace when testing fstat stressor on ppc64el with virtual keyboard and mouse present

Bug #1652132 reported by Mike Rushton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linux
Unknown
Unknown
linux (Ubuntu)
Fix Released
High
Colin Ian King
Nominated for Yakkety by Thadeu Lima de Souza Cascardo
Xenial
Fix Released
Undecided
Colin Ian King

Bug Description

== SRU REQUEST [Xenial, Yakkety, Zesty] ==

Ubuntu 16.04.1
Kernel = 4.4.0-53-generic-74-Ubuntu ppc64le

When running the stress-ng "fstat" stressor, it is trying to access the USB bus and giving a call trace and locking up any further USB activity (lsusb hangs). This only seems to occur so far on openpower(Firestone and Garrison) where there is a virtual USB keyboard and mouse built into the BMC.

From lsusb(before crashing): Bus 001 Device 004: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard and Mouse

Another openpower server(Briggs) has no virtual usb devices and does not experience the failure.

Please see attached kern.log and dmesg output for further details.

== Fix ==

Quirking the Virtual AMI keyboard and mouse with ALWAYS_POLL addresses the issue. The patch has been accepted into the upstream queue for 4.11, see http://www.spinics.net/lists/linux-usb/msg152977.html

== Test Case ==

run 10 times:
sudo stress-ng --fstat 128 -t 60 -v

Without the fix, it will hang, with the fix there is no hang or USB error messages.

== Regression Potential ==

This only quirks a specific AMI virtual keyboard and mouse into a poll mode, so it touches one device. Futhermore, the poll mode shouldn't affect operation; it just makes the URB handling less efficient.

Revision history for this message
Mike Rushton (leftyfb) wrote :
Revision history for this message
Mike Rushton (leftyfb) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1652132

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Mike Rushton (leftyfb)
summary: Call trace when testing fstat stressor on ppc64el with virtual keyboard
- and mouse
+ and mouse present
tags: added: blocks-hwcert-server
Revision history for this message
Mike Rushton (leftyfb) wrote :

apport-collect is not functioning on this server:

The collected information can be sent to the developers to improve the
application. This might take a few minutes.
tar: Removing leading `/' from member names
....tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
tar: /var/log/opal-elog: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors
...........dpkg-query: no packages found matching linux
.....................................

The dots go on forever

Jeff Lane  (bladernr)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in stress-ng:
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Jeff Lane  (bladernr) wrote :

FYI, Added stress-ng task as a specific test case in that tool is exposing this issue. Just want to be sure this is not a test tool issue instead of a legit kernel issue.

Revision history for this message
Colin Ian King (colin-king) wrote :

It looks to me that a reset on a USB device from an atomic context occurred, causing __usb_queue_reset_device to be executed from a worker thread but this got hung for some reason. The fstat stressor was being run at the time, and that also locks up. So my current hunch is that this may have been triggered by a USB device issue and then it snarls up causing subsequent hangs. Can this test be re-run to see if this fails again. My expectation is that it won't because its not a stress-ng specific triggered issue.

Revision history for this message
Mike Rushton (leftyfb) wrote :

This test has been run dozens of times across multiple machines and deployments and reboots. It is completely reproducible.

Revision history for this message
Colin Ian King (colin-king) wrote :

OK, I've got access to the machine and can easily reproduce this. This is not a stress-ng bug per-se, it's a locking issue so I'll remove stress-ng from the bug.

no longer affects: stress-ng
Changed in linux (Ubuntu):
importance: Undecided → High
assignee: nobody → Colin Ian King (colin-king)
status: Confirmed → In Progress
Revision history for this message
Colin Ian King (colin-king) wrote :

Hang seems to occur when on a race when fstat'ing /dev/psaux

Revision history for this message
Colin Ian King (colin-king) wrote :

Fails on mainline 4.9 too.

Revision history for this message
Colin Ian King (colin-king) wrote :

Can't break the kernel with 4.10-rc4

Revision history for this message
Colin Ian King (colin-king) wrote :

4.9 fails, 4.10-rc1 OK, bisecting on that

Revision history for this message
Colin Ian King (colin-king) wrote :

OK, so this is annoying race condition, so bisecting it has been causing me some pain today. It turns out that 4.10-rc4 seems OK, whereas 4.10-rc3 and before will hang after two or 3 iterations of my stress test.

Revision history for this message
Colin Ian King (colin-king) wrote :

Checked this right through 4.0 to 4.10-rc5, it can be triggered with any of these kernels with enough run time.

Revision history for this message
Colin Ian King (colin-king) wrote :

Going to quirk this with polling to avoid the issue with the urb kill.

description: updated
description: updated
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Colin Ian King (colin-king)
status: New → Fix Committed
Revision history for this message
Jeff Lane  (bladernr) wrote :
Download full text (4.2 KiB)

FWIW, I have run this on a Xeon Phi system and not reproduced the failure using stress-ng 0.07.16 (built in our PPA for Xenial).

This appears to have an AMI fake keyboard and mouse as the Power system that fails does.
Bus 003 Device 003: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard and Mouse
Device Descriptor:
  bLength 18
  bDescriptorType 1
  bcdUSB 2.00
  bDeviceClass 0 (Defined at Interface level)
  bDeviceSubClass 0
  bDeviceProtocol 0
  bMaxPacketSize0 64
  idVendor 0x046b American Megatrends, Inc.
  idProduct 0xff10 Virtual Keyboard and Mouse
  bcdDevice 1.00
  iManufacturer 1
  iProduct 2
  iSerial 0
  bNumConfigurations 1
  Configuration Descriptor:
    bLength 9
    bDescriptorType 2
    wTotalLength 59
    bNumInterfaces 2
    bConfigurationValue 1
    iConfiguration 0
    bmAttributes 0xe0
      Self Powered
      Remote Wakeup
    MaxPower 0mA
    Interface Descriptor:
      bLength 9
      bDescriptorType 4
      bInterfaceNumber 0
      bAlternateSetting 0
      bNumEndpoints 1
      bInterfaceClass 3 Human Interface Device
      bInterfaceSubClass 1 Boot Interface Subclass
      bInterfaceProtocol 1 Keyboard
      iInterface 3
        HID Device Descriptor:
          bLength 9
          bDescriptorType 33
          bcdHID 1.10
           bCountryCode 0 Not supported
          bNumDescriptors 1
          bDescriptorType 34 Report
          wDescriptorLength 65
         Report Descriptors:
           ** UNAVAILABLE **
      Endpoint Descriptor:
        bLength 7
        bDescriptorType 5
        bEndpointAddress 0x81 EP 1 IN
        bmAttributes 3
          Transfer Type Interrupt
          Synch Type None
          Usage Type Data
        wMaxPacketSize 0x0040 1x 64 bytes
        bInterval 1
    Interface Descriptor:
      bLength 9
      bDescriptorType 4
      bInterfaceNumber 1
      bAlternateSetting 0
      bNumEndpoints 1
      bInterfaceClass 3 Human Interface Device
      bInterfaceSubClass 1 Boot Interface Subclass
      bInterfaceProtocol 2 Mouse
      iInterface 4
        HID Device Descriptor:
          bLength 9
          bDescriptorType 33
          bcdHID 1.10
          bCountryCode 0 Not supported
          bNumDescriptors 1
          bDescriptorType 34 Report
          wDescriptorLength 63
         Report Descriptors:
           ** UNAVAILABLE **
      Endpoint Descriptor:
        bLength 7
        bDescriptorType 5
        bEndpointAddress 0x82 EP 2 IN
        bmAttributes 3
          Transfer Type Interrupt
          Synch Type ...

Read more...

Revision history for this message
Jeff Lane  (bladernr) wrote :

Note, the above comment was validated on a Fujitsu CX1640 Xeon Phi system with Xenial and the stock ubuntu kernel:
Linux cx1640-1 4.4.0-59-generic #80-Ubuntu SMP Fri Jan 6 17:47:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
tags: added: verification-needed-yakkety
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-yakkety' to 'verification-done-yakkety'. If the problem still exists, change the tag 'verification-needed-yakkety' to 'verification-failed-yakkety'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Colin Ian King (colin-king) wrote :

Tested with Xenial 4.4.0-63-generic #84-Ubuntu, no failures.

tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Colin Ian King (colin-king) wrote :

Tested with Yakkety 4.8.0-38-generic #41-Ubuntu, no failures

tags: added: verification-done-yakkety
removed: verification-needed-yakkety
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (23.0 KiB)

This bug was fixed in the package linux - 4.4.0-63.84

---------------
linux (4.4.0-63.84) xenial; urgency=low

  [ Thadeu Lima de Souza Cascardo ]

  * Release Tracking Bug
    - LP: #1660704

  * Backport Dirty COW patch to prevent wineserver freeze (LP: #1658270)
    - SAUCE: mm: Respect FOLL_FORCE/FOLL_COW for thp

  * Kdump through NMI SMP and single core not working on Ubuntu16.10
    (LP: #1630924)
    - x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
    - SAUCE: hv: don't reset hv_context.tsc_page on crash

  * [regression 4.8.0-14 -> 4.8.0-17] keyboard and touchscreen lost on Acer
    Chromebook R11 (LP: #1630238)
    - [Config] CONFIG_PINCTRL_CHERRYVIEW=y

  * Call trace when testing fstat stressor on ppc64el with virtual keyboard and
    mouse present (LP: #1652132)
    - SAUCE: HID: usbhid: Quirk a AMI virtual mouse and keyboard with ALWAYS_POLL

  * VLAN SR-IOV regression for IXGBE driver (LP: #1658491)
    - ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths

  * "Out of memory" errors after upgrade to 4.4.0-59 (LP: #1655842)
    - mm, page_alloc: convert alloc_flags to unsigned
    - mm, compaction: change COMPACT_ constants into enum
    - mm, compaction: distinguish COMPACT_DEFERRED from COMPACT_SKIPPED
    - mm, compaction: simplify __alloc_pages_direct_compact feedback interface
    - mm, compaction: distinguish between full and partial COMPACT_COMPLETE
    - mm, compaction: abstract compaction feedback to helpers
    - mm, oom: protect !costly allocations some more
    - mm: consider compaction feedback also for costly allocation
    - mm, oom, compaction: prevent from should_compact_retry looping for ever for
      costly orders
    - mm, oom: protect !costly allocations some more for !CONFIG_COMPACTION
    - mm, oom: prevent premature OOM killer invocation for high order request

  * Backport 3 patches to fix bugs with AIX clients using IBMVSCSI Target Driver
    (LP: #1657194)
    - SAUCE: ibmvscsis: Fix max transfer length
    - SAUCE: ibmvscsis: fix sleeping in interrupt context
    - SAUCE: ibmvscsis: Fix srp_transfer_data fail return code

  * NVMe: adapter is missing after abnormal shutdown followed by quick reboot,
    quirk needed (LP: #1656913)
    - nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too

  * Ubuntu 16.10 KVM SRIOV: if enable sriov while ping flood is running ping
    will stop working (LP: #1625318)
    - PCI: Do any VF BAR updates before enabling the BARs
    - PCI: Ignore BAR updates on virtual functions
    - PCI: Update BARs using property bits appropriate for type
    - PCI: Separate VF BAR updates from standard BAR updates
    - PCI: Don't update VF BARs while VF memory space is enabled
    - PCI: Remove pci_resource_bar() and pci_iov_resource_bar()
    - PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLE
    - PCI: Add comments about ROM BAR updating

  * Linux rtc self test fails in a VM under xenial (LP: #1649718)
    - kvm: x86: Convert ioapic->rtc_status.dest_map to a struct
    - kvm: x86: Track irq vectors in ioapic->rtc_status.dest_map
    - kvm: x86: Check dest_map->vector to match eoi signals for rtc

  * Xenial update to v4.4.44 stable releas...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (20.4 KiB)

This bug was fixed in the package linux - 4.8.0-38.41

---------------
linux (4.8.0-38.41) yakkety; urgency=low

  [ Thadeu Lima de Souza Cascardo ]

  * Release Tracking Bug
    - LP: #1661232

  * Backport Dirty COW patch to prevent wineserver freeze (LP: #1658270)
    - SAUCE: mm: Respect FOLL_FORCE/FOLL_COW for thp

  * Kdump through NMI SMP and single core not working on Ubuntu16.10
    (LP: #1630924)
    - x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
    - SAUCE: hv: don't reset hv_context.tsc_page on crash

  * Call trace when testing fstat stressor on ppc64el with virtual keyboard and
    mouse present (LP: #1652132)
    - HID: usbhid: Quirk a AMI virtual mouse and keyboard with ALWAYS_POLL

  * regression in linux-libc-dev in yakkety: C++ style comments are not allowed
    in ISO C90 (LP: #1659654)
    - generic syscalls: kill cruft from removed pkey syscalls

  * [16.04.2] POWER9 patches on top of 4.8 (LP: #1650263)
    - powerpc/book3s: Add a cpu table entry for different POWER9 revs
    - powerpc/mm/radix: Use different RTS encoding for different POWER9 revs
    - powerpc/mm/radix: Use different pte update sequence for different POWER9
      revs
    - powerpc/mm: Update the HID bit when switching from radix to hash
    - powerpc/64/kexec: NULL check "clear_all" in kexec_sequence
    - powerpc/64/kexec: Fix MMU cleanup on radix
    - powerpc/mm: Add radix flush all with IS=3
    - powerpc/64/kexec: Copy image with MMU off when possible
    - powerpc/64: Simplify adaptation to new ISA v3.00 HPTE format
    - powerpc/mm/radix: Invalidate ERAT on tlbiel for POWER9 DD1
    - powerpc/mm: Fix missing update of HID register on secondary CPUs
    - powerpc/64: Add some more SPRs and SPR bits for POWER9
    - powerpc/64: Provide functions for accessing POWER9 partition table
    - powerpc/powernv: Define real-mode versions of OPAL XICS accessors
    - powerpc/64: Define new ISA v3.00 logical PVR value and PCR register value
    - mm: update mmu_gather range correctly
    - mm/hugetlb: add tlb_remove_hugetlb_entry for handling hugetlb pages
    - mm: add tlb_remove_check_page_size_change to track page size change
    - powerpc: Revert Load Monitor Register Support
    - powerpc/mm: Correct process and partition table max size
    - powernv: Clear SPRN_PSSCR when a POWER9 CPU comes online
    - powerpc/mm/radix: Setup AMOR in HV mode to allow key 0
    - powerpc/mm: Detect instruction fetch denied and report
    - powerpc/mm/radix: Prevent kernel execution of user space
    - powerpc/mm: Rename hugetlb-radix.h to hugetlb.h
    - powerpc/mm/hugetlb: Handle hugepage size supported by hash config
    - powerpc/mm: Introduce _PAGE_LARGE software pte bits
    - powerpc/mm: Add radix__tlb_flush_pte_p9_dd1()
    - powerpc/mm: update radix__ptep_set_access_flag to not do full mm tlb flush
    - powerpc/mm: update radix__pte_update to not do full mm tlb flush
    - powerpc/mm: Batch tlb flush when invalidating pte entries
    - powerpc/sparse: Make a bunch of things static
    - powerpc/perf: factor out the event format field
    - powerpc/perf: update attribute_group data structure
    - powerpc/perf: power9 raw event format en...

Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.