ISST:LTE: Regression: roselp2 Oops in kernel during setup io

Bug #1546439 reported by bugproxy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Tim Gardner
Xenial
Fix Released
Undecided
Tim Gardner

Bug Description

== Comment: #0 - Alton L. Pundt <email address hidden> - 2016-02-08 17:14:17 ==
---Problem Description---
was setting up i/o scenerios and kernel oopsed

Contact Information = A.P. Pundt <email address hidden>

---uname output---
4.4.0-2-generic

Machine Type = 8286-42A

---System Hang---
 system hung. does not respond to ping

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 login as root:
cd /kte/tools/
setup io

Stack trace output:
 [ 148.026515] NIP [c000000000529cd0] blk_account_io_start+0x150/0x290
[ 148.026518] LR [c000000000529cac] blk_account_io_start+0x12c/0x290
[ 148.026520] Call Trace:
[ 148.026523] [c0000006da9437d0] [c000000000529cac] blk_account_io_start+0x12c/0x290 (unreliable)
[ 148.026527] [c0000006da943810] [c00000000052b20c] blk_queue_bio+0x1dc/0x4c0
[ 148.026530] [c0000006da943870] [c00000000052881c] generic_make_request+0x16c/0x240
[ 148.026534] [c0000006da9438d0] [c0000000005289c4] submit_bio+0xd4/0x1f0
[ 148.026537] [c0000006da943980] [c00000000033b1a4] mpage_readpages+0x174/0x1c0
[ 148.026541] [c0000006da943a70] [c000000000333b7c] blkdev_readpages+0x4c/0x70
[ 148.026545] [c0000006da943ab0] [c00000000023cf18] __do_page_cache_readahead+0x1a8/0x2e0
[ 148.026548] [c0000006da943b80] [c00000000023d1c8] ondemand_readahead+0x178/0x2f0
[ 148.026552] [c0000006da943be0] [c00000000022aadc] generic_file_read_iter+0x41c/0x6b0
[ 148.026555] [c0000006da943cc0] [c000000000334d38] blkdev_read_iter+0x68/0xa0
[ 148.026559] [c0000006da943cf0] [c0000000002dc80c] new_sync_read+0xcc/0x110
[ 148.026562] [c0000006da943d90] [c0000000002dd614] vfs_read+0xa4/0x1c0
[ 148.026565] [c0000006da943de0] [c0000000002de71c] SyS_read+0x6c/0x110
[ 148.026568] [c0000006da943e30] [c000000000009204] system_call+0x38/0xb4
[ 148.026571] Instruction dump:

Oops output:
 [ 148.026436] Oops: Kernel access of bad area, sig: 11 [#1]

System Dump Info:
  The system was configured to capture a dump, however a dump was not produced.

== Comment: #2 - Alton L. Pundt <email address hidden> - 2016-02-08 17:16:09 ==
console screen during time of oops:

root@roselp2:~# [ 148.026417] Unable to handle kernel paging request for data at address 0x779ea0000
[ 148.026433] Faulting instruction address: 0xc000000000529cd0
[ 148.026436] Oops: Kernel access of bad area, sig: 11 [#1]
[ 148.026438] SMP NR_CPUS=2048 NUMA pSeries
[ 148.026442] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache binfmt_misc dm_round_robin pseries_rng dm_multipath rtc_generic nfsd auth_rpcgss nfs_acl lockd grace sunrpc autofs4 btrfs xor raid6_pq mlx4_en vxlan ip6_udp_tunnel udp_tunnel ibmvscsi mlx4_core
[ 148.026460] CPU: 192 PID: 5726 Comm: parted Not tainted 4.4.0-2-generic #16-Ubuntu
[ 148.026463] task: c0000006da8d5100 ti: c0000006da940000 task.ti: c0000006da940000
[ 148.026466] NIP: c000000000529cd0 LR: c000000000529cac CTR: 0000000000000000
[ 148.026468] REGS: c0000006da943550 TRAP: 0300 Not tainted (4.4.0-2-generic)
[ 148.026471] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24002828 XER: 00000000
[ 148.026478] CFAR: c000000000008468 DAR: 0000000779ea0000 DSISR: 40000000 SOFTE: 0
GPR00: c000000000529cac c0000006da9437d0 c000000001583800 0000000000000001
GPR04: 0000000000001000 c00000074eec14a0 c00000074eec14e8 0000000000000800
GPR08: 0000000779ea0000 0000000000000000 0000000000000000 c000000000ae08e0
GPR12: 0000000024002222 c000000007b52000 0000000051633600 0000000000080001
GPR16: c0000006d944efa0 0000000000000001 0000000000001000 0000000000000001
GPR20: c0000006d944ef00 0000000000020000 0000000000000001 c000000000332700
GPR24: c00000070d405000 0000000000000100 0000000000000200 fffffffffffffffb
GPR28: c000000749810730 c00000071eb4d400 0000000000000000 00000000000000c0
[ 148.026515] NIP [c000000000529cd0] blk_account_io_start+0x150/0x290
[ 148.026518] LR [c000000000529cac] blk_account_io_start+0x12c/0x290
[ 148.026520] Call Trace:
[ 148.026523] [c0000006da9437d0] [c000000000529cac] blk_account_io_start+0x12c/0x290 (unreliable)
[ 148.026527] [c0000006da943810] [c00000000052b20c] blk_queue_bio+0x1dc/0x4c0
[ 148.026530] [c0000006da943870] [c00000000052881c] generic_make_request+0x16c/0x240
[ 148.026534] [c0000006da9438d0] [c0000000005289c4] submit_bio+0xd4/0x1f0
[ 148.026537] [c0000006da943980] [c00000000033b1a4] mpage_readpages+0x174/0x1c0
[ 148.026541] [c0000006da943a70] [c000000000333b7c] blkdev_readpages+0x4c/0x70
[ 148.026545] [c0000006da943ab0] [c00000000023cf18] __do_page_cache_readahead+0x1a8/0x2e0
[ 148.026548] [c0000006da943b80] [c00000000023d1c8] ondemand_readahead+0x178/0x2f0
[ 148.026552] [c0000006da943be0] [c00000000022aadc] generic_file_read_iter+0x41c/0x6b0
[ 148.026555] [c0000006da943cc0] [c000000000334d38] blkdev_read_iter+0x68/0xa0
[ 148.026559] [c0000006da943cf0] [c0000000002dc80c] new_sync_read+0xcc/0x110
[ 148.026562] [c0000006da943d90] [c0000000002dd614] vfs_read+0xa4/0x1c0
[ 148.026565] [c0000006da943de0] [c0000000002de71c] SyS_read+0x6c/0x110
[ 148.026568] [c0000006da943e30] [c000000000009204] system_call+0x38/0xb4
[ 148.026571] Instruction dump:
[ 148.026573] e89c0060 e87c00c0 48016631 60000000 7c7d1b78 e9230358 792a07a1 408200e4
[ 148.026578] 39400000 886d02ca 994d02ca e90d0030 <7d49402a> 394a0001 7d49412a 4bae6cbd
[ 148.026584] ---[ end trace 8c0008700e05e0ed ]---
[ 148.028168]
[ 148.028173] ------------[ cut here ]------------
[ 148.028175] WARNING: at /build/linux-X_Ge0M/linux-4.4.0/kernel/exit.c:661
[ 148.028177] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache binfmt_misc dm_round_robin pseries_rng dm_multipath rtc_generic nfsd auth_rpcgss nfs_acl lockd grace sunrpc autofs4 btrfs xor raid6_pq mlx4_en vxlan ip6_udp_tunnel udp_tunnel ibmvscsi mlx4_core
[ 148.028193] CPU: 192 PID: 5726 Comm: parted Tainted: G D 4.4.0-2-generic #16-Ubuntu
[ 148.028195] task: c0000006da8d5100 ti: c0000006da940000 task.ti: c0000006da940000
[ 148.028198] NIP: c0000000000bc2a8 LR: c0000000000bc28c CTR: 00000000006338e4
[ 148.028201] REGS: c0000006da9430a0 TRAP: 0700 Tainted: G D (4.4.0-2-generic)
[ 148.028203] MSR: 8000000000021033 <SF,ME,IR,DR,RI,LE> CR: 28002224 XER: 00000001
[ 148.028209] CFAR: c000000000144b9c SOFTE: 0
GPR00: c0000000000bc28c c0000006da943320 c000000001583800 0000000000000000
GPR04: 0000000000000000 c0000006da8d5100 ffffffffffffffff 0000000000000000
GPR08: 0000000000000000 c0000006da943ad0 c000000749810730 0000000000003ff0
GPR12: 0000000000002200 c000000007b52000 0000000051633600 0000000000080001
GPR16: c0000006d944efa0 0000000000000001 0000000000001000 0000000000000001
GPR20: c0000006d944ef00 0000000000020000 0000000000000001 c0000006da8d5100
GPR24: c00000070d405000 0000000000000100 00000000000000c0 c0000000015e8260
GPR28: 0000000000000000 000000000000000b 000000000000000b c0000006da943550
[ 148.028241] NIP [c0000000000bc2a8] do_exit+0x78/0xbf0
[ 148.028244] LR [c0000000000bc28c] do_exit+0x5c/0xbf0
[ 148.028245] Call Trace:
[ 148.028247] [c0000006da943320] [c0000000000bc28c] do_exit+0x5c/0xbf0 (unreliable)
[ 148.028251] [c0000006da9433e0] [c000000000020af4] die+0x314/0x470
[ 148.028255] [c0000006da943470] [c000000000050ab8] bad_page_fault+0xd8/0x150
[ 148.028258] [c0000006da9434e0] [c000000000008680] handle_page_fault+0x2c/0x30
[ 148.028262] --- interrupt: 300 at blk_account_io_start+0x150/0x290
[ 148.028262] LR = blk_account_io_start+0x12c/0x290
[ 148.028266] [c0000006da943810] [c00000000052b20c] blk_queue_bio+0x1dc/0x4c0
[ 148.028269] [c0000006da943870] [c00000000052881c] generic_make_request+0x16c/0x240
[ 148.028272] [c0000006da9438d0] [c0000000005289c4] submit_bio+0xd4/0x1f0
[ 148.028275] [c0000006da943980] [c00000000033b1a4] mpage_readpages+0x174/0x1c0
[ 148.028279] [c0000006da943a70] [c000000000333b7c] blkdev_readpages+0x4c/0x70
[ 148.028282] [c0000006da943ab0] [c00000000023cf18] __do_page_cache_readahead+0x1a8/0x2e0
[ 148.028285] [c0000006da943b80] [c00000000023d1c8] ondemand_readahead+0x178/0x2f0
[ 148.028289] [c0000006da943be0] [c00000000022aadc] generic_file_read_iter+0x41c/0x6b0
[ 148.028291] [c0000006da943cc0] [c000000000334d38] blkdev_read_iter+0x68/0xa0
[ 148.028294] [c0000006da943cf0] [c0000000002dc80c] new_sync_read+0xcc/0x110
[ 148.028297] [c0000006da943d90] [c0000000002dd614] vfs_read+0xa4/0x1c0
[ 148.028300] [c0000006da943de0] [c0000000002de71c] SyS_read+0x6c/0x110
[ 148.028304] [c0000006da943e30] [c000000000009204] system_call+0x38/0xb4
[ 148.028306] Instruction dump:
[ 148.028307] 60000000 60000000 eaed02a0 7ee3bb78 480888d1 60000000 e93707c0 2fa90000
[ 148.028312] 419e0014 e9490000 7faa4800 419e093c <0fe00000> 78290464 8129000c 552502ef
[ 148.028318] ---[ end trace 8c0008700e05e0ee ]---
[ 148.028821] Unable to handle kernel paging request for data at address 0x00000348
[ 148.028824] Faulting instruction address: 0xc000000000528e9c
[ 148.028827] Oops: Kernel access of bad area, sig: 11 [#2]
[ 148.028829] SMP NR_CPUS=2048 NUMA pSeries
[ 148.028831] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache binfmt_misc dm_round_robin pseries_rng dm_multipath rtc_generic nfsd auth_rpcgss nfs_acl lockd grace sunrpc autofs4 btrfs xor raid6_pq mlx4_en vxlan ip6_udp_tunnel udp_tunnel ibmvscsi mlx4_core
[ 148.028846] CPU: 192 PID: 0 Comm: swapper/192 Tainted: G D W 4.4.0-2-generic #16-Ubuntu
[ 148.028850] task: c000000773e0bd90 ti: c00000077f9c0000 task.ti: c000000773ea4000
[ 148.028852] NIP: c000000000528e9c LR: c000000000528f80 CTR: c00000000075b870
[ 148.028855] REGS: c00000077f9c38a0 TRAP: 0300 Tainted: G D W (4.4.0-2-generic)
[ 148.028857] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 48022882 XER: 00000001
[ 148.028864] CFAR: c000000000008468 DAR: 0000000000000348 DSISR: 40000000 SOFTE: 1
GPR00: c000000000528f80 c00000077f9c3b20 c000000001583800 c000000749810730
GPR04: 0000000000010000 c0000000015baa60 0000000000000000 0000000000000600
GPR08: c000000001873800 0000000000000000 c0000000015b3800 d0000000073d4d90
GPR12: 0000000000002200 c000000007b52000 c000000000b03a18 0000000000200040
GPR16: 0000000000000000 c00000077f9c0000 c000000000f7dd00 c0000000015b2200
GPR20: 00000000ffff6b84 0000000000000009 c00000000187aaf8 0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000005
GPR28: c000000749810730 0000000000010000 0000000000000080 c000000749810730
[ 148.028896] NIP [c000000000528e9c] blk_account_io_completion+0x8c/0x100
[ 148.028900] LR [c000000000528f80] blk_update_request+0x70/0x490
[ 148.028902] Call Trace:
[ 148.028904] [c00000077f9c3b20] [c00000077f9c3b70] 0xc00000077f9c3b70 (unreliable)
[ 148.028907] [c00000077f9c3b50] [c000000000528f80] blk_update_request+0x70/0x490
[ 148.028912] [c00000077f9c3be0] [c0000000007467d0] scsi_end_request+0x70/0x250
[ 148.028915] [c00000077f9c3c50] [c00000000074a294] scsi_io_completion+0xe4/0x710
[ 148.028918] [c00000077f9c3d10] [c00000000073dcf8] scsi_finish_command+0x148/0x200
[ 148.028922] [c00000077f9c3d90] [c000000000749788] scsi_softirq_done+0x198/0x200
[ 148.028925] [c00000077f9c3e10] [c000000000534354] blk_done_softirq+0xb4/0xe0
[ 148.028928] [c00000077f9c3e50] [c0000000000be518] __do_softirq+0x188/0x3a0
[ 148.028932] [c00000077f9c3f40] [c0000000000be9a8] irq_exit+0xc8/0x100
[ 148.028935] [c00000077f9c3f60] [c00000000001130c] __do_irq+0x8c/0x190
[ 148.028938] [c00000077f9c3f90] [c000000000024760] call_do_irq+0x14/0x24
[ 148.028941] [c000000773ea79f0] [c0000000000114a8] do_IRQ+0x98/0x140
[ 148.028944] [c000000773ea7a40] [c000000000002594] hardware_interrupt_common+0x114/0x180
[ 148.028950] --- interrupt: 501 at plpar_hcall_norets+0x1c/0x28
[ 148.028950] LR = check_and_cede_processor+0x34/0x50
[ 148.028954] [c000000773ea7d30] [c0000000008f61d0] check_and_cede_processor+0x20/0x50 (unreliable)
[ 148.028959] [c000000773ea7d90] [c0000000008f63f8] shared_cede_loop+0x68/0x170
[ 148.028961] [c000000773ea7dd0] [c0000000008f3460] cpuidle_enter_state+0x160/0x3c0
[ 148.028965] [c000000773ea7e30] [c000000000118c18] call_cpuidle+0x78/0xd0
[ 148.028968] [c000000773ea7e70] [c000000000118fac] cpu_startup_entry+0x33c/0x450
[ 148.028972] [c000000773ea7f30] [c00000000004559c] start_secondary+0x33c/0x360
[ 148.028976] [c000000773ea7f90] [c000000000008b6c] start_secondary_prolog+0x10/0x14
[ 148.028978] Instruction dump:
[ 148.028980] ebe1fff8 7c0803a6 4e800020 60000000 60420000 a0ed0008 3d420003 e8df00c8
[ 148.028985] 79291f28 38aa7260 7bdeba42 78e71f24 <e9460348> 7d05382a 7d4a4214 7d0a482a
[ 148.028990] ---[ end trace 8c0008700e05e0ef ]---
[ 148.030823]
[ 150.030883] Kernel panic - not syncing: Fatal exception in interrupt
[ 150.042512] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

root@conelp2:~# multipath -ll
mpathd (1IBM_IPR-0_68405D4000002DC0) dm-1 IBM,IPR-0 68405D40
size=361G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 1:2:0:0 sde 8:64 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 0:2:1:0 sdb 8:16 active ready running
mpathc (1IBM_IPR-0_68405D4000002CC0) dm-0 IBM,IPR-0 68405D40
size=361G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 1:2:1:0 sdf 8:80 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 0:2:0:0 sda 8:0 active ready running
mpatha (1IBM_IPR-0_68405D4000003800) dm-2 IBM,IPR-0 68405D40
size=723G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 1:2:2:0 sdg 8:96 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 0:2:2:0 sdc 8:32 active ready running
root@conelp2:~# service multipath-tools stop
root@conelp2:~# multipath -F
root@conelp2:~# multipath -ll
root@conelp2:~# ls -la /dev/mapper/
total 0
drwxr-xr-x 2 root root 60 Feb 9 12:37 .
drwxr-xr-x 16 root root 6960 Feb 9 12:37 ..
crw------- 1 root root 10, 236 Feb 9 12:33 control

root@conelp2:~# /kte/tools/setup io
...
Removing unprotected SW RAID devices...
Removing unprotected LVM devices...
Configuring MPIO...
Configuring partitions...
Configuring 12 Partitions for sdc...done
   created sdc1 sdc2 sdc3 sdc5 sdc6 sdc7 sdc8 sdc9 sdc10 sdc11 sdc12 sdc13 successfully
Configuring 12 Partitions for sdb...[ 625.917304] Unable to handle kernel paging request for data at address 0x27ada0000
[ 625.917323] Faulting instruction address: 0xc000000000529cd0
cpu 0x94: Vector: 300 (Data Access) at [c000000251ed7550]
    pc: c000000000529cd0: blk_account_io_start+0x150/0x290
    lr: c000000000529cac: blk_account_io_start+0x12c/0x290
    sp: c000000251ed77d0
   msr: 8000000000009033
   dar: 27ada0000
 dsisr: 40000000
  current = 0xc000000251e57b20
  paca = 0xc000000007b37e00 softe: 0 irq_happened: 0x01
    pid = 8979, comm = parted
enter ? for help
[c000000251ed7810] c00000000052b20c blk_queue_bio+0x1dc/0x4c0
[c000000251ed7870] c00000000052881c generic_make_request+0x16c/0x240
[c000000251ed78d0] c0000000005289c4 submit_bio+0xd4/0x1f0
[c000000251ed7980] c00000000033b1a4 mpage_readpages+0x174/0x1c0
[c000000251ed7a70] c000000000333b7c blkdev_readpages+0x4c/0x70
[c000000251ed7ab0] c00000000023cf18 __do_page_cache_readahead+0x1a8/0x2e0
[c000000251ed7b80] c00000000023d1c8 ondemand_readahead+0x178/0x2f0
[c000000251ed7be0] c00000000022aadc generic_file_read_iter+0x41c/0x6b0
[c000000251ed7cc0] c000000000334d38 blkdev_read_iter+0x68/0xa0
[c000000251ed7cf0] c0000000002dc80c new_sync_read+0xcc/0x110
[c000000251ed7d90] c0000000002dd614 vfs_read+0xa4/0x1c0
[c000000251ed7de0] c0000000002de71c SyS_read+0x6c/0x110
[c000000251ed7e30] c000000000009204 system_call+0x38/0xb4
--- Exception: c01 (System Call) at 00003fff98b63f38
SP (3fffe47e64c0) is in userspace

== Comment: #8 - YUECHANG E. MEI <email address hidden> - 2016-02-09 16:05:10 ==
I have verified that setup io would trigger system crash when we configure the partitions on single path disks. If we enable the multipath service and configure partitions on /devmapper/mpathn, everything will be fine.

Problem was recreated in roselp2, and it is in xmon now. Please take a look!

== Comment: #19 - YUECHANG E. MEI <email address hidden> - 2016-02-15 18:39:23 ==
conelp2 has finished ST test, and I could recreate the issue with following steps:

Steps to recreate:
1. run "service multipath-tools stop"
2. run " multipath -F"
3. run "multipath -ll", to make sure there is no multipath seeing by the lpar
4. modify "/kte/tools/io/scen/__lpar__" in kte server with

partition|sdc|12
partition|sdb|12
partition|sdd|12

admndisk|sdc1|ext3
blast|sdc2|ext4
fstest|sdc3|ext3
aio|sdc5|ext3

admndisk|sdb1|ext3
blast|sdb2|ext4
fstest|sdb3|ext3
aio|sdb5|ext3

admndisk|sdd1|ext3
blast|sdd2|ext4
fstest|sdd3|ext3
aio|sdd5|ext3

Error:
root@conelp2:~# /kte/tools/setup io
.....
Configuring partitions...
Configuring 12 Partitions for sdc...done
   created sdc1 sdc2 sdc3 sdc5 sdc6 sdc7 sdc8 sdc9 sdc10 sdc11 sdc12 sdc13 successfully
Configuring 12 Partitions for sdb...[ 675.658649] Unable to handle kernel paging request f or data at address 0x27aae0000
[ 675.658669] Faulting instruction address: 0xc000000000529e30
[ 675.658675] Oops: Kernel access of bad area, sig: 11 [#1]
[ 675.658677] SMP NR_CPUS=2048 NUMA pSeries
[ 675.658683] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache rpadlpar_io rpaphp binfmt_misc pseries_rng rtc_generic scsi_dh_alua dm_round_robin dm_mult ipath sunrpc autofs4 xfs libcrc32c btrfs xor raid6_pq ses enclosure be2net lpfc vxlan scsi_ transport_fc ipr ip6_udp_tunnel udp_tunnel
[ 675.658711] CPU: 137 PID: 11532 Comm: parted Not tainted 4.4.0-4-generic #19-Ubuntu
[ 675.658715] task: c00000025b1f5f90 ti: c00000025b214000 task.ti: c00000025b214000
[ 675.658719] NIP: c000000000529e30 LR: c000000000529e0c CTR: 0000000000000000
[ 675.658722] REGS: c00000025b217560 TRAP: 0300 Not tainted (4.4.0-4-generic)
[ 675.658725] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24002828 XER: 00000000
[ 675.658736] CFAR: c000000000008468 DAR: 000000027aae0000 DSISR: 40000000 SOFTE: 0
GPR00: c000000000529e0c c00000025b2177e0 c000000001593900 0000000000000001
GPR04: 00000000039cf900 c000000253c700c0 c000000253c70150 00000000039cf800
GPR08: 000000027aae0000 0000000000000000 0000000000000000 c00000027faac300
GPR12: 0000000024002222 c000000007b31580 000000005fe63600 0000000000080001
GPR16: c0000002287bf2a0 0000000000000001 0000000000000200 0000000000000001
GPR20: c0000002287bf200 0000000000020000 0000000000000001 c000000000332840
GPR24: c00000026d01e400 0000000000000100 0000000000000200 c000000244230258
GPR28: c0000002504a5ee0 c00000022fc46c00 0000000000000000 0000000000000089
[ 675.658785] NIP [c000000000529e30] blk_account_io_start+0x150/0x290
[ 675.658789] LR [c000000000529e0c] blk_account_io_start+0x12c/0x290
[ 675.658792] Call Trace:
[ 675.658795] [c00000025b2177e0] [c000000000529e0c] blk_account_io_start+0x12c/0x290 (unre liable)
[ 675.658800] [c00000025b217820] [c00000000052b36c] blk_queue_bio+0x1dc/0x4c0
[ 675.658805] [c00000025b217880] [c000000000528964] generic_make_request+0x144/0x230
[ 675.658810] [c00000025b2178d0] [c000000000528b24] submit_bio+0xd4/0x1f0
[ 675.658815] [c00000025b217980] [c00000000033b2e4] mpage_readpages+0x174/0x1c0
[ 675.658820] [c00000025b217a70] [c000000000333cbc] blkdev_readpages+0x4c/0x70
[ 675.658826] [c00000025b217ab0] [c00000000023d038] __do_page_cache_readahead+0x1a8/0x2e0
[ 675.658830] [c00000025b217b80] [c00000000023d2e8] ondemand_readahead+0x178/0x2f0
[ 675.658835] [c00000025b217be0] [c00000000022abfc] generic_file_read_iter+0x41c/0x6b0
[ 675.658840] [c00000025b217cc0] [c000000000334e78] blkdev_read_iter+0x68/0xa0
[ 675.658844] [c00000025b217cf0] [c0000000002dc92c] new_sync_read+0xcc/0x110
[ 675.658848] [c00000025b217d90] [c0000000002dd734] vfs_read+0xa4/0x1c0
[ 675.658853] [c00000025b217de0] [c0000000002de83c] SyS_read+0x6c/0x110
[ 675.658858] [c00000025b217e30] [c000000000009204] system_call+0x38/0xb4
[ 675.658861] Instruction dump:
[ 675.658863] e89c0060 e87c00c0 480165d1 60000000 7c7d1b78 e9230358 792a07a1 408200e4
[ 675.658871] 39400000 886d02ca 994d02ca e90d0030 <7d49402a> 394a0001 7d49412a 4bae6b5d
[ 675.658882] ---[ end trace be1c5abd542c2196 ]---
[ 675.661579]
[ 675.661626] Sending IPI to other CPUs
[ 675.679881] IPI complete

Revision history for this message
bugproxy (bugproxy) wrote : xmon log

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-136788 severity-critical targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1546439/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-02-29 17:57 EDT-------
*** Bug 138015 has been marked as a duplicate of this bug. ***

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-02-29 18:01 EDT-------
*** Bug 138047 has been marked as a duplicate of this bug. ***

------- Comment From <email address hidden> 2016-02-29 18:06 EDT-------
*** Bug 138047 has been marked as a duplicate of this bug. ***

bugproxy (bugproxy)
tags: added: targetmilestone-inin1604
removed: targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote : Detailed setup io execution leading crash

------- Comment (attachment only) From <email address hidden> 2016-03-01 10:19 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla
Download full text (4.9 KiB)

------- Comment From <email address hidden> 2016-03-01 14:03 EDT-------
So as some folks have previously pointed out we end up with issues with the struct hd_struct *part in blk_account_io_start()

2342 part = disk_map_sector_rcu(rq->rq_disk, blk_rq_pos(rq));
2343 if (!hd_struct_try_get(part)) {

r28 has the address of the struct request *rq from which we get rq->rq_disk located at offset 0xc0 from the what r28 points to. Offset 0x60 of the request has the sector on the disk we care to find out which partition it is in.

d:mon> d c0000007490f0450
c0000007490f0450 d0ba6cda060000c0 d0ba6cda060000c0 |..l.......l.....|
c0000007490f0460 0000000000000000 0000000000000000 |................|
c0000007490f0470 0000000000000000 0000000000000000 |................|
c0000007490f0480 e0083a4d070000c0 0000000000000000 |..:M............|
c0000007490f0490 0000402400000000 0100000000000000 |..@$............|
c0000007490f04a0 0000000000000000 0d00000000000100 |................|
c0000007490f04b0 8008000000000000 00860749070000c0 |...........I....|
c0000007490f04c0 00860749070000c0 0000000000000000 |...I............|
c0000007490f04d0 0000000000000000 d8040f49070000c0 |...........I....|
c0000007490f04e0 0000000000000000 0000000000000000 |................|
c0000007490f04f0 0000000000000000 0000000000000000 |................|
c0000007490f0500 0000000000000000 0000000000000000 |................|
c0000007490f0510 00d86668070000c0 0000000000000000 |..fh............|
c0000007490f0520 3781ffff00000000 4098b162070000c0 |7.......@..b....|
c0000007490f0530 84b745a127000000 0000000000000000 |..E.'...........|
c0000007490f0540 0100000000000000 0000000000000000 |................|

This should be the struct gendisk *rq_disk that is passed to disk_map_sector_rcu() and at offset 0x40 we have the struct disk_part_tbl __rcu *part_tbl and after that at offset 0x48 is the struct hd_struct part0

d:mon> d c00000076866d800
c00000076866d800 0800000020000000 1000000073646300 |.... .......sdc.|
c00000076866d810 0000000000000000 0000000000000000 |................|
c00000076866d820 0000000000000000 0000630000000000 |..........c.....|
c00000076866d830 0000000000000000 0000000000000000 |................|
c00000076866d840 4038257e070000c0 0000000000000000 |@8%~............|
c00000076866d850 00c0092100000000 0000000000000000 |...!............|
c00000076866d860 0000000000000000 0000000000000000 |................|
c00000076866d870 6881067e070000c0 c000a54e070000c0 |h..~.......N....|
c00000076866d880 2800d44e070000c0 409c2068070000c0 |(..N....@. h....|
c00000076866d890 18fccc50070000c0 6000f448070000c0 |...P....`..H....|
c00000076866d8a0 00004a68070000c0 602f4e01000000c0 |..Jh....`/N.....|
c00000076866d8b0 500af148070000c0 0e00000007000000 |P..H............|
c00000076866d8c0 0000000000000000 08274d01000000c0 |.........'M.....|
c00000076866d8d0 0100000000000000 d8d86668070000c0 |..........fh....|
c00000076866d8e0 d8d86668070000c0 0000000000000000 |..fh............|
c00000076866d8f0 0000000000000000 0000000000000000 |................|

struct disk_part_tbl __rcu *part_tbl. Offset 0x18 is part_tbl->last_lookup

d:mon> d 0xC00000077E253840
c000...

Read more...

Revision history for this message
bugproxy (bugproxy) wrote : xmon log for io ran on Leaf adapter on irislp2

------- Comment (attachment only) From <email address hidden> 2016-03-07 16:22 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-03-08 00:27 EDT-------
*** Bug 138537 has been marked as a duplicate of this bug. ***

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-03-08 15:11 EDT-------
Script to trigger the problem
(from comments #35 #36)

# cat <<"EOF" >test-bz136788.sh
#!/bin/sh
set -ex

DEVICE="$1"
[ -n "$DEVICE" ]

parted -s $DEVICE mklabel msdos
parted -s -a optimal $DEVICE mkpart primary 0% 12%
parted -s -a optimal $DEVICE mkpart primary 12% 24%
parted -s -a optimal $DEVICE mkpart primary 24% 36%
parted -s -a optimal $DEVICE mkpart extended 36% 100%
parted -s -a optimal $DEVICE mkpart logical 36% 48%
parted -s -a optimal $DEVICE mkpart logical 48% 60%
parted -s -a optimal $DEVICE mkpart logical 60% 72%
parted -s -a optimal $DEVICE mkpart logical 72% 84%
parted -s -a optimal $DEVICE mkpart logical 84% 96%
parted -s $DEVICE print
EOF

# chmod +x test-bz136788.sh

# while true; do ./test-bz136788.sh /dev/sdc; done

Then, after a short 'while' (pun intended):

...
+ parted -s -a optimal /dev/sdc mkpart primary 24% 36%
[ 669.234353] Unable to handle kernel paging request for data at address 0x7760d0000
[ 669.234400] Faulting instruction address: 0xc000000000532010
cpu 0x19: Vector: 300 (Data Access) at [c0000006d8b3b560]
pc: c000000000532010: blk_account_io_start+0x150/0x290
lr: c000000000531fec: blk_account_io_start+0x12c/0x290
...

Revision history for this message
bugproxy (bugproxy) wrote :

*** Bug 139317 has been marked as a duplicate of this bug. ***
*** Bug 138916 has been marked as a duplicate of this bug. ***

Revision history for this message
Naveen Kaje (nkaje) wrote :
Download full text (59.3 KiB)

A similar crash was observed on Linux Kernel 4.3 and 4.5 on Qualcomm Technologies QDF2432 platform.

********************* Begin Crash Logs ******************************************
Model: Mass Storage Device (scsi)
Disk /dev/sdb: 7948MB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number Start End Size Type File system Flags
 1 512B 32.0MB 32.0MB primary
 2 32.0MB 2000MB 1968MB primary

[ 112.685168] Unable to handle kernel paging request at virtual address 3f9629000
[ 112.691762] pgd = ffff800365d79000
[ 112.695022] [3f9629000] *pgd=000000436917a003, *pud=0000000000000000
[ 112.701294] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[ 112.706854] Modules linked in:
[ 112.709887] CPU: 4 PID: 2916 Comm: udevd Not tainted 4.5.0+ #4
[ 112.715702] Hardware name: Qualcomm QDF2432 DP/Server Development Platform, BIOS Engineering Build
[ 112.724760] task: ffff800365d4b400 ti: ffff800364c0c000 task.ti: ffff800364c0c000
[ 112.732239] PC is at blk_account_io_start+0x224/0x4b0
[ 112.737233] LR is at blk_account_io_start+0x20c/0x4b0
[ 112.742265] pc : [<ffff8000004dba30>] lr : [<ffff8000004dba18>] pstate: 20000145
[ 112.749665] sp : ffff800364c0f830
[ 112.752937] x29: ffff800364c0f830 x28: 0000000000000000
[ 112.758227] x27: ffff800365defcb8 x26: 0000000000000001
[ 112.763522] x25: ffff800364c0c018 x24: ffff800364c0c000
[ 112.768818] x23: ffff800365def980 x22: ffff800366f85ac8
[ 112.774112] x21: 0000000000000004 x20: 0000000000000000
[ 112.779408] x19: ffff800366f85a08 x18: 0000000000000003
[ 112.784702] x17: 0000000000000000 x16: 0000000000000000
[ 112.789997] x15: 0000000000000000 x14: 0000000000000000
[ 112.795292] x13: 0000000000000000 x12: 0000000000000000
[ 112.800587] x11: 0000000000000000 x10: 0000000000000007
[ 112.805883] x9 : 0000000000000000 x8 : ffff800366f85b60
[ 112.811178] x7 : 0000000000000000 x6 : 000000000000003f
[ 112.816473] x5 : 1ffff0006cb8974d x4 : ffff10006c981803
[ 112.821769] x3 : ffff8000004dba18 x2 : 00000003f9629000
[ 112.827063] x1 : 0000000000000000 x0 : 00000003f9629000
[ 112.832358]
[ 112.833840] Process udevd (pid: 2916, stack limit = 0xffff800364c0c020)
[ 112.840453] Stack: (0xffff800364c0f830 to 0xffff800364c10000)
[ 112.846179] f820: ffff800364c0f890 ffff8000004dd0e8
[ 112.854011] f840: ffff800366f85a08 ffff800364c0fba0 ffff800364c0fba0 0000000000000003
[ 112.861826] f860: 0000000000000004 ffff800364c0c000 00000000fffffffb ffff800364c0c000
[ 112.869637] f880: ffff800364c0f950 ffff800364c0fb90 ffff800364c0f8f0 ffff8000004da1e8
[ 112.877448] f8a0: ffff8000f7760500 00000000ffffffff ffff800366d0d0f0 ffff800364c0c010
[ 112.885263] f8c0: ffff800364c0c010 0000000000000001 ffff8000f7760500 ffff8000f7760300
[ 112.893074] f8e0: 0000000064c0f8f0 ffff800366f85a08 ffff800364c0f960 ffff8000004da3e0
[ 112.900888] f900: ffff800001453000 ffff8000f7760500 0000000000000100 00000000103a0000
[ 112.908702] f920: ffff800364c0c000 ffff800365d4b400 ffff800364c0fb98 ffff8000f7760500
[ 112.916511] f940: ffff800364c0fb90 ffff800364c0fb90 ffff8000f7760500 ffff8000f7760500
[ 112.924324] f...

Revision history for this message
bugproxy (bugproxy) wrote : Detailed setup io execution leading crash

Default Comment by Bridge

Revision history for this message
Ming Lei (tom-leiming) wrote :

Hi,

The attached patch should fix one related issue, could anyone test it?

Thanks,

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "fix crash" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Naveen Kaje (nkaje) wrote :

Pulled in Ming's patch and the issue did not reproduce with it.

Revision history for this message
bugproxy (bugproxy) wrote : linux-header

Default Comment by Bridge

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Applied 0001-block-partition-initialize-percpuref-before-sending-.patch

Changed in linux (Ubuntu Xenial):
assignee: Taco Screen team (taco-screen-team) → Tim Gardner (timg-tpi)
status: New → Fix Committed
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

Thank you Breno and Rick !!

Patched kernel (attached to the bug) helped to fix the issue..

Installed patch kernel:
-----------------
root@roselp2:/kte/tools# uname -r
4.4.0-16-generic
root@roselp2:/kte/tools# dpkg -l |grep 16-generic
ii linux-headers-4.4.0-16-generic 4.4.0-16.32-internal ppc64el Linux kernel headers for version 4.4.0 on PowerPC 64el SMP
ii linux-image-4.4.0-16-generic 4.4.0-16.32-internal ppc64el Linux kernel image for version 4.4.0 on PowerPC 64el SMP
ii linux-image-extra-4.4.0-16-generic 4.4.0-16.32-internal ppc64el Linux kernel extra modules for version 4.4.0 on PowerPC 64el SMP
iU linux-tools-4.4.0-16-generic 4.4.0-16.32-internal ppc64el Linux kernel version specific tools for version 4.4.0-16

Now, disk partitioning is going well :
--------------------------

Creating xfs filesystem on sdb3 for admndisk...done
mounting /dev/sdb3 on /admntest/sdb3...done
Add entry /dev/sdb3 /admntest/sdb3 to /etc/fstab
Creating ext3 filesystem on sdb5 for admndisk...done
mounting /dev/sdb5 on /admntest/sdb5...done
Add entry /dev/sdb5 /admntest/sdb5 to /etc/fstab
Creating btrfs filesystem on sdc3 for admndisk...done
mounting /dev/sdc3 on /admntest/sdc3...done
Creating ext4 filesystem on sdc5 for admndisk...done
mounting /dev/sdc5 on /admntest/sdc5...done
Creating xfs filesystem on sdb1 for aio...done
mounting /dev/sdb1 on /aio-stress/sdb1...done
Add entry /dev/sdb1 /aio-stress/sdb1 to /etc/fstab
Creating ext3 filesystem on sdb2 for aio...done
mounting /dev/sdb2 on /aio-stress/sdb2...done
Add entry /dev/sdb2 /aio-stress/sdb2 to /etc/fstab
Creating ext4 filesystem on sdc1 for aio...done
mounting /dev/sdc1 on /aio-stress/sdc1...done
Creating btrfs filesystem on sdc2 for aio...done
mounting /dev/sdc2 on /aio-stress/sdc2...done
Creating xfs filesystem on sdb6 for blast...done
mounting /dev/sdb6 on /blast/sdb6...done
Add entry /dev/sdb6 /blast/sdb6 to /etc/fstab
Creating ext3 filesystem on sdb7 for blast...done
mounting /dev/sdb7 on /blast/sdb7...done
Add entry /dev/sdb7 /blast/sdb7 to /etc/fstab
Creating btrfs filesystem on sdc6 for blast...done
mounting /dev/sdc6 on /blast/sdc6...done
Creating ext4 filesystem on sdc7 for blast...done
mounting /dev/sdc7 on /blast/sdc7...done
Creating xfs filesystem on sdb8 for fstest...done
mounting /dev/sdb8 on /fs_test/sdb8...done
Add entry /dev/sdb8 /fs_test/sdb8 to /etc/fstab
Creating ext3 filesystem on sdb9 for fstest...done
mounting /dev/sdb9 on /fs_test/sdb9...done
Add entry /dev/sdb9 /fs_test/sdb9 to /etc/fstab
Creating btrfs filesystem on sdc8 for fstest...done
mounting /dev/sdc8 on /fs_test/sdc8...done
Creating ext4 filesystem on sdc9 for fstest...done
mounting /dev/sdc9 on /fs_test/sdc9...done

Thanks,
Manju

Revision history for this message
bugproxy (bugproxy) wrote : xmon log for io ran on Leaf adapter on irislp2

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : linux-header

Default Comment by Bridge

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (7.9 KiB)

This bug was fixed in the package linux - 4.4.0-18.34

---------------
linux (4.4.0-18.34) xenial; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1566868

  * [i915_bpo] Fix RC6 on SKL GT3 & GT4 (LP: #1564759)
    - SAUCE: i915_bpo: drm/i915/skl: Fix rc6 based gpu/system hang
    - SAUCE: i915_bpo: drm/i915/skl: Fix spurious gpu hang with gt3/gt4 revs

  * CONFIG_ARCH_ROCKCHIP not enabled in armhf generic kernel (LP: #1566283)
    - [Config] CONFIG_ARCH_ROCKCHIP=y

  * [Feature] Memory Bandwidth Monitoring (LP: #1397880)
    - perf/x86/cqm: Fix CQM handling of grouping events into a cache_group
    - perf/x86/cqm: Fix CQM memory leak and notifier leak
    - x86/cpufeature: Carve out X86_FEATURE_*
    - Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
    - x86/topology: Create logical package id
    - perf/x86/mbm: Add Intel Memory B/W Monitoring enumeration and init
    - perf/x86/mbm: Add memory bandwidth monitoring event management
    - perf/x86/mbm: Implement RMID recycling
    - perf/x86/mbm: Add support for MBM counter overflow handling

  * User namespace mount updates (LP: #1566505)
    - SAUCE: quota: Require that qids passed to dqget() be valid and map into s_user_ns
    - SAUCE: fs: Allow superblock owner to change ownership of inodes with unmappable ids
    - SAUCE: fuse: Don't initialize user_id or group_id in mount options
    - SAUCE: cgroup: Use a new super block when mounting in a cgroup namespace
    - SAUCE: fs: fix a posible leak of allocated superblock

  * [arm64] kernel BUG at /build/linux-StrpB2/linux-4.4.0/fs/ext4/inode.c:2394!
    (LP: #1566518)
    - arm64: Honour !PTE_WRITE in set_pte_at() for kernel mappings
    - arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission

  * [Feature]USB core and xHCI tasks for USB 3.1 SuperSpeedPlus (SSP) support
    for Alpine Ridge on SKL (LP: #1519623)
    - usb: define USB_SPEED_SUPER_PLUS speed for SuperSpeedPlus USB3.1 devices
    - usb: set USB 3.1 roothub device speed to USB_SPEED_SUPER_PLUS
    - usb: show speed "10000" in sysfs for USB 3.1 SuperSpeedPlus devices
    - usb: add device descriptor for usb 3.1 root hub
    - usb: Support USB 3.1 extended port status request
    - xhci: Make sure xhci handles USB_SPEED_SUPER_PLUS devices.
    - xhci: set roothub speed to USB_SPEED_SUPER_PLUS for USB3.1 capable controllers
    - xhci: USB 3.1 add default Speed Attributes to SuperSpeedPlus device capability
    - xhci: set slot context speed field to SuperSpeedPlus for USB 3.1 SSP devices
    - usb: Add USB3.1 SuperSpeedPlus Isoc Endpoint Companion descriptor
    - usb: Parse the new USB 3.1 SuperSpeedPlus Isoc endpoint companion descriptor
    - usb: Add USB 3.1 Precision time measurement capability descriptor support
    - xhci: refactor and cleanup endpoint initialization.
    - xhci: Add SuperSpeedPlus high bandwidth isoc support to xhci endpoints
    - xhci: cleanup isoc tranfers queuing code
    - xhci: Support extended burst isoc TRB structure used by xhci 1.1 for USB 3.1
    - SAUCE: (noup) usb: fix regression in SuperSpeed endpoint descriptor parsing

  * wrong/missing permissions for device f...

Read more...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.