ARM 보드 linux oops 질문드립니다.
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = cfb1c000
[00000000] *pgd=2fbfa831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1]
last sysfs file:
Modules linked in:
CPU: 0 Not tainted (2.6.39 #216)
PC is at kmem_cache_alloc+0x24/0x98
LR is at getname_flags+0x20/0xe8
pc : [] lr : [] psr: 40000093
sp : cfbb5f40 ip : 00000000 fp : 00000000
r10: 00000000 r9 : cfbb4000 r8 : ffffff9c
r7 : 00000000 r6 : 00000000 r5 : 40000013 r4 : 40202e7e
r3 : 40000093 r2 : 00000100 r1 : 000000d0 r0 : 00000000
Flags: nZcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
Control: 0005317f Table: 2fb1c000 DAC: 00000015
Process MEG (pid: 589, stack limit = 0xcfbb4270)
Stack: (0xcfbb5f40 to 0xcfbb6000)
5f40: 00000000 40202e7e 00000000 00000000 00000000 c00a1e9c 00014220 00000001
5f60: 00000000 00000000 00000005 c0094efc 00000000 00000011 00000000 00000000
5f80: 00000024 00000100 00000000 00000000 4020e040 4020dee0 00000005 c0030b28
5fa0: 00000000 c0030980 00000000 4020e040 40202e7e 00000000 00000000 00000050
5fc0: 00000000 4020e040 4020dee0 00000005 402123dc 00000000 be3ffea0 00000000
5fe0: 40202e87 be3ffd58 401d949c 401c3010 60000010 40202e7e 00000000 00000000
[] (kmem_cache_alloc+0x24/0x98) from [] (getname_flags+0x20/0xe8)
[] (getname_flags+0x20/0xe8) from [] (do_sys_open+0xa8/0x1ac)
[] (do_sys_open+0xa8/0x1ac) from [] (ret_fast_syscall+0x0/0x2c)
Code: e0016003 e10f5000 e3853080 e121f003 (e5901000)
---[ end trace d33e1f5c547d52cb ]---
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: Oops: 17 [#2]
last sysfs file:
Modules linked in:
CPU: 0 Tainted: G D (2.6.39 #216)
PC is at kmem_cache_free+0x18/0xcc
LR is at rcu_process_callbacks+0x6c/0x84
pc : [] lr : [] psr: 20000093
sp : cf833f68 ip : cf833f30 fp : 00000000
r10: 00000000 r9 : c0058dc4 r8 : cfbff540
r7 : 20000013 r6 : c042df1c r5 : cfbff540 r4 : cfbbf760
r3 : 20000093 r2 : 00000000 r1 : cfbff540 r0 : 00000000
Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel
Control: 0005317f Table: 2fb1c000 DAC: 00000017
Process rcu_kthread (pid: 6, stack limit = 0xcf832270)
Stack: (0xcf833f68 to 0xcf834000)
3f60: cfbbf760 cfbff540 c042df1c cf833f94 cf833fa0 c00705d0
3f80: cf815040 cf815040 cf832000 c00706a4 c031057c 00000000 cf815040 c0058dc4
3fa0: cf833fa0 cf833fa0 cf833fd4 cf81bf74 00000000 c00705e8 00000000 00000000
3fc0: 00000000 c0058a04 c003185c 00000000 00000000 00000000 cf833fd8 cf833fd8
3fe0: 00000000 cf81bf74 c0058984 c003185c 00000013 c003185c 00020018 18000004
[] (kmem_cache_free+0x18/0xcc) from [] (rcu_process_callbacks+0x6c/0x84)
[] (rcu_process_callbacks+0x6c/0x84) from [] (rcu_kthread+0xbc/0xe4)
[] (rcu_kthread+0xbc/0xe4) from [] (kthread+0x80/0x88)
[] (kthread+0x80/0x88) from [] (kernel_thread_exit+0x0/0x8)
Code: e1a08001 e10f7000 e3873080 e121f003 (e5904000)
---[ end trace d33e1f5c547d52cc ]---
Kernel panic - not syncing: Fatal exception in interrupt
[] (unwind_backtrace+0x0/0xec) from [] (panic+0x4c/0x180)
[] (panic+0x4c/0x180) from [] (die+0x180/0x1c4)
[] (die+0x180/0x1c4) from [] (__do_kernel_fault+0x64/0x84)
[] (__do_kernel_fault+0x64/0x84) from [] (do_page_fault+0x1b8/0x1d0)
[] (do_page_fault+0x1b8/0x1d0) from [] (do_DataAbort+0x34/0x94)
[] (do_DataAbort+0x34/0x94) from [] (__dabt_svc+0x4c/0x60)
Exception stack(0xcf833f20 to 0xcf833f68)
3f20: 00000000 cfbff540 00000000 20000093 cfbbf760 cfbff540 c042df1c 20000013
3f40: cfbff540 c0058dc4 00000000 00000000 cf833f30 cf833f68 c00705d0 c0093080
3f60: 20000093 ffffffff
[] (__dabt_svc+0x4c/0x60) from [] (kmem_cache_free+0x18/0xcc)
[] (kmem_cache_free+0x18/0xcc) from [] (rcu_process_callbacks+0x6c/0x84)
[] (rcu_process_callbacks+0x6c/0x84) from [] (rcu_kthread+0xbc/0xe4)
[] (rcu_kthread+0xbc/0xe4) from [] (kthread+0x80/0x88)
[] (kthread+0x80/0x88) from [] (kernel_thread_exit+0x0/0x8)
리눅스 버전은 2.6.39이고 arm보드에서 발생했습니다.
MEG라는 프로세스는 4개가 실행중인 상태입니다.
처음 웁스 메시지 따라 따라가보니 kmem_cache_all함수에서 캐시관련 구조체인 cachep의 주소가 0번지로 되있어, 이 0번지를 참조하는 과정에서 oops가 발생했고, 아래 웁스와 패닉은 그에 따라 연쇄적으로 발생한거 같습니다.
저 cachep는 linux부팅시 처음 한번 초기화한후 절대 변경될일이 없는데 왜 0번지로 바뀐지 모르겠네요.
웁스 메시지상 해당 프로세스가 무슨 sysfs나 모듈을 사용한것도 아닌거 같아보입니다.
주말동안 혼자서 계속 찾아보긴했는데 도저히 원인을 못찾겠습니다.
도움좀 부탁드립니다 _ _
깜빡하고 본문에 안썻는데
저 oops 실행하자 발생하는게 아닌 1일이상의 장기간 동작중에 갑자기 발생하는 현상입니다.
Git review 찾아보기
이럴 경우 제가 쓰는 방법 중 하나입니다.
커널 버그일것 같아요
저도 같은 커널에, 같은 문제가 있는데,
kernel.org에 갔더니,
요런 문제들이 있네요..
커널 3.15.1을 다운 받아서 slab.c 쪽을 비교해 보세요
http://oops.kernel.org/oops/kernel-paging-request-at-____cache_alloc-2/
답변들 감사합니다.
:)
댓글 달기