99 Star 56 Fork 233

src-openEuler / kernel

 / 详情

【20.03-LTS-SP1~SP4】4.19 kernel加载并卸载vkms模块即可导致系统崩溃重启

已完成
缺陷
创建于  
2024-03-12 11:36

【标题描述】能够简要描述问题:说明什么场景下,做了什么操作,出现什么问题(尽量使用正向表达方式)

一、缺陷信息
【20.03-LTS-SP1~SP4】4.19 kernel加载并卸载vkms模块即可导致系统崩溃重启
涉及问题的内核版本:测试了kernel-4.19.90-2402.4.0.0238.oe1.x86_64到kernel-4.19.90-2403.1.0.0241.oe1.x86_64均存在该问题
内核信息:
kernel-4.19.90-2402.4.0.0238.oe1.x86_64
缺陷归属组件:
kernel
缺陷归属的版本:
kernel-4.19.90-2402.4.0.0238.oe1.x86_64到kernel-4.19.90-2403.1.0.0241.oe1.x86_64
缺陷简述:
安装系统后执行命令:

modprobe vkms && modprobe -r vkms

系统就会崩溃重启

【环境信息】
硬件信息
任意启动虚拟机,安装20.03-LTS-SP1

软件信息
发现问题是安装2月以后版本:20.03-LTS-SP1
kernel版本:kernel-4.19.90-2402.4.0.0238.oe1.x86_64到kernel-4.19.90-2403.1.0.0241.oe1.x86_64

网络信息
-无

【问题复现步骤】,请描述具体的操作步骤
安装系统后执行命令:

modprobe vkms && modprobe -r vkms

【实际结果】
系统崩溃并重启

【其他相关附件信息】
比如系统message日志/组件日志、dump信息、图片等

缺陷详情参考链接:

缺陷分析指导链接:
初步分析是:CVE-2023-51043的修改引入的问题,在5.10内核(22.03 LTS SP1)上也修复了该CVE,但是没有发现该问题。原始社区补丁是修复的v6.5~v6.8上的问题,所以4.19可能有前置补丁未打上导致该问题。

评论 (8)

sigui 创建了缺陷

Hi si-gui, welcome to the openEuler Community.
I'm the Bot here serving you. You can find the instructions on how to interact with me at Here.
If you have any questions, please contact the SIG: Kernel, and any of the maintainers.

openeuler-ci-bot 添加了
 
sig/Kernel
标签
sigui 修改了描述

问题分析总结
1、为什么会crash?
先看crash调用栈,
vkms_exit
drm_dev_put
drm_dev_release
dev->driver->release(dev) ===== vkms_release
drm_atomic_helper_shutdown
__drm_atomic_helper_disable_all
alloc
init-----》
get
drm_atomic_state_put
__drm_atomic_state_free
drm_dev_put
drm_dev_release
dev->driver->release(dev) ===== vkms_release
platform_device_unregister
platform_device_del
device_del
dpm_sysfs_remove
sysfs_unmerge_group
kernfs_find_and_get_ns
kernfs_find_ns --->dev->kobj->sd为空指针了
直接原因是dev->kobj->sd指针是个空指针,kernfs_find_ns中访问sd->flags,系统crash。
下面分析sd空指针的来源:
1)vkms模块的退出函数vkms_exit中,调用drm_dev_put,这个函数很关键,代码逻辑如下:
void drm_dev_put(struct drm_device *dev)
{
if (dev)
kref_put(&dev->ref, drm_dev_release);
}
kref_put函数实现很简单,先将dev->ref减1,如果减后的结果是0,则调用drm_dev_release。
由于vkms模块初始化是,调用了drm_dev_init,将dev->ref设置为1了,因此drm_dev_put调用后,就会调用drm_dev_release。
drm_dev_release随即调用vkms驱动的vkms_release。

2)vkms_release函数首先调用platform_device_unregister -》platform_device_del -》device_del-》kobject_del-》sysfs_put,该函数释放了kobj->sd.(非常关键)

3)在vkms_release后来的调用关系中,__drm_atomic_helper_disable_all函数很关键:
它首先调用drm_atomic_state_alloc -》 drm_atomic_state_init,在drm_atomic_state_init函数中会对dev->ref参数加1,此时dev->ref变成了1,drm_atomic_state_alloc返回到__drm_atomic_helper_disable_all。
__drm_atomic_helper_disable_all函数继续执行drm_atomic_state_put,这个函数最终又会调用到drm_dev_put,此时dev->ref是1,再次调用vkms_release。
4)当第二次vkms_release释放时,会有最开始crash的调用栈,最终调用到kernfs_find_ns访问dev->kobj->sd空指针。

2、为什么原来没问题,合入了CVE-2023-51043补丁后就有问题?
1)原来的流程分析
原流程中,调用到__drm_atomic_helper_disable_all-》drm_atomic_state_alloc -》 drm_atomic_state_init时,不会再对dev->ref参数加1,因此不会二次调用到vkms_release,也即不会再次访问已经释放的dev->kobj->sd,所以没问题。
2)合入CVE-2023-51043补丁后,drm_atomic_state_init会对dev->ref参数加1,会二次调用到vkms_release,最终调用到kernfs_find_ns访问dev->kobj->sd空指针

非常感谢报告该问题。
能否发一下原始的调用栈日志吗?
我在qemu上insmod vkms.ko再rmmod vkms.ko,出现了如下的栈,但是还没崩溃。想确认下是否是同一个栈。

root@buildroot:~# insmod vkms.ko
[   28.132658] vkms: module verification failed: signature and/or required key missing - tainting kernel
[   28.177024] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[   28.177508] [drm] Driver supports precise vblank timestamp query.
[   28.198651] [drm] Initialized vkms 1.0.0 20180514 for virtual device on minor 0
root@buildroot:~# lsmod
Module                  Size  Used by    Tainted: G
vkms                   28672  0
root@buildroot:~# rmmod vkms.ko
[   34.573093] ------------[ cut here ]------------
[   34.573747] refcount_t: increment on 0; use-after-free.
[   34.576323] WARNING: CPU: 0 PID: 154 at ../lib/refcount.c:156 refcount_inc_checked+0x5c/0x80
[   34.577169] Modules linked in: vkms(E-)
[   34.578614] CPU: 0 PID: 154 Comm: rmmod Tainted: G            E     4.19.90+ #70
[   34.578981] Hardware name: linux,dummy-virt (DT)
[   34.579475] pstate: 60000005 (nZCv daif -PAN -UAO)
[   34.580050] pc : refcount_inc_checked+0x5c/0x80
[   34.580294] lr : refcount_inc_checked+0x5c/0x80
[   34.580543] sp : ffff8000c818fb20
[   34.580749] x29: ffff8000c818fb20 x28: ffff8000c6f2e000
[   34.581098] x27: 0000000000000000 x26: 0000000000000000
[   34.581379] x25: ffff2000028f5418 x24: ffff20000db37000
[   34.581664] x23: 1fffe4000051ea18 x22: ffff20000ca2c000
[   34.581945] x21: ffff8000c7d75d00 x20: ffff20000d8c7434
[   34.582224] x19: 0000000000000000 x18: 0000000000000000
[   34.582523] x17: 0000000000000000 x16: 0000000000000000
[   34.582818] x15: 0000000000000000 x14: ffff20000974507c
[   34.583114] x13: 000000000008c133 x12: ffff200009710a04
[   34.583405] x11: 1fffe40001b4608f x10: ffff040001b4608f
[   34.583944] x9 : dfff200000000000 x8 : 657466612d657375
[   34.584262] x7 : 203b30206e6f2074 x6 : 0000000000000030
[   34.584544] x5 : 1ffff0018801b5c6 x4 : 0000000000000000
[   34.584831] x3 : 0000000000000000 x2 : ffffffffffffffff
[   34.585082] x1 : 61bf317b17390d00 x0 : 0000000000000000
[   34.585592] Call trace:
[   34.585802]  refcount_inc_checked+0x5c/0x80
[   34.586033]  drm_dev_get+0x24/0x30
[   34.586217]  drm_atomic_state_init+0x150/0x240
[   34.586434]  drm_atomic_state_alloc+0xb8/0xe8
[   34.586647]  __drm_atomic_helper_disable_all.isra.4+0x28/0x438
[   34.586932]  drm_atomic_helper_shutdown+0xac/0x118
[   34.588049]  vkms_release+0x40/0x68 [vkms]
[   34.588272]  drm_dev_put.part.0+0x7c/0xb0
[   34.588489]  drm_dev_put+0x24/0x30
[   34.588669]  vkms_exit+0x38/0x50 [vkms]
[   34.588874]  __arm64_sys_delete_module+0x334/0x538
[   34.589126]  el0_svc_common+0x10c/0x488
[   34.589318]  el0_svc_handler+0x170/0x240
[   34.589521]  el0_svc+0x10/0x640
[   34.589787] ---[ end trace 546d5b7622744ad6 ]---
[   34.604467] ------------[ cut here ]------------
[   34.604767] refcount_t: underflow; use-after-free.
[   34.605223] WARNING: CPU: 0 PID: 154 at ../lib/refcount.c:190 refcount_sub_and_test_checked+0xe8/0x108
[   34.605659] Modules linked in: vkms(E-)
[   34.605872] CPU: 0 PID: 154 Comm: rmmod Tainted: G        W   E     4.19.90+ #70
[   34.606198] Hardware name: linux,dummy-virt (DT)
[   34.606454] pstate: 60000005 (nZCv daif -PAN -UAO)
[   34.606707] pc : refcount_sub_and_test_checked+0xe8/0x108
[   34.606968] lr : refcount_sub_and_test_checked+0xe8/0x108
[   34.607238] sp : ffff8000c818fb10
[   34.607411] x29: ffff8000c818fb10 x28: ffff8000c6f2e000
[   34.607905] x27: 0000000000000000 x26: 0000000000000000
[   34.608252] x25: ffff2000028f5418 x24: ffff20000db37000
[   34.608505] x23: 1ffff000196d7121 x22: ffff800bc5480000
[   34.608747] x21: ffff2000028f20c0 x20: ffff20000d8c7434
[   34.608983] x19: 0000000000000000 x18: 0000000000000000
[   34.609211] x17: 0000000000000000 x16: 0000000000000000
[   34.609439] x15: 0000000000000000 x14: ffff20000974507c
[   34.609667] x13: 0000000000093a5f x12: ffff200009710a04
[   34.609922] x11: 1fffe40001b4608e x10: ffff040001b4608e
[   34.610210] x9 : dfff200000000000 x8 : 72657466612d6573
[   34.610508] x7 : 75203b776f6c6672 x6 : 0000000000000030
[   34.610823] x5 : 1ffff0018801b5c6 x4 : 0000000000000000
[   34.611108] x3 : 0000000000000000 x2 : ffffffffffffffff
[   34.611359] x1 : 61bf317b17390d00 x0 : 0000000000000000
[   34.611721] Call trace:
[   34.611981]  refcount_sub_and_test_checked+0xe8/0x108
[   34.612455]  refcount_dec_and_test_checked+0x14/0x20
[   34.612726]  drm_dev_put.part.0+0x20/0xb0
[   34.612943]  drm_dev_put+0x24/0x30
[   34.613101]  __drm_atomic_state_free+0xa4/0xe0
[   34.613346]  __drm_atomic_helper_disable_all.isra.4+0x340/0x438
[   34.613647]  drm_atomic_helper_shutdown+0xac/0x118
[   34.613929]  vkms_release+0x40/0x68 [vkms]
[   34.614168]  drm_dev_put.part.0+0x7c/0xb0
[   34.614410]  drm_dev_put+0x24/0x30
[   34.614586]  vkms_exit+0x38/0x50 [vkms]
[   34.614812]  __arm64_sys_delete_module+0x334/0x538
[   34.615083]  el0_svc_common+0x10c/0x488
[   34.615269]  el0_svc_handler+0x170/0x240
[   34.615481]  el0_svc+0x10/0x640
[   34.615831] ---[ end trace 546d5b7622744ad7 ]---

drm_atomic_helper_shutdown函数需要初始化一个新的drm_atomic_state. CVE-2023-51043补丁在每个drm_atomic_state的初始化无条件加上了dev引用计数加一。因此drm_atomic_helper_shutdown函数会去修改dev引用计数。

vkms驱动在release回调中调用了drm_atomic_helper_shutdown函数,而release回调是在dev引用计数到0后调用的。在这里就会导致上述调用栈中,refcount_t: increment on 0;的问题。

在驱动卸载时,正确调用顺序应该是drm_atomic_helper_shutdown > drm_dev_put?
这样可以保证drm_atomic_helper_shutdown时,dev仍持有引用计数。
这样的话最好就不要把drm_atomic_helper_shutdown 放在release回调里面。

同理其他也有可能有问题的driver:
drivers/gpu/drm/xen/xen_drm_front.c

复现环境:

OS: openEuler 20.03 LTS SP4
kernel: 升级到 kernel-4.19.90-2402.4.0.0264.oe2003sp4.x86_64

复现命令:

modprobe vkms; modprobe -r vkms

stack trace

crash> bt
PID: 5317   TASK: ffff8ecb85e82f00  CPU: 25  COMMAND: "modprobe"
 #0 [ffff9aeb049d3a58] machine_kexec at ffffffff9ca5466f
 #1 [ffff9aeb049d3ab0] __crash_kexec at ffffffff9cb57791
 #2 [ffff9aeb049d3b70] crash_kexec at ffffffff9cb5868d
 #3 [ffff9aeb049d3b88] oops_end at ffffffff9ca231ef
 #4 [ffff9aeb049d3ba8] no_context at ffffffff9ca63ec5
 #5 [ffff9aeb049d3c00] __do_page_fault at ffffffff9ca64688
 #6 [ffff9aeb049d3c70] do_page_fault at ffffffff9ca64ac1
 #7 [ffff9aeb049d3ca0] async_page_fault at ffffffff9d4011fe
    [exception RIP: kernfs_find_ns+17]
    RIP: ffffffff9cd6de31  RSP: ffff9aeb049d3d58  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffffffff9d8c5168  RDI: 0000000000000000
    RBP: ffffffff9d8c5168   R8: 0000000000000000   R9: ffffffffc055c6f6
    R10: ffffcd7884103600  R11: 0000000000000001  R12: 0000000000000000
    R13: ffffffff9ddae9c0  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff9aeb049d3d78] kernfs_find_and_get_ns at ffffffff9cd6defc
 #9 [ffff9aeb049d3d98] sysfs_unmerge_group at ffffffff9cd717e8
#10 [ffff9aeb049d3db0] dpm_sysfs_remove at ffffffff9cfe0e9d
#11 [ffff9aeb049d3dc8] device_del at ffffffff9cfd37e7
#12 [ffff9aeb049d3e18] platform_device_del at ffffffff9cfda81e
#13 [ffff9aeb049d3e30] platform_device_unregister at ffffffff9cfda8b3
#14 [ffff9aeb049d3e40] vkms_release at ffffffffc042a015 [vkms]
#15 [ffff9aeb049d3e50] __drm_atomic_helper_disable_all.constprop.29 at ffffffffc063b7d0 [drm_kms_helper]
#16 [ffff9aeb049d3e78] drm_atomic_helper_shutdown at ffffffffc063b840 [drm_kms_helper]
#17 [ffff9aeb049d3ec8] vkms_release at ffffffffc042a01d [vkms]
#18 [ffff9aeb049d3ed8] cleanup_module at ffffffffc042a890 [vkms]
#19 [ffff9aeb049d3ee0] __x64_sys_delete_module at ffffffff9cb51b49
#20 [ffff9aeb049d3f38] do_syscall_64 at ffffffff9ca0430f
#21 [ffff9aeb049d3f50] entry_SYSCALL_64_after_hwframe at ffffffff9d4000a0
    RIP: 00007f8c532469b7  RSP: 00007ffe8f2f5898  RFLAGS: 00000206
    RAX: ffffffffffffffda  RBX: 0000563cd32d6d20  RCX: 00007f8c532469b7
    RDX: 0000000000000000  RSI: 0000000000000800  RDI: 0000563cd32d6d88
    RBP: 0000563cd32d6d20   R8: 00007ffe8f2f4841   R9: 0000000000000000
    R10: 00007f8c532b8aa0  R11: 0000000000000206  R12: 0000563cd32d6d88
    R13: 0000000000000001  R14: 0000563cd32d6d88  R15: 0000563cd32d6d20
    ORIG_RAX: 00000000000000b0  CS: 0033  SS: 002b
crash>

@郭梦琪
将卸载模块的操作从 vkms_release 挪到 vkms_exit,可以修复该问题。
已经提交 PR,帮忙 review。

https://gitee.com/openeuler/kernel/pulls/5935

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(5)
5329419 openeuler ci bot 1632792936 7400170 chenyanpanhw 1604646878
1
https://gitee.com/src-openeuler/kernel.git
git@gitee.com:src-openeuler/kernel.git
src-openeuler
kernel
kernel

搜索帮助

53164aa7 5694891 3bd8fe86 5694891