From fc1a2ec7da531223b3473185dc2584f8a7c6c659 Mon Sep 17 00:00:00 2001 From: hongmianquan Date: Fri, 27 Mar 2026 21:10:24 +0800 Subject: [PATCH] monitor: Fix deadlock in monitor_cleanup During qemu_cleanup, if a non-coroutine QMP command (e.g., query-commands) is concurrently received and processed by the mon_iothread, it can lead to a deadlock in monitor_cleanup. The root cause is a race condition between the main thread's shutdown sequence and the coroutine's dispatching mechanism. When handling a non-coroutine QMP command, qmp_dispatcher_co schedules the actual command execution as a bottom half in iohandler_ctx and then yields. At this suspended point, qmp_dispatcher_co_busy remains true. Subsequently, the main thread in monitor_cleanup(), sets qmp_dispatcher_co_shutdown, and calls qmp_dispatcher_co_wake(). Since qmp_dispatcher_co_busy is already true, the aio_co_wake is skipped. The main thread then enters the AIO_WAIT_WHILE_UNLOCKED loop, it executes the scheduled BH (do_qmp_dispatch_bh) via aio_poll(iohandler_ctx, false), which attempts to wake up the coroutine, aio_co_wake schedules a new wake-up BH in iohandler_ctx. The main thread then blocks indefinitely in aio_poll(qemu_aio_context, true), while the coroutine's wake-up BH is starved in iohandler_ctx, qmp_dispatcher_co never reaches termination, resulting in a deadlock. The execution sequence is illustrated below: IO Thread Main Thread (qemu_aio_context) qmp_dispatcher_co (iohandler_ctx) | | | |-- query-commands | | |-- qmp_dispatcher_co_wake() | | | (sets busy = true) | | | | <-- Wakes up in iohandler_ctx --> | | | |-- qmp_dispatch() | | |-- Schedules BH (do_qmp_dispatch_bh) | | |-- qemu_coroutine_yield() | | [State: Suspended, busy=true] | [ quit triggered ] | | |-- monitor_cleanup() | |-- qmp_dispatcher_co_shutdown = true | |-- qmp_dispatcher_co_wake() | | -> Checks busy flag. It's TRUE! | | -> Skips aio_co_wake(). | | | |-- AIO_WAIT_WHILE_UNLOCKED: | | |-- aio_poll(iohandler_ctx, false) | | | -> Executes do_qmp_dispatch_bh | | | -> Schedules 'co_schedule_bh' in iohandler_ctx | | | | | |-- aio_poll(qemu_aio_context, true) | | | -> Blocks indefinitely! (Deadlock) | | | X (Main thread sleeping) X (Waiting for next iohandler_ctx poll) To fix this, we add an explicit aio_wait_kick() in do_qmp_dispatch_bh() to break the main loop out of its blocking poll, allowing it to evaluate the loop condition and poll iohandler_ctx. Suggested-by: Kevin Wolf Signed-off-by: hongmianquan Signed-off-by: wubo.bob Message-ID: <20260327131024.51947-1-hongmianquan@bytedance.com> Acked-by: Markus Armbruster Reviewed-by: Kevin Wolf Signed-off-by: Kevin Wolf --- qapi/qmp-dispatch.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c index 9bb1e6a9f4..e3897d5197 100644 --- a/qapi/qmp-dispatch.c +++ b/qapi/qmp-dispatch.c @@ -128,6 +128,16 @@ static void do_qmp_dispatch_bh(void *opaque) data->cmd->fn(data->args, data->ret, data->errp); monitor_set_cur(qemu_coroutine_self(), NULL); aio_co_wake(data->co); + + /* + * If the QMP dispatcher coroutine is waiting to be scheduled + * in iohandler_ctx, we must kick the main loop. This ensures + * that AIO_WAIT_WHILE_UNLOCKED() in monitor_cleanup() doesn't + * block indefinitely waiting for an event in qemu_aio_context, + * but actually gets the chance to poll iohandler_ctx and resume + * the coroutine. + */ + aio_wait_kick(); } /*