Files
hongmianquan fc1a2ec7da monitor: Fix deadlock in monitor_cleanup
During qemu_cleanup, if a non-coroutine QMP command (e.g.,
query-commands) is concurrently received and processed by the
mon_iothread, it can lead to a deadlock in monitor_cleanup.

The root cause is a race condition between the main thread's shutdown
sequence and the coroutine's dispatching mechanism. When handling a
non-coroutine QMP command, qmp_dispatcher_co schedules the actual
command execution as a bottom half in iohandler_ctx and then yields. At
this suspended point, qmp_dispatcher_co_busy remains true.

Subsequently, the main thread in monitor_cleanup(), sets
qmp_dispatcher_co_shutdown, and calls qmp_dispatcher_co_wake(). Since
qmp_dispatcher_co_busy is already true, the aio_co_wake is skipped. The
main thread then enters the AIO_WAIT_WHILE_UNLOCKED loop, it executes
the scheduled BH (do_qmp_dispatch_bh) via aio_poll(iohandler_ctx,
false), which attempts to wake up the coroutine, aio_co_wake schedules a
new wake-up BH in iohandler_ctx. The main thread then blocks
indefinitely in aio_poll(qemu_aio_context, true), while the coroutine's
wake-up BH is starved in iohandler_ctx, qmp_dispatcher_co never reaches
termination, resulting in a deadlock.

The execution sequence is illustrated below:

 IO Thread                 Main Thread (qemu_aio_context)        qmp_dispatcher_co (iohandler_ctx)
    |                                 |                                        |
    |-- query-commands                |                                        |
    |-- qmp_dispatcher_co_wake()      |                                        |
    |    (sets busy = true)           |                                        |
    |                                 |   <-- Wakes up in iohandler_ctx -->    |
    |                                 |                                        |-- qmp_dispatch()
    |                                 |                                        |-- Schedules BH (do_qmp_dispatch_bh)
    |                                 |                                        |-- qemu_coroutine_yield()
    |                                 |                                            [State: Suspended, busy=true]
    |   [ quit triggered ]            |
    |                                 |-- monitor_cleanup()
    |                                 |-- qmp_dispatcher_co_shutdown = true
    |                                 |-- qmp_dispatcher_co_wake()
    |                                 |    -> Checks busy flag. It's TRUE!
    |                                 |    -> Skips aio_co_wake().
    |                                 |
    |                                 |-- AIO_WAIT_WHILE_UNLOCKED:
    |                                 |   |-- aio_poll(iohandler_ctx, false)
    |                                 |   |    -> Executes do_qmp_dispatch_bh
    |                                 |   |    -> Schedules 'co_schedule_bh' in iohandler_ctx
    |                                 |   |
    |                                 |   |-- aio_poll(qemu_aio_context, true)
    |                                 |   |    -> Blocks indefinitely! (Deadlock)
    |                                 |
    |                                 X (Main thread sleeping)                 X (Waiting for next iohandler_ctx poll)

To fix this, we add an explicit aio_wait_kick() in do_qmp_dispatch_bh()
to break the main loop out of its blocking poll, allowing it to evaluate
the loop condition and poll iohandler_ctx.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: hongmianquan <hongmianquan@bytedance.com>
Signed-off-by: wubo.bob <wubo.bob@bytedance.com>
Message-ID: <20260327131024.51947-1-hongmianquan@bytedance.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2026-03-31 15:35:19 +02:00
..
2026-02-13 10:00:02 +01:00
2025-07-29 14:51:20 +02:00