Commit Graph

120656 Commits

Author SHA1 Message Date
Michael Tokarev
561f025ae2 Update version for 10.0.7 release
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
v10.0.7
2025-12-05 10:22:05 +03:00
Markus Armbruster
ed6a627f85 kvm: Fix kvm_vm_ioctl() and kvm_device_ioctl() return value
These functions wrap ioctl().  When ioctl() fails, it sets @errno.
The wrappers then return that @errno negated.

Except they call accel_ioctl_end() between calling ioctl() and reading
@errno.  accel_ioctl_end() can clobber @errno, e.g. when a futex()
system call fails.  Seems unlikely, but it's a bug all the same.

Fix by retrieving @errno before calling accel_ioctl_end().

Fixes: a27dd2de68 (KVM: keep track of running ioctls)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20251128152050.3417834-1-armbru@redhat.com>
(cherry picked from commit 88be119fb1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-12-03 09:16:39 +03:00
Peter Maydell
95b0af630f docs/devel: Update URL for make-pullreq script
In the submitting-a-pull-request docs, we have a link to the
make-pullreq script which might be useful for maintainers.  The
canonical git repo for this script has moved; update the link.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20251125164511.255550-1-peter.maydell@linaro.org
(cherry picked from commit ebb625262c)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-12-02 16:12:24 +03:00
Harald van Dijk
06793a36b7 target/arm: Fix assert on BRA.
trans_BRA does

    gen_a64_set_pc(s, dst);
    set_btype_for_br(s, a->rn);

gen_a64_set_pc does

    s->pc_save = -1;

set_btype_for_br (if aa64_bti is enabled and the register is not x16 or
x17) does

    gen_pc_plus_diff(s, pc, 0);

gen_pc_plus_diff does

    assert(s->pc_save != -1);

Hence, this assert is getting hit. We need to call set_btype_for_br
before gen_a64_set_pc, and there is nothing in set_btype_for_br that
depends on gen_a64_set_pc having already been called, so this commit
simply swaps the calls.

(The commit message for 64678fc45d says that set_brtype_for_br()
must be "moved after" get_a64_set_pc(), but this is a mistake in
the commit message -- the actual changes in that commit move
set_brtype_for_br() *before* get_a64_set_pc() and this is necessary
to avoid the assert.)

Cc: qemu-stable@nongnu.org
Fixes: 64678fc45d ("target/arm: Fix BTI versus CF_PCREL")
Signed-off-by: Harald van Dijk <hdijk@accesssoftek.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: d2265ebb-84bc-41b7-a2d7-05dc9a5a2055@accesssoftek.com
[PMM: added note about 64678fc45d to commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 7248dab3c9)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-12-02 16:11:52 +03:00
Cédric Le Goater
88c28f9f9c hw/aspeed/{xdma, rtc, sdhci}: Fix endianness to DEVICE_LITTLE_ENDIAN
When the XDMA, RTC and SDHCI device models of the Aspeed SoCs were
first introduced, their MMIO regions inherited of a DEVICE_NATIVE_ENDIAN
endianness. It should be DEVICE_LITTLE_ENDIAN. Fix that.

Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20251125142631.676689-1-clg@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 57756aa01f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-26 17:06:16 +03:00
Peter Xu
475563823c hw/core/machine: Provide a description for aux-ram-share property
It was forgotten when being introduced in commit 91792807d1 ("machine:
aux-ram-share option").

Cc: qemu-stable@nongnu.org
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Message-ID: <20251124191408.783473-1-peterx@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 98ee8aa92e)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-26 17:06:16 +03:00
Peter Maydell
58c32325f2 hw/pci: Make msix_init take a uint32_t for nentries
msix_init() and msix_init_exclusive_bar() take an "unsigned short"
argument for the number of MSI-X vectors to try to use.  This is big
enough for the maximum permitted number of vectors, which is 2048.
Unfortunately, we have several devices (most notably virtio) which
allow the user to specify the desired number of vectors, and which
use uint32_t properties for this.  If the user sets the property to a
value that is too big for a uint16_t, the value will be truncated
when it is passed to msix_init(), and msix_init() may then return
success if the truncated value is a valid one.

The resulting mismatch between the number of vectors the msix code
thinks the device has and the number of vectors the device itself
thinks it has can cause assertions, such as the one in issue 2631,
where "-device virtio-mouse-pci,vectors=19923041" is interpreted by
msix as "97 vectors" and by the virtio-pci layer as "19923041
vectors"; a guest attempt to access vector 97 thus passes the
virtio-pci bounds checking and hits an essertion in
msix_vector_use().

Avoid this by making msix_init() and its wrapper function
msix_init_exclusive_bar() take the number of vectors as a uint32_t.
The erroneous command line will now produce the warning

 qemu-system-i386: -device virtio-mouse-pci,vectors=19923041:
   warning: unable to init msix vectors to 19923041

and proceed without crashing.  (The virtio device warns and falls
back to not using MSIX, rather than complaining that the option is
not a valid value this is the same as the existing behaviour for
values that are beyond the MSI-X maximum possible value but fit into
a 16-bit integer, like 2049.)

To ensure this doesn't result in potential overflows in calculation
of the BAR size in msix_init_exclusive_bar(), we duplicate the
nentries error-check from msix_init() at the top of
msix_init_exclusive_bar(), so we know nentries is sane before we
start using it.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2631
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20251107131044.1321637-1-peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit ef44cc0a76)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-26 17:06:16 +03:00
Fiona Ebner
eb3e5de450 block/io_uring: avoid potentially getting stuck after resubmit at the end of ioq_submit()
Note that this issue seems already fixed as a consequence of the large
io_uring rework with 047dabef97 ("block/io_uring: use aio_add_sqe()")
in current master, so this is purely for QEMU stable branches.

At the end of ioq_submit(), there is an opportunistic call to
luring_process_completions(). This is the single caller of
luring_process_completions() that doesn't use the
luring_process_completions_and_submit() wrapper.

Other callers use the wrapper, because luring_process_completions()
might require a subsequent call to ioq_submit() after resubmitting a
request. As noted for luring_resubmit():

> Resubmit a request by appending it to submit_queue.  The caller must ensure
> that ioq_submit() is called later so that submit_queue requests are started.

So the caller at the end of ioq_submit() violates the contract and can
in fact be problematic if no other requests come in later. In such a
case, the request intended to be resubmitted will never be actually be
submitted via io_uring_submit().

A reproducer exposing this issue is [0], which is based on user
reports from [1]. Another reproducer is iotest 109 with '-i io_uring'.

I had the most success to trigger the issue with [0] when using a
BTRFS RAID 1 storage. With tmpfs, it can take quite a few iterations,
but also triggers eventually on my machine. With iotest 109 with '-i
io_uring' the issue triggers reliably on my ext4 file system.

Have ioq_submit() submit any resubmitted requests after calling
luring_process_completions(). The return value from io_uring_submit()
is checked to be non-negative before the opportunistic processing of
completions and going for the new resubmit logic, to ensure that a
failure of io_uring_submit() is not missed. Also note that the return
value already was not necessarily the total number of submissions,
since the loop might've been iterated more than once even before the
current change.

Only trigger the resubmission logic if it is actually necessary to
avoid changing behavior more than necessary. For example iotest 109
would produce more 'mirror ready' events if always resubmitting after
luring_process_completions() at the end of ioq_submit().

Note iotest 109 still does not pass as is when run with '-i io_uring',
because of two offset values for BLOCK_JOB_COMPLETED events being zero
instead of non-zero as in the expected output. Note that the two
affected test cases are expected failures and still fail, so they just
fail "faster". The test cases are actually not triggering the resubmit
logic, so the reason seems to be different ordering of requests and
completions of the current aio=io_uring implementation versus
aio=threads.

[0]:

> #!/bin/bash -e
> #file=/mnt/btrfs/disk.raw
> file=/tmp/disk.raw
> filesize=256
> readsize=512
> rm -f $file
> truncate -s $filesize $file
> ./qemu-system-x86_64 --trace '*uring*' --qmp stdio \
> --blockdev raw,node-name=node0,file.driver=file,file.cache.direct=off,file.filename=$file,file.aio=io_uring \
> <<EOF
> {"execute": "qmp_capabilities"}
> {"execute": "human-monitor-command", "arguments": { "command-line": "qemu-io node0 \"read 0 $readsize \"" }}
> {"execute": "quit"}
> EOF

[1]: https://forum.proxmox.com/threads/170045/

Cc: qemu-stable@nongnu.org
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-26 17:05:59 +03:00
Kevin Wolf
9d496598da block-backend: Fix race when resuming queued requests
When new requests arrive at a BlockBackend that is currently drained,
these requests are queued until the drain section ends.

There is a race window between blk_root_drained_end() waking up a queued
request in an iothread from the main thread and blk_wait_while_drained()
actually being woken up in the iothread and calling blk_inc_in_flight().
If the BlockBackend is drained again during this window, drain won't
wait for this request and it will sneak in when the BlockBackend is
already supposed to be quiesced. This causes assertion failures in
bdrv_drain_all_begin() and can have other unintended consequences.

Fix this by increasing the in_flight counter immediately when scheduling
the request to be resumed so that the next drain will wait for it to
complete.

Cc: qemu-stable@nongnu.org
Reported-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20251119172720.135424-1-kwolf@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Tested-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 8eeaa706ba)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-25 23:43:16 +03:00
AlanoSong@163.com
b3c10df2e2 ui/vnc: Fix qemu abort when query vnc info
When there is no display device on qemu machine,
and user only access qemu by remote vnc.
At the same time user input `info vnc` by QMP,
the qemu will abort.

To avoid the abort above, I add display device check,
when query vnc info in qmp_query_vnc_servers().

Reviewed-by: Marc-AndréLureau <marcandre.lureau@redhat.com>
Signed-off-by: Alano Song <AlanoSong@163.com>
[ Marc-André - removed useless Error *err ]
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <20251125131955.7024-1-AlanoSong@163.com>
(cherry picked from commit 4c1646e23f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-25 23:43:16 +03:00
Philippe Mathieu-Daudé
1978079d2a chardev/char-pty: Do not ignore chr_write() failures
Cc: qemu-stable@nongnu.org
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <20251022150743.78183-6-philmd@linaro.org>
(cherry picked from commit 303f604935)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-25 23:43:16 +03:00
Peter Maydell
f4fa2d06dd hw/display/exynos4210_fimd: Account for zero length in fimd_update_memory_section()
In fimd_update_memory_section() we attempt ot find and map part of
the RAM MR which backs the framebuffer, based on guest-configurable
size and start address.

If the guest configures framebuffer settings which result in a
zero-sized framebuffer, we hit an assertion(), because
memory_region_find() will return a NULL mem_section.mr.

Explicitly check for the zero-size case and treat this as a
guest error.

Because we now have a code path which can reach error_return without
calling memory_region_find to set w->mem_section, we must NULL out
w->mem_section.mr after the unref of the old MR, so that error_return
does not incorrectly double-unref the old MR.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1407
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20251107143913.1341358-1-peter.maydell@linaro.org
(cherry picked from commit 579be921f5)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-25 23:43:16 +03:00
Peter Maydell
35819127d1 hw/arm/armv7m: Disable reentrancy guard for v7m_sysreg_ns_ops MRs
For M-profile cores which support TrustZone, there are some memory
areas which are "NS aliases" -- a Secure access to these addresses
really performs an NS access to a different part of the device.  We
implement these using MemoryRegionOps read and write functions which
pass the access on with adjusted attributes using
memory_region_dispatch_read() and memory_region_dispatch_write().

Since the MR we are dispatching to is owned by the same device that
owns the NS-alias MR (the TYPE_ARMV7M container object), this trips
the reentrancy-guard that is applied by access_with_adjusted_size().

Mark the NS alias MemoryRegions as disable_reentrancy_guard; this is
safe because v7m_sysreg_ns_read() and v7m_sysreg_ns_write() do not
touch any of the device's state.  (Any further reentrancy attempts by
the underlying MR will still be caught.)

Without this fix, an attempt to read from an address like 0xe002e010,
which is a register in the NS systick alias, will fail and provoke

 qemu-system-arm: warning: Blocked re-entrant IO on MemoryRegion: v7m_systick at addr: 0x0

We didn't notice this earlier because almost all code accesses
the registers and systick via the non-alias addresses; the NS
aliases are only need for the rarer case of Secure code that needs
to manage the NS timer or system state on behalf of NS code.

Note that although the v7m_systick_ops read and write functions
also call memory_region_dispatch_{read,write}, this MR does not
need to have the reentrancy-guard disabled because the underlying
MR that it forwards to is owned by a different device (the
TYPE_SYSTICK timer device).

Reported via a stackoverflow question:
https://stackoverflow.com/questions/79808107/what-this-error-is-even-about-qemu-system-arm-warning-blocked-re-entrant-io

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20251114155304.2662414-1-peter.maydell@linaro.org
(cherry picked from commit 4a934d284d)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-25 23:43:16 +03:00
Jamin Lin
f78cda72cb hw/arm/aspeed: Fix missing SPI IRQ connection causing DMA interrupt failure
It did not connect SPI IRQ to the Interrupt Controller, so even the SPI
model raised the IRQ, the interrupt was not received. The CPU therefore
did not trigger an interrupt via the controller, and the firmware never
received the interrupt.

Fixes: 356b230ed1 ("aspeed/soc: Add AST1030 support")
Fixes: f25c0ae107 ("aspeed/soc: Add AST2600 support")
Fixes: 5dd883ab06 ("aspeed/soc: Add AST2700 support")
Signed-off-by: Jamin Lin <jamin_lin@aspeedtech.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20251106084925.1253704-2-jamin_lin@aspeedtech.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
(cherry picked from commit 510d5c61ad)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-25 23:43:16 +03:00
Li Zhijian
c41f52f5d7 migration: Fix transition to COLO state from precopy
Commit 4881411136 ("migration: Always set DEVICE state") set a new DEVICE
state before completed during migration, which broke the original transition
to COLO. The migration flow for precopy has changed to:
active -> pre-switchover -> device -> completed.

This patch updates the transition state to ensure that the Pre-COLO
state corresponds to DEVICE state correctly.

Cc: qemu-stable <qemu-stable@nongnu.org>
Fixes: 4881411136 ("migration: Always set DEVICE state")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Zhang Chen <zhangckid@gmail.com>
Tested-by: Zhang Chen <zhangckid@gmail.com>
Link: https://lore.kernel.org/r/20251104013606.1937764-1-lizhijian@fujitsu.com
Signed-off-by: Peter Xu <peterx@redhat.com>
(cherry picked from commit 0b5bf4ea76)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-25 23:43:16 +03:00
Jack Wang
20a7e4d3e7 qmp: Fix a typo for a USO feature
There is a copy & paste error, USO6 should be there.

Fixes: 58f8168978 ("qmp: update virtio feature maps, vhost-user-gpio introspection")
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit 5fbcbf76a1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-25 23:43:16 +03:00
Thomas Huth
62f897b7d3 MAINTAINERS: Add functional tests that are not covered yet
Some functional tests are currently not covered by the entries
in MAINTAINERS yet, so scripts/get_maintainers.pl fails to suggest
the right people who should be CC:-ed for related patches.
Add the uncovered tests to the right sections to close this gap.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250414121520.213665-1-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 12c6b61530)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-23 11:00:59 +03:00
Thomas Huth
bb09656bdf tests/functional: Remove unnecessary import statements
pylint complains about these unnecessary import statements,
so let's remove them.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250414145457.261734-1-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 99fb9256b7)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-23 11:00:59 +03:00
Thomas Huth
112d01ce9f tests/functional: Remove semicolons at the end of lines
Yes, we are all C coders who try to write Python code for testing...
but still, let's better avoid semicolons at the end of the lines
to keep "pylint" happy!

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Message-ID: <20250327201305.996241-1-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 858640eaee)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-23 11:00:59 +03:00
Thomas Huth
cf4b882ed3 Remove the remainders of the Avocado tests
Now that all Avocado tests have been converted to or been replaced by
other functional tests, we can delete the remainders of the Avocado
tests from the QEMU source tree.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20250414113031.151105-16-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 52e9ed6d3a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-23 11:00:59 +03:00
Thomas Huth
0818105ab6 docs/devel/testing: Dissolve the ci-definitions.rst.inc file
This file was meant for defining the vocabulary for our testing
efforts, but it did not age well: First, the definitions are not
only about the CI part, but also about testing in general, so most
of the information should rather reside in main.rst instead.
Second, some vocabulary has never been really adopted by the QEMU
project, for example we never really use the word "system testing"
since "system" rather means the system emulator binaries in the
QEMU project (and we also don't do any testing with other components
like libvirt and virt-managers here). It also defines that the qtests
are the "functional" tests in QEMU, which is not really up to date
anymore after the "tests/functional" framework has been introduced
a couple of months ago (FWIW, the qtests could rather be seen as a
mix between unit testing and functional testing).

To solve this problem, move the useful parts of this file into
main.rst and directly into ci.rst, and drop the ones (like "system
testing") that we don't really need anymore.

Message-ID: <20250314085959.1585568-1-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 5748e46415)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-23 11:00:59 +03:00
Thomas Huth
bb4444ba8f gitlab-ci: Update QEMU_JOB_AVOCADO and QEMU_CI_AVOCADO_TESTING
Since we don't run the Avocado jobs in the CI anymore, rename
these variables to QEMU_JOB_FUNCTIONAL and QEMU_CI_FUNCTIONAL.

Also, there was a mismatch between the documentation and the
implementation of QEMU_CI_AVOCADO_TESTING: While the documentation
said that you had to "Set this variable to have the tests using the
Avocado framework run automatically", you indeed needed to set it
to make the pipelines appear in your dashboard - but they were never
run automatically in forks and had to be triggered manually. Let's
improve this now: No need to hide these pipelines from the users
by default anymore (the functional tests should be stable enough
nowadays), and rather allow the users to run the pipelines auto-
matically with this switch now instead, as was documented.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20250414113031.151105-15-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit f8c5484417)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-23 11:00:59 +03:00
Thomas Huth
eb45f72de0 tests/functional: Convert the SMMU test to the functional framework
This test was using cloudinit and a "dnf install" command in the guest
to exercise the NIC with SMMU enabled. Since we don't have the cloudinit
stuff in the functional framework and we should not rely on having access
to external networks (once our ASSETs have been cached), we rather boot
into the initrd first, manually mount the root disk and then use the
check_http_download() function from the functional framework here instead
for testing whether the network works as expected.

Unfortunately, there seems to be a small race when using the files
from Fedora 33: To enter the initrd shell, we have to send a "return"
once. But it does not seem to work if we send it too early. Using a
sleep(0.2) makes it work reliably for me, but to make it even more
unlikely to trigger this situation, let's better limit the Fedora 33
tests to only run with KVM.

Finally, while we're at it, we also add some lines for testing writes
to the hard disk, as we already do it in the test_intel_iommu test.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Message-ID: <20250414113031.151105-14-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 5c2bae2155)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-23 11:00:59 +03:00
Thomas Huth
26267e726d tests/functional: Use the tuxrun kernel for the aarch64 replay test
This way we can do a full boot in record-replay mode and
should get a similar test coverage compared to the old
replay test from tests/avocado/replay_linux.py.

Since the aarch64 test was the last avocado test in the
tests/avocado/replay_linux.py file, we can remove this
file now completely.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250414113031.151105-13-thuth@redhat.com>
(cherry picked from commit a820caf844)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-23 11:00:59 +03:00
Thomas Huth
43d94f4c41 tests/functional: Use the tuxrun kernel for the x86 replay test
This way we can do a full boot in record-replay mode and
should get a similar test coverage compared to the old
replay test from tests/avocado/replay_linux.py. Thus remove
the x86 avocado replay_linux test now.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20250414113031.151105-12-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 7fecdb0acd)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-23 11:00:59 +03:00
Thomas Huth
d50027a984 tests/avocado: Remove the boot_linux.py tests
These tests are based on the cloudinit functions from Avocado.
The cloudinit is very, very slow compared to our other tests,
so most of these Avocado tests have either been disabled by default
with a decorator, or have been marked to only run with KVM.

We won't include this sluggish cloudinit stuff in the functional
framework, and we've already got plenty of other tests there that
check pretty much the same things, so let's simply get rid of these
old tests now.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20250414113031.151105-11-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit e83aee9c6a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-23 11:00:59 +03:00
Thomas Huth
0236373fa6 tests/functional: Convert the 64-bit big endian Wheezy mips test
Reuse the test function from the 32-bit big endian test to easily
convert the 64-bit big endian Wheezy mips test.

Since this was the last test in tests/avocado/linux_ssh_mips_malta.py,
we can remove this avocado file now, too.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20250414113031.151105-10-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit f79592f427)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-23 11:00:59 +03:00
Thomas Huth
9821c1cb60 tests/functional: Convert the 64-bit little endian Wheezy mips test
Reuse the test function from the 32-bit big endian test to easily
convert the 64-bit little endian Wheezy mips test.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20250414113031.151105-9-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 8e3461c3a6)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-23 11:00:59 +03:00
Thomas Huth
902fc01dc2 tests/functional: Convert the 32-bit little endian Wheezy mips test
Reuse the test function from the big endian test to easily
convert the 32-bit little endian Wheezy mips test.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20250414113031.151105-8-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 689a8b56a6)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-23 11:00:59 +03:00
Thomas Huth
ebfa6996dc tests/functional: Convert the 32-bit big endian Wheezy mips test
The test checks some entries in /proc and the output of some commands ...
we put these checks into exportable functions now so that they can
be reused more easily.

Additionally the linux_ssh_mips_malta.py uses SSH to test the networking
of the guest. Since we don't have a SSH module in the functional
framework yet, let's use the check_http_download() function here instead.

And while we're at it, also switch the NIC to e1000 now to get some more
test coverage, since the "pcnet" device is already tested in the test
test_mips_malta_cpio.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20250414113031.151105-7-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 42a87f0ce7)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-23 11:00:59 +03:00
Thomas Huth
f95d41b486 tests/avocado: Remove the LinuxKernelTest class
All tests that used this class have been converted to the functional
framework, so we can remove the boot_linux_console.py file now.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20250414113031.151105-6-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 574f71bc1f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-23 11:00:59 +03:00
Thomas Huth
27cfc5ea42 tests/functional: Convert the i386 replay avocado test
Since this was the last test in tests/avocado/replay_kernel.py,
we can remove that Avocado file now.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20250414113031.151105-5-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 0e756f404d)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-23 11:00:59 +03:00
Thomas Huth
c4747e3beb tests/functional: Convert reverse_debugging tests to the functional framework
These tests are using the gdb-related library functions from the
Avocado framework which we don't have in the functional framework
yet. So for the time being, keep those imports and skip the test
if the Avocado framework is not installed on the host.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20250414113031.151105-4-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 951ededf12)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-23 11:00:59 +03:00
Thomas Huth
1c491b7186 tests/functional: Move the check for the parameters from avocado to functional
test_x86_64_pc in tests/avocado/boot_linux_console.py only checks
whether the kernel parameters have correctly been passed to the
kernel in the guest by looking for them in the console output of the
guest. Let's move that to the functional test framework now, but
instead of doing it in a separate test, let's do it for all tuxrun
tests instead, so it is done automatically for all targets that have
a tuxrun test.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20250414113031.151105-3-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit bc65ae6961)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-23 11:00:59 +03:00
Thomas Huth
d18488775f gitlab-ci: Remove the avocado tests from the CI pipelines
We are going to move the remaining Avocado tests step by step
into the functional test framework. Unfortunately, Avocado fails
with an error if it cannot determine a test to run, so disable
the tests here now to avoid failures in the Gitlab-CI during the
next steps.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20250414113031.151105-2-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 22baa5f340)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-23 11:00:59 +03:00
Cornelia Huck
83aeacaafa tests/functional/test_vnc: skip test if no crypto backend available
The test_change_password test will fail if no cryptographic backend is
available (e.g. if QEMU was built on a system with no cryptographic
library development packages installed); just skip the test in that
case.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250414093732.220498-1-cohuck@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 4e3823c68c)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-23 11:00:59 +03:00
Paolo Bonzini
e3f70ada90 target/i386: fix stack size when delivering real mode interrupts
The stack can be 32-bit even in real mode, and in this case
the stack pointer must be updated in its entirety rather than
just the bottom 16 bits.  The same is true of real mode IRET,
for which there was even a comment suggesting the right thing
to do.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1506
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 106d766c9d)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-21 16:01:48 +03:00
Paolo Bonzini
9898237cf2 target/i386: svm: fix sign extension of exit code
The exit_code parameter of cpu_vmexit is declared as uint32_t, but exit
codes are 64 bits wide according to the AMD SVM specification.  And because
uint32_t is unsigned, this causes exit codes to be zero-extended, for example
writing SVM_EXIT_ERR as 0xffff_ffff instead of the expected 0xffff_ffff_ffff_ffff.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2977
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 9c3afb9d9b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-21 16:01:48 +03:00
Paolo Bonzini
fdda67ec29 target/i386/tcg: validate segment registers
Correctly reject invalid segment registers, including CS when used as
the destination of a MOV.  Ignore the REX prefix as well.

Fixes: 5e9e21bcc4 ("target/i386: move 60-BF opcodes to new decoder", 2024-05-07)
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3195
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit ebb46ba6a4)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-21 16:01:48 +03:00
Peter Maydell
1a854ae15f target/i386: Mark VPERMILPS as not valid with prefix 0
There are a small set of binary SSE insns which have no MMX
equivalent, which we create the gen functions for with the
BINARY_INT_SSE() macro.  This forwards to gen_binary_int_sse() with a
NULL pointer for 'mmx'.

For almost all of these insns we correctly mark them in the decode
table as not permitting a zero prefix byte; however we got this wrong
for VPERMILPS, with the result that a bogus instruction would get
through the decode checks and end up in gen_binary_int_sse() trying
to call a NULL pointer.

Correct the decode table entry for VPERMILPS so that we get the
expected #UD exception.

In the x86 SDM, table A-4 "Three-byte Opcode Map: 08H-FFH
(First Two Bytes are 0F 38H)" confirms that there is no pfx 0
version of VPERMILPS.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3199
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Link: https://lore.kernel.org/r/20251114175417.2794804-1-peter.maydell@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit ebd9ea2947)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-21 16:01:48 +03:00
Philippe Mathieu-Daudé
ec87d57a17 hw/southbridge/lasi: Correct LasiState parent
TYPE_LASI_CHIP inherits from TYPE_SYS_BUS_DEVICE, not
TYPE_PCI_HOST_BRIDGE, so its parent structure is of
SysBusDevice type.

Cc: qemu-stable@nongnu.org
Fixes: 376b851909 ("hppa: Add support for LASI chip with i82596 NIC")
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Gustavo Romero <gustavo.romero@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20251117091804.56529-1-philmd@linaro.org>
(cherry picked from commit 9c3b76a0d4)
(Mjt: context fixup)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-21 16:01:47 +03:00
Yannick Voßen
3873acda93 hw/dma/zynq-devcfg: Fix register memory
Registers are always 32 bit aligned. R_MAX is not the maximum
register address, it is the maximum register number. The memory
size can be determined by 4 * R_MAX.

Currently every register with an offset bigger than 0x40 will be
ignored, because the memory size is set wrong. This effects the
MCTRL register and makes it useless. This commit restores the
correct behaviour.

Cc: qemu-stable@nongnu.org
Fixes: 034c2e6902 ("dma: Add Xilinx Zynq devcfg device model")
Signed-off-by: YannickV <Y.Vossen@beckhoff.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20251111102836.212535-9-corvin.koehne@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit a344e22917)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-21 16:01:47 +03:00
Daniel P. Berrangé
abd554bafe tests/functional: handle URLError when fetching assets
We treat most HTTP errors as non-fatal when fetching assets,
but forgot to handle network level errors. This adds catching
of URLError so that we retry on failure, and will ultimately
trigger graceful skipping in the pre-cache task.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250829142616.2633254-4-berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 335da23abe)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-21 16:01:47 +03:00
Daniel P. Berrangé
5dfc7d7cc6 tests/functional: fix formatting of exception args
The catch-all exception handler forgot the placeholder for
the exception details.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250829142616.2633254-3-berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 124ab930ba)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-21 16:01:47 +03:00
Hanna Czenczek
da9a175f36 block/io: Take reqs_lock for tracked_requests
bdrv_co_get_self_request() does not take a lock around iterating through
bs->tracked_requests.  With multiqueue, it may thus iterate over a list
that is in the process of being modified, producing an assertion
failure:

../block/file-posix.c:3702: raw_do_pwrite_zeroes: Assertion `req' failed.

[0] abort() at /lib64/libc.so.6
[1] __assert_fail_base.cold() at /lib64/libc.so.6
[2] raw_do_pwrite_zeroes() at ../block/file-posix.c:3702
[3] bdrv_co_do_pwrite_zeroes() at ../block/io.c:1910
[4] bdrv_aligned_pwritev() at ../block/io.c:2109
[5] bdrv_co_do_zero_pwritev() at ../block/io.c:2192
[6] bdrv_co_pwritev_part() at ../block/io.c:2292
[7] bdrv_co_pwritev() at ../block/io.c:2225
[8] handle_alloc_space() at ../block/qcow2.c:2573
[9] qcow2_co_pwritev_task() at ../block/qcow2.c:2625

Fix this by taking reqs_lock.

Cc: qemu-stable@nongnu.org
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-11-hreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 9b9ee60c07)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-21 16:01:47 +03:00
Hanna Czenczek
361bd4a6c8 nvme: Fix coroutine waking
nvme wakes the request coroutine via qemu_coroutine_enter() from a BH
scheduled in the BDS AioContext.  This may not be the same context as
the one in which the request originally ran, which would be wrong:
- It could mean we enter the coroutine before it yields,
- We would move the coroutine in to a different context.

(Can be reproduced with multiqueue by adding a usleep(100000) before the
`while (data.ret == -EINPROGRESS)` loop.)

To fix that, use aio_co_wake() to run the coroutine in its home context.
Just like in the preceding iscsi and nfs patches, we can drop the
trivial nvme_rw_cb_bh() and use aio_co_wake() directly.

With this, we can remove NVMeCoData.ctx.

Note the check of data->co == NULL to bypass the BH/yield combination in
case nvme_rw_cb() is called from nvme_submit_command(): We probably want
to keep this fast path for performance reasons, but we have to be quite
careful about it:
- We cannot overload .ret for this, but have to use a dedicated
  .skip_yield field.  Otherwise, if nvme_rw_cb() runs in a different
  thread than the coroutine, it may see .ret set and skip the yield,
  while nvme_rw_cb() will still schedule a BH for waking.   Therefore,
  the signal to skip the yield can only be set in nvme_rw_cb() if waking
  too is skipped, which is independent from communicating the return
  value.
- We can only skip the yield if nvme_rw_cb() actually runs in the
  request coroutine.  Otherwise (specifically if they run in different
  AioContexts), the order between this function’s execution and the
  coroutine yielding (or not yielding) is not reliable.
- There is no point to yielding in a loop; there are no spurious wakes,
  so once we yield, we will only be re-entered once the command is done.
  Replace `while` by `if`.

Cc: qemu-stable@nongnu.org
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-9-hreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 0f142cbd91)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-21 16:01:47 +03:00
Hanna Czenczek
ca02694bbc nvme: Kick and check completions in BDS context
nvme_process_completion() must run in the main BDS context, so schedule
a BH for requests that aren’t there.

The context in which we kick does not matter, but let’s just keep kick
and process_completion together for simplicity’s sake.

(For what it’s worth, a quick fio bandwidth test indicates that on my
test hardware, if anything, this may be a bit better than kicking
immediately before scheduling a pure nvme_process_completion() BH.  But
I wouldn’t take more from those results than that it doesn’t really seem
to matter either way.)

Cc: qemu-stable@nongnu.org
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-8-hreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 7a501bbd51)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-21 16:01:47 +03:00
Hanna Czenczek
03778618b1 curl: Fix coroutine waking
If we wake a coroutine from a different context, we must ensure that it
will yield exactly once (now or later), awaiting that wake.

curl’s current .ret == -EINPROGRESS loop may lead to the coroutine not
yielding if the request finishes before the loop gets run.  To fix it,
we must drop the loop and yield exactly once, if we need to yield.

Finding out that latter part ("if we need to yield") makes it a bit
complicated: Requests may be served from a cache internal to the curl
block driver, or fail before being submitted.  In these cases, we must
not yield.  However, if we find a matching but still ongoing request in
the cache, we will have to await that, i.e. still yield.

To address this, move the yield inside of the respective functions:
- Inside of curl_find_buf() when awaiting ongoing concurrent requests,
- Inside of curl_setup_preadv() when having created a new request.

Rename curl_setup_preadv() to curl_do_preadv() to reflect this.

(Can be reproduced with multiqueue by adding a usleep(100000) before the
`while (acb.ret == -EINPROGRESS)` loop.)

Also, add a comment why aio_co_wake() is safe regardless of whether the
coroutine and curl_multi_check_completion() run in the same context.

Cc: qemu-stable@nongnu.org
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-6-hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 53d5c7ffac)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-21 16:01:47 +03:00
Hanna Czenczek
aa2ec06680 nfs: Run co BH CB in the coroutine’s AioContext
Like in “rbd: Run co BH CB in the coroutine’s AioContext”, drop the
completion flag, yield exactly once, and run the BH in the coroutine’s
AioContext.

(Can be reproduced with multiqueue by adding a usleep(100000) before the
`while (!task.complete)` loops.)

Like in “iscsi: Run co BH CB in the coroutine’s AioContext”, this makes
nfs_co_generic_bh_cb() trivial, so we can drop it in favor of just
calling aio_co_wake() directly.

Cc: qemu-stable@nongnu.org
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-5-hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit deb35c129b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-21 16:01:47 +03:00
Hanna Czenczek
23f0865f1b rbd: Run co BH CB in the coroutine’s AioContext
qemu_rbd_completion_cb() schedules the request completion code
(qemu_rbd_finish_bh()) to run in the BDS’s AioContext, assuming that
this is the same thread in which qemu_rbd_start_co() runs.

To explain, this is how both latter functions interact:

In qemu_rbd_start_co():

    while (!task.complete)
        qemu_coroutine_yield();

In qemu_rbd_finish_bh():

    task->complete = true;
    aio_co_wake(task->co); // task->co is qemu_rbd_start_co()

For this interaction to work reliably, both must run in the same thread
so that qemu_rbd_finish_bh() can only run once the coroutine yields.
Otherwise, finish_bh() may run before start_co() checks task.complete,
which will result in the latter seeing .complete as true immediately and
skipping the yield altogether, even though finish_bh() still wakes it.

With multiqueue, the BDS’s AioContext is not necessarily the thread
start_co() runs in, and so finish_bh() may be scheduled to run in a
different thread than start_co().  With the right timing, this will
cause the problems described above; waking a non-yielding coroutine is
not good, as can be reproduced by putting e.g. a usleep(100000) above
the while loop in start_co() (and using multiqueue), giving finish_bh()
a much better chance at exiting before start_co() can yield.

So instead of scheduling finish_bh() in the BDS’s AioContext, schedule
finish_bh() in task->co’s AioContext.

In addition, we can get rid of task.complete altogether because we will
get woken exactly once, when the task is indeed complete, no need to
check.

(We could go further and drop the BH, running aio_co_wake() directly in
qemu_rbd_completion_cb() because we are allowed to do that even if the
coroutine isn’t yet yielding and we’re in a different thread – but the
doc comment on qemu_rbd_completion_cb() says to be careful, so I decided
not to go so far here.)

Buglink: https://issues.redhat.com/browse/RHEL-67115
Reported-by: Junyao Zhao <junzhao@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-3-hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 89d22536d1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-21 16:01:47 +03:00