6334 Commits

Author SHA1 Message Date
Philippe Mathieu-Daudé
3115691855 bswap: Include missing 'qemu/bswap.h' header
All these files indirectly include the "qemu/bswap.h" header.
Make this inclusion explicit to avoid build errors when
refactoring unrelated headers.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20260109164742.58041-4-philmd@linaro.org>
2026-01-22 10:48:45 +01:00
Richard Henderson
d9fe147955 block: Drop use of Stat64
The Stat64 structure is an aid for 32-bit hosts, and
is no longer required.  Use plain 64-bit types.

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2026-01-17 10:46:42 +11:00
Markus Armbruster
1f674a2b7a block/file-win32: Improve an error message
Two out of three calls of CreateFile() use error_setg_win32() to
report errors.  The third uses error_setg_errno(), mapping
ERROR_ACCESS_DENIED to EACCES, and everything else to EINVAL, throwing
away detail.  Switch it to error_setg_win32().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20251121121438.1249498-16-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2026-01-08 07:50:32 +01:00
Markus Armbruster
9c3a66f5c4 blkdebug: Use error_setg_file_open() for a better error message
The error message changes from

    Could not read blkdebug config file: REASON

to

    Could not open 'FNAME': REASON

I think the exact file name is more useful to know than the file's
purpose.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20251121121438.1249498-10-armbru@redhat.com>
2026-01-07 11:26:57 +01:00
Markus Armbruster
19552884e4 error: Consistently name Error * objects err, and not errp
This touches code in xen_enable_tpm() that is obviously wrong.  Since
I don't know how to fix it properly, I'm adding a FIXME there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20251119130855.105479-6-armbru@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
2026-01-07 11:26:57 +01:00
Nguyen Dinh Phi
0c9f429ec8 util: Move qemu_ftruncate64 from block/file-win32.c to oslib-win32.c
qemu_ftruncate64() is a general-purpose utility function that may be
used outside of the block layer. Move it to util/oslib-win32.c where
other Windows-specific utility functions reside.

Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20251218085446.462827-3-phind.uet@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-12-30 20:38:41 +01:00
Paolo Bonzini
12e50722e4 block: rename block/aio-wait.h to qemu/aio-wait.h
AIO_WAIT_WHILE is used even outside the block layer; move the header file
out of block/ just like the implementation is in util/.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-12-27 10:11:12 +01:00
Paolo Bonzini
ba773aded3 block: rename block/aio.h to qemu/aio.h
AioContexts are used as a generic event loop even outside the block
layer; move the header file out of block/ just like the implementation
is in util/.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-12-27 10:11:12 +01:00
Paolo Bonzini
ddab0ef124 block: extract include/qemu/aiocb.h out of include/block/aio.h
Create a new header corresponding to functions defined in
util/aiocb.c, and include it whenever AIOCBs are used but
AioContext is not.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-12-27 10:11:12 +01:00
Paolo Bonzini
7f548b8f23 include: reorganize memory API headers
Move RAMBlock functions out of ram_addr.h and cpu-common.h;
move memory API headers out of include/exec and into include/system.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-12-27 10:11:09 +01:00
Paolo Bonzini
d1000ecae2 include: move hw/qdev-core.h to hw/core/, rename
Call it hw/core/qdev.h to avoid the duplication in the name.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-12-27 10:11:07 +01:00
Paolo Bonzini
1942b61b74 include: move hw/boards.h to hw/core/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-12-27 10:11:06 +01:00
Hanna Czenczek
b002acacc1 Revert "nvme: Fix coroutine waking"
This reverts commit 0f142cbd91.

Said commit changed the replay_bh_schedule_oneshot_event() in
nvme_rw_cb() to aio_co_wake(), allowing the request coroutine to be
entered directly (instead of only being scheduled for later execution).
This can cause the device to become stalled like so:

It is possible that after completion the request coroutine goes on to
submit another request without yielding, e.g. a flush after a write to
emulate FUA.  This will likely cause a nested nvme_process_completion()
call because nvme_rw_cb() itself is called from there.

(After submitting a request, we invoke nvme_process_completion() through
defer_call(); but the fact that nvme_process_completion() ran in the
first place indicates that we are not in a call-deferring section, so
defer_call() will call nvme_process_completion() immediately.)

If this inner nvme_process_completion() loop then processes any
completions, it will write the final completion queue (CQ) head index to
the CQ head doorbell, and subsequently execution will return to the
outer nvme_process_completion() loop.  Even if this loop now finds no
further completions, it still processed at least one completion before,
or it would not have called the nvme_rw_cb() which led to nesting.
Therefore, it will now write the exact same CQ head index value to the
doorbell, which effectively is an unrecoverable error[1].

Therefore, nesting of nvme_process_completion() does not work at this
point.  Reverting said commit removes the nesting (by scheduling the
request coroutine instead of entering it immediately), and so fixes the
stall.

On the downside, reverting said commit breaks multiqueue for nvme, but
better to have single-queue working than neither.  For 11.0, we will
have a solution that makes both work.

A side note: There is a comment in nvme_process_completion() above
qemu_bh_schedule() that claims nesting works, as long as it is done
through the completion_bh.  I am quite sure that is not true, for two
reasons:
- The problem described above, which is even worse when going through
  nvme_process_completion_bh() because that function unconditionally
  writes to the CQ head doorbell,
- nvme_process_completion_bh() never takes q->lock, so
  nvme_process_completion() unlocking it will likely abort.

Given the lack of reports of such aborts, I believe that completion_bh
simply is unused in practice.

[1] See the NVMe Base Specification revision 2.3, page 180, figure 152:
    “Invalid Doorbell Write Value: A host attempted to write an invalid
    doorbell value. Some possible causes of this error are: [...] the
    value written is the same as the previously written doorbell value.”

    To even be notified of this error, we would need to send an
    Asynchronous Event Request to the admin queue (p. 178ff), which we
    don’t do, and then to handle it, we would need to delete and
    recreate the queue (p. 88, section 3.3.1.2 Queue Usage).

Cc: qemu-stable@nongnu.org
Reported-by: Lukáš Doktor <ldoktor@redhat.com>
Tested-by: Lukáš Doktor <ldoktor@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-id: 20251215141540.88915-1-hreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-12-15 09:50:41 -05:00
Cédric Le Goater
326e620fc0 Fix const qualifier build errors with recent glibc
A recent change in glibc 2.42.9000 [1] changes the return type of
strstr() and other string functions to be 'const char *' when the
input is a 'const char *'.

This breaks the build in various files with errors such as :

  error: initialization discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers]
    208 |         char *pidstr = strstr(filename, "%");
        |                        ^~~~~~

Fix this by changing the type of the variables that store the result
of these functions to 'const char *'.

[1] https://sourceware.org/git/?p=glibc.git;a=commit;h=cd748a63ab1a7ae846175c532a3daab341c62690

Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20251209174328.698774-1-clg@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-12-09 21:00:15 +01:00
Kevin Wolf
2c3165a1a6 file-posix: Handle suspended dm-multipath better for SG_IO
When introducing DM_MPATH_PROBE_PATHS, we already anticipated that
dm-multipath devices might be suspended for a short time when the DM
tables are reloaded and that they return -EAGAIN in this case. We then
wait for a millisecond and retry.

However, meanwhile it has also turned out that libmpathpersist (which is
used by qemu-pr-helper) may need to perform more complex recovery
operations to get reservations back to expected state if a path failure
happened in the middle of a PR operation. In this case, the device is
suspended for a longer time compared to the case we originally expected.

This patch changes hdev_co_ioctl() to treat -EAGAIN separately so that
it doesn't result in an immediate failure if the device is suspended for
more than 1ms, and moves to incremental backoff to cover both quick and
slow cases without excessive delays.

Buglink: https://issues.redhat.com/browse/RHEL-121543
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20251128221440.89125-1-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-12-04 18:34:15 +01:00
Stefan Hajnoczi
d704a13d2c block: use pwrite_zeroes_alignment when writing first sector
Since commit 5634622bcb ("file-posix: allow BLKZEROOUT with -t
writeback"), qemu-img create errors out on a Linux loop block device
with a 4 KB sector size:

  # dd if=/dev/zero of=blockfile bs=1M count=1024
  # losetup --sector-size 4096 /dev/loop0 blockfile
  # qemu-img create -f raw /dev/loop0 1G
  Formatting '/dev/loop0', fmt=raw size=1073741824
  qemu-img: /dev/loop0: Failed to clear the new image's first sector: Invalid argument

Use the pwrite_zeroes_alignment block limit to avoid misaligned
fallocate(2) or ioctl(BLKZEROOUT) in the block/file-posix.c block
driver.

Cc: qemu-stable@nongnu.org
Fixes: 5634622bcb ("file-posix: allow BLKZEROOUT with -t writeback")
Reported-by: Jean-Louis Dupond <jean-louis@dupond.be>
Buglink: https://gitlab.com/qemu-project/qemu/-/issues/3127
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20251007141700.71891-3-stefanha@redhat.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-25 15:26:22 +01:00
Stefan Hajnoczi
98e788b91a file-posix: populate pwrite_zeroes_alignment
Linux block devices require write zeroes alignment whereas files do not.

It may come as a surprise that block devices opened in buffered I/O mode
require the alignment for write zeroes requests although normal
read/write requests do not.

Therefore it is necessary to populate the pwrite_zeroes_alignment field.

Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20251007141700.71891-2-stefanha@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-25 15:26:22 +01:00
Kevin Wolf
8eeaa706ba block-backend: Fix race when resuming queued requests
When new requests arrive at a BlockBackend that is currently drained,
these requests are queued until the drain section ends.

There is a race window between blk_root_drained_end() waking up a queued
request in an iothread from the main thread and blk_wait_while_drained()
actually being woken up in the iothread and calling blk_inc_in_flight().
If the BlockBackend is drained again during this window, drain won't
wait for this request and it will sneak in when the BlockBackend is
already supposed to be quiesced. This causes assertion failures in
bdrv_drain_all_begin() and can have other unintended consequences.

Fix this by increasing the in_flight counter immediately when scheduling
the request to be resumed so that the next drain will wait for it to
complete.

Cc: qemu-stable@nongnu.org
Reported-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20251119172720.135424-1-kwolf@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Tested-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-25 15:26:22 +01:00
Hanna Czenczek
837c04e9fc win32-aio: Run CB in original context
AIO callbacks must be called in the originally calling AioContext,
regardless of the BDS’s “main” AioContext.

Note: I tried to test this (under wine), but failed.  Whenever I tried
to use multiqueue or even just an I/O thread for a virtio-blk (or
virtio-scsi) device, I/O stalled, both with and without this patch.

For what it’s worth, when not using an I/O thread, I/O continued to work
with this patch.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-20-hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-18 18:01:57 +01:00
Hanna Czenczek
f18782a8f5 null-aio: Run CB in original AioContext
AIO callbacks must be called in the originally calling AioContext,
regardless of the BDS’s “main” AioContext.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-19-hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-18 18:01:57 +01:00
Hanna Czenczek
63d15c7aa5 iscsi: Create AIO BH in original AioContext
AIO callbacks must be called in the original request’s AioContext,
regardless of the BDS’s “main” AioContext.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-18-hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-18 18:01:57 +01:00
Hanna Czenczek
b0cc742f84 blkreplay: Run BH in coroutine’s AioContext
While it does not matter in which AioContext we run aio_co_wake() to
continue an exactly-once-yielding coroutine, making this commit not
strictly necessary, there is also no reason why the BH should run in any
context but the request’s AioContext.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-16-hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-18 18:01:55 +01:00
Hanna Czenczek
7c3e9b87f5 ssh: Run restart_coroutine in current AioContext
restart_coroutine() is attached as an FD handler just to wake the
current coroutine after yielding.  It makes most sense to attach it to
the current (request) AioContext instead of the BDS main context.  This
way, the coroutine can be entered directly from the BH instead of having
yet another indirection through AioContext.co_schedule_bh.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-15-hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-18 18:01:55 +01:00
Hanna Czenczek
94ce870f60 qcow2: Schedule cache-clean-timer in realtime
There is no reason why the cache cleaning timer should run in virtual
time, run it in realtime instead.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-14-hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-18 18:01:55 +01:00
Hanna Czenczek
f86dde9a15 qcow2: Fix cache_clean_timer
The cache-cleaner runs as a timer CB in the BDS AioContext.  With
multiqueue, it can run concurrently to I/O requests, and because it does
not take any lock, this can break concurrent cache accesses, corrupting
the image.  While the chances of this happening are low, it can be
reproduced e.g. by modifying the code to schedule the timer CB every
5 ms (instead of at most once per second) and modifying the last (inner)
while loop of qcow2_cache_clean_unused() like so:

    while (i < c->size && can_clean_entry(c, i)) {
        for (int j = 0; j < 1000 && can_clean_entry(c, i); j++) {
            usleep(100);
        }
        c->entries[i].offset = 0;
        c->entries[i].lru_counter = 0;
        i++;
        to_clean++;
    }

i.e. making it wait on purpose for the point in time where the cache is
in use by something else.

The solution chosen for this in this patch is not the best solution, I
hope, but I admittedly can’t come up with anything strictly better.

We can protect from concurrent cache accesses either by taking the
existing s->lock, or we introduce a new (non-coroutine) mutex
specifically for cache accesses.  I would prefer to avoid the latter so
as not to introduce additional (very slight) overhead.

Using s->lock, which is a coroutine mutex, however means that we need to
take it in a coroutine, so the timer must run in a coroutine.  We can
transform it from the current timer CB style into a coroutine that
sleeps for the set interval.  As a result, however, we can no longer
just deschedule the timer to instantly guarantee it won’t run anymore,
but have to await the coroutine’s exit.

(Note even before this patch there were places that may not have been so
guaranteed after all: Anything calling cache_clean_timer_del() from the
QEMU main AioContext could have been running concurrently to an existing
timer CB invocation.)

Polling to await the timer to actually settle seems very complicated for
something that’s rather a minor problem, but I can’t come up with any
better solution that doesn’t again just overlook potential problems.

(Not Cc-ing qemu-stable, as the issue is quite unlikely to be hit, and
I’m not too fond of this solution.)

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-13-hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-18 18:01:55 +01:00
Hanna Czenczek
90db3a1721 qcow2: Re-initialize lock in invalidate_cache
After clearing our state (memset()-ing it to 0), we should
re-initialize objects that need it.  Specifically, that applies to
s->lock, which is originally initialized in qcow2_open().

Given qemu_co_mutex_init() is just a memset() to 0, this is functionally
a no-op, but still seems like the right thing to do.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-12-hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-18 18:01:55 +01:00
Hanna Czenczek
9b9ee60c07 block/io: Take reqs_lock for tracked_requests
bdrv_co_get_self_request() does not take a lock around iterating through
bs->tracked_requests.  With multiqueue, it may thus iterate over a list
that is in the process of being modified, producing an assertion
failure:

../block/file-posix.c:3702: raw_do_pwrite_zeroes: Assertion `req' failed.

[0] abort() at /lib64/libc.so.6
[1] __assert_fail_base.cold() at /lib64/libc.so.6
[2] raw_do_pwrite_zeroes() at ../block/file-posix.c:3702
[3] bdrv_co_do_pwrite_zeroes() at ../block/io.c:1910
[4] bdrv_aligned_pwritev() at ../block/io.c:2109
[5] bdrv_co_do_zero_pwritev() at ../block/io.c:2192
[6] bdrv_co_pwritev_part() at ../block/io.c:2292
[7] bdrv_co_pwritev() at ../block/io.c:2225
[8] handle_alloc_space() at ../block/qcow2.c:2573
[9] qcow2_co_pwritev_task() at ../block/qcow2.c:2625

Fix this by taking reqs_lock.

Cc: qemu-stable@nongnu.org
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-11-hreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-18 18:01:54 +01:00
Hanna Czenczek
ac3520f599 nvme: Note in which AioContext some functions run
Sprinkle comments throughout block/nvme.c noting for some functions
(where it may not be obvious) that they require a certain AioContext, or
in which AioContext they do happen to run (for callbacks, BHs, event
notifiers).

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-10-hreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-18 18:01:53 +01:00
Hanna Czenczek
0f142cbd91 nvme: Fix coroutine waking
nvme wakes the request coroutine via qemu_coroutine_enter() from a BH
scheduled in the BDS AioContext.  This may not be the same context as
the one in which the request originally ran, which would be wrong:
- It could mean we enter the coroutine before it yields,
- We would move the coroutine in to a different context.

(Can be reproduced with multiqueue by adding a usleep(100000) before the
`while (data.ret == -EINPROGRESS)` loop.)

To fix that, use aio_co_wake() to run the coroutine in its home context.
Just like in the preceding iscsi and nfs patches, we can drop the
trivial nvme_rw_cb_bh() and use aio_co_wake() directly.

With this, we can remove NVMeCoData.ctx.

Note the check of data->co == NULL to bypass the BH/yield combination in
case nvme_rw_cb() is called from nvme_submit_command(): We probably want
to keep this fast path for performance reasons, but we have to be quite
careful about it:
- We cannot overload .ret for this, but have to use a dedicated
  .skip_yield field.  Otherwise, if nvme_rw_cb() runs in a different
  thread than the coroutine, it may see .ret set and skip the yield,
  while nvme_rw_cb() will still schedule a BH for waking.   Therefore,
  the signal to skip the yield can only be set in nvme_rw_cb() if waking
  too is skipped, which is independent from communicating the return
  value.
- We can only skip the yield if nvme_rw_cb() actually runs in the
  request coroutine.  Otherwise (specifically if they run in different
  AioContexts), the order between this function’s execution and the
  coroutine yielding (or not yielding) is not reliable.
- There is no point to yielding in a loop; there are no spurious wakes,
  so once we yield, we will only be re-entered once the command is done.
  Replace `while` by `if`.

Cc: qemu-stable@nongnu.org
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-9-hreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-18 18:01:51 +01:00
Hanna Czenczek
7a501bbd51 nvme: Kick and check completions in BDS context
nvme_process_completion() must run in the main BDS context, so schedule
a BH for requests that aren’t there.

The context in which we kick does not matter, but let’s just keep kick
and process_completion together for simplicity’s sake.

(For what it’s worth, a quick fio bandwidth test indicates that on my
test hardware, if anything, this may be a bit better than kicking
immediately before scheduling a pure nvme_process_completion() BH.  But
I wouldn’t take more from those results than that it doesn’t really seem
to matter either way.)

Cc: qemu-stable@nongnu.org
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-8-hreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-18 18:01:50 +01:00
Hanna Czenczek
7214ad20da gluster: Do not move coroutine into BDS context
The request coroutine may not run in the BDS AioContext.  We should wake
it in its own context, not move it.

With that, we can remove GlusterAIOCB.aio_context.

Also add a comment why aio_co_schedule() is safe to use in this way.

**Note:** Due to a lack of a gluster set-up, I have not tested this
commit.  It seemed safe enough to send anyway, just maybe not to
qemu-stable.  To be clear, I don’t know of any user-visible bugs that
would arise from the state without this patch; the request coroutine is
moved into the main BDS AioContext, so guest device completion code will
run in a different context than where the request started, which can’t
be good, but I haven’t actually confirmed any bugs (due to not being
able to test it).

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-7-hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-18 18:01:50 +01:00
Hanna Czenczek
53d5c7ffac curl: Fix coroutine waking
If we wake a coroutine from a different context, we must ensure that it
will yield exactly once (now or later), awaiting that wake.

curl’s current .ret == -EINPROGRESS loop may lead to the coroutine not
yielding if the request finishes before the loop gets run.  To fix it,
we must drop the loop and yield exactly once, if we need to yield.

Finding out that latter part ("if we need to yield") makes it a bit
complicated: Requests may be served from a cache internal to the curl
block driver, or fail before being submitted.  In these cases, we must
not yield.  However, if we find a matching but still ongoing request in
the cache, we will have to await that, i.e. still yield.

To address this, move the yield inside of the respective functions:
- Inside of curl_find_buf() when awaiting ongoing concurrent requests,
- Inside of curl_setup_preadv() when having created a new request.

Rename curl_setup_preadv() to curl_do_preadv() to reflect this.

(Can be reproduced with multiqueue by adding a usleep(100000) before the
`while (acb.ret == -EINPROGRESS)` loop.)

Also, add a comment why aio_co_wake() is safe regardless of whether the
coroutine and curl_multi_check_completion() run in the same context.

Cc: qemu-stable@nongnu.org
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-6-hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-18 18:01:50 +01:00
Hanna Czenczek
deb35c129b nfs: Run co BH CB in the coroutine’s AioContext
Like in “rbd: Run co BH CB in the coroutine’s AioContext”, drop the
completion flag, yield exactly once, and run the BH in the coroutine’s
AioContext.

(Can be reproduced with multiqueue by adding a usleep(100000) before the
`while (!task.complete)` loops.)

Like in “iscsi: Run co BH CB in the coroutine’s AioContext”, this makes
nfs_co_generic_bh_cb() trivial, so we can drop it in favor of just
calling aio_co_wake() directly.

Cc: qemu-stable@nongnu.org
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-5-hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-18 18:01:50 +01:00
Hanna Czenczek
a9500527db iscsi: Run co BH CB in the coroutine’s AioContext
For rbd (and others), as described in “rbd: Run co BH CB in the
coroutine’s AioContext”, the pattern of setting a completion flag and
waking a coroutine that yields while the flag is not set can only work
when both run in the same thread.

iscsi has the same pattern, but the details are a bit different:
iscsi_co_generic_cb() can (as far as I understand) only run through
iscsi_service(), not just from a random thread at a random time.
iscsi_service() in turn can only be run after iscsi_set_events() set up
an FD event handler, which is done in iscsi_co_wait_for_task().

As a result, iscsi_co_wait_for_task() will always yield exactly once,
because iscsi_co_generic_cb() can only run after iscsi_set_events(),
after the completion flag has already been checked, and the yielding
coroutine will then be woken only once the completion flag was set to
true.  So as far as I can tell, iscsi has no bug and already works fine.

Still, we don’t need the completion flag because we know we have to
yield exactly once, so we can drop it.  This simplifies the code and
makes it more obvious that the “rbd bug” isn’t present here.

This makes iscsi_co_generic_bh_cb() and iscsi_retry_timer_expired() a
bit boring, so at least the former we can drop and call aio_co_wake()
directly from scsi_co_generic_cb() to the same effect.  As for the
latter, the timer needs a CB, so we can’t drop it (I suppose we could
technically use aio_co_wake directly as the CB, but that would be
nasty), but we can put it into the coroutine’s AioContext to make its
aio_co_wake() a simple wrapper around qemu_coroutine_enter() without a
further BH indirection.

Finally, remove the iTask->co != NULL checks: This field is set by
iscsi_co_init_iscsitask(), which all users of IscsiTask run before even
setting up iscsi_co_generic_cb() as the callback, and it is never set or
cleared elsewhere, so it is impossible to not be set in
iscsi_co_generic_cb().

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-4-hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-18 18:01:50 +01:00
Hanna Czenczek
89d22536d1 rbd: Run co BH CB in the coroutine’s AioContext
qemu_rbd_completion_cb() schedules the request completion code
(qemu_rbd_finish_bh()) to run in the BDS’s AioContext, assuming that
this is the same thread in which qemu_rbd_start_co() runs.

To explain, this is how both latter functions interact:

In qemu_rbd_start_co():

    while (!task.complete)
        qemu_coroutine_yield();

In qemu_rbd_finish_bh():

    task->complete = true;
    aio_co_wake(task->co); // task->co is qemu_rbd_start_co()

For this interaction to work reliably, both must run in the same thread
so that qemu_rbd_finish_bh() can only run once the coroutine yields.
Otherwise, finish_bh() may run before start_co() checks task.complete,
which will result in the latter seeing .complete as true immediately and
skipping the yield altogether, even though finish_bh() still wakes it.

With multiqueue, the BDS’s AioContext is not necessarily the thread
start_co() runs in, and so finish_bh() may be scheduled to run in a
different thread than start_co().  With the right timing, this will
cause the problems described above; waking a non-yielding coroutine is
not good, as can be reproduced by putting e.g. a usleep(100000) above
the while loop in start_co() (and using multiqueue), giving finish_bh()
a much better chance at exiting before start_co() can yield.

So instead of scheduling finish_bh() in the BDS’s AioContext, schedule
finish_bh() in task->co’s AioContext.

In addition, we can get rid of task.complete altogether because we will
get woken exactly once, when the task is indeed complete, no need to
check.

(We could go further and drop the BH, running aio_co_wake() directly in
qemu_rbd_completion_cb() because we are allowed to do that even if the
coroutine isn’t yet yielding and we’re in a different thread – but the
doc comment on qemu_rbd_completion_cb() says to be careful, so I decided
not to go so far here.)

Buglink: https://issues.redhat.com/browse/RHEL-67115
Reported-by: Junyao Zhao <junzhao@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-3-hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-18 18:01:50 +01:00
Eric Blake
2e909d7ca9 qcow2, vmdk: Restrict creation with secondary file using protocol
Ever since CVE-2024-4467 (see commit 7ead9469 in qemu v9.1.0), we have
intentionally treated the opening of secondary files whose name is
specified in the contents of the primary file, such as a qcow2
data_file, as something that must be a local file and not a protocol
prefix (it is still possible to open a qcow2 file that wraps an NBD
data image by using QMP commands, but that is from the explicit action
of the QMP overriding any string encoded in the qcow2 file).  At the
time, we did not prevent the use of protocol prefixes on the secondary
image while creating a qcow2 file, but it results in a qcow2 file that
records an empty string for the data_file, rather than the protocol
passed in during creation:

$ qemu-img create -f raw datastore.raw 2G
$ qemu-nbd -e 0 -t -f raw datastore.raw &
$ qemu-img create -f qcow2 -o data_file=nbd://localhost:10809/ \
  datastore_nbd.qcow2 2G
Formatting 'datastore_nbd.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2147483648 data_file=nbd://localhost:10809/ lazy_refcounts=off refcount_bits=16
$ qemu-img info datastore_nbd.qcow2 | grep data
$ qemu-img info datastore_nbd.qcow2 | grep data
image: datastore_nbd.qcow2
    data file:
    data file raw: false
    filename: datastore_nbd.qcow2

And since an empty string was recorded in the file, attempting to open
the image without using QMP to supply the NBD data store fails, with a
somewhat confusing error message:

$ qemu-io -f qcow2 datastore_nbd.qcow2
qemu-io: can't open device datastore_nbd.qcow2: The 'file' block driver requires a file name

Although the ability to create an image with a convenience reference
to a protocol data file is not a security hole (unlike the case with
open, the image is not untrusted if we are the ones creating it), the
above demo shows that it is still inconsistent.  Thus, it makes more
sense if we also insist that image creation rejects a protocol prefix
when using the same syntax.  Now, the above attempt produces:

$ qemu-img create -f qcow2 -o data_file=nbd://localhost:10809/ \
  datastore_nbd.qcow2 2G
Formatting 'datastore_nbd.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2147483648 data_file=nbd://localhost:10809/ lazy_refcounts=off refcount_bits=16
qemu-img: datastore_nbd.qcow2: Could not create 'nbd://localhost:10809/': No such file or directory

with datastore_nbd.qcow2 no longer created.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20250915213919.3121401-6-eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-11 22:06:09 +01:00
Eric Blake
1bd7bfbc2b block: Allow drivers to control protocol prefix at creation
This patch is pure refactoring: instead of hard-coding permission to
use a protocol prefix when creating an image, the drivers can now pass
in a parameter, comparable to what they could already do for opening a
pre-existing image.  This patch is purely mechanical (all drivers pass
in true for now), but it will enable the next patch to cater to
drivers that want to differ in behavior for the primary image vs. any
secondary images that are opened at the same time as creating the
primary image.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20250915213919.3121401-5-eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-11 22:06:09 +01:00
Jean-Louis Dupond
524d5ba8c0 qcow2: put discards in discard queue when discard-no-unref is enabled
When discard-no-unref is enabled, discards are not queued like it
should.
This was broken since discard-no-unref was added.

Add a helper function qcow2_discard_cluster which handles some common
checks and calls the queue_discards function if needed to add the
discard request to the queue.

Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
Message-ID: <20250513132628.1055549-3-jean-louis@dupond.be>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-11 22:06:09 +01:00
Jean-Louis Dupond
31242df6ca qcow2: rename update_refcount_discard to queue_discard
The function just queues discards, and doesn't do any refcount change.
So let's change the function name to align with its function.

Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
Message-ID: <20250513132628.1055549-2-jean-louis@dupond.be>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-11 22:06:09 +01:00
Yeqi Fu
9730b9974d block: replace TABs with space
Bring the block files in line with the QEMU coding style, with spaces
for indentation. This patch partially resolves the issue 371.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/371
Signed-off-by: Yeqi Fu <fufuyqqqqqq@gmail.com>
Message-ID: <20230325085224.23842-1-fufuyqqqqqq@gmail.com>
[thuth: Rebased the patch to the current master branch]
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20251007163511.334178-1-thuth@redhat.com>
[kwolf: Fixed up vertical alignemnt]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-11 22:06:09 +01:00
Stefan Hajnoczi
684363fa3b block/io_uring: use non-vectored read/write when possible
The io_uring_prep_readv2/writev2() man pages recommend using the
non-vectored read/write operations when possible for performance
reasons.

I didn't measure a significant difference but it doesn't hurt to have
this optimization in place.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20251104022933.618123-16-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-11 22:06:09 +01:00
Stefan Hajnoczi
047dabef97 block/io_uring: use aio_add_sqe()
AioContext has its own io_uring instance for file descriptor monitoring.
The disk I/O io_uring code was developed separately. Originally I
thought the characteristics of file descriptor monitoring and disk I/O
were too different, requiring separate io_uring instances.

Now it has become clear to me that it's feasible to share a single
io_uring instance for file descriptor monitoring and disk I/O. We're not
using io_uring's IOPOLL feature or anything else that would require a
separate instance.

Unify block/io_uring.c and util/fdmon-io_uring.c using the new
aio_add_sqe() API that allows user-defined io_uring sqe submission. Now
block/io_uring.c just needs to submit readv/writev/fsync and most of the
io_uring-specific logic is handled by fdmon-io_uring.c.

There are two immediate advantages:
1. Fewer system calls. There is no need to monitor the disk I/O io_uring
   ring fd from the file descriptor monitoring io_uring instance. Disk
   I/O completions are now picked up directly. Also, sqes are
   accumulated in the sq ring until the end of the event loop iteration
   and there are fewer io_uring_enter(2) syscalls.
2. Less code duplication.

Note that error_setg() messages are not supposed to end with
punctuation, so I removed a '.' for the non-io_uring build error
message.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20251104022933.618123-15-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-11 22:06:09 +01:00
Kevin Wolf
5b4b3bfdfc qemu-img info: Optionally show block limits
Add a new --limits option to 'qemu-img info' that displays the block
limits for the image and all of its children, making the information
more accessible for human users than in QMP. This option is not enabled
by default because it can be a lot of output that isn't usually relevant
if you're not specifically trying to diagnose some I/O problem.

This makes the same information automatically also available in HMP
'info block -v'.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251024123041.51254-4-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-10-29 12:10:10 +01:00
Kevin Wolf
d2634e1828 block: Expose block limits for images in QMP
This information can be useful both for debugging and for management
tools trying to configure guest devices with the optimal limits
(possibly across multiple hosts). There is no reason not to make it
available, so just add it to BlockNodeInfo.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251024123041.51254-3-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-10-29 12:10:10 +01:00
Fiona Ebner
08736e7584 block: make bdrv_co_parent_cb_resize() a proper IO API function
In preparation for calling it via the bdrv_child_cb_resize() callback
that will be added by the next commit. Rename it to include the "_co_"
part while at it.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20250917115509.401015-3-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-10-29 12:10:09 +01:00
Chandan Somani
9f0c763e16 block: enable stats-intervals for storage devices
This patch allows stats-intervals to be used for storage
devices with the -device option. It accepts a list of interval
lengths in JSON format.

It configures and collects the stats in the BlockBackend layer
through the storage device that consumes the BlockBackend.

Signed-off-by: Chandan Somani <csomani@redhat.com>
Message-ID: <20251003220039.1336663-1-csomani@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-10-29 12:10:09 +01:00
Richard W.M. Jones
ad97769e9d block/curl.c: Fix CURLOPT_VERBOSE parameter type
In commit ed26056d90 ("block/curl.c: Use explicit long constants in
curl_easy_setopt calls") we missed a further call that takes a long
parameter.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Message-ID: <20251013124127.604401-1-rjones@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-10-29 12:10:09 +01:00
Bin Guo
dec83ac02b block/monitor: Use hmp_handle_error to report error
According to writing-monitor-commands.rst, best practice is to
use the 'hmp_handle_error' function, which ensures that the
message gets an 'Error: ' prefix.

Signed-off-by: Bin Guo <guobin@linux.alibaba.com>
Message-ID: <20250916054850.40963-1-guobin@linux.alibaba.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
[kwolf: Fixed up iotests reference output]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-10-29 12:10:09 +01:00
Daniel P. Berrangé
c86488abaf block: fix luks 'amend' when run in coroutine
Launch QEMU with

  $ qemu-img create \
      --object secret,id=sec0,data=123456 \
      -f luks -o key-secret=sec0 demo.luks 1g

  $ qemu-system-x86_64 \
      --object secret,id=sec0,data=123456 \
      -blockdev  driver=luks,key-secret=sec0,file.filename=demo.luks,file.driver=file,node-name=luks

Then in QMP shell attempt

  x-blockdev-amend job-id=fish node-name=luks options={'state':'active','new-secret':'sec0','driver':'luks'}

It will result in an assertion

  #0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
  #1  0x00007fad18b73f63 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:89
  #2  0x00007fad18b19f3e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
  #3  0x00007fad18b016d0 in __GI_abort () at abort.c:77
  #4  0x00007fad18b01639 in __assert_fail_base
      (fmt=<optimized out>, assertion=<optimized out>, file=<optimized out>, line=<optimized out>, function=<optimized out>) at assert.c:118
  #5  0x00007fad18b120af in __assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>, function=<optimized out>)
      at assert.c:127
  #6  0x000055ff74fdbd46 in bdrv_graph_rdlock_main_loop () at ../block/graph-lock.c:260
  #7  0x000055ff7548521b in graph_lockable_auto_lock_mainloop (x=<optimized out>)
      at /usr/src/debug/qemu-9.2.4-1.fc42.x86_64/include/block/graph-lock.h:266
  #8  block_crypto_read_func (block=<optimized out>, offset=4096, buf=0x55ffb6d66ef0 "", buflen=256000, opaque=0x55ffb5edcc30, errp=0x55ffb6f00700)
      at ../block/crypto.c:71
  #9  0x000055ff75439f8b in qcrypto_block_luks_load_key
      (block=block@entry=0x55ffb5edbe90, slot_idx=slot_idx@entry=0, password=password@entry=0x55ffb67dc260 "123456", masterkey=masterkey@entry=0x55ffb5fb0c40 "", readfunc=readfunc@entry=0x55ff754851e0 <block_crypto_read_func>, opaque=opaque@entry=0x55ffb5edcc30, errp=0x55ffb6f00700)
      at ../crypto/block-luks.c:927
  #10 0x000055ff7543b90f in qcrypto_block_luks_find_key
      (block=<optimized out>, password=<optimized out>, masterkey=<optimized out>, readfunc=<optimized out>, opaque=<optimized out>, errp=<optimized out>) at ../crypto/block-luks.c:1045
  #11 qcrypto_block_luks_amend_add_keyslot
      (block=0x55ffb5edbe90, readfunc=0x55ff754851e0 <block_crypto_read_func>, writefunc=0x55ff75485100 <block_crypto_write_func>, opaque=0x55ffb5edcc3, opts_luks=0x7fad1715aef8, force=<optimized out>, errp=0x55ffb6f00700) at ../crypto/block-luks.c:1673
  #12 qcrypto_block_luks_amend_options
      (block=0x55ffb5edbe90, readfunc=0x55ff754851e0 <block_crypto_read_func>, writefunc=0x55ff75485100 <block_crypto_write_func>, opaque=0x55ffb5edcc30, options=0x7fad1715aef0, force=<optimized out>, errp=0x55ffb6f00700) at ../crypto/block-luks.c:1865
  #13 0x000055ff75485b95 in block_crypto_amend_options_generic_luks
      (bs=<optimized out>, amend_options=<optimized out>, force=<optimized out>, errp=<optimized out>) at ../block/crypto.c:949
  #14 0x000055ff75485c28 in block_crypto_co_amend_luks (bs=<optimized out>, opts=<optimized out>, force=<optimized out>, errp=<optimized out>)
      at ../block/crypto.c:1008
  #15 0x000055ff754778e5 in blockdev_amend_run (job=0x55ffb6f00640, errp=0x55ffb6f00700) at ../block/amend.c:52
  #16 0x000055ff75468b90 in job_co_entry (opaque=0x55ffb6f00640) at ../job.c:1106
  #17 0x000055ff755a0fc2 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:175

This changes the read/write callbacks to not assert that they
are run in mainloop context if already in a coroutine.

This is also reproduced by qemu-iotests cases 295 and 296.

Fixes: 1f051dcbdf
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20250919112213.1530079-1-berrange@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-10-29 12:10:09 +01:00
Daniel P. Berrangé
6eda39a87f block: remove 'detached-header' option from opts after use
The code for creating LUKS devices references a 'detached-header'
option in the QemuOpts  data, but does not consume (remove) the
option.

Thus when the code later tries to convert the remaining unused
QemuOpts into a QCryptoBlockCreateOptions struct, an error is
reported by the QAPI code that 'detached-header' is not a valid
field.

This fixes a regression caused by

  commit e818c01ae6
  Author: Daniel P. Berrangé <berrange@redhat.com>
  Date:   Mon Feb 19 15:12:59 2024 +0000

    qapi: drop unused QCryptoBlockCreateOptionsLUKS.detached-header

which identified that the QAPI field was unused, but failed to
realize the QemuOpts -> QCryptoBlockCreateOptions conversion
was seeing the left-over 'detached-header' option which had not
been removed from QemuOpts.

This problem was identified by the 'luks-detached-header' I/O
test, but unfortunately I/O tests are not run regularly for the
LUKS format.

Fixes: e818c01ae6
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20250919103810.1513109-1-berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-10-29 12:10:09 +01:00