Commit Graph

2067 Commits

Author SHA1 Message Date
Peter Maydell
4577ac30d8 Merge tag 'bsd-user-2026q1-upstream-pull-request' of ssh://github.com/bsdimp/qemu into staging
git-publish --base upstream/master --pull --to qemu-devel@nongnu.org --no-check-url

bsd-user: Upstream for 11.0

This combines several batch streams that:
(1) Upstream the bsd-misc.c system calls:
    quoatctl, reboot, getdtablesize, uuidgen, semget, semop, semctl, msgctl
(2) common-user drop __linux__ ifdef
(3) Remove NetBSD and OpenBSD specific code for bsd-user (hasn't built in years)
(4) Fix inotify issues on FreeBSD 15
(5) Fix issues with gdb on aarch64

All of thse have been reviewed, and the only problems with the check patch line
length and about added files.

# -----BEGIN PGP SIGNATURE-----
# Comment: GPGTools - https://gpgtools.org
#
# iQIzBAABCgAdFiEEIDX4lLAKo898zeG3bBzRKH2wEQAFAmmlD6EACgkQbBzRKH2w
# EQANZw//edwiQF/H+07EBKdZNF/QJsBwsH5OwHh/rgyq6OPUHWtu00gxNDFd/e/D
# O+FisLvDbNa9v2es1RX0lDzdgXRwi2LRIc4tMW3ifEjK7Jj8np09tfWkghwc2u9Z
# RShNxlCHfg/lTFkkm5wbHEpl1W1sImcLhYSLdoXAdUhK8lQOoUiFYOtg9s6xq6LH
# 3NHH4roY+HQE2zpK6gY45BsD1Fi3qdg5VNwTHkvcducdC5jjXnJ1UikL48zM72An
# LK8EqQfGx06RVkPgPyxTeUjniJj9SyixZjBD8YzqlmhSCt3RD4e0V+5/wd8YlPpI
# dBaYqzLSfft+vtJEqUyds/SilMHqf2brvJ9e2chwIqBlghxPb9GpPjHASDqk1/t8
# +ckFaOtdtamw0H8JFp1ixzFn7WLvUp3jpQJbSzZxmKwC0hZCxl/aXFKcq+gDg3k5
# 1wt/su+1zfb1Qjp8M8tKHLWy2/aXT/yY7IeWAk2hpOel3e4L9pDU6bsgQMz4kOE8
# WO6GHDu2YA688EArVL8ErTkKw04+mGdTMmjqrF00O/MWnW8LNKNTHIHaxWtCfXVv
# mHSUyHt94CoDtScwCdLmyZslHiO0XgUFhnK+EPd+sHyaAPu2uH6ezfFMRF8F1vs8
# WXsOnZArDg+r02PnltEjbIEOJ8t+tYTZqZ/3IKn2Gecixqhqdmc=
# =yPBa
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon Mar  2 04:18:41 2026 GMT
# gpg:                using RSA key 2035F894B00AA3CF7CCDE1B76C1CD1287DB01100
# gpg: Good signature from "Warner Losh <wlosh@netflix.com>" [unknown]
# gpg:                 aka "Warner Losh <imp@bsdimp.com>" [unknown]
# gpg:                 aka "Warner Losh <imp@freebsd.org>" [unknown]
# gpg:                 aka "Warner Losh <imp@village.org>" [unknown]
# gpg:                 aka "Warner Losh <wlosh@bsdimp.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2035 F894 B00A A3CF 7CCD  E1B7 6C1C D128 7DB0 1100

* tag 'bsd-user-2026q1-upstream-pull-request' of ssh://github.com/bsdimp/qemu: (27 commits)
  bsd-user: update aarch64-bsd-user.mak gdb XML list
  bsd-user: Add miscellaneous BSD syscall implementations
  bsd-user: Add System V message queue syscalls
  bsd-user: Implement System V semaphore calls
  bsd-user: Add bsd-misc.c to build
  bsd-user: Add message queue implementations
  bsd-user: Add do_bsd_msgctl implementation
  bsd-user: Add do_bsd___semctl implementation
  bsd-user: Add do_bsd_semop implementation
  bsd-user: Add do_bsd_semget implementation
  bsd-user: Add do_bsd_uuidgen implementation
  bsd-user: Add do_bsd_quotactl, do_bsd_reboot and do_bsd_getdtablesize
  bsd-user: Add semaphore operation constants and structures
  bsd-user: Add host_to_target_msqid_ds for msgctl(2)
  bsd-user: Add target_to_host_msqid_ds for msgctl(2)
  bsd-user: Add host_to_target_semid_ds for semctl(2)
  bsd-user: Add target_to_host_semid_ds for semctl(2)
  bsd-user: Add host_to_target_semarray for semaphore operations
  bsd-user: Add target_to_host_semarray for semaphore operations
  bsd-user: Add host_to_target_uuid for uuidgen(2)
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2026-03-02 14:01:31 +00:00
Warner Losh
bba2f724f1 freebsd: FreeBSD 15 has native inotify
Check to make sure that we have inotify in libc, before looking for it
in libinotify.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: Daniel P. Berrange <berrange@redhat.com>
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Warner Losh <imp@bsdimp.com>
2026-03-01 20:49:39 -07:00
Akihiko Odaki
649a78aa32 Reapply "rcu: Unify force quiescent state"
This reverts commit ddb4d9d174.

The commit says:
> This reverts commit 55d98e3ede.
>
> The commit introduced a regression in the replay functional test
> on alpha (tests/functional/alpha/test_replay.py), that causes CI
> failures regularly. Thus revert this change until someone has
> figured out what is going wrong here.

Reapply the change as alpha is fixed.

Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Link: https://lore.kernel.org/r/20260217-alpha-v1-2-0dcc708c9db3@rsg.ci.i.u-tokyo.ac.jp
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-02-27 14:48:04 +01:00
Peter Maydell
2f9c4ba196 Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging
Pull request

Jens Axboe's fixes for fdmon-io_uring.

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCgAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmmcYhEACgkQnKSrs4Gr
# c8i4BAgAgI7UlEbQs0iLV0N4qUJ7pc8p/bgRNP9/TkjFYtFIcmRZtNeRvlkIVdgD
# F2QAMJAP0KzP3T2mo3a9f5kcvKIqLFsOpwKmMiUa98aA4dDhkT6TNOqLjgKmaXmR
# muEv7TKjBPzZqNBWDyEMIcnIzxSGjnbATN1BdvEy+7awJJDqaYZEkxTT6u+Vv/2l
# MlaYwnNbfye7nFM/xCBt/wH07XLZMC3pezHQxJGq7CGGTG0JFFeXIhm0HrTjlAt/
# EBVf41zsAtYqBj9egl5Y570NXkTirBbY+Niv/rqtavknC/tAxN+PQTBoobHm2Wna
# yU7+L+lidtIcdwhbzeZar0KdoDhsEQ==
# =zdFX
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon Feb 23 14:20:01 2026 GMT
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* tag 'block-pull-request' of https://gitlab.com/stefanha/qemu:
  fdmon-io_uring: check CQ ring directly in gsource_check
  aio-posix: notify main loop when SQEs are queued

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2026-02-24 11:32:35 +00:00
Jens Axboe
961fcc0f22 fdmon-io_uring: check CQ ring directly in gsource_check
gsource_check() only looks at the ppoll revents for the io_uring fd,
but CQEs can be posted during gsource_prepare()'s io_uring_submit()
call via kernel task_work processing on syscall exit. These completions
are already sitting in the CQ ring but the ring fd may not be signaled
yet, causing gsource_check() to return false.

Add a fallback io_uring_cq_ready() check so completions that arrive
during submission are dispatched immediately rather than waiting for
the next ppoll() cycle.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Message-ID: <20260213143225.161043-3-axboe@kernel.dk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2026-02-23 08:50:04 -05:00
Jens Axboe
2ae361ef1d aio-posix: notify main loop when SQEs are queued
When a vCPU thread handles MMIO (holding BQL), aio_co_enter() runs the
block I/O coroutine inline on the vCPU thread because
qemu_get_current_aio_context() returns the main AioContext when BQL is
held. The coroutine calls luring_co_submit() which queues an SQE via
fdmon_io_uring_add_sqe(), but the actual io_uring_submit() only happens
in gsource_prepare() on the main loop thread.

Since the coroutine ran inline (not via aio_co_schedule()), no BH is
scheduled and aio_notify() is never called. The main loop remains asleep
in ppoll() with up to a 499ms timeout, leaving the SQE unsubmitted until
the next timer fires.

Fix this by calling aio_notify() after queuing the SQE. This wakes the
main loop via the eventfd so it can run gsource_prepare() and submit the
pending SQE promptly.

This is a generic fix that benefits all devices using aio=io_uring.
Without it, AHCI/SATA devices see MUCH worse I/O latency since they use
MMIO (not ioeventfd like virtio) and have no other mechanism to wake the
main loop after queuing block I/O.

This is usually a bit hard to detect, as it also relies on the ppoll
loop not waking up for other activity, and micro benchmarks tend not to
see it because they don't have any real processing time. With a
synthetic test case that has a few usleep() to simulate processing of
read data, it's very noticeable. The below example reads 128MB with
O_DIRECT in 128KB chunks in batches of 16, and has a 1ms delay before
each batch submit, and a 1ms delay after processing each completion.
Running it on /dev/sda yields:

time sudo ./iotest /dev/sda

________________________________________________________
Executed in   25.76 secs	  fish           external
   usr time    6.19 millis  783.00 micros    5.41 millis
   sys time   12.43 millis  642.00 micros   11.79 millis

while on a virtio-blk or NVMe device we get:

time sudo ./iotest /dev/vdb

________________________________________________________
Executed in    1.25 secs      fish           external
   usr time    1.40 millis    0.30 millis    1.10 millis
   sys time   17.61 millis    1.43 millis   16.18 millis

time sudo ./iotest /dev/nvme0n1

________________________________________________________
Executed in    1.26 secs      fish           external
   usr time    6.11 millis    0.52 millis    5.59 millis
   sys time   13.94 millis    1.50 millis   12.43 millis

where the latter are consistent. If we run the same test but keep the
socket for the ssh connection active by having activity there, then
the sda test looks as follows:

time sudo ./iotest /dev/sda

________________________________________________________
Executed in    1.23 secs      fish           external
   usr time    2.70 millis   39.00 micros    2.66 millis
   sys time    4.97 millis  977.00 micros    3.99 millis

as now the ppoll loop is woken all the time anyway.

After this fix, on an idle system:

time sudo ./iotest /dev/sda

________________________________________________________
Executed in    1.30 secs      fish           external
   usr time    2.14 millis    0.14 millis    2.00 millis
   sys time   16.93 millis    1.16 millis   15.76 millis

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Message-Id: <07d701b9-3039-4f9b-99a2-abeae51146a5@kernel.dk>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
[Generalize the comment since this applies to all vCPU thread activity,
not just coroutines, as suggested by Kevin Wolf <kwolf@redhat.com>.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2026-02-23 08:49:51 -05:00
Marc-André Lureau
ba63a9643a util: add some extra stubs for qemu modules initialization
Avoid extra ifdef-ery when optionally supporting modules, as done in
audio-test (and vl.c).

Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2026-02-23 14:28:57 +01:00
Vladimir Sementsov-Ogievskiy
2eed5472ec error-report: make real_time_iso8601() public
To be reused in the following commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <20260201173633.413934-3-vsementsov@yandex-team.ru>
2026-02-13 10:00:02 +01:00
Thomas Huth
ddb4d9d174 Revert "rcu: Unify force quiescent state"
This reverts commit 55d98e3ede.

The commit introduced a regression in the replay functional test
on alpha (tests/functional/alpha/test_replay.py), that causes CI
failures regularly. Thus revert this change until someone has
figured out what is going wrong here.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/3197
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20260209120336.41454-1-thuth@redhat.com>
2026-02-12 08:43:13 +01:00
Vladimir Sementsov-Ogievskiy
7404d6852d tests/unit: add unit test for qemu_hexdump()
Test that the fix in commit 20aa05edc2 ("util/hexdump: fix
QEMU_HEXDUMP_LINE_WIDTH logic") make sense.

To not break compilation when we build without 'block', move
hexdump.c out of "if have_block" in meson.build.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20260202112826.38018-1-philmd@linaro.org>
2026-02-02 12:34:14 +01:00
Richard Henderson
9c4c090d27 Merge tag 'pull-target-arm-20260123' of https://gitlab.com/pm215/qemu into staging
target-arm queue:
 * hw/arm/imx8mp-evk: Provide some defaults matching real hardware
 * hw/intc: endianness fixes
 * various: Clean up includes
 * kernel-doc.py: sync with upstream Kernel v6.19-rc4
 * scripts/clean-includes: Minor improvements; exclude list update
 * docs/system/arm/imx8mp-evk: Avoid suggesting redundant CLI parameters
 * docs/system/arm/xlnx-zynq.rst: Improve docs rendering
 * docs: Be consistent about capitalization of 'Arm' (again)
 * docs: Avoid unintended mailto: hyperlinks
 * qemu-options.hx: Drop uses of @var
 * qemu-options.hx: Improve formatting in colo-compare docs

# -----BEGIN PGP SIGNATURE-----
#
# iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmlzju4ZHHBldGVyLm1h
# eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3uU/D/9IHpo57UIHAF7vU9gsWm5k
# TxLl9PBw3ev2Uv6zWWza0wYZQF2ZcvqwMiU/AlBFuyJFyXTLocL3iN6Rsw+8kcjh
# jaq2hCtzSNJWj41CEU22l7iUfJ5PdOVdRYhhwlrQqxXDJj8Oj3plliRc6AL1EZYD
# mxAJ+YQ8pfJ/2ibO66sqwGMLjPsjCmmgfloTm/qFzk7QccQkPZKzDrC9CGGRmmRA
# tcdBGMtu+DOqpCRKIRul0S8ed2qaTecIK3+fUID0+qEzb10VWgFs/AAQiwLPkwyi
# RvMmIbC9lYVCnP+YC4HlvYMfd61V3lpzsUXgMIbdRZYsN/IlTVfetJUOVmn3LTQ/
# gGU0b+t6D/OZAt1L6toBngKVh89VPqbpGXEx4UMHCNIcvfI1Xo+HRT9ZV5WCw6b8
# sVKOZUwKs9ZbFAcrgBgskXp/5KWZAb92IFjwbfwxxl/2NRK3B3y7CDHBoOM/zQ9a
# rZ7rfJHhQVGR2+1QonNbpG0IFwbgs0zPQwBjPreGh6TWf2UiXvx1ku94Wxe2lA+5
# CPeju+swbFKRNjwSas6NZjJWazacohYG3nhmhF7HtcgX279BzIV0d+ZIl786Juls
# 4Vt4dPUxU/kHHZHjE52AZUS/opIy+UHAj0FKPAPpTrc7UfuHlY3gqoI7UfVpciau
# q3DqM7PlF2X91kw4xJ6JCA==
# =bE6w
# -----END PGP SIGNATURE-----
# gpg: Signature made Sat 24 Jan 2026 02:08:30 AM AEDT
# gpg:                using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg:                issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [unknown]
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>" [unknown]
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [unknown]
# gpg:                 aka "Peter Maydell <peter@archaic.org.uk>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* tag 'pull-target-arm-20260123' of https://gitlab.com/pm215/qemu: (22 commits)
  qemu-options.hx: Improve formatting in colo-compare docs
  qemu-options.hx: Drop uses of @var
  docs: avoid unintended mailto: hyperlinks
  docs/system/arm/xlnx-zynq.rst: Improve docs rendering
  hw/intc: avoid byte swap fiddling in gicv3 its path
  hw/intc: declare GICv3 regions as little endian
  hw/intc: declare GIC regions as little endian
  hw/intc: declare NVIC regions as little endian
  all: Clean up includes
  misc: Clean up includes
  bsd-user: Clean up includes
  mshv: Clean up includes
  scripts/clean-includes: Update exclude list
  scripts/clean-includes: Give the args in git commit messages
  scripts/clean-includes: Do all our exclusions with REGEXFILE
  scripts/clean-includes: Make ignore-regexes one per line
  scripts/clean-includes: Remove outdated comment
  scripts/clean-includes: Allow directories on command line
  docs: Be consistent about capitalization of 'Arm' (again)
  kernel-doc.py: sync with upstream Kernel v6.19-rc4
  ...

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2026-01-24 07:59:34 +11:00
Peter Maydell
dc249aaf57 misc: Clean up includes
This commit deals with various .c files that included system
headers that are already pulled in by osdep.h, where the .c
file includes osdep.h already itself.

This commit was created with scripts/clean-includes:
 ./scripts/clean-includes '--git' 'misc' 'hw/core' 'semihosting' 'target/arm' 'target/i386/kvm/kvm.c' 'target/loongarch' 'target/riscv' 'tools' 'util'

All .c should include qemu/osdep.h first.  The script performs three
related cleanups:

* Ensure .c files include qemu/osdep.h first.
* Including it in a .h is redundant, since the .c  already includes
  it.  Drop such inclusions.
* Likewise, including headers qemu/osdep.h includes is redundant.
  Drop these, too.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20260116125830.926296-4-peter.maydell@linaro.org
2026-01-22 11:23:31 +00:00
Philippe Mathieu-Daudé
e44dc42f3a bswap: Use 'qemu/bswap.h' instead of 'qemu/host-utils.h'
These files only require "qemu/bswap.h", not "qemu/host-utils.h".

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20260109163730.57087-2-philmd@linaro.org>
2026-01-22 10:48:45 +01:00
Philippe Mathieu-Daudé
3115691855 bswap: Include missing 'qemu/bswap.h' header
All these files indirectly include the "qemu/bswap.h" header.
Make this inclusion explicit to avoid build errors when
refactoring unrelated headers.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20260109164742.58041-4-philmd@linaro.org>
2026-01-22 10:48:45 +01:00
Michael Tokarev
a2b429b114 Revert "gdbstub: Try unlinking the unix socket before binding"
This reverts commit fccb744f41.

This commit introduced dependency of linux-user on qemu-sockets.c.
The latter includes handling of various socket types, while gdbstub
only needs unix sockets.  Including different kinds of sockets
makes it more problematic to build linux-user statically.

The original issue - the need to unlink unix socket before binding -
will be addressed in the next change.

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2026-01-20 14:21:06 +03:00
Richard Henderson
239b9d0488 include/qemu/atomic: Drop aligned_{u}int64_t
As we no longer support i386 as a host architecture,
this abstraction is no longer required.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2026-01-17 10:46:51 +11:00
Richard Henderson
71adccb6f7 include/qemu/atomic: Drop qatomic_{read,set}_[iu]64
Replace all uses with the normal qatomic_{read,set}.

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2026-01-17 10:46:51 +11:00
Richard Henderson
90e2e8ada7 util: Remove stats64
This API is no longer used.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2026-01-17 10:46:42 +11:00
Richard Henderson
25512d6865 *: Remove __i386__ tests
Remove instances of __i386__, except from tests and imported headers.

Drop a block containing sanity check and fprintf error message for
i386-on-i386 or x86_64-on-x86_64 emulation.  If we really want
something like this, we would do it via some form of compile-time check.

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2026-01-17 10:45:39 +11:00
Farhan Ali
0ffc8f3625 util/vfio-helper: Fix endianness in PCI config read/write functions
The VFIO pread/pwrite functions use little-endian data format. Currently, the
qemu_vfio_pci_read_config() and qemu_vfio_pci_write_config() don't correctly
convert from CPU native endian format to little-endian (and vice versa) when
using the pread/pwrite functions. Fix this by limiting read/write to 32 bits
and handling endian conversion in qemu_vfio_pci_read_config() and
qemu_vfio_pci_write_config().

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20260105222029.2423-1-alifm@linux.ibm.com
[ clg: Fixed typo in subject ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2026-01-13 08:29:59 +01:00
Ilya Leoshkevich
f098c32db4 target/s390x: Fix infinite loop during replay
Replaying even trivial s390x kernels hangs, because:

- cpu_post_load() fires the TOD timer immediately.

- s390_tod_load() schedules work for firing the TOD timer.

- If rr loop sees work and then timer, we get one timer expiration.

- If rr loop sees timer and then work, we get two timer expirations.

- Record and replay may diverge due to this race.

- In this particular case divergence makes replay loop spin: it sees that
  TOD timer has expired, but cannot invoke its callback, because there
  is no recorded CHECKPOINT_CLOCK_VIRTUAL.

- The order in which rr loop sees work and timer depends on whether
  and when rr loop wakes up during load_snapshot().

- rr loop may wake up after the main thread kicks the CPU and drops
  the BQL, which may happen if it calls, e.g., qemu_cond_wait_bql().

Firing TOD timer twice is duplicate work, but it was introduced
intentionally in commit 7c12f710ba ("s390x/tcg: rearm the CKC timer
during migration") in order to avoid dependency on migration order.

The key culprits here are timers that are armed ready expired. They
break the ordering between timers and CPU work, because they are not
constrained by instruction execution, thus introducing non-determinism
and record-replay divergence.

Fix by converting such timer callbacks to CPU work. Also add TOD clock
updates to the save path, mirroring the load path, in order to have the
same CHECKPOINT_CLOCK_VIRTUAL during recording and replaying.

Link: https://lore.kernel.org/qemu-devel/20251128133949.181828-1-thuth@redhat.com/
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20251201215514.1751994-1-iii@linux.ibm.com>
[thuth: Add SPDX license identifiers to the new stubs files]
Signed-off-by: Thomas Huth <thuth@redhat.com>
2026-01-12 13:56:28 +01:00
Markus Armbruster
d84870f2b8 error: Use error_setg_file_open() for simplicity and consistency
Replace

    error_setg_errno(errp, errno, MSG, FNAME);

by

    error_setg_file_open(errp, errno, FNAME);

where MSG is "Could not open '%s'" or similar.

Also replace equivalent uses of error_setg().

A few messages lose prefixes ("net dump: ", "SEV: ", __func__ ": ").
We could put them back with error_prepend().  Not worth the bother.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dave@treblig.org>
Message-ID: <20251121121438.1249498-11-armbru@redhat.com>
[Conflict with commit 26b4a6ffe7 (monitor/hmp: Merge
hmp-cmds-target.c within hmp-cmds.c) resolved]
2026-01-07 13:24:41 +01:00
Nguyen Dinh Phi
0c9f429ec8 util: Move qemu_ftruncate64 from block/file-win32.c to oslib-win32.c
qemu_ftruncate64() is a general-purpose utility function that may be
used outside of the block layer. Move it to util/oslib-win32.c where
other Windows-specific utility functions reside.

Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20251218085446.462827-3-phind.uet@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-12-30 20:38:41 +01:00
Paolo Bonzini
12e50722e4 block: rename block/aio-wait.h to qemu/aio-wait.h
AIO_WAIT_WHILE is used even outside the block layer; move the header file
out of block/ just like the implementation is in util/.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-12-27 10:11:12 +01:00
Paolo Bonzini
ba773aded3 block: rename block/aio.h to qemu/aio.h
AioContexts are used as a generic event loop even outside the block
layer; move the header file out of block/ just like the implementation
is in util/.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-12-27 10:11:12 +01:00
Paolo Bonzini
238449947d block: reduce files included by block/aio.h
Avoid including all of qdev everywhere (the hw/core/qdev.h header in fact
brings in a lot more headers too), instead declare a couple structs for
which only a pointer type is needed.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-12-27 10:11:12 +01:00
Paolo Bonzini
ddab0ef124 block: extract include/qemu/aiocb.h out of include/block/aio.h
Create a new header corresponding to functions defined in
util/aiocb.c, and include it whenever AIOCBs are used but
AioContext is not.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-12-27 10:11:12 +01:00
Marc Morcos
e77508292c thread-pool: Fix thread race
Fix a data race occurred between `worker_thread()` writing and
`thread_pool_completion_bh()` reading shared data in `util/thread-pool.c`.

Signed-off-by: Marc Morcos <marcmorcos@google.com>
Link: https://lore.kernel.org/r/20251213001443.2041258-3-marcmorcos@google.com
[Use qatomic_set for writes to ret->ret. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-12-27 10:11:12 +01:00
Paolo Bonzini
7f548b8f23 include: reorganize memory API headers
Move RAMBlock functions out of ram_addr.h and cpu-common.h;
move memory API headers out of include/exec and into include/system.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-12-27 10:11:09 +01:00
Cédric Le Goater
326e620fc0 Fix const qualifier build errors with recent glibc
A recent change in glibc 2.42.9000 [1] changes the return type of
strstr() and other string functions to be 'const char *' when the
input is a 'const char *'.

This breaks the build in various files with errors such as :

  error: initialization discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers]
    208 |         char *pidstr = strstr(filename, "%");
        |                        ^~~~~~

Fix this by changing the type of the variables that store the result
of these functions to 'const char *'.

[1] https://sourceware.org/git/?p=glibc.git;a=commit;h=cd748a63ab1a7ae846175c532a3daab341c62690

Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20251209174328.698774-1-clg@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-12-09 21:00:15 +01:00
Stefan Hajnoczi
047dabef97 block/io_uring: use aio_add_sqe()
AioContext has its own io_uring instance for file descriptor monitoring.
The disk I/O io_uring code was developed separately. Originally I
thought the characteristics of file descriptor monitoring and disk I/O
were too different, requiring separate io_uring instances.

Now it has become clear to me that it's feasible to share a single
io_uring instance for file descriptor monitoring and disk I/O. We're not
using io_uring's IOPOLL feature or anything else that would require a
separate instance.

Unify block/io_uring.c and util/fdmon-io_uring.c using the new
aio_add_sqe() API that allows user-defined io_uring sqe submission. Now
block/io_uring.c just needs to submit readv/writev/fsync and most of the
io_uring-specific logic is handled by fdmon-io_uring.c.

There are two immediate advantages:
1. Fewer system calls. There is no need to monitor the disk I/O io_uring
   ring fd from the file descriptor monitoring io_uring instance. Disk
   I/O completions are now picked up directly. Also, sqes are
   accumulated in the sq ring until the end of the event loop iteration
   and there are fewer io_uring_enter(2) syscalls.
2. Less code duplication.

Note that error_setg() messages are not supposed to end with
punctuation, so I removed a '.' for the non-io_uring build error
message.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20251104022933.618123-15-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-11 22:06:09 +01:00
Stefan Hajnoczi
1eebdab3c3 aio-posix: add aio_add_sqe() API for user-defined io_uring requests
Introduce the aio_add_sqe() API for submitting io_uring requests in the
current AioContext. This allows other components in QEMU, like the block
layer, to take advantage of io_uring features without creating their own
io_uring context.

This API supports nested event loops just like file descriptor
monitoring and BHs do. This comes at a complexity cost: CQE callbacks
must be placed on a list so that nested event loops can invoke pending
CQE callbacks from parent event loops. If you're wondering why
CqeHandler exists instead of just a callback function pointer, this is
why.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20251104022933.618123-14-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-11 22:06:09 +01:00
Stefan Hajnoczi
87e7a0f423 aio-posix: add fdmon_ops->dispatch()
The ppoll and epoll file descriptor monitoring implementations rely on
the event loop's generic file descriptor, timer, and BH dispatch code to
invoke user callbacks.

The io_uring file descriptor monitoring implementation will need
io_uring-specific dispatch logic for CQE handlers for custom SQEs.

Introduce a new FDMonOps ->dispatch() callback that allows file
descriptor monitoring implementations to invoke user callbacks. The next
patch will use this new callback.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20251104022933.618123-13-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-11 22:06:09 +01:00
Stefan Hajnoczi
a63e41f2a4 aio-posix: unindent fdmon_io_uring_destroy()
Reduce the level of indentation to make further code changes easier to
read.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20251104022933.618123-12-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-11 22:06:09 +01:00
Stefan Hajnoczi
59202c98c0 aio-posix: gracefully handle io_uring_queue_init() failure
io_uring may not be available at runtime due to system policies (e.g.
the io_uring_disabled sysctl) or creation could fail due to file
descriptor resource limits.

Handle failure scenarios as follows:

If another AioContext already has io_uring, then fail AioContext
creation so that the aio_add_sqe() API is available uniformly from all
QEMU threads. Otherwise fall back to epoll(7) if io_uring is
unavailable.

Notes:
- Update the comment about selecting the fastest fdmon implementation.
  At this point it's not about speed anymore, it's about aio_add_sqe()
  API availability.
- Uppercase the error message when converting from error_report() to
  error_setg_errno() for consistency (but there are instances of
  lowercase in the codebase).
- It's easier to move the #ifdefs from aio-posix.h to aio-posix.c.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20251104022933.618123-11-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-11 22:06:09 +01:00
Stefan Hajnoczi
421dcc8023 aio: add errp argument to aio_context_setup()
When aio_context_new() -> aio_context_setup() fails at startup it
doesn't really matter whether errors are returned to the caller or the
process terminates immediately.

However, it is not acceptable to terminate when hotplugging --object
iothread at runtime. Refactor aio_context_setup() so that errors can be
propagated. The next commit will set errp when fdmon_io_uring_setup()
fails.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20251104022933.618123-10-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-11 22:06:09 +01:00
Stefan Hajnoczi
3769b9abe9 aio: free AioContext when aio_context_new() fails
g_source_destroy() only removes the GSource from the GMainContext it's
attached to, if any. It does not free it.

Use g_source_unref() instead so that the AioContext (which embeds a
GSource) is freed. There is no need to call g_source_destroy() in
aio_context_new() because the GSource isn't attached to a GMainContext
yet.

aio_ctx_finalize() expects everything to be set up already, so introduce
the new ctx->initialized boolean and do nothing when called with
!initialized. This also requires moving aio_context_setup() down after
event_notifier_init() since aio_ctx_finalize() won't release any
resources that aio_context_setup() acquired.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20251104022933.618123-9-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-11 22:06:09 +01:00
Stefan Hajnoczi
d1f42b600a aio: remove aio_context_use_g_source()
There is no need for aio_context_use_g_source() now that epoll(7) and
io_uring(7) file descriptor monitoring works with the glib event loop.
AioContext doesn't need to be notified that GSource is being used.

On hosts with io_uring support this now enables fdmon-io_uring.c by
default, replacing fdmon-poll.c and fdmon-epoll.c. In other words, the
event loop will use io_uring!

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20251104022933.618123-8-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-11 22:06:09 +01:00
Stefan Hajnoczi
ded29e64c6 aio-posix: integrate fdmon into glib event loop
AioContext's glib integration only supports ppoll(2) file descriptor
monitoring. epoll(7) and io_uring(7) disable themselves and switch back
to ppoll(2) when the glib event loop is used. The main loop thread
cannot use epoll(7) or io_uring(7) because it always uses the glib event
loop.

Future QEMU features may require io_uring(7). One example is uring_cmd
support in FUSE exports. Each feature could create its own io_uring(7)
context and integrate it into the event loop, but this is inefficient
due to extra syscalls. It would be more efficient to reuse the
AioContext's existing fdmon-io_uring.c io_uring(7) context because
fdmon-io_uring.c will already be active on systems where Linux io_uring
is available.

In order to keep fdmon-io_uring.c's AioContext operational even when the
glib event loop is used, extend FDMonOps with an API similar to
GSourceFuncs so that file descriptor monitoring can integrate into the
glib event loop.

A quick summary of the GSourceFuncs API:
- prepare() is called each event loop iteration before waiting for file
  descriptors and timers.
- check() is called to determine whether events are ready to be
  dispatched after waiting.
- dispatch() is called to process events.

More details here: https://docs.gtk.org/glib/struct.SourceFuncs.html

Move the ppoll(2)-specific code from aio-posix.c into fdmon-poll.c and
also implement epoll(7)- and io_uring(7)-specific file descriptor
monitoring code for glib event loops.

Note that it's still faster to use aio_poll() rather than the glib event
loop since glib waits for file descriptor activity with ppoll(2) and
does not support adaptive polling. But at least epoll(7) and io_uring(7)
now work in glib event loops.

Splitting this into multiple commits without temporarily breaking
AioContext proved difficult so this commit makes all the changes. The
next commit will remove the aio_context_use_g_source() API because it is
no longer needed.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20251104022933.618123-7-stefanha@redhat.com>
[kwolf: Build fixes; fix AioContext.list_lock use after destroy]
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-11 22:04:53 +01:00
Stefan Hajnoczi
511c62a2c6 aio-posix: keep polling enabled with fdmon-io_uring.c
Commit 816a430c51 ("util/aio: Defer disabling poll mode as long as
possible") kept polling enabled when the event loop timeout is 0. Since
there is no timeout the event loop will continue immediately and the
overhead of disabling and re-enabling polling can be avoided.

fdmon-io_uring.c is unable to take advantage of this optimization
because its ->need_wait() function returns true whenever there are new
io_uring SQEs to submit:

  if (timeout || ctx->fdmon_ops->need_wait(ctx)) {
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Polling will be disabled even when timeout == 0.

Extend the optimization to handle the case when need_wait() returns true
and timeout == 0.

Cc: Chao Gao <chao.gao@intel.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20251104022933.618123-5-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-11 17:32:48 +01:00
Stefan Hajnoczi
5f8741fca5 aio-posix: fix spurious return from ->wait() due to signals
io_uring_enter(2) only returns -EINTR in some cases when interrupted by
a signal. Therefore the while loop in fdmon_io_uring_wait() is
incomplete and can lead to a spurious early return.

Handle the case when a signal interrupts io_uring_enter(2) but the
syscall returns the number of SQEs submitted (that takes priority over
-EINTR).

This patch probably makes little difference for QEMU, but the test suite
relies on the exact pattern of aio_poll() return values, so it's best to
hide this io_uring syscall interface quirk.

Here is the strace of test-aio receiving 3 SIGCONT signals after this
fix has been applied. Notice how the io_uring_enter(2) return value is 1
the first time because an SQE was submitted, but -EINTR the other times:

  eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK) = 9
  io_uring_enter(7, 1, 0, 0, NULL, 8) = 1
  clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe38a46240) = 0
  io_uring_enter(7, 1, 1, IORING_ENTER_GETEVENTS, NULL, 8) = 1
  --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=596096, si_uid=1000} ---
  io_uring_enter(7, 0, 1, IORING_ENTER_GETEVENTS, NULL, 8) = -1 EINTR (Interrupted system call)
  --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=596096, si_uid=1000} ---
  io_uring_enter(7, 0, 1, IORING_ENTER_GETEVENTS, NULL, 8 <unfinished ...>
  <... io_uring_enter resumed>) = -1 EINTR (Interrupted system call)
  --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=596096, si_uid=1000} ---
  io_uring_enter(7, 0, 1, IORING_ENTER_GETEVENTS, NULL, 8 <unfinished ...>
  <... io_uring_enter resumed>) = 0

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20251104022933.618123-4-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-11 17:32:48 +01:00
Stefan Hajnoczi
c31a445749 aio-posix: fix fdmon-io_uring.c timeout stack variable lifetime
io_uring_prep_timeout() stashes a pointer to the timespec struct rather
than copying its fields. That means the struct must live until after the
SQE has been submitted by io_uring_enter(2). add_timeout_sqe() violates
this constraint because the SQE is not submitted within the function.

Inline add_timeout_sqe() into fdmon_io_uring_wait() so that the struct
lives at least as long as io_uring_enter(2).

This fixes random hangs (bogus timeout values) when the kernel loads
undefined timespec struct values from userspace after the original
struct on the stack has been destroyed.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20251104022933.618123-3-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-11 17:32:48 +01:00
Stefan Hajnoczi
dbf70f0a03 aio-posix: fix race between io_uring CQE and AioHandler deletion
When an AioHandler is enqueued on ctx->submit_list for removal, the
fill_sq_ring() function will submit an io_uring POLL_REMOVE operation to
cancel the in-flight POLL_ADD operation.

There is a race when another thread enqueues an AioHandler for deletion
on ctx->submit_list when the POLL_ADD CQE has already appeared. In that
case POLL_REMOVE is unnecessary. The code already handled this, but
forgot that the AioHandler itself is still on ctx->submit_list when the
POLL_ADD CQE is being processed. It's unsafe to delete the AioHandler at
that point in time (use-after-free).

Solve this problem by keeping the AioHandler alive but setting a flag so
that it will be deleted by fill_sq_ring() when it runs.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20251104022933.618123-2-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-11 17:32:48 +01:00
Vladimir Sementsov-Ogievskiy
20aa05edc2 util/hexdump: fix QEMU_HEXDUMP_LINE_WIDTH logic
QEMU_HEXDUMP_LINE_WIDTH calculation doesn't correspond to
qemu_hexdump_line(). This leads to last line of the dump (when
length is not multiply of 16) has badly aligned ASCII part.

Let's calculate length the same way.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20251031190246.257153-2-vsementsov@yandex-team.ru>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-11-03 11:59:32 +01:00
Alex Bennée
91634cc331 timers: properly prefix init_clocks()
Otherwise we run the risk of name clashing, for example with
stm32l4x5_usart-test.c should we shuffle the includes.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20251030173302.1379174-1-alex.bennee@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-11-03 11:59:32 +01:00
Peter Xu
c89d1c879a bql: Fix bql_locked status with condvar APIs
QEMU has a per-thread "bql_locked" variable stored in TLS section, showing
whether the current thread is holding the BQL lock.

It's a pretty handy variable.  Function-wise, QEMU have codes trying to
conditionally take bql, relying on the var reflecting the locking status
(e.g. BQL_LOCK_GUARD), or in a GDB debugging session, we could also look at
the variable (in reality, co_tls_bql_locked), to see which thread is
currently holding the bql.

When using that as a debugging facility, sometimes we can observe multiple
threads holding bql at the same time. It's because QEMU's condvar APIs
bypassed the bql_*() API, hence they do not update bql_locked even if they
have released the mutex while waiting.

It can cause confusion if one does "thread apply all p co_tls_bql_locked"
and see multiple threads reporting true.

Fix this by moving the bql status updates into the mutex debug hooks.  Now
the variable should always reflect the reality.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250904223158.1276992-1-peterx@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-11-03 11:59:32 +01:00
Akihiko Odaki
55d98e3ede rcu: Unify force quiescent state
Borrow the concept of force quiescent state from Linux to ensure readers
remain fast during normal operation and to avoid stalls.

Background
==========

The previous implementation had four steps to begin reclamation.

1. call_rcu_thread() would wait for the first callback.

2. call_rcu_thread() would periodically poll until a decent number of
   callbacks piled up or it timed out.

3. synchronize_rcu() would statr a grace period (GP).

4. wait_for_readers() would wait for the GP to end. It would also
   trigger the force_rcu notifier to break busy loops in a read-side
   critical section if drain_call_rcu() had been called.

Problem
=======

The separation of waiting logic across these steps led to suboptimal
behavior:

The GP was delayed until call_rcu_thread() stops polling.

force_rcu was not consistently triggered when call_rcu_thread() detected
a high number of pending callbacks or a timeout. This inconsistency
sometimes led to stalls, as reported in a virtio-gpu issue where memory
unmapping was blocked[1].

wait_for_readers() imposed unnecessary overhead in non-urgent cases by
unconditionally executing qatomic_set(&index->waiting, true) and
qemu_event_reset(&rcu_gp_event), which are necessary only for expedited
synchronization.

Solution
========

Move the polling in call_rcu_thread() to wait_for_readers() to prevent
the delay of the GP. Additionally, reorganize wait_for_readers() to
distinguish between two states:

Normal State: it relies exclusively on periodic polling to detect
the end of the GP and maintains the read-side fast path.

Force Quiescent State: Whenever expediting synchronization, it always
triggers force_rcu and executes both qatomic_set(&index->waiting, true)
and qemu_event_reset(&rcu_gp_event). This avoids stalls while confining
the read-side overhead to this state.

This unified approach, inspired by the Linux RCU, ensures consistent and
efficient RCU grace period handling and confirms resolution of the
virtio-gpu issue.

[1] https://lore.kernel.org/qemu-devel/20251014111234.3190346-9-alex.bennee@linaro.org/

Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Link: https://lore.kernel.org/r/20251016-force-v1-1-919a82112498@rsg.ci.i.u-tokyo.ac.jp
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-28 12:39:59 +01:00
Philippe Mathieu-Daudé
2ff8c9a298 buildsys: Remove support for 32-bit PPC hosts
Stop detecting 32-bit PPC host as supported.
See previous commit for rationale.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
[rth: Retain _ARCH_PPC64 check in udiv_qrnnd]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20251014173900.87497-4-philmd@linaro.org>
2025-10-16 14:58:53 -07:00
Paolo Bonzini
67913e95bf timer: constify some functions
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-14 11:04:11 +02:00
Paolo Bonzini
5142397c79 async: access bottom half flags with qatomic_read
Running test-aio-multithread under TSAN reveals data races on bh->flags.
Because bottom halves may be scheduled or canceled asynchronously,
without taking a lock, adjust aio_compute_bh_timeout() and aio_ctx_check()
to use a relaxed read to access the flags.

Use an acquire load to ensure that anything that was written prior to
qemu_bh_schedule() is visible.

Closes: https://gitlab.com/qemu-project/qemu/-/issues/2749
Closes: https://gitlab.com/qemu-project/qemu/-/issues/851
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-14 11:03:59 +02:00