The original design of QIOTask was intended to simplify lifecycle
management by automatically freeing it when the task was marked as
complete. This overlooked the fact that when a QIOTask is used in
combination with a GSource, there may be times when the source
callback is never invoked. This is typically when a GSource is
released before any I/O event arrives. In such cases it is not
desirable to mark a QIOTask as complete, but it still needs to be
freed. To satisfy this, the task must be released manually.
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
A new machine specific option 'x-change-vmfd-on-reset' is introduced for
debugging and testing only (hence the 'x-' prefix). This option when enabled
will force KVM VM file descriptor to be changed upon guest reset like
in the case of confidential guests. This can be used to exercise the code
changes that are specific for confidential guests on non-confidential
guests as well (except changes that require hardware support for
confidential guests).
A new functional test has been added in the next patch that uses this new
parameter to test the VM file descriptor changes.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-33-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When new vcpu file descriptors are created and bound to the new kvm file
descriptor as a part of the confidential guest reset mechanism, various
subsystems needs to know about it. This change adds notifiers so that various
subsystems can take appropriate actions when vcpu fds change by registering
their handlers to this notifier.
Subsequent changes will register specific handlers to this notifier.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-31-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Confidential guest smust reload their bios rom upon reset. This is because
bios memory is encrypted and upon reset, the contents of the old bios memory
is lost and cannot be re-used. To this end, export a new x86 function
x86_bios_rom_reload() to reload the bios again. This function will be used in
the subsequent patches.
Reviewed-by: Bernhard Beschow <shentey@gmail.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-14-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Various subsystems might need to take some steps before the KVM file descriptor
for a virtual machine is changed. So a new boolean attribute is added to the
vmfd_notifier structure which is passed to the notifier callbacks.
vmfd_notifer.pre is true for pre-notification of vmfd change and false for
post notification. Notifier callback implementations can simply check
the boolean value for (vmfd_notifer*)->pre and can take actions for pre or
post vmfd change based on the value.
Subsequent patches will add callback implementations for specific components
that need this pre-notification.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-9-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A notifier callback can be used by various subsystems to perform actions when
KVM file descriptor for a virtual machine changes as a part of confidential
guest reset process. This change adds this notifier mechanism. Subsequent
patches will add specific implementations for various notifier callbacks
corresponding to various subsystems that need to take action when KVM VM file
descriptor changed.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-8-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This change adds common kvm specific support to handle KVM VM file descriptor
change. KVM VM file descriptor can change as a part of confidential guest reset
mechanism. A new function api kvm_arch_on_vmfd_change() per
architecture platform is added in order to implement architecture specific
changes required to support it. A subsequent patch will add x86 specific
implementation for kvm_arch_on_vmfd_change() as currently only x86 supports
confidential guest reset.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-6-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
After the guest KVM file descriptor has changed as a part of the process of
confidential guest reset mechanism, existing memory needs to be reattached to
the new file descriptor. This change adds a helper function ram_block_rebind()
for this purpose. The next patch will make use of this function.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-5-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When a confidential virtual machine is reset, a new guest context in the
accelerator must be generated post reset. Therefore, the old accelerator guest
file handle must be closed and a new one created. To this end, a per-accelerator
callback, "rebuild_guest" is introduced that would get called when a confidential
guest is reset. Subsequent patches will introduce specific implementation of
this callback for KVM accelerator.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-4-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As a part of the confidential guest reset process, the existing encrypted guest
state must be made mutable since it would be discarded after reset. A new
encrypted and locked guest state must be established after the reset. To this
end, a new boolean member per confidential guest support class
(eg, tdx or sev-snp) is added that will indicate whether its possible to
rebuild guest state:
bool can_rebuild_guest_state;
This is true if rebuilding guest state is possible, false otherwise.
A KVM based confidential guest reset is only possible when
the existing state is locked but its possible to rebuild guest state.
Otherwise, the guest is not resettable.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-3-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a machine model to spawn a Nitro Enclave. Unlike the existing -M
nitro-enclave, this machine model works exclusively with the -accel
nitro accelerator to drive real Nitro Enclave creation. It supports
memory allocation, number of CPU selection, both x86_64 as well as
aarch64, implements the Enclave heartbeat logic and debug serial
console.
To use it, create an EIF file and run
$ qemu-system-x86_64 -accel nitro,debug-mode=on -M nitro -nographic \
-kernel test.eif
or
$ qemu-system-aarch64 -accel nitro,debug-mode=on -M nitro -nographic \
-kernel test.eif
Signed-off-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20260225220807.33092-9-graf@amazon.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Nitro Enclaves expect the parent instance to host a vsock heartbeat listener
at port 9000. To host a Nitro Enclave with the nitro accel in QEMU, add
such a heartbeat listener as device model, so that the machine can
easily instantiate it.
Signed-off-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20260225220807.33092-7-graf@amazon.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Nitro Enclaves support a special "debug" mode. When in debug mode, the
Nitro Hypervisor provides a vsock port that the parent can connect to to
receive serial console output of the Enclave. Add a new nitro-serial-vsock
driver that implements short-circuit logic to establish the vsock
connection to that port and feed its data into a chardev, so that a machine
model can use it as serial device.
Signed-off-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20260225220807.33092-6-graf@amazon.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Nitro Enclaves are a confidential compute technology which
allows a parent instance to carve out resources from itself
and spawn a confidential sibling VM next to itself. Similar
to other confidential compute solutions, this sibling is
controlled by an underlying vmm, but still has a higher level
vmm (QEMU) to implement some of its I/O functionality and
lifecycle.
Add an accelerator to drive this interface. In combination with
follow-on patches to enhance the Nitro Enclaves machine model, this
will allow users to run a Nitro Enclave using QEMU.
Signed-off-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20260225220807.33092-5-graf@amazon.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a dedicated bus for Nitro Enclave vsock devices. In Nitro Enclaves,
communication between parent and enclave/hypervisor happens almost
exclusively through vsock. The nitro-vsock-bus models this dependency
in QEMU, which allows devices in this bus to implement individual services
on top of vsock.
The nitro machine spawns this bus by creating the included
nitro-vsock-bridge sysbus device.
The nitro accel then advertises the Enclave's CID to the bus by calling
nitro_vsock_bridge_start_enclave() on the bridge device as soon as it
knows the CID.
Nitro vsock devices can listen to that event and learn the Enclave's CID
when it is available to perform actions, such as connect to the debug
serial vsock port.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20260225220807.33092-4-graf@amazon.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We log a GUEST_ERROR message "PL011 data written to disabled UART" if
the guest writes data to the TX FIFO when it has not set the enable
bit in the UART. The idea is to note that the guest has done
something dubious but let it work anyway. However, since we print
this message for every output character, it floods the logs when
running a guest that does this.
Keep a note of whether we've printed the log message or not, so we
only output it once. If the guest actively disables the UART, we
re-arm the log message.
Notably, the Linux kernel does not bother to enable the UART if it is
used for earlycon, relying on the firmware having already done that.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Mohamed Mediouni <mohamed@unpredictable.fr>
Message-id: 20260210101702.3980804-1-peter.maydell@linaro.org
Windows ARM64 guests detect virtio-mmio devices declared in ACPI
tables even when no backend is attached. This causes "Unknown
devices" (ACPI\LNRO0005) to appear in Device Manager.
Until Windows fixes that by supporting, add a new machine
property 'virtio-mmio-transports' to control the number of
virtio-mmio transports instantiated. The default remains
NUM_VIRTIO_TRANSPORTS (32) for backward compatibility.
Setting it to 0 allows users to disable virtio-mmio entirely.
Usage: -machine virt,virtio-mmio-transports=0
Signed-off-by: Mohammadfaiz Bawa <mbawa@redhat.com>
Message-id: 20260219173256.152743-1-mbawa@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Remove the need of per-target QEMU_ARCH. Define the
QEMU_ARCH_* constants based on SYS_EMU_TARGET_* ones,
themselves already exposed via target_arch(), allowing
to check the current target is included in @arch_bitmask.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20260213175032.32121-5-philmd@linaro.org>
qemu_arch_available() is used to check if a broadly available
feature should be exposed to a particular set of target
architectures.
Since its argument is a mask of bits, rename it as @arch_bitmask.
We have less than 32 target architectures so far, so restrict it
to the uint32_t type.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20260213175032.32121-2-philmd@linaro.org>
The endianness field used an int to represent a boolean concept, with
0 meaning little-endian and 1 meaning big-endian. This required runtime
validation to reject invalid values and made the code less readable.
Replace with a bool big_endian field that is self-documenting and
type-safe. The compiler now enforces valid values, eliminating the
need for the validation check in audio_validate_settings().
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Mark Cave-Ayland <mark.caveayland@nutanix.com>
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
The audio_pcm_info structure stored three fields (bits, is_signed,
is_float) that were always derived from the AudioFormat enum. This
redundancy meant the same information was represented twice, with no
type-level guarantee that they stayed in sync.
Replace these fields with a single AudioFormat field, and add helper
functions to extract the derived properties when needed:
- audio_format_bits()
- audio_format_is_signed()
- audio_format_is_float()
This improves type safety by making AudioFormat the single source of
truth, eliminating the possibility of inconsistent state between the
format enum and its derived boolean/integer representations.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Reviewed-by: Mark Cave-Ayland <mark.caveayland@nutanix.com>
Code clean-up, to allow building bare abstract class separately.
The original file is MIT-licensed.
Reviewed-by: Mark Cave-Ayland <mark.caveayland@nutanix.com>
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>