QEMU SMMUv3 currently reports no SubstreamID support, forcing SSID to
zero. This prevents accelerated use cases such as Shared Virtual
Addressing (SVA), which require multiple Stage-1 context descriptors
indexed by SubstreamID.
Add a new "ssidsize" property to explicitly configure the number of bits
used for SubstreamIDs. A value greater than zero enables SubstreamID
support and advertises PASID capability to the vIOMMU.
The requested SSIDSIZE is validated against host SMMUv3 capabilities and
is only supported when accel=on.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Message-id: 20260126104342.253965-38-skolothumtho@nvidia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Add support for synthesizing a PCIe PASID extended capability for
vfio-pci devices when PASID is enabled via a vIOMMU and supported by
the host IOMMU backend.
PASID capability parameters are retrieved via IOMMUFD APIs and the
capability is inserted into the PCIe extended capability list using
the insertion helper. A new x-vpasid-cap-offset property allows
explicit control over the placement; by default the capability is
placed at the end of the PCIe extended configuration space.
If the kernel does not expose PASID information or insertion fails,
the device continues without PASID support.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Message-id: 20260126104342.253965-37-skolothumtho@nvidia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Add pcie_insert_capability(), a helper to insert a PCIe extended
capability into an existing extended capability list at a caller
specified offset.
Unlike pcie_add_capability(), which always appends a capability to the
end of the list, this helper preserves the existing list ordering while
allowing insertion at an arbitrary offset.
The helper only validates that the insertion does not overwrite an
existing PCIe extended capability header, since corrupting a header
would break the extended capability linked list. Validation of overlaps
with other configuration space registers or capability-specific
register blocks is left to the caller.
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Message-id: 20260126104342.253965-35-skolothumtho@nvidia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
QEMU SMMUv3 currently sets the output address size (OAS) to 44 bits.
With accelerator mode enabled, a device may use SVA, where CPU page tables
are shared with the SMMU, requiring an OAS at least as large as the
CPU’s output address size. A user option is added to configure this.
However, the OAS value advertised by the virtual SMMU must remain
compatible with the capabilities of the host SMMUv3. In accelerated
mode, the host SMMU performs stage-2 translation and must be able to
consume the intermediate physical addresses (IPA) produced by stage-1.
The OAS exposed by the virtual SMMU defines the maximum IPA width that
stage-1 translations may generate. For AArch64 implementations, the
maximum usable IPA size on the host SMMU is determined by its own OAS.
Check that the configured OAS does not exceed what the host SMMU
can safely support.
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Message-id: 20260126104342.253965-32-skolothumtho@nvidia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
QEMU SMMUv3 does not enable ATS (Address Translation Services) by default.
When accelerated mode is enabled and the host SMMUv3 supports ATS, it can
be useful to report ATS capability to the guest so it can take advantage
of it if the device also supports ATS.
Note: ATS support cannot be reliably detected from the host SMMUv3 IDR
registers alone, as firmware ACPI IORT tables may override them. The
user must therefore ensure the support before enabling it.
The ATS support enabled here is only relevant for vfio-pci endpoints,
as SMMUv3 accelerated mode does not support emulated endpoint devices.
QEMU’s SMMUv3 implementation still lacks support for handling ATS
translation requests, which would be required for emulated endpoints.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Message-id: 20260126104342.253965-31-skolothumtho@nvidia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
To handle SMMUv3 accel=on mode(which configures the host SMMUv3 in nested
mode), it is practical to expose the guest with reserved memory regions
(RMRs) covering the IOVAs used by the host kernel to map physical MSI
doorbells.
Those IOVAs belong to [0x8000000, 0x8100000] matching MSI_IOVA_BASE and
MSI_IOVA_LENGTH definitions in kernel arm-smmu-v3 driver. This is the
window used to allocate IOVAs matching physical MSI doorbells.
With those RMRs, the guest is forced to use a flat mapping for this range.
Hence the assigned device is programmed with one IOVA from this range.
Stage 1, owned by the guest has a flat mapping for this IOVA. Stage2,
owned by the VMM then enforces a mapping from this IOVA to the physical
MSI doorbell.
The creation of those RMR nodes is only relevant if nested stage SMMU is
in use, along with VFIO. As VFIO devices can be hotplugged, all RMRs need
to be created in advance.
Initialise AcpiIortSMMUv3Dev structures to avoid using uninitialised
state when building the IORT, as legacy and device SMMUv3 paths
populate different fields now(e.g. accel).
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Message-id: 20260126104342.253965-26-skolothumtho@nvidia.com
Suggested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Introduce a new pci_preserve_config field in virt machine state which
allows the generation of DSM #5. This field is only set if accel SMMU
is instantiated.
In a subsequent patch, SMMUv3 accel mode will make use of IORT RMR nodes
to enable nested translation of MSI doorbell addresses. IORT RMR requires
_DSM #5 to be set for the PCI host bridge so that the Guest kernel
preserves the PCI boot configuration.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Message-id: 20260126104342.253965-24-skolothumtho@nvidia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Add a 'preserve_config' field in struct GPEXConfig and, if set, generate
the _DSM function #5 for preserving PCI boot configurations.
This will be used for SMMUv3 accel=on support in subsequent patch. When
SMMUv3 acceleration (accel=on) is enabled, QEMU exposes IORT Reserved
Memory Region (RMR) nodes to support MSI doorbell translations. As per
the Arm IORT specification, using IORT RMRs mandates the presence of
_DSM function #5 so that the OS retains the firmware-assigned PCI
configuration. Hence, this patch adds conditional support for generating
_DSM #5.
According to the ACPI Specification, Revision 6.6, Section 9.1.1 -
“_DSM (Device Specific Method)”,
"
If Function Index is zero, the return is a buffer containing one bit for
each function index, starting with zero. Bit 0 indicates whether there
is support for any functions other than function 0 for the specified
UUID and Revision ID. If set to zero, no functions are supported (other
than function zero) for the specified UUID and Revision ID. If set to
one, at least one additional function is supported. For all other bits
in the buffer, a bit is set to zero to indicate if that function index
is not supported for the specific UUID and Revision ID. (For example,
bit 1 set to 0 indicates that function index 1 is not supported for the
specific UUID and Revision ID.)
"
Please refer PCI Firmware Specification, Revision 3.3, Section 4.6.5 —
"_DSM for Preserving PCI Boot Configurations" for Function 5 of _DSM
method.
Also, while at it, move the byte_list declaration to the top of the
function for clarity.
At the moment, DSM generation is not yet enabled.
The resulting AML when preserve_config=true is:
Method (_DSM, 4, NotSerialized)
{
If ((Arg0 == ToUUID ("e5c937d0-3553-4d7a-9117-ea4d19c3434d")))
{
If ((Arg2 == Zero))
{
Return (Buffer (One)
{
0x21
})
}
If ((Arg2 == 0x05))
{
Return (Zero)
}
}
...
}
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20260126104342.253965-23-skolothumtho@nvidia.com
[Shameer: Removed possible duplicate _DSM creations]
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Just before the device gets attached to the SMMUv3, make sure QEMU SMMUv3
features are compatible with the host SMMUv3.
Not all fields in the host SMMUv3 IDR registers are meaningful for userspace.
Only the following fields can be used:
- IDR0: ST_LEVEL, TERM_MODEL, STALL_MODEL, TTENDIAN, CD2L, ASID16, TTF
- IDR1: SIDSIZE, SSIDSIZE
- IDR3: BBML, RIL
- IDR5: VAX, GRAN64K, GRAN16K, GRAN4K
For now, the check is to make sure the features are in sync to enable
basic accelerated SMMUv3 support. AIDR is not checked, as hardware
implementations often provide a mix of architecture features regardless
of the revision reported in AIDR.
Note that SSIDSIZE check will be added later when support for PASID is
introduced.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Message-id: 20260126104342.253965-22-skolothumtho@nvidia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Accelerated SMMUv3 instances rely on the physical SMMUv3 for nested
translation (guest Stage-1, host Stage-2). In this mode, the guest Stage-1
tables are programmed directly into hardware, and QEMU must not attempt to
walk them for translation, as doing so is not reliably safe. For vfio-pci
endpoints behind such a vSMMU, the only translation QEMU needs to perform
is for the MSI doorbell used during KVM MSI setup.
Implement the callback so that kvm_arch_fixup_msi_route() can retrieve the
MSI doorbell GPA directly, instead of attempting a software walk of the
guest translation tables.
Also introduce an SMMUv3 device property to carry the MSI doorbell GPA.
This property will be set by the virt machine in a subsequent patch.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Message-id: 20260126104342.253965-18-skolothumtho@nvidia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
For certain vIOMMU implementations, such as SMMUv3 in accelerated mode,
the translation tables are programmed directly into the physical SMMUv3
in a nested configuration. While QEMU knows where the guest tables live,
safely walking them in software would require trapping and ordering all
guest invalidations on every command queue. Without this, QEMU could race
with guest updates and walk stale or freed page tables.
This constraint is fundamental to the design of HW-accelerated vSMMU when
used with downstream vfio-pci endpoint devices, where QEMU must never walk
guest translation tables and must rely on the physical SMMU for
translation. Future accelerated vSMMU features, such as virtual CMDQ, will
also prevent trapping invalidations, reinforcing this restriction.
For vfio-pci endpoints behind such a vSMMU, the only translation QEMU
needs is for the MSI doorbell used when setting up KVM MSI route tables.
Instead of attempting a software walk, introduce an optional vIOMMU
callback that returns the MSI doorbell GPA directly.
kvm_arch_fixup_msi_route() uses this callback when available and ignores
the guest provided IOVA in that case.
If the vIOMMU does not implement the callback, we fall back to the
existing IOMMU based address space translation path.
This ensures correct MSI routing for accelerated SMMUv3 + VFIO passthrough
while avoiding unsafe software walks of guest translation tables.
As a related change, replace RCU_READ_LOCK_GUARD() with explicit
rcu_read_lock()/rcu_read_unlock(). The introduction of an early goto
(set_doorbell) path means the RCU read side critical section can no longer
be safely scoped using RCU_READ_LOCK_GUARD().
Cc: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Message-id: 20260126104342.253965-17-skolothumtho@nvidia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
A device placed behind a vSMMU instance must have corresponding vSTEs
(bypass, abort, or translate) installed. The bypass and abort proxy nested
HWPTs are pre-allocated.
For translat HWPT, a vDEVICE object is allocated and associated with the
vIOMMU for each guest device. This allows the host kernel to establish a
virtual SID to physical SID mapping, which is required for handling
invalidations and event reporting.
An translate HWPT is allocated based on the guest STE configuration and
attached to the device when the guest issues SMMU_CMD_CFGI_STE or
SMMU_CMD_CFGI_STE_RANGE, provided the STE enables S1 translation.
If the guest STE is invalid or S1 translation is disabled, the device is
attached to one of the pre-allocated ABORT or BYPASS HWPTs instead.
While at it, export smmu_find_ste() for use here.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Message-id: 20260126104342.253965-15-skolothumtho@nvidia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Implement the VFIO/PCI callbacks to attach and detach a HostIOMMUDevice
to a vSMMUv3 when accel=on,
- set_iommu_device(): attach a HostIOMMUDevice to a vIOMMU
- unset_iommu_device(): detach and release associated resources
In SMMUv3 accel=on mode, the guest SMMUv3 is backed by the host SMMUv3 via
IOMMUFD. A vIOMMU object (created via IOMMU_VIOMMU_ALLOC) provides a per-VM,
security-isolated handle to the physical SMMUv3. Without a vIOMMU, the
vSMMUv3 cannot relay guest operations to the host hardware nor maintain
isolation across VMs or devices. Therefore, set_iommu_device() allocates
a vIOMMU object if one does not already exist.
There are two main points to consider in this implementation:
1) VFIO core allocates and attaches a S2 HWPT that acts as the nesting
parent for nested HWPTs(IOMMU_DOMAIN_NESTED). This parent HWPT will
be shared across multiple vSMMU instances within a VM.
2) A device cannot attach directly to a vIOMMU. Instead, it attaches
through a proxy nested HWPT (IOMMU_DOMAIN_NESTED). Based on the STE
configuration,there are three types of nested HWPTs: bypass, abort,
and translate.
-The bypass and abort proxy HWPTs are pre-allocated. When SMMUv3
operates in global abort or bypass modes, as controlled by the GBPA
register, or issues a vSTE for bypass or abort we attach these
pre-allocated nested HWPTs.
-The translate HWPT requires a vDEVICE to be allocated first, since
invalidations and events depend on a valid vSID.
-The vDEVICE allocation and attach operations for vSTE based HWPTs
are implemented in subsequent patches.
In summary, a device placed behind a vSMMU instance must have a vSID for
translate vSTE. The bypass and abort vSTEs are pre-allocated as proxy
nested HWPTs and is attached based on GBPA register. The core-managed
nesting parent S2 HWPT is used as parent S2 HWPT for all the nested
HWPTs and is intended to be shared across vSMMU instances within the
same VM.
set_iommu_device():
- Reuse an existing vIOMMU for the same physical SMMU if available.
If not, allocate a new one using the nesting parent S2 HWPT.
- Pre-allocate two proxy nested HWPTs (bypass and abort) under the
vIOMMU and install one based on GBPA.ABORT value.
- Add the device to the vIOMMU’s device list.
unset_iommu_device():
- Re-attach device to the nesting parent S2 HWPT.
- Remove the device from the vIOMMU’s device list.
- If the list is empty, free the proxy HWPTs (bypass and abort)
and release the vIOMMU object.
Introduce struct SMMUv3AccelState, representing an accelerated SMMUv3
instance backed by an iommufd vIOMMU object, and storing the bypass and
abort proxy HWPT IDs.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Message-id: 20260126104342.253965-13-skolothumtho@nvidia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Accelerated SMMUv3 is only meaningful when a device can leverage the host
SMMUv3 in nested mode (S1+S2 translation). To keep the model consistent
and correct, this mode is restricted to vfio-pci endpoint devices using
the iommufd backend.
Non-endpoint emulated devices such as PCIe root ports and bridges are also
permitted so that vfio-pci devices can be attached downstream. All other
device types are unsupported in accelerated mode.
Implement supports_address_space() callback to reject all such unsupported
devices.
This restriction also avoids complications with IOTLB invalidations. Some
TLBI commands (e.g. CMD_TLBI_NH_ASID) lack an associated SID, making it
difficult to trace the originating device. Allowing emulated endpoints
would require invalidating both QEMU’s software IOTLB and the host’s
hardware IOTLB, which can significantly degrade performance.
A key design choice is the address space returned for accelerated vfio-pci
endpoints. VFIO core has a container that manages an HWPT. By default, it
allocates a stage-1 normal HWPT, unless vIOMMU requests for a nesting
parent HWPT for accelerated cases.
VFIO core adds a listener for that HWPT and sets up a handler
vfio_container_region_add() where it checks the memory region.
-If the region is a non-IOMMU translated one (system address space), VFIO
treats it as RAM and handles all stage-2 mappings for the core allocated
nesting parent HWPT.
-If the region is an IOMMU address space, VFIO instead enables IOTLB
notifier handling and translation replay, skipping the RAM listener and
therefore not installing stage-2 mappings.
For accelerated SMMUv3, correct operation requires the S1+S2 nesting
model, and therefore VFIO must take the "system address space" path so
that stage-2 mappings are properly built. Returning an alias of the
system address space ensures this happens. Returning the IOMMU address
space would omit stage-2 mapping and break nested translation.
Another option considered was forcing a pre-registration path using
vfio_prereg_listener() to set up stage-2 mappings, but this requires
changes in VFIO core and was not adopted. Returning an alias of the
system address space keeps the design aligned with existing VFIO/iommufd
nesting flows and avoids the need for cross-subsystem changes.
In summary:
- vfio-pci devices(with iommufd as backend) return an address space
aliased to system address space.
- bridges and root ports return the IOMMU address space.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Message-id: 20260126104342.253965-11-skolothumtho@nvidia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Introduce an optional supports_address_space() callback in PCIIOMMUOps to
allow a vIOMMU implementation to reject devices that should not be attached
to it.
Currently, get_address_space() is the first and mandatory callback into the
vIOMMU layer, which always returns an address space. For certain setups, such
as hardware accelerated vIOMMUs (e.g. ARM SMMUv3 with accel=on), attaching
emulated endpoint devices is undesirable as it may impact the behavior or
performance of VFIO passthrough devices, for example, by triggering
unnecessary invalidations on the host IOMMU.
The new callback allows a vIOMMU to check and reject unsupported devices
early during PCI device registration.
Cc: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Message-id: 20260126104342.253965-9-skolothumtho@nvidia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
During PCI hotplug, in do_pci_register_device(), pci_init_bus_master()
is called before storing the pci_dev pointer in bus->devices[devfn].
This causes a problem if pci_init_bus_master() (via its
get_address_space() callback) attempts to retrieve the device using
pci_find_device(), since the PCI device is not yet visible on the bus.
Fix this by moving the pci_init_bus_master() call to after the device
has been added to bus->devices[devfn].
This prepares for a subsequent patch where the accel SMMUv3
get_address_space() callback retrieves the pci_dev to identify the
attached device type.
No functional change intended.
Cc: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Message-id: 20260126104342.253965-8-skolothumtho@nvidia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
To support accelerated SMMUv3 instances, introduce a shared system-wide
AddressSpace (shared_as_sysmem) that aliases the global system memory.
This shared AddressSpace will be used in a subsequent patch for all
vfio-pci devices behind all accelerated SMMUv3 instances within a VM.
Sharing a single system AddressSpace ensures that all devices behind
accelerated SMMUv3s use the same system address space pointer. This
allows VFIO/iommufd to reuse a single IOAS ID in iommufd_cdev_attach(),
enabling the Stage-2 page tables to be shared within the VM rather than
duplicated for each SMMUv3 instance.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Message-id: 20260126104342.253965-7-skolothumtho@nvidia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Set up dedicated PCIIOMMUOps for the accel SMMUv3, since it will need
different callback handling in upcoming patches. This also adds a
CONFIG_ARM_SMMUV3_ACCEL build option so the feature can be disabled
at compile time. Because we now include CONFIG_DEVICES in the header to
check for ARM_SMMUV3_ACCEL, the meson file entry for smmuv3.c needs to
be changed to arm_ss.add.
The “accel” property isn’t user visible yet and it will be introduced in
a later patch once all the supporting pieces are ready.
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Message-id: 20260126104342.253965-6-skolothumtho@nvidia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
A previous commit wrongly skipped including the generated modinfo in
case hw_arch dictionary ends up being empty.
Fix that by adding an empty source set in dictionary in this case.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3280
Fixes: e8efe5ff4 (meson: Do not try to build module for empty per-target hw/ directory)
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
* gitlab: preserve base rules for container template
* Fix some issues to make QEMU compilable on non-mainstream distros again
* Enforce sha256 as hashsum algorithm for all functional tests
# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCgAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAml4ytMRHHRodXRoQHJl
# ZGhhdC5jb20ACgkQLtnXdP5wLbX8LQ//dceTUIF2Ynakhce4MZjsHzM9YZz86knw
# 3MK8172k05Vcb/e4vlsqLv/gJYAu1XlZRGsEjSfFBi11FSQMORV3c3QLm0MkSIzR
# 7L3Zt9YnwBMEdMjJ/3moLPegRvE1kE/Kaa5t/wWP/eh34jgsubcnALktw9K5vkBP
# B/gYMKP5ZgYM+3UyNvy3FmtNGz4+U8IeURzqmgYvZ87BkCfT4DAR8WWBgjasGRSZ
# MjOYsJwtVLnm1eSzZRiJKAwzLgsQMJOp3UJFvGSTYFgalM+YP/MoV4aia3ZyKr5H
# iZQfqTdvRnp2KIJKsOJIYop3do/xUylKDYCXxESF61QyFugrA9igZ9i4tTtLBTJf
# M6ZDqdJIZj2auU4Pps6DXDjcpZcOpnhTI3exg4aCLDdUZt9DsZrdjGYnM6rf0TeK
# g7Cr+TXHEt8nMTymH3NXZLPCOyzpBbOH7a6ZbblLkOhV/KSZaazBBzpoC3FHKdfu
# l61+wbre3JCNSLUyNRh2eH112N2JR/J3Yg/8CLcAgQjsJfko701nfnb+kC8eoVtP
# YTCZmPPrbaSzzNrEamDC3YafyX3/92Y9NLiS6oEeoOog2Fy69V5tF4HzOkA4riBx
# LVk6aLkScJYYM/MI4vUYhYnK3yu9u9ioDDQJfPYgPOj5ariON7AA6ftU/WZkW7xL
# EP7xytLZBUE=
# =vTKP
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 28 Jan 2026 01:25:23 AM AEDT
# gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg: issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [unknown]
# gpg: aka "Thomas Huth <thuth@redhat.com>" [unknown]
# gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# gpg: aka "Thomas Huth <huth@tuxfamily.org>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5
* tag 'pull-request-2026-01-27' of https://gitlab.com/thuth/qemu:
tests/functional: Enforce sha256 as hashsum algorithm for all tests
tests/vm: Make the haiku VM usable again
tests/vm: Update netbsd VM to version 10.1
pc-bios/optionrom: Use 32-bit linker emulation for the optionroms
tests/tracetool: Honor the Python interpreter that "configure" detected
tests/functional/x86_64: Use the right Python interpreter & fix format string
tests/functional/x86_64: Limit the memlock test to Linux hosts
tests/functional/riscv64: Silence warnings from Pylint in the boston test
gitlab: preserve base rules for container template
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
MAINTAINER updates
- fix some malformed entries (names, lists, status)
- drop Mads from HVF and Tracing reviews
- add Pierrick for overall docs catcher
- add Pierrick as a linux-user reviewer
- add Pierrick as a co-maintainer for plugins
- set linux-user to Odd Fixes
- update core Arm to "Supported"
# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCgAdFiEEZoWumedRZ7yvyN81+9DbCVqeKkQFAml4pGcACgkQ+9DbCVqe
# KkQA6wgAkkXh8ZfPmtqYurG75nAPT37t70OUyFxxf0/MmbvERzIuBrnl7AOxsKWt
# NGo0CnP/jD3hiC9f8ciUgp7XqS8xUMdbflkt7h/opoiD+72I6G1K7Z8IS6vIF++o
# xhGj6fOVQlVfib/wMFFSGbJ+W+Uii1zuX4N1dTT1xVMFs833aj6dQ3x2qHKXBO1S
# K2Hlj6kfcIOW0l85LK6SmpNnSlmK3seolXDcceQ6cqZtofjmrApLqIGuM4lyA6uG
# qNjKH3J2omFI7eUAvxu+xa/UT1zKJQFmH9f7qUKcXhHd5z7unIj/RrUEMRu/moge
# 3F7r7LCOJ5tJxZ86DdO52b2yf1nMEA==
# =CyCY
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 27 Jan 2026 10:41:27 PM AEDT
# gpg: using RSA key 6685AE99E75167BCAFC8DF35FBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44
* tag 'pull-11.0-maintainer-updates-270126-1' of https://gitlab.com/stsquad/qemu:
MAINTAINERS: add co-maintainer for TCG Plugins
MAINTAINERS: be realistic about *-user
MAINTAINERS: add reviewer for linux-user
MAINTAINERS: update Arm to Supported status
MAINTAINERS: add maintainer for docs/
MAINTAINERS: remove myself as reviewer
MAINTAINERS: regularise the status fields
MAINTAINERS: fix libvirt entry
MAINTAINERS: fix missing names
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* rust: move binding generation to bindings/
* rust: fixes for Windows
* target/i386/tcg: fix a few instructions that do not support VEX.L=1
* target/i386/tcg: various cleanups
# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAml4h1QUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroP56Qf/cqjdwUO8GUNP5Og2s/D2wjiTeSTq
# 9oer3Jq2OUyh5zqt6oVXLjWIV6GOjaak9aSa8b23Ej4jC+Zjc0RRN9s6qXkCfvM7
# yrfQWnbIkkSmCWIp5stsqtXtE11wMWh25gjVNzj0tuPxNEzgYG8lyZT3/vgZ3B/o
# OO6s8HxNdgGrP5zeIMNeaF0OkdyF/JADv5NrKH57HYRyYE0ZMmn0G/RPxecyS7se
# W0KW7H6F6RqFPNf7W0Y9+uQjDttrinQ9Ni2+IIgZ9GaoIRloqslclmof9fxMizK9
# aqxuC8XmJkgF13V/9mLy+iZKO9fhlaCJ0CsxZqscmrzPNs7QWlJ3L9nDng==
# =4pTP
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 27 Jan 2026 08:37:24 PM AEDT
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [unknown]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu:
rust/hpet: remove stale TODO comment
target/i386/tcg: cleanup #ifdef TARGET_X86_64
target/i386/tcg: replace havesib variable with the SIB byte itself
target/i386/tcg: merge decode_modrm and decode_modrm_address split
target/i386/tcg: remove dead constants
target/i386/tcg: fix typo in dpps/dppd instructions
target/i386/tcg: fix a few instructions that do not support VEX.L=1
qdev: add hw/core/gpio.c to libhwcore
rust: move hwcore::sysbus to system crate
rust: move binding generation to bindings/
rust: move class_init to an extension trait
rust: hwcore: add chardev symbols to integration tests
rust: trace: libc does not have syslog on windows
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The functional testing framework currently supports both, sha256
and sha512 as hashsums for the assets. However, all but one test
currently only use sha256, which should also be sufficient according
to the current security standards. Having two algorithms around already
caused some confusion (e.g. the clean_functional_cache.py script only
supports sha256 right now), so standardize now on enforcing sha256
before more tests use a mix of the two algorithms.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Aditya Gupta <adityag@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20260121101957.82477-1-thuth@redhat.com>
The haiku VM bitrotted in the course of time. Make sure to use the
latest version of the repositories here and install missing pieces
like "pip" and "tomli" now.
Since we nowadays also install our own version of meson in our venv,
this also requires a change to our configure script: On Haiku, the
meson binary shows up as pyvenv/non-packaged/bin/meson here, and not
in the expected location pyvenv/bin/meson. Adjust the "meson" variable
to point to that Haiku-specific location to fix this issue. See also:
https://github.com/haiku/haiku/blob/r1beta5/docs/user/storage/storageintro.dox
And finally, with the new toolchain from the beta 5, we also have to
compile with "-pie", otherwise the linker complains about bad relocations
in the object files, so allow compiling with PIE in the configure script
now.
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20260123184429.5278-1-thuth@redhat.com>
NetBSD 10.1 has been released since more than a year, so it's time to
update our VM to that version.
Apart from the usual changes in the installation process, we also have
to disable the installation of the "jpeg" package now, otherwise the
package installation fails with an error message like this:
pkg_add: jpeg-9fnb1: conflicts with `libjpeg-turbo-[0-9]*', and
`libjpeg-turbo-3.1.3' is installed.
We also have to drop the executable bits from scripts/qemu-plugin-symbols.py
to force meson to use the detected Python interpreter instead of executing
the file directly (which tries to use the Python interpreter from the file's
shebang line).
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20260113193554.123082-1-thuth@redhat.com>
Without this linker flag, the linking fails on NetBSD v10.1 with:
ld: i386 architecture of input file `multiboot.o' is incompatible with i386:x86-64 output
ld: i386 architecture of input file `multiboot_dma.o' is incompatible with i386:x86-64 output
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20260121074819.29396-1-thuth@redhat.com>