qemu-qemu-1

mirror of https://github.com/qemu/qemu.git synced 2026-02-04 02:24:51 +00:00

Author	SHA1	Message	Date
Shameer Kolothum	06b38473cd	hw/vfio/pci: Synthesize PASID capability for vfio-pci devices Add support for synthesizing a PCIe PASID extended capability for vfio-pci devices when PASID is enabled via a vIOMMU and supported by the host IOMMU backend. PASID capability parameters are retrieved via IOMMUFD APIs and the capability is inserted into the PCIe extended capability list using the insertion helper. A new x-vpasid-cap-offset property allows explicit control over the placement; by default the capability is placed at the end of the PCIe extended configuration space. If the kernel does not expose PASID information or insertion fails, the device continues without PASID support. Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Tested-by: Eric Auger <eric.auger@redhat.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Message-id: 20260126104342.253965-37-skolothumtho@nvidia.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2026-01-29 13:32:05 +00:00
Shameer Kolothum	550beca3d7	backends/iommufd: Retrieve PASID width from iommufd_backend_get_device_info() Retrieve PASID width from iommufd_backend_get_device_info() and store it in HostIOMMUDeviceCaps for later use. Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Message-id: 20260126104342.253965-33-skolothumtho@nvidia.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2026-01-29 13:32:05 +00:00
Nicolin Chen	8cfaf22668	hw/vfio/region: Create dmabuf for PCI BAR per region Linux now provides a VFIO dmabuf exporter to expose PCI BAR memory for P2P use cases. Create a dmabuf for each mapped BAR region after the mmap is set up, and store the returned fd in the region’s RAMBlock. This allows QEMU to pass the fd to dma_map_file(), enabling iommufd to import the dmabuf and map the BAR correctly in the host IOMMU page table. If the kernel lacks support or dmabuf setup fails, QEMU skips the setup and continues with normal mmap handling. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/qemu-devel/20260121114111.34045-4-skolothumtho@nvidia.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2026-01-26 08:30:04 +01:00
Shameer Kolothum	de36da106d	hw/vfio: Add helper to retrieve device feature Add vfio_device_get_feature() as a common helper to retrieve VFIO device features. No functional change intended. Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/qemu-devel/20260121114111.34045-3-skolothumtho@nvidia.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2026-01-26 08:30:04 +01:00
Jim Shu	0e387bd1df	hw/vfio: cpr-iommufd: Fix wrong usage of migrate_add_blocker_modes The return value of API is 0 for success and negative error code for failure. We'll check if the return value equals to 0. Also, the MIG_MODE should be CPR_TRANSFER and CPR_EXEC instead of 2 same bits. The API usage is aligned with 'hw/vfio/cpr-legacy.c' after these 2 changes. Fixes: `3ca0a0ab05` ("migration: Use bitset of MigMode instead of variable arguments") Signed-off-by: Jim Shu <jim.shu@sifive.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20260121063418.2001326-1-jim.shu@sifive.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2026-01-26 08:30:04 +01:00
Zhenzhong Duan	e3c659fee0	vfio/migration: Fix page size calculation Coverity detected an issue of left shifting int by more than 31 bits leading to undefined behavior. In practice bcontainer->dirty_pgsizes always have some common page sizes when dirty tracking is supported. Resolves: Coverity CID 1644186 Resolves: Coverity CID 1644187 Resolves: Coverity CID 1644188 Fixes: `46c7633114` ("vfio/migration: Add migration blocker if VM memory is too large to cause unmap_bitmap failure"). Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20260116060315.65723-1-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2026-01-26 08:30:04 +01:00
Zhenzhong Duan	68d3a2a24d	Workaround for ERRATA_772415_SPR17 On a system influenced by ERRATA_772415, IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 is repored by IOMMU_DEVICE_GET_HW_INFO. Due to this errata, even the readonly range mapped on second stage page table could still be written. Reference from 4th Gen Intel Xeon Processor Scalable Family Specification Update, Errata Details, SPR17. Link https://edc.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/eagle-stream/sapphire-rapids-specification-update/ Backup https://cdrdv2.intel.com/v1/dl/getContent/772415 Also copied the SPR17 details from above link: "Problem: When remapping hardware is configured by system software in scalable mode as Nested (PGTT=011b) and with PWSNP field Set in the PASID-table-entry, it may Set Accessed bit and Dirty bit (and Extended Access bit if enabled) in first-stage page-table entries even when second-stage mappings indicate that corresponding first-stage page-table is Read-Only. Implication: Due to this erratum, pages mapped as Read-only in second-stage page-tables may be modified by remapping hardware Access/Dirty bit updates. Workaround: None identified. System software enabling nested translations for a VM should ensure that there are no read-only pages in the corresponding second-stage mappings." Introduce a helper vfio_device_get_host_iommu_quirk_bypass_ro to check if readonly mappings should be bypassed. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Link: https://lore.kernel.org/qemu-devel/20260106062808.316574-5-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2026-01-13 08:29:59 +01:00
Zhenzhong Duan	5c9da3d65d	vfio/listener: Bypass readonly region for dirty tracking When doing dirty tracking or calculating dirty tracking range, readonly regions can be bypassed, because corresponding DMA mappings are readonly and never become dirty. This can optimize dirty tracking a bit for passthrough device. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Link: https://lore.kernel.org/qemu-devel/20260106062808.316574-4-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2026-01-13 08:29:59 +01:00
Zhenzhong Duan	0e3c1e2b2b	vfio/migration: Allow live migration with vIOMMU without VFs using device dirty tracking Commit `e46883204c` ("vfio/migration: Block migration with vIOMMU") introduces a migration blocker when vIOMMU is enabled, because we need to calculate the IOVA ranges for device dirty tracking. But this is unnecessary for iommu dirty tracking. Limit the vfio_viommu_preset() check to those devices which use device dirty tracking. This allows live migration with VFIO devices which use iommu dirty tracking. Suggested-by: Jason Zeng <jason.zeng@intel.com> Co-developed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com> Tested-by: Rohith S R <rohith.s.r@intel.com> Link: https://lore.kernel.org/qemu-devel/20251218062643.624796-10-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2026-01-13 08:29:59 +01:00
Zhenzhong Duan	46c7633114	vfio/migration: Add migration blocker if VM memory is too large to cause unmap_bitmap failure With default config, kernel VFIO IOMMU type1 driver limits dirty bitmap to 256MB for unmap_bitmap ioctl so the maximum guest memory region is no more than 8TB size for the ioctl to succeed. Be conservative here to limit total guest memory to max value supported by unmap_bitmap ioctl or else add a migration blocker. IOMMUFD backend doesn't have such limit, one can use it if there is a need to migrate such large VM. Suggested-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/qemu-devel/20251218062643.624796-9-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2026-01-13 08:29:59 +01:00
Zhenzhong Duan	6e360c0617	vfio/listener: Add missing dirty tracking in region_del If a VFIO device in guest switches from passthrough(PT) domain to block domain, the whole memory address space is unmapped, but we passed a NULL iotlb entry to unmap_bitmap, then bitmap query didn't happen and we lost dirty pages. By constructing an iotlb entry with iova = gpa for unmap_bitmap, it can set dirty bits correctly. For IOMMU address space, we still send NULL iotlb because VFIO don't know the actual mappings in guest. It's vIOMMU's responsibility to send actual unmapping notifications, e.g., vtd_address_space_unmap_in_dirty_tracking(). Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/qemu-devel/20251218062643.624796-8-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2026-01-13 08:29:59 +01:00
Zhenzhong Duan	e98a1c7049	vfio/iommufd: Add IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR flag support Pass IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR when doing the last dirty bitmap query right before unmap, no PTEs flushes. This accelerates the query without issue because unmap will tear down the mapping anyway. Co-developed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com> Tested-by: Rohith S R <rohith.s.r@intel.com> Link: https://lore.kernel.org/qemu-devel/20251218062643.624796-6-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2026-01-13 08:29:59 +01:00
Joao Martins	374e28d876	vfio: Add a backend_flag parameter to vfio_container_query_dirty_bitmap() This new parameter will be used in following patch, currently 0 is passed. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com> Tested-by: Rohith S R <rohith.s.r@intel.com> Link: https://lore.kernel.org/qemu-devel/20251218062643.624796-5-zhenzhong.duan@intel.com [ clg: Fixed subject typo ] Signed-off-by: Cédric Le Goater <clg@redhat.com>	2026-01-13 08:29:59 +01:00
Zhenzhong Duan	e79bc265ef	vfio/container-legacy: rename vfio_dma_unmap_bitmap() to vfio_legacy_dma_unmap_get_dirty_bitmap() This is to follow naming style in container-legacy.c to have low level functions with vfio_legacy_ prefix. No functional changes. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/qemu-devel/20251218062643.624796-4-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2026-01-13 08:29:59 +01:00
Zhenzhong Duan	f051dbeb91	vfio/iommufd: Query dirty bitmap before DMA unmap When an existing mapping is unmapped, there could already be dirty bits which need to be recorded before unmap. If query dirty bitmap fails, we still need to do unmapping or else there is stale mapping and it's risky to guest. Co-developed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com> Tested-by: Rohith S R <rohith.s.r@intel.com> Link: https://lore.kernel.org/qemu-devel/20251218062643.624796-3-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2026-01-13 08:29:59 +01:00
Zhenzhong Duan	31ec4aadd0	vfio/iommufd: Add framework code to support getting dirty bitmap before unmap Currently we support device and iommu dirty tracking, device dirty tracking is preferred. Add the framework code in iommufd_cdev_unmap() to choose either device or iommu dirty tracking, just like vfio_legacy_dma_unmap_one(). Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com> Tested-by: Rohith S R <rohith.s.r@intel.com> Link: https://lore.kernel.org/qemu-devel/20251218062643.624796-2-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2026-01-13 08:29:59 +01:00
Zhenzhong Duan	c3459c6bfa	vfio/iommufd: Force creating nesting parent HWPT Call pci_device_get_viommu_flags() to get if vIOMMU supports VIOMMU_FLAG_WANT_NESTING_PARENT. If yes, create a nesting parent HWPT and add it to the container's hwpt_list, letting this parent HWPT cover the entire second stage mappings (GPA=>HPA). This allows a VFIO passthrough device to directly attach to this default HWPT and then to use the system address space and its listener. Introduce a vfio_device_get_viommu_flags_want_nesting() helper to facilitate this implementation. It is safe to do so because a vIOMMU will be able to fail in set_iommu_device() call, if something else related to the VFIO device or vIOMMU isn't compatible. Suggested-by: Nicolin Chen <nicolinc@nvidia.com> Suggested-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/qemu-devel/20260106061304.314546-9-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2026-01-13 08:29:58 +01:00
Philippe Mathieu-Daudé	78e630fcc4	hw/vfio/migration: Check base architecture at runtime Inline vfio_arch_wants_loading_config_after_iter() and replace the compile time check of the TARGET_ARM definition by a runtime call to target_base_arm(). Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/20251021161707.8324-1-philmd@linaro.org Signed-off-by: Cédric Le Goater <clg@redhat.com>	2026-01-13 08:29:58 +01:00
Markus Armbruster	b351b49275	error: Use error_setg_errno() to improve error messages A few error messages show numeric errno codes. Use error_setg_errno() to show human-readable text instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20251121121438.1249498-13-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> [Trivial fixup to riscv_kvm_cpu_finalize_features()]	2026-01-08 07:49:23 +01:00
Paolo Bonzini	7f548b8f23	include: reorganize memory API headers Move RAMBlock functions out of ram_addr.h and cpu-common.h; move memory API headers out of include/exec and into include/system. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-12-27 10:11:09 +01:00
Paolo Bonzini	048a23851c	include: move hw/hw.h to hw/core/, rename Call it include/hw/core/hw-error.h since that is the only thing it contains. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-12-27 10:11:09 +01:00
Paolo Bonzini	e1e9a72500	include: move hw/qdev-properties-system.h to hw/core/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-12-27 10:11:08 +01:00
Paolo Bonzini	78d45220b4	include: move hw/qdev-properties.h to hw/core/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-12-27 10:11:07 +01:00
Paolo Bonzini	d1000ecae2	include: move hw/qdev-core.h to hw/core/, rename Call it hw/core/qdev.h to avoid the duplication in the name. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-12-27 10:11:07 +01:00
Paolo Bonzini	1942b61b74	include: move hw/boards.h to hw/core/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-12-27 10:11:06 +01:00
Yanghang Liu	5f9ac96373	Fix the typo of vfio-pci device's enable-migration option Signed-off-by: Yanghang Liu <yanghliu@redhat.com> Reported-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2025-11-21 15:53:06 +03:00
Peter Maydell	b1f4f4695c	vfio: Clean up includes This commit was created with scripts/clean-includes: ./scripts/clean-includes --git vfio hw/vfio hw/vfio-user All .c should include qemu/osdep.h first. The script performs three related cleanups: * Ensure .c files include qemu/osdep.h first. * Including it in a .h is redundant, since the .c already includes it. Drop such inclusions. * Likewise, including headers qemu/osdep.h includes is redundant. Drop these, too. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20251104160943.751997-9-peter.maydell@linaro.org	2025-11-14 13:18:04 +00:00
Markus Armbruster	3ca0a0ab05	migration: Use bitset of MigMode instead of variable arguments migrate_add_blocker_modes() and migration_add_notifier_modes use variable arguments for a set of migration modes. The variable arguments get collected into a bitset for processsing. Take a bitset argument instead, it's simpler. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20251027064503.1074255-3-armbru@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
John Levon	ecbe424a63	vfio: only check region info cache for initial regions It is semantically valid for a VFIO device to increase the number of regions after initialization. In this case, we'd attempt to check for cached region info past the size of the ->reginfo array. Check for the region index and skip the cache in these cases. This also works around some VGPU use cases which appear to be a bug, where VFIO_DEVICE_QUERY_GFX_PLANE returns a region index beyond the reported ->num_regions. Fixes: `95cdb024` ("vfio: add region info cache") Signed-off-by: John Levon <john.levon@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex@shazbot.org> Link: https://lore.kernel.org/qemu-devel/20251014151227.2298892-3-john.levon@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-10-22 08:12:52 +02:00
John Levon	aaca725884	vfio: rename field to "num_initial_regions" We set VFIODevice::num_regions at initialization time, and do not otherwise refresh it. As it is valid in theory for a VFIO device to later increase the number of supported regions, rename the field to "num_initial_regions" to better reflect its semantics. Signed-off-by: John Levon <john.levon@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex@shazbot.org> Link: https://lore.kernel.org/qemu-devel/20251014151227.2298892-2-john.levon@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-10-22 08:12:52 +02:00
Zhenzhong Duan	271fec6f18	vfio/listener: Add an assertion for unmap_all Currently the maximum of iommu address space is 64bit. So when a maximum iommu memory section is deleted, it's in scope [0, 2^64). Add a assertion for that. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20251009040134.334251-4-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-10-22 08:12:52 +02:00
Zhenzhong Duan	b30823e561	vfio/iommufd: Support unmap all in one ioctl() IOMMUFD kernel uAPI supports unmapping whole address space in one call with [iova, size] set to [0, UINT64_MAX], this can simplify iommufd_cdev_unmap() a bit. See iommufd_ioas_unmap() in kernel for details. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20251009040134.334251-3-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-10-22 08:12:52 +02:00
Zhenzhong Duan	962bcf0911	vfio/container: Support unmap all in one ioctl() VFIO type1 kernel uAPI supports unmapping whole address space in one call since commit c19650995374 ("vfio/type1: implement unmap all"). Use the unmap_all variant whenever it's supported in kernel. Opportunistically pass VFIOLegacyContainer pointer in low level function vfio_legacy_dma_unmap_one(). Co-developed-by: John Levon <levon@movementarian.org> Signed-off-by: John Levon <levon@movementarian.org> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20251009040134.334251-2-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-10-22 08:12:52 +02:00
Zhenzhong Duan	8bf49fff0d	vfio/iommufd: Restore vbasedev's reference to hwpt after CPR transfer After CPR transfer, if there are more than one VFIO devices, device is not added to hwpt->device_list and its reference to hwpt isn't restored on destination. We still need to call iommufd_cdev_attach_container() to restore it after a matching container is found, or else SIGSEV triggers. Fixes: `4296ee0745` ("vfio/iommufd: reconstruct device") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Steve Sistare <steven.sistare@oracle.com> Link: https://lore.kernel.org/qemu-devel/20250928085432.40107-5-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-10-22 08:12:52 +02:00
Zhenzhong Duan	d59db04aed	vfio/iommufd: Set cpr.ioas_id on source side for CPR transfer On source side, if there are more than one VFIO devices and they attach to same container, only the first device sets cpr.ioas_id, the others are bypassed. We should set it for each device, or else only first device works. Fixes: `4296ee0745` ("vfio/iommufd: reconstruct device") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Steve Sistare <steven.sistare@oracle.com> Link: https://lore.kernel.org/qemu-devel/20250928085432.40107-4-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-10-22 08:12:52 +02:00
Zhenzhong Duan	9423094896	vfio/cpr-legacy: drop an erroneous assert vfio_legacy_cpr_dma_map() is not only used in post_load on destination but also error recovery path on source side. Assert it for destination is wrong. Fixes: `7e9f214113` ("vfio/container: restore DMA vaddr") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Steve Sistare <steven.sistare@oracle.com> Link: https://lore.kernel.org/qemu-devel/20250928085432.40107-3-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-10-22 08:12:52 +02:00
Zhenzhong Duan	5a78db7f80	vfio/container: Remap only populated parts in a section If there are multiple containers and unmap-all fails for some of them, we need to remap vaddr for the other containers for which unmap-all succeeded. When ram discard is enabled, we should only remap populated parts in a section instead of the whole section. Fixes: `eba1f657cb` ("vfio/container: recover from unmap-all-vaddr failure") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250928085432.40107-2-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-10-22 08:12:52 +02:00
Philippe Mathieu-Daudé	4db362f68c	system/physmem: Extract API out of 'system/ram_addr.h' header Very few files use the Physical Memory API. Declare its methods in their own header: "system/physmem.h". Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Message-Id: <20251001175448.18933-19-philmd@linaro.org>	2025-10-07 05:03:56 +02:00
Philippe Mathieu-Daudé	aa60bdb700	system/physmem: Drop 'cpu_' prefix in Physical Memory API The functions related to the Physical Memory API declared in "system/ram_addr.h" do not operate on vCPU. Remove the 'cpu_' prefix. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Message-Id: <20251001175448.18933-18-philmd@linaro.org>	2025-10-07 05:03:56 +02:00
Philippe Mathieu-Daudé	97480ca692	hw: Remove unnecessary 'system/ram_addr.h' header None of these files require definition exposed by "system/ram_addr.h", remove its inclusion. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Jagannathan Raman <jag.raman@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20251001175448.18933-7-philmd@linaro.org>	2025-10-07 05:03:56 +02:00
Philippe Mathieu-Daudé	edd1f91d38	hw/vfio/listener: Include missing 'exec/target_page.h' header The "exec/target_page.h" header is indirectly pulled from "system/ram_addr.h". Include it explicitly, in order to avoid unrelated issues when refactoring "system/ram_addr.h": hw/vfio/listener.c: In function ‘vfio_ram_discard_register_listener’: hw/vfio/listener.c:258:28: error: implicit declaration of function ‘qemu_target_page_size’; did you mean ‘qemu_ram_pagesize’? 258 \| int target_page_size = qemu_target_page_size(); \| ^~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Message-Id: <20251001175448.18933-5-philmd@linaro.org>	2025-10-07 05:03:56 +02:00
Richard Henderson	bd6aa0d1e5	Merge tag 'staging-pull-request' of https://gitlab.com/peterx/qemu into staging Migration/Memory Pull for 10.2 - PeterX's fix on tls warning for preempt channel when migratino completes - Arun's series to enhance error reporting for vTPM and migration framework - PeterX's patch to cleanup multifd send TLS BYE messages - Juraj's fix on postcopy start state transition when switchover failed - Yanfei's fix to migrate APIC before VFIO-PCI to avoid irq fallbacks - Dan's cleanup to simplify error reporting in qemu_fill_buffer() - PeterM's fix on address space leak when cpu hot plug / unplug - Steve's cpr-exec wholeset # -----BEGIN PGP SIGNATURE----- # # iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCaN/uIhIccGV0ZXJ4QHJl # ZGhhdC5jb20ACgkQO1/MzfOr1wZ+mAEA1l2RS9sZS1W3vXQMCNb+Nu8Uo2p+e5Qj # Uu6J0WVV+XsBANtzGZk2UM/frqlABywW3/ozJ4qBvIPKo758Mr6/lqUH # =asUv # -----END PGP SIGNATURE----- # gpg: Signature made Fri 03 Oct 2025 08:39:14 AM PDT # gpg: using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706 # gpg: issuer "peterx@redhat.com" # gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [unknown] # gpg: aka "Peter Xu <peterx@redhat.com>" [unknown] # gpg: WARNING: The key's User ID is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D D1A9 3B5F CCCD F3AB D706 * tag 'staging-pull-request' of https://gitlab.com/peterx/qemu: (45 commits) migration-test: test cpr-exec vfio: cpr-exec mode migration: cpr-exec docs migration: cpr-exec mode migration: cpr-exec save and load migration: cpr-exec-command parameter oslib: qemu_clear_cloexec migration: add cpr_walk_fd migration: multi-mode notifier migration: simplify error reporting after channel read physmem: Destroy all CPU AddressSpaces on unrealize memory: New AS helper to serialize destroy+free include/system/memory.h: Clarify address_space_destroy() behaviour migration: ensure APIC is loaded prior to VFIO PCI devices migration: Fix state transition in postcopy_start() error handling migration/multifd/tls: Cleanup BYE message processing on sender side migration: HMP: Adjust the order of output fields migration: Make migration_has_failed() work even for CANCELLING io/crypto: Move tls premature termination handling into QIO layer backends/tpm: Propagate vTPM error on migration failure ... Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2025-10-04 09:10:58 -07:00
Steve Sistare	ee1ca09fc1	vfio: cpr-exec mode All blockers and notifiers for cpr-transfer mode also apply to cpr-exec. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Acked-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/30750362-d4a1-4392-8dd6-016624d01be1@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	6f9fc6f501	migration: Remove error variant of vmstate_save_state() function This commit removes the redundant vmstate_save_state_with_err() function. Previously, commit `969298f9d7` introduced vmstate_save_state_with_err() to handle error propagation, while vmstate_save_state() existed for non-error scenarios. This is because there were code paths where vmstate_save_state_v() (called internally by vmstate_save_state) did not explicitly set errors on failure. This change unifies error handling by - updating vmstate_save_state() to accept an Error **errp argument. - vmstate_save_state_v() ensures errors are set directly within the errp object, eliminating the need for two separate functions. All calls to vmstate_save_state_with_err() are replaced with vmstate_save_state(). This simplifies the API and improves code maintainability. vmstate_save_state() that only calls vmstate_save_state_v(), by inference, also has errors set in errp in case of failure. The errors are reported using error_report_err(). If we want the function to exit on error, then &error_fatal is passed. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-24-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	c632ffbd74	migration: push Error **errp into vmstate_load_state() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that vmstate_load_state() must report an error in errp, in case of failure. The errors are temporarily reported using error_report_err(). This is removed in the subsequent patches in this series, when we are actually able to propagate the error to the calling function using errp. Whereas, if we want the function to exit on error, then error_fatal is passed. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-2-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:01 -04:00
Philippe Mathieu-Daudé	f0b52aa08a	hw/vfio: Use uint64_t for IOVA mapping size in vfio_container_dma_*map The 'ram_addr_t' type is described as: a QEMU internal address space that maps guest RAM physical addresses into an intermediate address space that can map to host virtual address spaces. This doesn't represent well an IOVA mapping size. Simply use the uint64_t type. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250930123528.42878-5-philmd@linaro.org Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-10-02 10:41:23 +02:00
Philippe Mathieu-Daudé	0ca70d3bf7	hw/vfio: Avoid ram_addr_t in vfio_container_query_dirty_bitmap() The 'ram_addr_t' type is described as: a QEMU internal address space that maps guest RAM physical addresses into an intermediate address space that can map to host virtual address spaces. vfio_container_query_dirty_bitmap() doesn't expect such QEMU intermediate address, but a guest physical addresses. Use the appropriate 'hwaddr' type, rename as @translated_addr for clarity. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250930123528.42878-4-philmd@linaro.org Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-10-02 10:41:23 +02:00
Philippe Mathieu-Daudé	5764a71527	hw/vfio: Reorder vfio_container_query_dirty_bitmap() trace format Update the trace-events comments after the changes from commit `dcce51b193` ("hw/vfio/container-base.c: rename file to container.c") and commit `a3bcae62b6` ("hw/vfio/container.c: rename file to container-legacy.c"). Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250930123528.42878-3-philmd@linaro.org Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-10-02 10:41:23 +02:00
Cédric Le Goater	1d9a832b58	vfio: Remove workaround for kernel DMA unmap overflow bug A kernel bug was introduced in Linux v4.15 via commit 71a7d3d78e3c ("vfio/type1: Check for address space wrap-around on unmap"), which added a test for address space wrap-around in the vfio DMA unmap path. Unfortunately, due to an integer overflow, the kernel would incorrectly detect an unmap of the last page in the 64-bit address space as a wrap-around, causing the unmap to fail with -EINVAL. A QEMU workaround was introduced in commit `567d7d3e6b` ("vfio/common: Work around kernel overflow bug in DMA unmap") to retry the unmap, excluding the final page of the range. The kernel bug was then fixed in Linux v5.0 via commit 58fec830fc19 ("vfio/type1: Fix dma_unmap wrap-around check"). Since the oldest supported LTS kernel is now v5.4, kernels affected by this bug are considered deprecated, and the workaround is no longer necessary. This change reverts `567d7d3e6b`, removing the workaround. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1662291 Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Link: https://lore.kernel.org/qemu-devel/20250926085423.375547-1-clg@redhat.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-10-02 10:41:23 +02:00
Mark Cave-Ayland	5bdf0db823	vfio/pci.c: rename vfio_pci_nohotplug_dev_info to vfio_pci_nohotplug_info This changes the prefix to match the name of the QOM type. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-23-mark.caveayland@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-09-25 17:55:20 +02:00

1 2 3 4 5 ...

1132 Commits