Files
qemu/system
Chuang Xu e8a6d158db migration: merge fragmented clear_dirty ioctls
In our long-term experience in Bytedance, we've found that under
the same load, live migration of larger VMs with more devices is
often more difficult to converge (requiring a larger downtime limit).

Through some testing and calculations, we conclude that bitmap sync time
affects the calculation of live migration bandwidth.

When the addresses processed are not aligned, a large number of
clear_dirty ioctl occur (e.g. a 4MB misaligned memory can generate
2048 clear_dirty ioctls from two different memory_listener),
which increases the time required for bitmap_sync and makes it
more difficult for dirty pages to converge.

For a 64C256G vm with 8 vhost-user-net(32 queue per nic) and
16 vhost-user-blk(4 queue per blk), the sync time is as high as *73ms*
(tested with 10GBps dirty rate, the sync time increases as the dirty
page rate increases), Here are each part of the sync time:

- sync from kvm to ram_list: 2.5ms
- vhost_log_sync:3ms
- sync aligned memory from ram_list to RAMBlock: 5ms
- sync misaligned memory from ram_list to RAMBlock: 61ms

Attempt to merge those fragmented clear_dirty ioctls, then syncing
misaligned memory from ram_list to RAMBlock takes only about 1ms,
and the total sync time is only *12ms*.

Signed-off-by: Chuang Xu <xuchuangxclwt@bytedance.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20251218114220.83354-1-xuchuangxclwt@bytedance.com
[peterx: drop var "offset" in physical_memory_sync_dirty_bitmap]
Signed-off-by: Peter Xu <peterx@redhat.com>
2025-12-23 09:24:34 -05:00
..
2024-12-20 17:44:56 +01:00
2025-04-23 14:08:44 -07:00
2025-09-16 17:31:54 +01:00
2025-05-28 08:07:59 +01:00
2024-12-20 17:44:56 +01:00
2025-11-03 10:05:39 +00:00