migration: Introduce POSTCOPY_DEVICE state

Currently, when postcopy starts, the source VM starts switchover and
sends a package containing the state of all non-postcopiable devices.
When the destination loads this package, the switchover is complete and
the destination VM starts. However, if the device state load fails or
the destination side crashes, the source side is already in
POSTCOPY_ACTIVE state and cannot be recovered, even when it has the most
up-to-date machine state as the destination has not yet started.

This patch introduces a new POSTCOPY_DEVICE state which is active while
the destination machine is loading the device state, is not yet running,
and the source side can be resumed in case of a migration failure.
Return-path is required for this state to function, otherwise it will be
skipped in favor of POSTCOPY_ACTIVE.

To transition from POSTCOPY_DEVICE to POSTCOPY_ACTIVE, the source
side uses a PONG message that is a response to a PING message processed
just before the POSTCOPY_RUN command that starts the destination VM.
Thus, this feature is effective even if the destination side does not
yet support this new state.

Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20251103183301.3840862-9-jmarcin@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
This commit is contained in:
Juraj Marcin
2025-11-03 19:32:57 +01:00
committed by Peter Xu
parent 0680dd185b
commit 7b842fe354
9 changed files with 75 additions and 11 deletions

View File

@@ -142,6 +142,12 @@
# @postcopy-active: like active, but now in postcopy mode.
# (since 2.5)
#
# @postcopy-device: like postcopy-active, but the destination is still
# loading device state and is not running yet. If migration fails
# during this state, the source side will resume. If there is no
# return-path from destination to source, this state is skipped.
# (since 10.2)
#
# @postcopy-paused: during postcopy but paused. (since 3.0)
#
# @postcopy-recover-setup: setup phase for a postcopy recovery
@@ -173,8 +179,8 @@
##
{ 'enum': 'MigrationStatus',
'data': [ 'none', 'setup', 'cancelling', 'cancelled',
'active', 'postcopy-active', 'postcopy-paused',
'postcopy-recover-setup',
'active', 'postcopy-device', 'postcopy-active',
'postcopy-paused', 'postcopy-recover-setup',
'postcopy-recover', 'completed', 'failed', 'colo',
'pre-switchover', 'device', 'wait-unplug' ] }
##