| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 00a4b65f8534c9e6521eab2e6ce796ae36037774 Sep 7 2010
"libxc: provide notification of final checkpoint to restore end"
broke migration from any version of Xen using tools from prior to that commit
Older tools have no idea about an XC_SAVE_ID_LAST_CHECKPOINT, causing newer
tools xc_domain_restore() to start reading the qemu save record, as
ctx->last_checkpoint is 0.
The failure looks like:
xc: error: Max batch size exceeded (1970103633). Giving up.
where 1970103633 = 0x756d6551 = *(uint32_t*)"Qemu"
With this fix in place, the behaviour for normal migrations is reverted to how
it was before the regression; the migration is considered non-checkpointed
right from the start. A XC_SAVE_ID_LAST_CHECKPOINT chunk seen in the
migration stream is a nop. For checkpointed migrations the behaviour is
unchanged.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> (Remus bits)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in current xen x86_64, the default libexec directory is /usr/lib/xen/bin,
while the private binder is /usr/lib64/xen/bin. but some commands(pygrub,
libxl-save-helper) located in private binder directory is called from
libexec directory which lead to the following error:
1, for pygrub bootloader:
libxl: debug: libxl_bootloader.c:429:bootloader_disk_attached_cb: /usr/lib/xen/bin/pygrub doesn't exist, falling back to config path
2, for libxl-save-helper:
libxl: cannot execute /usr/lib/xen/bin/libxl-save-helper: No such file or directory
libxl: error: libxl_utils.c:363:libxl_read_exactly: file/stream truncated reading ipc msg header from domain 3 save/restore helper stdout pipe
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: domain 3 save/restore helper [10222] exited with error status 255
there are two ways to fix above error. the first one is make such command
store in the /usr/lib/xen/bin and /usr/lib64/xen/bin(symbol link to
previous), e.g. qemu-dm. The second way is using private binder dir
instead of libexec dir. e.g. xenconsole.
For these cases, the latter one is suitable.
Signed-off-by: Bamvor Jian Zhang <bjzhang@suse.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current migration code in libxl instructs qemu to start or stop
logdirty, but it does not wait for an acknowledgement from qemu before
continuing. This might lead to memory corruption (!)
Fix this by waiting for qemu to acknowledge the command.
Unfortunately the necessary ao arrangements for waiting for this
command are unique because qemu has a special protocol for this
particular operation.
Also, this change means that the switch_qemu_logdirty callback
implementation in libxl can no longer synchronously produce its return
value, as it now needs to wait for xenstore. So we tell the
marshalling code generator that it is a message which does not need a
reply. This turns the callback function called by the marshaller into
one which returns void; the callback function arranges to later
explicitly sends the reply to the helper, when the xs watch triggers
and the appropriate value is read from xenstore.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
libxenctrl expects to be able to simply run the save or restore
operation synchronously. This won't work well in a process which is
trying to handle multiple domains.
The options are:
- Block such a whole process (eg, the whole of libvirt) while
migration completes (or until it fails).
- Create a thread to run xc_domain_save and xc_domain_restore on.
This is quite unpalatable. Multithreaded programming is error
prone enough without generating threads in libraries, particularly
if the thread does some very complex operation.
- Fork and run the operation in the child without execing. This is
no good because we would need to negotiate with the caller about
fds we would inherit (and we might be a very large process).
- Fork and exec a helper.
Of these options the latter is the most palatable.
Consequently:
* A new helper program libxl-save-helper (which does both save and
restore). It will be installed in /usr/lib/xen/bin. It does not
link against libxl, only libxc, and its error handling does not
need to be very advanced. It does contain a plumbing through of
the logging interface into the callback stream.
* A small ad-hoc protocol between the helper and libxl which allows
log messages and the libxc callbacks to be passed up and down.
Protocol doc comment is in libxl_save_helper.c.
* To avoid a lot of tedium the marshalling boilerplate (stubs for the
helper and the callback decoder for libxl) is generated with a
small perl script.
* Implement new functionality to spawn the helper, monitor its
output, provide responses, and check on its exit status.
* The functions libxl__xc_domain_restore_done and
libxl__xc_domain_save_done now turn out to want be called in the
same place. So make their state argument a void* so that the two
functions are type compatible.
The domain save path still writes the qemu savefile synchronously.
This will need to be fixed in a subsequent patch.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change the internal and external APIs for domain save (suspend) to be
capable of asynchronous operation. The implementation remains
synchronous. The interfaces surrounding device model saving are still
synchronous.
Public API changes:
* libxl_domain_save takes an ao_how.
* libxl_domain_remus_start takes an ao_how. If the
libxl_domain_remus_info is NULL, we abort rather than returning an
error.
* The `suspend_callback' function passed to libxl_domain_save is
never called by the existing implementation in libxl. Abolish it.
* libxl_domain_save takes its flags parameter as an argument.
Thus libxl_domain_suspend_info is abolished.
* XL_SUSPEND_* flags renamed to LIBXL_SAVE_*.
* Callers in xl updated.
Internal code restructuring:
* libxl__domain_suspend_state member types and names rationalised.
* libxl__domain_suspend renamed from libxl__domain_suspend_common.
(_common here actually meant "internal function").
* libxl__domain_suspend takes a libxl__domain_suspend_state, which
where the parameters to the operation are filled in by the caller.
* xc_domain_save is now called via libxl__xc_domain_save which can
itself become asynchronous.
* Consequently, libxl__domain_suspend is split into two functions at
the callback boundary; the second half is
libxl__xc_domain_save_done.
* libxl__domain_save_device_model is now called by the actual
implementation rather than by the public wrapper. It is already in
its proper place in the domain save execution sequence. So
officially make it part of that execution sequence, renaming it to
domain_save_device_model.
* Effectively, rewrite the public wrapper functions
libxl_domain_suspend and libxl_domain_remus_start.
* Remove a needless #include <xenctrl.h>
* libxl__domain_suspend aborts on unexpected domain types rather
than mysteriously returning EINVAL.
* struct save_callbacks moved from the stack to the dss.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
|
|
We are going to arrange that libxl, instead of calling
xc_domain_restore, calls a stub function which forks and execs a
helper program, so that restore can be asynchronous rather than
blocking the whole toolstack.
This stub function will be called libxl__xc_domain_restore.
However, its prospective call site is unsuitable for a function which
needs to make a callback, and is buried in two nested single-call-site
functions which are logically part of the domain creation procedure.
So we first abolish those single-call-site functions, integrate their
contents into domain creation in their proper temporal order, and
break out libxl__xc_domain_restore ready for its reimplementation.
No functional change - just the following reorganisation:
* Abolish libxl__domain_restore_common, as it had only one caller.
Move its contents into (what was) domain_restore.
* There is a new stage function domcreate_rebuild_done containing what
used to be the bulk of domcreate_bootloader_done, since
domcreate_bootloader_done now simply starts the restore (or does the
rebuild) and arranges to call the next stage.
* Move the contents of domain_restore into its correct place in the
domain creation sequence. We put it inside
domcreate_bootloader_done, which now either calls
libxl__xc_domain_restore which will call the new function
domcreate_rebuild_done, or calls domcreate_rebuild_done directly.
* Various general-purpose local variables (`i' etc.) and convenience
alias variables need to be shuffled about accordingly.
* Consequently libxl__toolstack_restore needs to gain external linkage
as it is now in a different file to its user.
* Move the xc_domain_save callbacks struct from the stack into
libxl__domain_create_state.
In general the moved code remains almost identical. Two returns in
what used to be libxl__domain_restore_common have been changed to set
the return value and "goto out", and the call sites for the abolished
and new functions have been adjusted.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
|