\documentclass[11pt,twoside,final,openright]{xenstyle} \usepackage{a4,graphicx,setspace,times} \usepackage{comment,parskip} \setstretch{1.15} \begin{document} % TITLE PAGE \pagestyle{empty} \begin{center} \vspace*{\fill} \includegraphics{figs/xenlogo.eps} \vfill \vfill \vfill \begin{tabular}{l} {\Huge \bf Interface manual} \\[4mm] {\huge Xen v2.0 for x86} \\[80mm] {\Large Xen is Copyright (c) 2002-2004, The Xen Team} \\[3mm] {\Large University of Cambridge, UK} \\[20mm] \end{tabular} \end{center} {\bf DISCLAIMER: This documentation is currently under active development and as such there may be mistakes and omissions --- watch out for these and please report any you find to the developer's mailing list. Contributions of material, suggestions and corrections are welcome. } \vfill \cleardoublepage % TABLE OF CONTENTS \pagestyle{plain} \pagenumbering{roman} { \parskip 0pt plus 1pt \tableofcontents } \cleardoublepage % PREPARE FOR MAIN TEXT \pagenumbering{arabic} \raggedbottom \widowpenalty=10000 \clubpenalty=10000 \parindent=0pt \parskip=5pt \renewcommand{\topfraction}{.8} \renewcommand{\bottomfraction}{.8} \renewcommand{\textfraction}{.2} \renewcommand{\floatpagefraction}{.8} \setstretch{1.15} \chapter{Introduction} Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned, allowing multiple different {\em guest} operating system images to be run simultaneously. Virtualizing the machine in this manner provides considerable flexibility, for example allowing different users to choose their preferred operating system (e.g., Linux, NetBSD, or a custom operating system). Furthermore, Xen provides secure partitioning between virtual machines (known as {\em domains} in Xen terminology), and enables better resource accounting and QoS isolation than can be achieved with a conventional operating system. Xen essentially takes a `whole machine' virtualization approach as pioneered by IBM VM/370. However, unlike VM/370 or more recent efforts such as VMWare and Virtual PC, Xen does not attempt to completely virtualize the underlying hardware. Instead parts of the hosted guest operating systems are modified to work with the VMM; the operating system is effectively ported to a new target architecture, typically requiring changes in just the machine-dependent code. The user-level API is unchanged, and so existing binaries and operating system distributions work without modification. In addition to exporting virtualized instances of CPU, memory, network and block devices, Xen exposes a control interface to manage how these resources are shared between the running domains. Access to the control interface is restricted: it may only be used by one specially-privileged VM, known as {\em domain-0}. This domain is a required part of any Xen-based server and runs the application software that manages the control-plane aspects of the platform. Running the control software in {\it domain-0}, distinct from the hypervisor itself, allows the Xen framework to separate the notions of mechanism and policy within the system. \chapter{Virtual Architecture} On a Xen-based system, the hypervisor itself runs in {\it ring 0}. It has full access to the physical memory available in the system and is responsible for allocating portions of it to the domains. Guest operating systems run in and use {\it rings 1}, {\it 2} and {\it 3} as they see fit. Segmentation is used to prevent the guest OS from accessing the portion of the address space that is reserved for Xen. We expect most guest operating systems will use ring 1 for their own operation and place applications in ring 3. In this chapter we consider the basic virtual architecture provided by Xen: the basic CPU state, exception and interrupt handling, and time. Other aspects such as memory and device access are discussed in later chapters. \section{CPU state} All privileged state must be handled by Xen. The guest OS has no direct access to CR3 and is not permitted to update privileged bits in EFLAGS. Guest OSes use \emph{hypercalls} to invoke operations in Xen; these are analagous to system calls but occur from ring 1 to ring 0. \section{Exceptions} The IDT is virtualised by submitting to Xen a table of trap handlers. Most trap handlers are identical to native x86 handlers, although the page-fault handler is somewhat different. \section{Interrupts and events} Interrupts are virtualized by mapping them to events, which are delivered asynchronously to the target domain. A guest OS can map these events onto its standard interrupt dispatch mechanisms. Xen is responsible for determining the target domain that will handle each physical interrupt source. \section{Time} Guest operating systems need to be aware of the passage of both real (or wallclock) time and their own `virtual time' (the time for which they have been executing). Furthermore, Xen has a notion of time which is used for scheduling. The following notions of time are provided: \begin{description} \item[Cycle counter time.] This provides a fine-grained time reference. The cycle counter time is used to accurately extrapolate the other time references. On SMP machines it is currently assumed that the cycle counter time is synchronised between CPUs. The current x86-based implementation achieves this within inter-CPU communication latencies. \item[System time.] This is a 64-bit counter which holds the number of nanoseconds that have elapsed since system boot. \item[Wall clock time.] This is the time of day in a Unix-style {\tt struct timeval} (seconds and microseconds since 1 January 1970, adjusted by leap seconds). An NTP client hosted by {\it domain-0} can keep this value accurate. \item[Domain virtual time.] This progresses at the same pace as system time, but only while a domain is executing --- it stops while a domain is de-scheduled. Therefore the share of the CPU that a domain receives is indicated by the rate at which its virtual time increases. \end{description} Xen exports timestamps for system time and wall-clock time to guest operating systems through a shared page of memory. Xen also provides the cycle counter time at the instant the timestamps were calculated, and the CPU frequency in Hertz. This allows the guest to extrapolate system and wall-clock times accurately based on the current cycle counter time. Since all time stamps need to be updated and read \emph{atomically} two version numbers are also stored in the shared info page. The first is incremented prior to an update, while the second is only incremented afterwards. Thus a guest can be sure that it read a consistent state by checking the two version numbers are equal. Xen includes a periodic ticker which sends a timer event to the currently executing domain every 10ms. The Xen scheduler also sends a timer event whenever a domain is scheduled; this allows the guest OS to adjust for the time that has passed while it has been inactive. In addition, Xen allows each domain to request that they receive a timer event sent at a specified system time. Guest OSes may use this timer to implement timeout values when they block. \chapter{Memory} Xen is responsible for managing the allocation of physical memory to domains, and for ensuring safe use of the paging and segmentation hardware. \section{Memory Allocation} Xen resides within a small fixed portion of physical memory and reserves the top 64MB of every virtual address space. The remaining physical memory is available for allocation to domains at a page granularity. Xen tracks the ownership and use of each page, which allows it to enforce secure partitioning between domains. Each domain has a maximum and current physical memory allocation. A guest OS may run a `balloon driver' to dynamically adjust its current memory allocation up to its limit. \section{Page Table Updates} In the default mode of operation, Xen enforces read-only access to page tables and requires guest operating systems to explicitly request any modifications. Xen validates all such requests and only applies updates that it deems safe. This is necessary to prevent domains from adding arbitrary mappings to their page tables. To aid validation, Xen associates a type and reference count with each memory page. A page has one of the following mutually-exclusive types at any point in time: page directory ({\sf PD}), page table ({\sf PT}), local descriptor table ({\sf LDT}), global descriptor table ({\sf GDT}), or writable ({\sf RW}). Note that a guest OS may always create readable mappings of its own memory regardless of its current type. %%% XXX: possibly explain more about ref count 'lifecyle' here? This mechanism is used to maintain the invariants required for safety; for example, a domain cannot have a writable mapping to any part of a page table as this would require the page concerned to simultaneously be of types {\sf PT} and {\sf RW}. %\section{Writable Page Tables} Xen also provides an alternative mode of operation in which guests be have the illusion that their page tables are directly writable. Of course this is not really the case, since Xen must still validate modifications to ensure secure partitioning. To this end, Xen traps any write attempt to a memory page of type {\sf PT} (i.e., that is currently part of a page table). If such an access occurs, Xen temporarily allows write access to that page while at the same time {\em disconnecting} it from the page table that is currently in use. This allows the guest to safely make updates to the page because the newly-updated entries cannot be used by the MMU until Xen revalidates and reconnects the page. Reconnection occurs automatically in a number of situations: for example, when the guest modifies a different page-table page, when the domain is preempted, or whenever the guest uses Xen's explicit page-table update interfaces. \section{Segment Descriptor Tables} On boot a guest is supplied with a default GDT, which does not reside within its own memory allocation. If the guest wishes to use other than the default `flat' ring-1 and ring-3 segments that this GDT provides, it must register a custom GDT and/or LDT with Xen, allocated from its own memory. Note that a number of GDT entries are reserved by Xen -- any custom GDT must also include sufficent space for these entries. For example, the following hypercall is used to specify a new GDT: \begin{quote} int {\bf set\_gdt}(unsigned long *{\em frame\_list}, int {\em entries}) {\em frame\_list}: An array of up to 16 page frames within which the GDT resides. Any frame registered as a GDT frame may only be mapped read-only within the guest's address space (e.g., no writable mappings, no use as a page-table page, and so on). {\em entries}: The number of descriptor-entry slots in the GDT. Note that the table must be large enough to contain Xen's reserved entries; thus we must have `{\em entries $>$ LAST\_RESERVED\_GDT\_ENTRY}\ '. Note also that, after registering the GDT, slots {\em FIRST\_} through {\em LAST\_RESERVED\_GDT\_ENTRY} are no longer usable by the guest and may be overwritten by Xen. \end{quote} XXX SMH: HERE \section{Pseudo-Physical Memory} The usual problem of external fragmentation means that a domain is unlikely to receive a contiguous stretch of physical memory. However, most guest operating systems do not have built-in support for operating in a fragmented physical address space e.g. Linux has to have a one-to-one mapping for its physical memory. There a notion of {\it pseudo physical memory} is introdouced. Xen maintains a {\it real physical} to {\it pseudo physical} mapping which can be consulted by every domain. Additionally, at its start of day, a domain is supplied a {\it pseudo physical} to {\it real physical} mapping which it needs to keep updated itself. From that moment onwards {\it pseudo physical} addresses are used instead of discontiguous {\it real physical} addresses. Thus, the rest of the guest OS code has an impression of operating in a contiguous address space. Guest OS page tables contain real physical addresses. Mapping {\it pseudo physical} to {\it real physical} addresses is needed on page table updates and also on remapping memory regions with the guest OS. \section{start of day xxx} Start-of-day issues such as building initial page tables for a domain, loading its kernel image and so on are done by the {\it domain builder} running in user-space in {\it domain0}. Paging to disk and swapping is handled by the guest operating systems themselves, if they need it. The amount of memory required by the domain is passed to the hypervisor as one of the parameters for new domain initialization by the domain builder. \chapter{Network I/O} Virtual network device services are provided by shared memory communications with a `backend' domain. From the point of view of other domains, the backend may be viewed as a virtual ethernet switch element with each domain having one or more virtual network interfaces connected to it. \section{Backend Packet Handling} The backend driver is responsible primarily for {\it data-path} operations. In terms of networking this means packet transmission and reception. On the transmission side, the backend needs to perform two key actions: \begin{itemize} \item {\tt Validation:} A domain may only be allowed to emit packets matching a certain specification; for example, ones in which the source IP address matches one assigned to the virtual interface over which it is sent. The backend would be responsible for ensuring any such requirements are met, either by checking or by stamping outgoing packets with prescribed values for certain fields. Validation functions can be configured using standard firewall rules (i.e. IP Tables, in the case of Linux). \item {\tt Scheduling:} Since a number of domains can share a single ``real'' network interface, the hypervisor must mediate access when several domains each have packets queued for transmission. Of course, this general scheduling function subsumes basic shaping or rate-limiting schemes. \item {\tt Logging and Accounting:} The hypervisor can be configured with classifier rules that control how packets are accounted or logged. For example, {\it domain0} could request that it receives a log message or copy of the packet whenever another domain attempts to send a TCP packet containg a SYN. \end{itemize} On the recive side, the backend's role is relatively straightforward: once a packet is received, it just needs to determine the virtual interface(s) to which it must be delivered and deliver it via page-flipping. \section{Data Transfer} Each virtual interface uses two ``descriptor rings'', one for transmit, the other for receive. Each descriptor identifies a block of contiguous physical memory allocated to the domain. There are four cases: \begin{itemize} \item The transmit ring carries packets to transmit from the domain to the hypervisor. \item The return path of the transmit ring carries ``empty'' descriptors indicating that the contents have been transmitted and the memory can be re-used. \item The receive ring carries empty descriptors from the domain to the hypervisor; these provide storage space for that domain's received packets. \item The return path of the receive ring carries packets that have been received. \end{itemize} Real physical addresses are used throughout, with the domain performing translation from pseudo-physical addresses if that is necessary. If a domain does not keep its receive ring stocked with empty buffers then packets destined to it may be dropped. This provides some defense against receiver-livelock problems because an overload domain will cease to receive further data. Similarly, on the transmit path, it provides the application with feedback on the rate at which packets are able to leave the system. Synchronization between the hypervisor and the domain is achieved using counters held in shared memory that is accessible to both. Each ring has associated producer and consumer indices indicating the area in the ring that holds descriptors that contain data. After receiving {\it n} packets or {\t nanoseconds} after receiving the first packet, the hypervisor sends an event to the domain. \chapter{Block I/O} \section{Virtual Block Devices (VBDs)} All guest OS disk access goes through the VBD interface. The VBD interface provides the administrator with the ability to selectively grant domains access to portions of block storage devices visible to the the block backend device (usually domain 0). VBDs can literally be backed by any block device accessible to the backend domain, including network-based block devices (iSCSI, *NBD, etc), loopback devices and LVM / MD devices. Old (Xen 1.2) virtual disks are not supported under Xen 2.0, since similar functionality can be achieved using the (more advanced) LVM system, which is already in widespread use. \subsection{Data Transfer} Domains which have been granted access to a logical block device are permitted to read and write it by shared memory communications with the backend domain. In overview, the same style of descriptor-ring that is used for network packets is used here. Each domain has one ring that carries operation requests to the hypervisor and carries the results back again. Rather than copying data, the backend simply maps the domain's buffers in order to enable direct DMA to them. The act of mapping the buffers also increases the reference counts of the underlying pages, so that the unprivileged domain cannot try to return them to the hypervisor, install them as page tables, or any other unsafe behaviour. %block API here \chapter{Privileged operations} {\it Domain0} is responsible for building all other domains on the server and providing control interfaces for managing scheduling, networking, and blocks. \chapter{CPU Scheduler} Xen offers a uniform API for CPU schedulers. It is possible to choose from a number of schedulers at boot and it should be easy to add more. \paragraph*{Note: SMP host support} Xen has always supported SMP host systems. Domains are statically assigned to CPUs, either at creation time or when manually pinning to a particular CPU. The current schedulers then run locally on each CPU to decide which of the assigned domains should be run there. \section{Standard Schedulers} These BVT, Atropos and Round Robin schedulers are part of the normal Xen distribution. BVT provides proportional fair shares of the CPU to the running domains. Atropos can be used to reserve absolute shares of the CPU for each domain. Round-robin is provided as an example of Xen's internal scheduler API. More information on the characteristics and use of these schedulers is available in { \tt Sched-HOWTO.txt }. \begin{comment} \section{Scheduling API} The scheduling API is used by both the schedulers described above and should also be used by any new schedulers. It provides a generic interface and also implements much of the ``boilerplate'' code. Schedulers conforming to this API are described by the following structure: \begin{verbatim} struct scheduler { char *name; /* full name for this scheduler */ char *opt_name; /* option name for this scheduler */ unsigned int sched_id; /* ID for this scheduler */ int (*init_scheduler) (); int (*alloc_task) (struct task_struct *); void (*add_task) (struct task_struct *); void (*free_task) (struct task_struct *); void (*rem_task) (struct task_struct *); void (*wake_up) (struct task_struct *); void (*do_block) (struct task_struct *); task_slice_t (*do_schedule) (s_time_t); int (*control) (struct sched_ctl_cmd *); int (*adjdom) (struct task_struct *, struct sched_adjdom_cmd *); s32 (*reschedule) (struct task_struct *); void (*dump_settings) (void); void (*dump_cpu_state) (int); void (*dump_runq_el) (struct task_struct *); }; \end{verbatim} The only method that {\em must} be implemented is {\tt do\_schedule()}. However, if there is not some implementation for the {\tt wake\_up()} method then waking tasks will not get put on the runqueue! The fields of the above structure are described in more detail below. \subsubsection{name} The name field should point to a descriptive ASCII string. \subsubsection{opt\_name} This field is the value of the {\tt sched=} boot-time option that will select this scheduler. \subsubsection{sched\_id} This is an integer that uniquely identifies this scheduler. There should be a macro corrsponding to this scheduler ID in {\tt }. \subsubsection{init\_scheduler} \paragraph*{Purpose} This is a function for performing any scheduler-specific initialisation. For instance, it might allocate memory for per-CPU scheduler data and initialise it appropriately. \paragraph*{Call environment} This function is called after the initialisation performed by the generic layer. The function is called exactly once, for the scheduler that has been selected. \paragraph*{Return values} This should return negative on failure --- this will cause an immediate panic and the system will fail to boot. \subsubsection{alloc\_task} \paragraph*{Purpose} Called when a {\tt task\_struct} is allocated by the generic scheduler layer. A particular scheduler implementation may use this method to allocate per-task data for this task. It may use the {\tt sched\_priv} pointer in the {\tt task\_struct} to point to this data. \paragraph*{Call environment} The generic layer guarantees that the {\tt sched\_priv} field will remain intact from the time this method is called until the task is deallocated (so long as the scheduler implementation does not change it explicitly!). \paragraph*{Return values} Negative on failure. \subsubsection{add\_task} \paragraph*{Purpose} Called when a task is initially added by the generic layer. \paragraph*{Call environment} The fields in the {\tt task\_struct} are now filled out and available for use. Schedulers should implement appropriate initialisation of any per-task private information in this method. \subsubsection{free\_task} \paragraph*{Purpose} Schedulers should free the space used by any associated private data structures. \paragraph*{Call environment} This is called when a {\tt task\_struct} is about to be deallocated. The generic layer will have done generic task removal operations and (if implemented) called the scheduler's {\tt rem\_task} method before this method is called. \subsubsection{rem\_task} \paragraph*{Purpose} This is called when a task is being removed from scheduling (but is not yet being freed). \subsubsection{wake\_up} \paragraph*{Purpose} Called when a task is woken up, this method should put the task on the runqueue (or do the scheduler-specific equivalent action). \paragraph*{Call environment} The task is already set to state RUNNING. \subsubsection{do\_block} \paragraph*{Purpose} This function is called when a task is blocked. This function should not remove the task from the runqueue. \paragraph*{Call environment} The EVENTS\_MASTER\_ENABLE\_BIT is already set and the task state changed to TASK\_INTERRUPTIBLE on entry to this method. A call to the {\tt do\_schedule} method will be made after this method returns, in order to select the next task to run. \subsubsection{do\_schedule} This method must be implemented. \paragraph*{Purpose} The method is called each time a new task must be chosen for scheduling on the current CPU. The current time as passed as the single argument (the current task can be found using the {\tt current} macro). This method should select the next task to run on this CPU and set it's minimum time to run as well as returning the data described below. This method should also take the appropriate action if the previous task has blocked, e.g. removing it from the runqueue. \paragraph*{Call environment} The other fields in the {\tt task\_struct} are updated by the generic layer, which also performs all Xen-specific tasks and performs the actual task switch (unless the previous task has been chosen again). This method is called with the {\tt schedule\_lock} held for the current CPU and local interrupts disabled. \paragraph*{Return values} Must return a {\tt struct task\_slice} describing what task to run and how long for (at maximum). \subsubsection{control} \paragraph*{Purpose} This method is called for global scheduler control operations. It takes a pointer to a {\tt struct sched\_ctl\_cmd}, which it should either source data from or populate with data, depending on the value of the {\tt direction} field. \paragraph*{Call environment} The generic layer guarantees that when this method is called, the caller selected the correct scheduler ID, hence the scheduler's implementation does not need to sanity-check these parts of the call. \paragraph*{Return values} This function should return the value to be passed back to user space, hence it should either be 0 or an appropriate errno value. \subsubsection{sched\_adjdom} \paragraph*{Purpose} This method is called to adjust the scheduling parameters of a particular domain, or to query their current values. The function should check the {\tt direction} field of the {\tt sched\_adjdom\_cmd} it receives in order to determine which of these operations is being performed. \paragraph*{Call environment} The generic layer guarantees that the caller has specified the correct control interface version and scheduler ID and that the supplied {\tt task\_struct} will not be deallocated during the call (hence it is not necessary to {\tt get\_task\_struct}). \paragraph*{Return values} This function should return the value to be passed back to user space, hence it should either be 0 or an appropriate errno value. \subsubsection{reschedule} \paragraph*{Purpose} This method is called to determine if a reschedule is required as a result of a particular task. \paragraph*{Call environment} The generic layer will cause a reschedule if the current domain is the idle task or it has exceeded its minimum time slice before a reschedule. The generic layer guarantees that the task passed is not currently running but is on the runqueue. \paragraph*{Return values} Should return a mask of CPUs to cause a reschedule on. \subsubsection{dump\_settings} \paragraph*{Purpose} If implemented, this should dump any private global settings for this scheduler to the console. \paragraph*{Call environment} This function is called with interrupts enabled. \subsubsection{dump\_cpu\_state} \paragraph*{Purpose} This method should dump any private settings for the specified CPU. \paragraph*{Call environment} This function is called with interrupts disabled and the {\tt schedule\_lock} for the specified CPU held. \subsubsection{dump\_runq\_el} \paragraph*{Purpose} This method should dump any private settings for the specified task. \paragraph*{Call environment} This function is called with interrupts disabled and the {\tt schedule\_lock} for the task's CPU held. \chapter{Debugging} Xen provides tools for debugging both Xen and guest OSes. Currently, the Pervasive Debugger provides a GDB stub, which provides facilities for symbolic debugging of Xen itself and of OS kernels running on top of Xen. The Trace Buffer provides a lightweight means to log data about Xen's internal state and behaviour at runtime, for later analysis. \section{Pervasive Debugger} Information on using the pervasive debugger is available in pdb.txt. \section{Trace Buffer} The trace buffer provides a means to observe Xen's operation from domain 0. Trace events, inserted at key points in Xen's code, record data that can be read by the {\tt xentrace} tool. Recording these events has a low overhead and hence the trace buffer may be useful for debugging timing-sensitive behaviours. \subsection{Internal API} To use the trace buffer functionality from within Xen, you must {\tt \#include }, which contains definitions related to the trace buffer. Trace events are inserted into the buffer using the {\tt TRACE\_xD} ({\tt x} = 0, 1, 2, 3, 4 or 5) macros. These all take an event number, plus {\tt x} additional (32-bit) data as their arguments. For trace buffer-enabled builds of Xen these will insert the event ID and data into the trace buffer, along with the current value of the CPU cycle-counter. For builds without the trace buffer enabled, the macros expand to no-ops and thus can be left in place without incurring overheads. \subsection{Trace-enabled builds} By default, the trace buffer is enabled only in debug builds (i.e. {\tt NDEBUG} is not defined). It can be enabled separately by defining {\tt TRACE\_BUFFER}, either in {\tt } or on the gcc command line. The size (in pages) of the per-CPU trace buffers can be specified using the {\tt tbuf\_size=n } boot parameter to Xen. If the size is set to 0, the trace buffers will be disabled. \subsection{Dumping trace data} When running a trace buffer build of Xen, trace data are written continuously into the buffer data areas, with newer data overwriting older data. This data can be captured using the {\tt xentrace} program in Domain 0. The {\tt xentrace} tool uses {\tt /dev/mem} in domain 0 to map the trace buffers into its address space. It then periodically polls all the buffers for new data, dumping out any new records from each buffer in turn. As a result, for machines with multiple (logical) CPUs, the trace buffer output will not be in overall chronological order. The output from {\tt xentrace} can be post-processed using {\tt xentrace\_cpusplit} (used to split trace data out into per-cpu log files) and {\tt xentrace\_format} (used to pretty-print trace data). For the predefined trace points, there is an example format file in {\tt tools/xentrace/formats }. For more information, see the manual pages for {\tt xentrace}, {\tt xentrace\_format} and {\tt xentrace\_cpusplit}. \appendix \newcommand{\hypercall}[1]{\vspace{5mm}{\large\sf #1}} \chapter{Xen Hypercalls} \hypercall{ set\_trap\_table(trap\_info\_t *table)} Install trap handler table. \hypercall{ mmu\_update(mmu\_update\_t *req, int count, int *success\_count)} Update the page table for the domain. Updates can be batched. success\_count will be updated to report the number of successfull updates. The update types are: {\it MMU\_NORMAL\_PT\_UPDATE}: {\it MMU\_MACHPHYS\_UPDATE}: {\it MMU\_EXTENDED\_COMMAND}: \hypercall{ set\_gdt(unsigned long *frame\_list, int entries)} Set the global descriptor table - virtualization for lgdt. \hypercall{ stack\_switch(unsigned long ss, unsigned long esp)} Request context switch from hypervisor. \hypercall{ set\_callbacks(unsigned long event\_selector, unsigned long event\_address, unsigned long failsafe\_selector, unsigned long failsafe\_address) } Register OS event processing routine. In Linux both the event\_selector and failsafe\_selector are the kernel's CS. The value event\_address specifies the address for an interrupt handler dispatch routine and failsafe\_address specifies a handler for application faults. \hypercall{ fpu\_taskswitch(void)} Notify hypervisor that fpu registers needed to be save on context switch. \hypercall{ sched\_op(unsigned long op)} Request scheduling operation from hypervisor. The options are: {\it yield}, {\it block}, and {\it shutdown}. {\it yield} keeps the calling domain run-able but may cause a reschedule if other domains are run-able. {\it block} removes the calling domain from the run queue and the domains sleeps until an event is delivered to it. {\it shutdown} is used to end the domain's execution and allows to specify whether the domain should reboot, halt or suspend.. \hypercall{ dom0\_op(dom0\_op\_t *op)} Administrative domain operations for domain management. The options are: {\it DOM0\_CREATEDOMAIN}: create new domain, specifying the name and memory usage in kilobytes. {\it DOM0\_CREATEDOMAIN}: create domain {\it DOM0\_PAUSEDOMAIN}: mark domain as unschedulable {\it DOM0\_UNPAUSEDOMAIN}: mark domain as schedulable {\it DOM0\_DESTROYDOMAIN}: deallocate resources associated with the domain {\it DOM0\_GETMEMLIST}: get list of pages used by the domain {\it DOM0\_SCHEDCTL}: {\it DOM0\_ADJUSTDOM}: adjust scheduling priorities for domain {\it DOM0\_BUILDDOMAIN}: do final guest OS setup for domain {\it DOM0\_GETDOMAINFO}: get statistics about the domain {\it DOM0\_GETPAGEFRAMEINFO}: {\it DOM0\_IOPL}: set IO privilege level {\it DOM0\_MSR}: {\it DOM0\_DEBUG}: interactively call pervasive debugger {\it DOM0\_SETTIME}: set system time {\it DOM0\_READCONSOLE}: read console content from hypervisor buffer ring {\it DOM0\_PINCPUDOMAIN}: pin domain to a particular CPU {\it DOM0\_GETTBUFS}: get information about the size and location of the trace buffers (only on trace-buffer enabled builds) {\it DOM0\_PHYSINFO}: get information about the host machine {\it DOM0\_PCIDEV\_ACCESS}: modify PCI device access permissions {\it DOM0\_SCHED\_ID}: get the ID of the current Xen scheduler {\it DOM0\_SHADOW\_CONTROL}: {\it DOM0\_SETDOMAINNAME}: set the name of a domain {\it DOM0\_SETDOMAININITIALMEM}: set initial memory allocation of a domain {\it DOM0\_SETDOMAINMAXMEM}: set maximum memory allocation of a domain {\it DOM0\_GETPAGEFRAMEINFO2}: {\it DOM0\_SETDOMAINVMASSIST}: set domain VM assist options \hypercall{ set\_debugreg(int reg, unsigned long value)} set debug register reg to value \hypercall{ get\_debugreg(int reg)} get the debug register reg \hypercall{ update\_descriptor(unsigned long ma, unsigned long word1, unsigned long word2)} \hypercall{ set\_fast\_trap(int idx)} install traps to allow guest OS to bypass hypervisor \hypercall{ dom\_mem\_op(unsigned int op, unsigned long *extent\_list, unsigned long nr\_extents, unsigned int extent\_order)} Increase or decrease memory reservations for guest OS \hypercall{ multicall(void *call\_list, int nr\_calls)} Execute a series of hypervisor calls \hypercall{ update\_va\_mapping(unsigned long page\_nr, unsigned long val, unsigned long flags)} \hypercall{ set\_timer\_op(uint64\_t timeout)} Request a timer event to be sent at the specified system time. \hypercall{ event\_channel\_op(void *op)} Inter-domain event-channel management. \hypercall{ xen\_version(int cmd)} Request Xen version number. \hypercall{ console\_io(int cmd, int count, char *str)} Interact with the console, operations are: {\it CONSOLEIO\_write}: Output count characters from buffer str. {\it CONSOLEIO\_read}: Input at most count characters into buffer str. \hypercall{ physdev\_op(void *physdev\_op)} \hypercall{ grant\_table\_op(unsigned int cmd, void *uop, unsigned int count)} \hypercall{ vm\_assist(unsigned int cmd, unsigned int type)} \hypercall{ update\_va\_mapping\_otherdomain(unsigned long page\_nr, unsigned long val, unsigned long flags, uint16\_t domid)} \end{comment} \end{document}