1 files changed, 0 insertions, 2216 deletions
diff --git a/docs/src/interface.tex b/docs/src/interface.tex
deleted file mode 100644
index dd061cbfff..0000000000
--- a/docs/src/interface.tex
+++ /dev/null
@@ -1,2216 +0,0 @@
-\documentclass[11pt,twoside,final,openright,a4paper]{report}
-\usepackage{graphicx,html,setspace,times}
-\usepackage{parskip}
-\setstretch{1.15}
-
-% LIBRARY FUNCTIONS
-
-\newcommand{\hypercall}[1]{\vspace{2mm}{\sf #1}}
-
-\begin{document}
-
-% TITLE PAGE
-\pagestyle{empty}
-\begin{center}
-\vspace*{\fill}
-\includegraphics{figs/xenlogo.eps}
-\vfill
-\vfill
-\vfill
-\begin{tabular}{l}
-{\Huge \bf Interface manual} \\[4mm]
-{\huge Xen v3.0 for x86} \\[80mm]
-
-{\Large Xen is Copyright (c) 2002-2005, The Xen Team} \\[3mm]
-{\Large University of Cambridge, UK} \\[20mm]
-\end{tabular}
-\end{center}
-
-{\bf DISCLAIMER: This documentation is always under active development
-and as such there may be mistakes and omissions --- watch out for
-these and please report any you find to the developer's mailing list.
-The latest version is always available on-line.  Contributions of
-material, suggestions and corrections are welcome.  }
-
-\vfill
-\cleardoublepage
-
-% TABLE OF CONTENTS
-\pagestyle{plain}
-\pagenumbering{roman}
-{ \parskip 0pt plus 1pt
-  \tableofcontents }
-\cleardoublepage
-
-% PREPARE FOR MAIN TEXT
-\pagenumbering{arabic}
-\raggedbottom
-\widowpenalty=10000
-\clubpenalty=10000
-\parindent=0pt
-\parskip=5pt
-\renewcommand{\topfraction}{.8}
-\renewcommand{\bottomfraction}{.8}
-\renewcommand{\textfraction}{.2}
-\renewcommand{\floatpagefraction}{.8}
-\setstretch{1.1}
-
-\chapter{Introduction}
-
-Xen allows the hardware resources of a machine to be virtualized and
-dynamically partitioned, allowing multiple different {\em guest}
-operating system images to be run simultaneously.  Virtualizing the
-machine in this manner provides considerable flexibility, for example
-allowing different users to choose their preferred operating system
-(e.g., Linux, NetBSD, or a custom operating system).  Furthermore, Xen
-provides secure partitioning between virtual machines (known as
-{\em domains} in Xen terminology), and enables better resource
-accounting and QoS isolation than can be achieved with a conventional
-operating system. 
-
-Xen essentially takes a `whole machine' virtualization approach as
-pioneered by IBM VM/370.  However, unlike VM/370 or more recent
-efforts such as VMware and Virtual PC, Xen does not attempt to
-completely virtualize the underlying hardware.  Instead parts of the
-hosted guest operating systems are modified to work with the VMM; the
-operating system is effectively ported to a new target architecture,
-typically requiring changes in just the machine-dependent code.  The
-user-level API is unchanged, and so existing binaries and operating
-system distributions work without modification.
-
-In addition to exporting virtualized instances of CPU, memory, network
-and block devices, Xen exposes a control interface to manage how these
-resources are shared between the running domains. Access to the
-control interface is restricted: it may only be used by one
-specially-privileged VM, known as {\em domain 0}.  This domain is a
-required part of any Xen-based server and runs the application software
-that manages the control-plane aspects of the platform.  Running the
-control software in {\it domain 0}, distinct from the hypervisor
-itself, allows the Xen framework to separate the notions of 
-mechanism and policy within the system.
-
-
-\chapter{Virtual Architecture}
-
-In a Xen/x86 system, only the hypervisor runs with full processor
-privileges ({\it ring 0} in the x86 four-ring model). It has full
-access to the physical memory available in the system and is
-responsible for allocating portions of it to running domains.  
-
-On a 32-bit x86 system, guest operating systems may use {\it rings 1},
-{\it 2} and {\it 3} as they see fit.  Segmentation is used to prevent
-the guest OS from accessing the portion of the address space that is
-reserved for Xen.  We expect most guest operating systems will use
-ring 1 for their own operation and place applications in ring 3.
-
-On 64-bit systems it is not possible to protect the hypervisor from
-untrusted guest code running in rings 1 and 2. Guests are therefore
-restricted to run in ring 3 only. The guest kernel is protected from its
-applications by context switching between the kernel and currently
-running application.
-
-In this chapter we consider the basic virtual architecture provided by
-Xen: CPU state, exception and interrupt handling, and time.
-Other aspects such as memory and device access are discussed in later
-chapters.
-
-
-\section{CPU state}
-
-All privileged state must be handled by Xen.  The guest OS has no
-direct access to CR3 and is not permitted to update privileged bits in
-EFLAGS. Guest OSes use \emph{hypercalls} to invoke operations in Xen;
-these are analogous to system calls but occur from ring 1 to ring 0.
-
-A list of all hypercalls is given in Appendix~\ref{a:hypercalls}.
-
-
-\section{Exceptions}
-
-A virtual IDT is provided --- a domain can submit a table of trap
-handlers to Xen via the {\bf set\_trap\_table} hypercall.  The
-exception stack frame presented to a virtual trap handler is identical
-to its native equivalent.
-
-
-\section{Interrupts and events}
-
-Interrupts are virtualized by mapping them to \emph{event channels},
-which are delivered asynchronously to the target domain using a callback
-supplied via the {\bf set\_callbacks} hypercall.  A guest OS can map
-these events onto its standard interrupt dispatch mechanisms.  Xen is
-responsible for determining the target domain that will handle each
-physical interrupt source. For more details on the binding of event
-sources to event channels, see Chapter~\ref{c:devices}.
-
-
-\section{Time}
-
-Guest operating systems need to be aware of the passage of both real
-(or wallclock) time and their own `virtual time' (the time for which
-they have been executing). Furthermore, Xen has a notion of time which
-is used for scheduling. The following notions of time are provided:
-
-\begin{description}
-\item[Cycle counter time.]
-
-  This provides a fine-grained time reference.  The cycle counter time
-  is used to accurately extrapolate the other time references.  On SMP
-  machines it is currently assumed that the cycle counter time is
-  synchronized between CPUs.  The current x86-based implementation
-  achieves this within inter-CPU communication latencies.
-
-\item[System time.]
-
-  This is a 64-bit counter which holds the number of nanoseconds that
-  have elapsed since system boot.
-
-\item[Wall clock time.]
-
-  This is the time of day in a Unix-style {\bf struct timeval}
-  (seconds and microseconds since 1 January 1970, adjusted by leap
-  seconds).  An NTP client hosted by {\it domain 0} can keep this
-  value accurate.
-
-\item[Domain virtual time.]
-
-  This progresses at the same pace as system time, but only while a
-  domain is executing --- it stops while a domain is de-scheduled.
-  Therefore the share of the CPU that a domain receives is indicated
-  by the rate at which its virtual time increases.
-
-\end{description}
-
-
-Xen exports timestamps for system time and wall-clock time to guest
-operating systems through a shared page of memory.  Xen also provides
-the cycle counter time at the instant the timestamps were calculated,
-and the CPU frequency in Hertz.  This allows the guest to extrapolate
-system and wall-clock times accurately based on the current cycle
-counter time.
-
-Since all time stamps need to be updated and read \emph{atomically}
-a version number is also stored in the shared info page, which is
-incremented before and after updating the timestamps. Thus a guest can
-be sure that it read a consistent state by checking the two version
-numbers are equal and even.
-
-Xen includes a periodic ticker which sends a timer event to the
-currently executing domain every 10ms.  The Xen scheduler also sends a
-timer event whenever a domain is scheduled; this allows the guest OS
-to adjust for the time that has passed while it has been inactive.  In
-addition, Xen allows each domain to request that they receive a timer
-event sent at a specified system time by using the {\bf
-  set\_timer\_op} hypercall.  Guest OSes may use this timer to
-implement timeout values when they block.
-
-
-\section{Xen CPU Scheduling}
-
-Xen offers a uniform API for CPU schedulers.  It is possible to choose
-from a number of schedulers at boot and it should be easy to add more.
-The SEDF and Credit schedulers are part of the normal Xen
-distribution.  SEDF will be going away and its use should be
-avoided once the credit scheduler has stabilized and become the default.
-The Credit scheduler provides proportional fair shares of the
-host's CPUs to the running domains. It does this while transparently
-load balancing runnable VCPUs across the whole system.
-
-\paragraph*{Note: SMP host support}
-Xen has always supported SMP host systems. When using the credit scheduler,
-a domain's VCPUs will be dynamically moved across physical CPUs to maximise
-domain and system throughput. VCPUs can also be manually restricted to be
-mapped only on a subset of the host's physical CPUs, using the pinning
-mechanism.
-
-
-%% More information on the characteristics and use of these schedulers
-%% is available in {\bf Sched-HOWTO.txt}.
-
-
-\section{Privileged operations}
-
-Xen exports an extended interface to privileged domains (viz.\ {\it
-  Domain 0}). This allows such domains to build and boot other domains
-on the server, and provides control interfaces for managing
-scheduling, memory, networking, and block devices.
-
-\chapter{Memory}
-\label{c:memory} 
-
-Xen is responsible for managing the allocation of physical memory to
-domains, and for ensuring safe use of the paging and segmentation
-hardware.
-
-
-\section{Memory Allocation}
-
-As well as allocating a portion of physical memory for its own private
-use, Xen also reserves s small fixed portion of every virtual address
-space. This is located in the top 64MB on 32-bit systems, the top
-168MB on PAE systems, and a larger portion in the middle of the
-address space on 64-bit systems. Unreserved physical memory is
-available for allocation to domains at a page granularity.  Xen tracks
-the ownership and use of each page, which allows it to enforce secure
-partitioning between domains.
-
-Each domain has a maximum and current physical memory allocation.  A
-guest OS may run a `balloon driver' to dynamically adjust its current
-memory allocation up to its limit.
-
-
-\section{Pseudo-Physical Memory}
-
-Since physical memory is allocated and freed on a page granularity,
-there is no guarantee that a domain will receive a contiguous stretch
-of physical memory. However most operating systems do not have good
-support for operating in a fragmented physical address space. To aid
-porting such operating systems to run on top of Xen, we make a
-distinction between \emph{machine memory} and \emph{pseudo-physical
-  memory}.
-
-Put simply, machine memory refers to the entire amount of memory
-installed in the machine, including that reserved by Xen, in use by
-various domains, or currently unallocated. We consider machine memory
-to comprise a set of 4kB \emph{machine page frames} numbered
-consecutively starting from 0. Machine frame numbers mean the same
-within Xen or any domain.
-
-Pseudo-physical memory, on the other hand, is a per-domain
-abstraction. It allows a guest operating system to consider its memory
-allocation to consist of a contiguous range of physical page frames
-starting at physical frame 0, despite the fact that the underlying
-machine page frames may be sparsely allocated and in any order.
-
-To achieve this, Xen maintains a globally readable {\it
-  machine-to-physical} table which records the mapping from machine
-page frames to pseudo-physical ones. In addition, each domain is
-supplied with a {\it physical-to-machine} table which performs the
-inverse mapping. Clearly the machine-to-physical table has size
-proportional to the amount of RAM installed in the machine, while each
-physical-to-machine table has size proportional to the memory
-allocation of the given domain.
-
-Architecture dependent code in guest operating systems can then use
-the two tables to provide the abstraction of pseudo-physical memory.
-In general, only certain specialized parts of the operating system
-(such as page table management) needs to understand the difference
-between machine and pseudo-physical addresses.
-
-
-\section{Page Table Updates}
-
-In the default mode of operation, Xen enforces read-only access to
-page tables and requires guest operating systems to explicitly request
-any modifications.  Xen validates all such requests and only applies
-updates that it deems safe.  This is necessary to prevent domains from
-adding arbitrary mappings to their page tables.
-
-To aid validation, Xen associates a type and reference count with each
-memory page. A page has one of the following mutually-exclusive types
-at any point in time: page directory ({\sf PD}), page table ({\sf
-  PT}), local descriptor table ({\sf LDT}), global descriptor table
-({\sf GDT}), or writable ({\sf RW}). Note that a guest OS may always
-create readable mappings of its own memory regardless of its current
-type.
-
-%%% XXX: possibly explain more about ref count 'lifecyle' here?
-This mechanism is used to maintain the invariants required for safety;
-for example, a domain cannot have a writable mapping to any part of a
-page table as this would require the page concerned to simultaneously
-be of types {\sf PT} and {\sf RW}.
-
-\hypercall{mmu\_update(mmu\_update\_t *req, int count, int *success\_count, domid\_t domid)}
-
-This hypercall is used to make updates to either the domain's
-pagetables or to the machine to physical mapping table.  It supports
-submitting a queue of updates, allowing batching for maximal
-performance.  Explicitly queuing updates using this interface will
-cause any outstanding writable pagetable state to be flushed from the
-system.
-
-\section{Writable Page Tables}
-
-Xen also provides an alternative mode of operation in which guests
-have the illusion that their page tables are directly writable.  Of
-course this is not really the case, since Xen must still validate
-modifications to ensure secure partitioning. To this end, Xen traps
-any write attempt to a memory page of type {\sf PT} (i.e., that is
-currently part of a page table).  If such an access occurs, Xen
-temporarily allows write access to that page while at the same time
-\emph{disconnecting} it from the page table that is currently in use.
-This allows the guest to safely make updates to the page because the
-newly-updated entries cannot be used by the MMU until Xen revalidates
-and reconnects the page.  Reconnection occurs automatically in a
-number of situations: for example, when the guest modifies a different
-page-table page, when the domain is preempted, or whenever the guest
-uses Xen's explicit page-table update interfaces.
-
-Writable pagetable functionality is enabled when the guest requests
-it, using a {\bf vm\_assist} hypercall.  Writable pagetables do {\em
-not} provide full virtualisation of the MMU, so the memory management
-code of the guest still needs to be aware that it is running on Xen.
-Since the guest's page tables are used directly, it must translate
-pseudo-physical addresses to real machine addresses when building page
-table entries.  The guest may not attempt to map its own pagetables
-writably, since this would violate the memory type invariants; page
-tables will automatically be made writable by the hypervisor, as
-necessary.
-
-\section{Shadow Page Tables}
-
-Finally, Xen also supports a form of \emph{shadow page tables} in
-which the guest OS uses a independent copy of page tables which are
-unknown to the hardware (i.e.\ which are never pointed to by {\tt
-  cr3}). Instead Xen propagates changes made to the guest's tables to
-the real ones, and vice versa. This is useful for logging page writes
-(e.g.\ for live migration or checkpoint). A full version of the shadow
-page tables also allows guest OS porting with less effort.
-
-
-\section{Segment Descriptor Tables}
-
-At start of day a guest is supplied with a default GDT, which does not reside
-within its own memory allocation.  If the guest wishes to use other
-than the default `flat' ring-1 and ring-3 segments that this GDT
-provides, it must register a custom GDT and/or LDT with Xen, allocated
-from its own memory.
-
-The following hypercall is used to specify a new GDT:
-
-\begin{quote}
-  int {\bf set\_gdt}(unsigned long *{\em frame\_list}, int {\em
-    entries})
-
-  \emph{frame\_list}: An array of up to 14 machine page frames within
-  which the GDT resides.  Any frame registered as a GDT frame may only
-  be mapped read-only within the guest's address space (e.g., no
-  writable mappings, no use as a page-table page, and so on). Only 14
-  pages may be specified because pages 15 and 16 are reserved for
-  the hypervisor's GDT entries.
-
-  \emph{entries}: The number of descriptor-entry slots in the GDT.
-\end{quote}
-
-The LDT is updated via the generic MMU update mechanism (i.e., via the
-{\bf mmu\_update} hypercall.
-
-\section{Start of Day}
-
-The start-of-day environment for guest operating systems is rather
-different to that provided by the underlying hardware. In particular,
-the processor is already executing in protected mode with paging
-enabled.
-
-{\it Domain 0} is created and booted by Xen itself. For all subsequent
-domains, the analogue of the boot-loader is the {\it domain builder},
-user-space software running in {\it domain 0}. The domain builder is
-responsible for building the initial page tables for a domain and
-loading its kernel image at the appropriate virtual address.
-
-\section{VM assists}
-
-Xen provides a number of ``assists'' for guest memory management.
-These are available on an ``opt-in'' basis to provide commonly-used
-extra functionality to a guest.
-
-\hypercall{vm\_assist(unsigned int cmd, unsigned int type)}
-
-The {\bf cmd} parameter describes the action to be taken, whilst the
-{\bf type} parameter describes the kind of assist that is being
-referred to.  Available commands are as follows:
-
-\begin{description}
-\item[VMASST\_CMD\_enable] Enable a particular assist type
-\item[VMASST\_CMD\_disable] Disable a particular assist type
-\end{description}
-
-And the available types are:
-
-\begin{description}
-\item[VMASST\_TYPE\_4gb\_segments] Provide emulated support for
-  instructions that rely on 4GB segments (such as the techniques used
-  by some TLS solutions).
-\item[VMASST\_TYPE\_4gb\_segments\_notify] Provide a callback (via trap number
-  15) to the guest if the above segment fixups are used: allows the guest to
-  display a warning message during boot.
-\item[VMASST\_TYPE\_writable\_pagetables] Enable writable pagetable
-  mode - described above.
-\end{description}
-
-
-\chapter{Xen Info Pages}
-
-The {\bf Shared info page} is used to share various CPU-related state
-between the guest OS and the hypervisor.  This information includes VCPU
-status, time information and event channel (virtual interrupt) state.
-The {\bf Start info page} is used to pass build-time information to
-the guest when it boots and when it is resumed from a suspended state.
-This chapter documents the fields included in the {\bf
-shared\_info\_t} and {\bf start\_info\_t} structures for use by the
-guest OS.
-
-\section{Shared info page}
-
-The {\bf shared\_info\_t} is accessed at run time by both Xen and the
-guest OS.  It is used to pass information relating to the
-virtual CPU and virtual machine state between the OS and the
-hypervisor.
-
-The structure is declared in {\bf xen/include/public/xen.h}:
-
-\scriptsize
-\begin{verbatim}
-typedef struct shared_info {
-    vcpu_info_t vcpu_info[XEN_LEGACY_MAX_VCPUS];
-
-    /*
-     * A domain can create "event channels" on which it can send and receive
-     * asynchronous event notifications. There are three classes of event that
-     * are delivered by this mechanism:
-     *  1. Bi-directional inter- and intra-domain connections. Domains must
-     *     arrange out-of-band to set up a connection (usually by allocating
-     *     an unbound 'listener' port and advertising that via a storage service
-     *     such as xenstore).
-     *  2. Physical interrupts. A domain with suitable hardware-access
-     *     privileges can bind an event-channel port to a physical interrupt
-     *     source.
-     *  3. Virtual interrupts ('events'). A domain can bind an event-channel
-     *     port to a virtual interrupt source, such as the virtual-timer
-     *     device or the emergency console.
-     * 
-     * Event channels are addressed by a "port index". Each channel is
-     * associated with two bits of information:
-     *  1. PENDING -- notifies the domain that there is a pending notification
-     *     to be processed. This bit is cleared by the guest.
-     *  2. MASK -- if this bit is clear then a 0->1 transition of PENDING
-     *     will cause an asynchronous upcall to be scheduled. This bit is only
-     *     updated by the guest. It is read-only within Xen. If a channel
-     *     becomes pending while the channel is masked then the 'edge' is lost
-     *     (i.e., when the channel is unmasked, the guest must manually handle
-     *     pending notifications as no upcall will be scheduled by Xen).
-     * 
-     * To expedite scanning of pending notifications, any 0->1 pending
-     * transition on an unmasked channel causes a corresponding bit in a
-     * per-vcpu selector word to be set. Each bit in the selector covers a
-     * 'C long' in the PENDING bitfield array.
-     */
-    unsigned long evtchn_pending[sizeof(unsigned long) * 8];
-    unsigned long evtchn_mask[sizeof(unsigned long) * 8];
-
-    /*
-     * Wallclock time: updated only by control software. Guests should base
-     * their gettimeofday() syscall on this wallclock-base value.
-     */
-    uint32_t wc_version;      /* Version counter: see vcpu_time_info_t. */
-    uint32_t wc_sec;          /* Secs  00:00:00 UTC, Jan 1, 1970.  */
-    uint32_t wc_nsec;         /* Nsecs 00:00:00 UTC, Jan 1, 1970.  */
-
-    arch_shared_info_t arch;
-
-} shared_info_t;
-\end{verbatim}
-\normalsize
-
-\begin{description}
-\item[vcpu\_info] An array of {\bf vcpu\_info\_t} structures, each of
-  which holds either runtime information about a virtual CPU, or is
-  ``empty'' if the corresponding VCPU does not exist.
-\item[evtchn\_pending] Guest-global array, with one bit per event
-  channel.  Bits are set if an event is currently pending on that
-  channel.
-\item[evtchn\_mask] Guest-global array for masking notifications on
-  event channels.
-\item[wc\_version] Version counter for current wallclock time.
-\item[wc\_sec] Whole seconds component of current wallclock time.
-\item[wc\_nsec] Nanoseconds component of current wallclock time.
-\item[arch] Host architecture-dependent portion of the shared info
-  structure.
-\end{description}
-
-\subsection{vcpu\_info\_t}
-
-\scriptsize
-\begin{verbatim}
-typedef struct vcpu_info {
-    /*
-     * 'evtchn_upcall_pending' is written non-zero by Xen to indicate
-     * a pending notification for a particular VCPU. It is then cleared 
-     * by the guest OS /before/ checking for pending work, thus avoiding
-     * a set-and-check race. Note that the mask is only accessed by Xen
-     * on the CPU that is currently hosting the VCPU. This means that the
-     * pending and mask flags can be updated by the guest without special
-     * synchronisation (i.e., no need for the x86 LOCK prefix).
-     * This may seem suboptimal because if the pending flag is set by
-     * a different CPU then an IPI may be scheduled even when the mask
-     * is set. However, note:
-     *  1. The task of 'interrupt holdoff' is covered by the per-event-
-     *     channel mask bits. A 'noisy' event that is continually being
-     *     triggered can be masked at source at this very precise
-     *     granularity.
-     *  2. The main purpose of the per-VCPU mask is therefore to restrict
-     *     reentrant execution: whether for concurrency control, or to
-     *     prevent unbounded stack usage. Whatever the purpose, we expect
-     *     that the mask will be asserted only for short periods at a time,
-     *     and so the likelihood of a 'spurious' IPI is suitably small.
-     * The mask is read before making an event upcall to the guest: a
-     * non-zero mask therefore guarantees that the VCPU will not receive
-     * an upcall activation. The mask is cleared when the VCPU requests
-     * to block: this avoids wakeup-waiting races.
-     */
-    uint8_t evtchn_upcall_pending;
-    uint8_t evtchn_upcall_mask;
-    unsigned long evtchn_pending_sel;
-    arch_vcpu_info_t arch;
-    vcpu_time_info_t time;
-} vcpu_info_t; /* 64 bytes (x86) */
-\end{verbatim}
-\normalsize
-
-\begin{description}
-\item[evtchn\_upcall\_pending] This is set non-zero by Xen to indicate
-  that there are pending events to be received.
-\item[evtchn\_upcall\_mask] This is set non-zero to disable all
-  interrupts for this CPU for short periods of time.  If individual
-  event channels need to be masked, the {\bf evtchn\_mask} in the {\bf
-  shared\_info\_t} is used instead.
-\item[evtchn\_pending\_sel] When an event is delivered to this VCPU, a
-  bit is set in this selector to indicate which word of the {\bf
-  evtchn\_pending} array in the {\bf shared\_info\_t} contains the
-  event in question.
-\item[arch] Architecture-specific VCPU info. On x86 this contains the
-  virtualized CR2 register (page fault linear address) for this VCPU.
-\item[time] Time values for this VCPU.
-\end{description}
-
-\subsection{vcpu\_time\_info}
-
-\scriptsize
-\begin{verbatim}
-typedef struct vcpu_time_info {
-    /*
-     * Updates to the following values are preceded and followed by an
-     * increment of 'version'. The guest can therefore detect updates by
-     * looking for changes to 'version'. If the least-significant bit of
-     * the version number is set then an update is in progress and the guest
-     * must wait to read a consistent set of values.
-     * The correct way to interact with the version number is similar to
-     * Linux's seqlock: see the implementations of read_seqbegin/read_seqretry.
-     */
-    uint32_t version;
-    uint32_t pad0;
-    uint64_t tsc_timestamp;   /* TSC at last update of time vals.  */
-    uint64_t system_time;     /* Time, in nanosecs, since boot.    */
-    /*
-     * Current system time:
-     *   system_time + ((tsc - tsc_timestamp) << tsc_shift) * tsc_to_system_mul
-     * CPU frequency (Hz):
-     *   ((10^9 << 32) / tsc_to_system_mul) >> tsc_shift
-     */
-    uint32_t tsc_to_system_mul;
-    int8_t   tsc_shift;
-    int8_t   pad1[3];
-} vcpu_time_info_t; /* 32 bytes */
-\end{verbatim}
-\normalsize
-
-\begin{description}
-\item[version] Used to ensure the guest gets consistent time updates.
-\item[tsc\_timestamp] Cycle counter timestamp of last time value;
-  could be used to expolate in between updates, for instance.
-\item[system\_time] Time since boot (nanoseconds).
-\item[tsc\_to\_system\_mul] Cycle counter to nanoseconds multiplier
-(used in extrapolating current time).
-\item[tsc\_shift] Cycle counter to nanoseconds shift (used in
-extrapolating current time).
-\end{description}
-
-\subsection{arch\_shared\_info\_t}
-
-On x86, the {\bf arch\_shared\_info\_t} is defined as follows (from
-xen/public/arch-x86\_32.h):
-
-\scriptsize
-\begin{verbatim}
-typedef struct arch_shared_info {
-    unsigned long max_pfn;                  /* max pfn that appears in table */
-    /* Frame containing list of mfns containing list of mfns containing p2m. */
-    unsigned long pfn_to_mfn_frame_list_list; 
-} arch_shared_info_t;
-\end{verbatim}
-\normalsize
-
-\begin{description}
-\item[max\_pfn] The maximum PFN listed in the physical-to-machine
-  mapping table (P2M table).
-\item[pfn\_to\_mfn\_frame\_list\_list] Machine address of the frame
-  that contains the machine addresses of the P2M table frames.
-\end{description}
-
-\section{Start info page}
-
-The start info structure is declared as the following (in {\bf
-xen/include/public/xen.h}):
-
-\scriptsize
-\begin{verbatim}
-#define MAX_GUEST_CMDLINE 1024
-typedef struct start_info {
-    /* THE FOLLOWING ARE FILLED IN BOTH ON INITIAL BOOT AND ON RESUME.    */
-    char magic[32];             /* "Xen-<version>.<subversion>". */
-    unsigned long nr_pages;     /* Total pages allocated to this domain.  */
-    unsigned long shared_info;  /* MACHINE address of shared info struct. */
-    uint32_t flags;             /* SIF_xxx flags.                         */
-    unsigned long store_mfn;    /* MACHINE page number of shared page.    */
-    uint32_t store_evtchn;      /* Event channel for store communication. */
-    unsigned long console_mfn;  /* MACHINE address of console page.       */
-    uint32_t console_evtchn;    /* Event channel for console messages.    */
-    /* THE FOLLOWING ARE ONLY FILLED IN ON INITIAL BOOT (NOT RESUME).     */
-    unsigned long pt_base;      /* VIRTUAL address of page directory.     */
-    unsigned long nr_pt_frames; /* Number of bootstrap p.t. frames.       */
-    unsigned long mfn_list;     /* VIRTUAL address of page-frame list.    */
-    unsigned long mod_start;    /* VIRTUAL address of pre-loaded module.  */
-    unsigned long mod_len;      /* Size (bytes) of pre-loaded module.     */
-    int8_t cmd_line[MAX_GUEST_CMDLINE];
-} start_info_t;
-\end{verbatim}
-\normalsize
-
-The fields are in two groups: the first group are always filled in
-when a domain is booted or resumed, the second set are only used at
-boot time.
-
-The always-available group is as follows:
-
-\begin{description}
-\item[magic] A text string identifying the Xen version to the guest.
-\item[nr\_pages] The number of real machine pages available to the
-  guest.
-\item[shared\_info] Machine address of the shared info structure,
-  allowing the guest to map it during initialisation.
-\item[flags] Flags for describing optional extra settings to the
-  guest.
-\item[store\_mfn] Machine address of the Xenstore communications page.
-\item[store\_evtchn] Event channel to communicate with the store.
-\item[console\_mfn] Machine address of the console data page.
-\item[console\_evtchn] Event channel to notify the console backend.
-\end{description}
-
-The boot-only group may only be safely referred to during system boot:
-
-\begin{description}
-\item[pt\_base] Virtual address of the page directory created for us
-  by the domain builder.
-\item[nr\_pt\_frames] Number of frames used by the builders' bootstrap
-  pagetables.
-\item[mfn\_list] Virtual address of the list of machine frames this
-  domain owns.
-\item[mod\_start] Virtual address of any pre-loaded modules
-  (e.g. ramdisk)
-\item[mod\_len] Size of pre-loaded module (if any).
-\item[cmd\_line] Kernel command line passed by the domain builder.
-\end{description}
-
-
-% by Mark Williamson <mark.williamson@cl.cam.ac.uk>
-
-\chapter{Event Channels}
-\label{c:eventchannels}
-
-Event channels are the basic primitive provided by Xen for event
-notifications.  An event is the Xen equivalent of a hardware
-interrupt.  They essentially store one bit of information, the event
-of interest is signalled by transitioning this bit from 0 to 1.
-
-Notifications are received by a guest via an upcall from Xen,
-indicating when an event arrives (setting the bit).  Further
-notifications are masked until the bit is cleared again (therefore,
-guests must check the value of the bit after re-enabling event
-delivery to ensure no missed notifications).
-
-Event notifications can be masked by setting a flag; this is
-equivalent to disabling interrupts and can be used to ensure atomicity
-of certain operations in the guest kernel.
-
-\section{Hypercall interface}
-
-\hypercall{event\_channel\_op(evtchn\_op\_t *op)}
-
-The event channel operation hypercall is used for all operations on
-event channels / ports.  Operations are distinguished by the value of
-the {\bf cmd} field of the {\bf op} structure.  The possible commands
-are described below:
-
-\begin{description}
-
-\item[EVTCHNOP\_alloc\_unbound]
-  Allocate a new event channel port, ready to be connected to by a
-  remote domain.
-  \begin{itemize}
-  \item Specified domain must exist.
-  \item A free port must exist in that domain.
-  \end{itemize}
-  Unprivileged domains may only allocate their own ports, privileged
-  domains may also allocate ports in other domains.
-\item[EVTCHNOP\_bind\_interdomain]
-  Bind an event channel for interdomain communications.
-  \begin{itemize}
-  \item Caller domain must have a free port to bind.
-  \item Remote domain must exist.
-  \item Remote port must be allocated and currently unbound.
-  \item Remote port must be expecting the caller domain as the ``remote''.
-  \end{itemize}
-\item[EVTCHNOP\_bind\_virq]
-  Allocate a port and bind a VIRQ to it.
-  \begin{itemize}
-  \item Caller domain must have a free port to bind.
-  \item VIRQ must be valid.
-  \item VCPU must exist.
-  \item VIRQ must not currently be bound to an event channel.
-  \end{itemize}
-\item[EVTCHNOP\_bind\_ipi]
-  Allocate and bind a port for notifying other virtual CPUs.
-  \begin{itemize}
-  \item Caller domain must have a free port to bind.
-  \item VCPU must exist.
-  \end{itemize}
-\item[EVTCHNOP\_bind\_pirq]
-  Allocate and bind a port to a real IRQ.
-  \begin{itemize}
-  \item Caller domain must have a free port to bind.
-  \item PIRQ must be within the valid range.
-  \item Another binding for this PIRQ must not exist for this domain.
-  \item Caller must have an available port.
-  \end{itemize}
-\item[EVTCHNOP\_close]
-  Close an event channel (no more events will be received).
-  \begin{itemize}
-  \item Port must be valid (currently allocated).
-  \end{itemize}
-\item[EVTCHNOP\_send] Send a notification on an event channel attached
-  to a port.
-  \begin{itemize}
-  \item Port must be valid.
-  \item Only valid for Interdomain, IPI or Allocated Unbound ports.
-  \end{itemize}
-\item[EVTCHNOP\_status] Query the status of a port; what kind of port,
-  whether it is bound, what remote domain is expected, what PIRQ or
-  VIRQ it is bound to, what VCPU will be notified, etc.
-  Unprivileged domains may only query the state of their own ports.
-  Privileged domains may query any port.
-\item[EVTCHNOP\_bind\_vcpu] Bind event channel to a particular VCPU -
-  receive notification upcalls only on that VCPU.
-  \begin{itemize}
-  \item VCPU must exist.
-  \item Port must be valid.
-  \item Event channel must be either: allocated but unbound, bound to
-  an interdomain event channel, bound to a PIRQ.
-  \end{itemize}
-
-\end{description}
-
-%%
-%% grant_tables.tex
-%% 
-%% Made by Mark Williamson
-%% Login   <mark@maw48>
-%%
-
-\chapter{Grant tables}
-\label{c:granttables}
-
-Xen's grant tables provide a generic mechanism to memory sharing
-between domains.  This shared memory interface underpins the split
-device drivers for block and network IO.
-
-Each domain has its own {\bf grant table}.  This is a data structure
-that is shared with Xen; it allows the domain to tell Xen what kind of
-permissions other domains have on its pages.  Entries in the grant
-table are identified by {\bf grant references}.  A grant reference is
-an integer, which indexes into the grant table.  It acts as a
-capability which the grantee can use to perform operations on the
-granter's memory.
-
-This capability-based system allows shared-memory communications
-between unprivileged domains.  A grant reference also encapsulates the
-details of a shared page, removing the need for a domain to know the
-real machine address of a page it is sharing.  This makes it possible
-to share memory correctly with domains running in fully virtualised
-memory.
-
-\section{Interface}
-
-\subsection{Grant table manipulation}
-
-Creating and destroying grant references is done by direct access to
-the grant table.  This removes the need to involve Xen when creating
-grant references, modifying access permissions, etc.  The grantee
-domain will invoke hypercalls to use the grant references.  Four main
-operations can be accomplished by directly manipulating the table:
-
-\begin{description}
-\item[Grant foreign access] allocate a new entry in the grant table
-  and fill out the access permissions accordingly.  The access
-  permissions will be looked up by Xen when the grantee attempts to
-  use the reference to map the granted frame.
-\item[End foreign access] check that the grant reference is not
-  currently in use, then remove the mapping permissions for the frame.
-  This prevents further mappings from taking place but does not allow
-  forced revocations of existing mappings.
-\item[Grant foreign transfer] allocate a new entry in the table
-  specifying transfer permissions for the grantee.  Xen will look up
-  this entry when the grantee attempts to transfer a frame to the
-  granter.
-\item[End foreign transfer] remove permissions to prevent a transfer
-  occurring in future.  If the transfer is already committed,
-  modifying the grant table cannot prevent it from completing.
-\end{description}
-
-\subsection{Hypercalls}
-
-Use of grant references is accomplished via a hypercall.  The grant
-table op hypercall takes three arguments:
-
-\hypercall{grant\_table\_op(unsigned int cmd, void *uop, unsigned int count)}
-
-{\bf cmd} indicates the grant table operation of interest.  {\bf uop}
-is a pointer to a structure (or an array of structures) describing the
-operation to be performed.  The {\bf count} field describes how many
-grant table operations are being batched together.
-
-The core logic is situated in {\bf xen/common/grant\_table.c}.  The
-grant table operation hypercall can be used to perform the following
-actions:
-
-\begin{description}
-\item[GNTTABOP\_map\_grant\_ref] Given a grant reference from another
-  domain, map the referred page into the caller's address space.
-\item[GNTTABOP\_unmap\_grant\_ref] Remove a mapping to a granted frame
-  from the caller's address space.  This is used to voluntarily
-  relinquish a mapping to a granted page.
-\item[GNTTABOP\_setup\_table] Setup grant table for caller domain.
-\item[GNTTABOP\_dump\_table] Debugging operation.
-\item[GNTTABOP\_transfer] Given a transfer reference from another
-  domain, transfer ownership of a page frame to that domain.
-\end{description}
-
-%%
-%% xenstore.tex
-%% 
-%% Made by Mark Williamson
-%% Login   <mark@maw48>
-%% 
-
-\chapter{Xenstore}
-
-Xenstore is the mechanism by which control-plane activities occur.
-These activities include:
-
-\begin{itemize}
-\item Setting up shared memory regions and event channels for use with
-  the split device drivers.
-\item Notifying the guest of control events (e.g. balloon driver
-  requests)
-\item Reporting back status information from the guest
-  (e.g. performance-related statistics, etc).
-\end{itemize}
-
-The store is arranged as a hierarchical collection of key-value pairs.
-Each domain has a directory hierarchy containing data related to its
-configuration.  Domains are permitted to register for notifications
-about changes in subtrees of the store, and to apply changes to the
-store transactionally.
-
-\section{Guidelines}
-
-A few principles govern the operation of the store:
-
-\begin{itemize}
-\item Domains should only modify the contents of their own
-  directories.
-\item The setup protocol for a device channel should simply consist of
-  entering the configuration data into the store.
-\item The store should allow device discovery without requiring the
-  relevant device drivers to be loaded: a Xen ``bus'' should be
-  visible to probing code in the guest.
-\item The store should be usable for inter-tool communications,
-  allowing the tools themselves to be decomposed into a number of
-  smaller utilities, rather than a single monolithic entity.  This
-  also facilitates the development of alternate user interfaces to the
-  same functionality.
-\end{itemize}
-
-\section{Store layout}
-
-There are three main paths in XenStore:
-
-\begin{description}
-\item[/vm] stores configuration information about domain
-\item[/local/domain] stores information about the domain on the local node (domid, etc.)
-\item[/tool] stores information for the various tools
-\end{description}
-
-The {\bf /vm} path stores configuration information for a domain.
-This information doesn't change and is indexed by the domain's UUID.
-A {\bf /vm} entry contains the following information:
-
-\begin{description}
-\item[uuid] uuid of the domain (somewhat redundant)
-\item[on\_reboot] the action to take on a domain reboot request (destroy or restart)
-\item[on\_poweroff] the action to take on a domain halt request (destroy or restart)
-\item[on\_crash] the action to take on a domain crash (destroy or restart)
-\item[vcpus] the number of allocated vcpus for the domain
-\item[memory] the amount of memory (in megabytes) for the domain Note: appears to sometimes be empty for domain-0
-\item[vcpu\_avail] the number of active vcpus for the domain (vcpus - number of disabled vcpus)
-\item[name] the name of the domain
-\end{description}
-
-
-{\bf /vm/$<$uuid$>$/image/}
-
-The image path is only available for Domain-Us and contains:
-\begin{description}
-\item[ostype] identifies the builder type (linux or vmx)
-\item[kernel] path to kernel on domain-0
-\item[cmdline] command line to pass to domain-U kernel
-\item[ramdisk] path to ramdisk on domain-0
-\end{description}
-
-{\bf /local}
-
-The {\tt /local} path currently only contains one directory, {\tt
-/local/domain} that is indexed by domain id.  It contains the running
-domain information.  The reason to have two storage areas is that
-during migration, the uuid doesn't change but the domain id does.  The
-{\tt /local/domain} directory can be created and populated before
-finalizing the migration enabling localhost to localhost migration.
-
-{\bf /local/domain/$<$domid$>$}
-
-This path contains:
-
-\begin{description}
-\item[cpu\_time] xend start time (this is only around for domain-0)
-\item[handle] private handle for xend
-\item[name] see /vm
-\item[on\_reboot] see /vm
-\item[on\_poweroff] see /vm
-\item[on\_crash] see /vm
-\item[vm] the path to the VM directory for the domain
-\item[domid] the domain id (somewhat redundant)
-\item[running] indicates that the domain is currently running
-\item[memory] the current memory in megabytes for the domain (empty for domain-0?)
-\item[maxmem\_KiB] the maximum memory for the domain (in kilobytes)
-\item[memory\_KiB] the memory allocated to the domain (in kilobytes)
-\item[cpu] the current CPU the domain is pinned to (empty for domain-0?)
-\item[cpu\_weight] the weight assigned to the domain
-\item[vcpu\_avail] a bitmap telling the domain whether it may use a given VCPU
-\item[online\_vcpus] how many vcpus are currently online
-\item[vcpus] the total number of vcpus allocated to the domain
-\item[console/] a directory for console information
-  \begin{description}
-  \item[ring-ref] the grant table reference of the console ring queue
-  \item[port] the event channel being used for the console ring queue (local port)
-  \item[tty] the current tty the console data is being exposed of
-  \item[limit] the limit (in bytes) of console data to buffer
-  \end{description}
-\item[backend/] a directory containing all backends the domain hosts
-  \begin{description}
-  \item[vbd/] a directory containing vbd backends
-    \begin{description}
-    \item[$<$domid$>$/] a directory containing vbd's for domid
-      \begin{description}
-      \item[$<$virtual-device$>$/] a directory for a particular
-	virtual-device on domid
-	\begin{description}
-	\item[frontend-id] domain id of frontend
-	\item[frontend] the path to the frontend domain
-	\item[physical-device] backend device number
-	\item[sector-size] backend sector size
-	\item[info] 0 read/write, 1 read-only (is this right?)
-	\item[domain] name of frontend domain
-	\item[params] parameters for device
-	\item[type] the type of the device
-	\item[dev] the virtual device (as given by the user)
-	\item[node] output from block creation script
-	\end{description}
-      \end{description}
-    \end{description}
-  
-  \item[vif/] a directory containing vif backends
-    \begin{description}
-    \item[$<$domid$>$/] a directory containing vif's for domid
-      \begin{description}
-      \item[$<$vif number$>$/] a directory for each vif
-      \item[frontend-id] the domain id of the frontend
-      \item[frontend] the path to the frontend
-      \item[mac] the mac address of the vif
-      \item[bridge] the bridge the vif is connected to
-      \item[handle] the handle of the vif
-      \item[script] the script used to create/stop the vif
-      \item[domain] the name of the frontend
-      \end{description}
-    \end{description}
-
-  \item[vtpm/] a directory containing vtpm backends
-    \begin{description}
-    \item[$<$domid$>$/] a directory containing vtpm's for domid
-      \begin{description}
-      \item[$<$vtpm number$>$/] a directory for each vtpm
-      \item[frontend-id] the domain id of the frontend
-      \item[frontend] the path to the frontend
-      \item[instance] the instance of the virtual TPM that is used
-      \item[pref{\textunderscore}instance] the instance number as given in the VM configuration file;
-           may be different from {\bf instance}
-      \item[domain] the name of the domain of the frontend
-      \end{description}
-    \end{description}
-
-  \end{description}
-
-  \item[device/] a directory containing the frontend devices for the
-    domain
-    \begin{description}
-    \item[vbd/] a directory containing vbd frontend devices for the
-      domain
-      \begin{description}
-      \item[$<$virtual-device$>$/] a directory containing the vbd frontend for
-	virtual-device
-	\begin{description}
-	\item[virtual-device] the device number of the frontend device
-	\item[backend-id] the domain id of the backend
-	\item[backend] the path of the backend in the store (/local/domain
-	  path)
-	\item[ring-ref] the grant table reference for the block request
-	  ring queue
-	\item[event-channel] the event channel used for the block request
-	  ring queue
-	\end{description}
-	
-      \item[vif/] a directory containing vif frontend devices for the
-	domain
-	\begin{description}
-	\item[$<$id$>$/] a directory for vif id frontend device for the domain
-	  \begin{description}
-	  \item[backend-id] the backend domain id
-	  \item[mac] the mac address of the vif
-	  \item[handle] the internal vif handle
-	  \item[backend] a path to the backend's store entry
-	  \item[tx-ring-ref] the grant table reference for the transmission ring queue 
-	  \item[rx-ring-ref] the grant table reference for the receiving ring queue 
-	  \item[event-channel] the event channel used for the two ring queues 
-	  \end{description}
-	\end{description}
-
-      \item[vtpm/] a directory containing the vtpm frontend device for the
-        domain
-        \begin{description}
-        \item[$<$id$>$] a directory for vtpm id frontend device for the domain
-          \begin{description}
-	  \item[backend-id] the backend domain id
-          \item[backend] a path to the backend's store entry
-          \item[ring-ref] the grant table reference for the tx/rx ring
-          \item[event-channel] the event channel used for the ring
-          \end{description}
-        \end{description}
-	
-      \item[device-misc/] miscellaneous information for devices 
-	\begin{description}
-	\item[vif/] miscellaneous information for vif devices
-	  \begin{description}
-	  \item[nextDeviceID] the next device id to use 
-	  \end{description}
-	\end{description}
-      \end{description}
-    \end{description}
-
-  \item[security/] access control information for the domain
-    \begin{description}
-    \item[ssidref] security reference identifier used inside the hypervisor
-    \item[access\_control/] security label used by management tools
-      \begin{description}
-       \item[label] security label name
-       \item[policy] security policy name
-      \end{description}
-    \end{description}
-
-  \item[store/] per-domain information for the store
-    \begin{description}
-    \item[port] the event channel used for the store ring queue 
-    \item[ring-ref] - the grant table reference used for the store's
-      communication channel 
-    \end{description}
-    
-  \item[image] - private xend information 
-\end{description}
-
-
-\chapter{Devices}
-\label{c:devices}
-
-Virtual devices under Xen are provided by a {\bf split device driver}
-architecture.  The illusion of the virtual device is provided by two
-co-operating drivers: the {\bf frontend}, which runs an the
-unprivileged domain and the {\bf backend}, which runs in a domain with
-access to the real device hardware (often called a {\bf driver
-domain}; in practice domain 0 usually fulfills this function).
-
-The frontend driver appears to the unprivileged guest as if it were a
-real device, for instance a block or network device.  It receives IO
-requests from its kernel as usual, however since it does not have
-access to the physical hardware of the system it must then issue
-requests to the backend.  The backend driver is responsible for
-receiving these IO requests, verifying that they are safe and then
-issuing them to the real device hardware.  The backend driver appears
-to its kernel as a normal user of in-kernel IO functionality.  When
-the IO completes the backend notifies the frontend that the data is
-ready for use; the frontend is then able to report IO completion to
-its own kernel.
-
-Frontend drivers are designed to be simple; most of the complexity is
-in the backend, which has responsibility for translating device
-addresses, verifying that requests are well-formed and do not violate
-isolation guarantees, etc.
-
-Split drivers exchange requests and responses in shared memory, with
-an event channel for asynchronous notifications of activity.  When the
-frontend driver comes up, it uses Xenstore to set up a shared memory
-frame and an interdomain event channel for communications with the
-backend.  Once this connection is established, the two can communicate
-directly by placing requests / responses into shared memory and then
-sending notifications on the event channel.  This separation of
-notification from data transfer allows message batching, and results
-in very efficient device access.
-
-This chapter focuses on some individual split device interfaces
-available to Xen guests.
-
-        
-\section{Network I/O}
-
-Virtual network device services are provided by shared memory
-communication with a backend domain.  From the point of view of other
-domains, the backend may be viewed as a virtual ethernet switch
-element with each domain having one or more virtual network interfaces
-connected to it.
-
-From the point of view of the backend domain itself, the network
-backend driver consists of a number of ethernet devices.  Each of
-these has a logical direct connection to a virtual network device in
-another domain.  This allows the backend domain to route, bridge,
-firewall, etc the traffic to / from the other domains using normal
-operating system mechanisms.
-
-\subsection{Backend Packet Handling}
-
-The backend driver is responsible for a variety of actions relating to
-the transmission and reception of packets from the physical device.
-With regard to transmission, the backend performs these key actions:
-
-\begin{itemize}
-\item {\bf Validation:} To ensure that domains do not attempt to
-  generate invalid (e.g. spoofed) traffic, the backend driver may
-  validate headers ensuring that source MAC and IP addresses match the
-  interface that they have been sent from.
-
-  Validation functions can be configured using standard firewall rules
-  ({\small{\tt iptables}} in the case of Linux).
-  
-\item {\bf Scheduling:} Since a number of domains can share a single
-  physical network interface, the backend must mediate access when
-  several domains each have packets queued for transmission.  This
-  general scheduling function subsumes basic shaping or rate-limiting
-  schemes.
-  
-\item {\bf Logging and Accounting:} The backend domain can be
-  configured with classifier rules that control how packets are
-  accounted or logged.  For example, log messages might be generated
-  whenever a domain attempts to send a TCP packet containing a SYN.
-\end{itemize}
-
-On receipt of incoming packets, the backend acts as a simple
-demultiplexer: Packets are passed to the appropriate virtual interface
-after any necessary logging and accounting have been carried out.
-
-\subsection{Data Transfer}
-
-Each virtual interface uses two ``descriptor rings'', one for
-transmit, the other for receive.  Each descriptor identifies a block
-of contiguous machine memory allocated to the domain.
-
-The transmit ring carries packets to transmit from the guest to the
-backend domain.  The return path of the transmit ring carries messages
-indicating that the contents have been physically transmitted and the
-backend no longer requires the associated pages of memory.
-
-To receive packets, the guest places descriptors of unused pages on
-the receive ring.  The backend will return received packets by
-exchanging these pages in the domain's memory with new pages
-containing the received data, and passing back descriptors regarding
-the new packets on the ring.  This zero-copy approach allows the
-backend to maintain a pool of free pages to receive packets into, and
-then deliver them to appropriate domains after examining their
-headers.
-
-% Real physical addresses are used throughout, with the domain
-% performing translation from pseudo-physical addresses if that is
-% necessary.
-
-If a domain does not keep its receive ring stocked with empty buffers
-then packets destined to it may be dropped.  This provides some
-defence against receive livelock problems because an overloaded domain
-will cease to receive further data.  Similarly, on the transmit path,
-it provides the application with feedback on the rate at which packets
-are able to leave the system.
-
-Flow control on rings is achieved by including a pair of producer
-indexes on the shared ring page.  Each side will maintain a private
-consumer index indicating the next outstanding message.  In this
-manner, the domains cooperate to divide the ring into two message
-lists, one in each direction.  Notification is decoupled from the
-immediate placement of new messages on the ring; the event channel
-will be used to generate notification when {\em either} a certain
-number of outstanding messages are queued, {\em or} a specified number
-of nanoseconds have elapsed since the oldest message was placed on the
-ring.
-
-%% Not sure if my version is any better -- here is what was here
-%% before: Synchronization between the backend domain and the guest is
-%% achieved using counters held in shared memory that is accessible to
-%% both.  Each ring has associated producer and consumer indices
-%% indicating the area in the ring that holds descriptors that contain
-%% data.  After receiving {\it n} packets or {\t nanoseconds} after
-%% receiving the first packet, the hypervisor sends an event to the
-%% domain.
-
-
-\subsection{Network ring interface}
-
-The network device uses two shared memory rings for communication: one
-for transmit, one for receive.
-
-Transmit requests are described by the following structure:
-
-\scriptsize
-\begin{verbatim}
-typedef struct netif_tx_request {
-    grant_ref_t gref;      /* Reference to buffer page */
-    uint16_t offset;       /* Offset within buffer page */
-    uint16_t flags;        /* NETTXF_* */
-    uint16_t id;           /* Echoed in response message. */
-    uint16_t size;         /* Packet size in bytes.       */
-} netif_tx_request_t;
-\end{verbatim}
-\normalsize
-
-\begin{description}
-\item[gref] Grant reference for the network buffer
-\item[offset] Offset to data
-\item[flags] Transmit flags (currently only NETTXF\_csum\_blank is
-  supported, to indicate that the protocol checksum field is
-  incomplete).
-\item[id] Echoed to guest by the backend in the ring-level response so
-  that the guest can match it to this request
-\item[size] Buffer size
-\end{description}
-
-Each transmit request is followed by a transmit response at some later
-date.  This is part of the shared-memory communication protocol and
-allows the guest to (potentially) retire internal structures related
-to the request.  It does not imply a network-level response.  This
-structure is as follows:
-
-\scriptsize
-\begin{verbatim}
-typedef struct netif_tx_response {
-    uint16_t id;
-    int16_t  status;
-} netif_tx_response_t;
-\end{verbatim}
-\normalsize
-
-\begin{description}
-\item[id] Echo of the ID field in the corresponding transmit request.
-\item[status] Success / failure status of the transmit request.
-\end{description}
-
-Receive requests must be queued by the frontend, accompanied by a
-donation of page-frames to the backend.  The backend transfers page
-frames full of data back to the guest
-
-\scriptsize
-\begin{verbatim}
-typedef struct {
-    uint16_t    id;        /* Echoed in response message.        */
-    grant_ref_t gref;      /* Reference to incoming granted frame */
-} netif_rx_request_t;
-\end{verbatim}
-\normalsize
-
-\begin{description}
-\item[id] Echoed by the frontend to identify this request when
-  responding.
-\item[gref] Transfer reference - the backend will use this reference
-  to transfer a frame of network data to us.
-\end{description}
-
-Receive response descriptors are queued for each received frame.  Note
-that these may only be queued in reply to an existing receive request,
-providing an in-built form of traffic throttling.
-
-\scriptsize
-\begin{verbatim}
-typedef struct {
-    uint16_t id;
-    uint16_t offset;       /* Offset in page of start of received packet  */
-    uint16_t flags;        /* NETRXF_* */
-    int16_t  status;       /* -ve: BLKIF_RSP_* ; +ve: Rx'ed pkt size. */
-} netif_rx_response_t;
-\end{verbatim}
-\normalsize
-
-\begin{description}
-\item[id] ID echoed from the original request, used by the guest to
-  match this response to the original request.
-\item[offset] Offset to data within the transferred frame.
-\item[flags] Transmit flags (currently only NETRXF\_csum\_valid is
-  supported, to indicate that the protocol checksum field has already
-  been validated).
-\item[status] Success / error status for this operation.
-\end{description}
-
-Note that the receive protocol includes a mechanism for guests to
-receive incoming memory frames but there is no explicit transfer of
-frames in the other direction.  Guests are expected to return memory
-to the hypervisor in order to use the network interface.  They {\em
-must} do this or they will exceed their maximum memory reservation and
-will not be able to receive incoming frame transfers.  When necessary,
-the backend is able to replenish its pool of free network buffers by
-claiming some of this free memory from the hypervisor.
-
-\section{Block I/O}
-
-All guest OS disk access goes through the virtual block device VBD
-interface.  This interface allows domains access to portions of block
-storage devices visible to the the block backend device.  The VBD
-interface is a split driver, similar to the network interface
-described above.  A single shared memory ring is used between the
-frontend and backend drivers for each virtual device, across which
-IO requests and responses are sent.
-
-Any block device accessible to the backend domain, including
-network-based block (iSCSI, *NBD, etc), loopback and LVM/MD devices,
-can be exported as a VBD.  Each VBD is mapped to a device node in the
-guest, specified in the guest's startup configuration.
-
-\subsection{Data Transfer}
-
-The per-(virtual)-device ring between the guest and the block backend
-supports two messages:
-
-\begin{description}
-\item [{\small {\tt READ}}:] Read data from the specified block
-  device.  The front end identifies the device and location to read
-  from and attaches pages for the data to be copied to (typically via
-  DMA from the device).  The backend acknowledges completed read
-  requests as they finish.
-
-\item [{\small {\tt WRITE}}:] Write data to the specified block
-  device.  This functions essentially as {\small {\tt READ}}, except
-  that the data moves to the device instead of from it.
-\end{description}
-
-%% Rather than copying data, the backend simply maps the domain's
-%% buffers in order to enable direct DMA to them.  The act of mapping
-%% the buffers also increases the reference counts of the underlying
-%% pages, so that the unprivileged domain cannot try to return them to
-%% the hypervisor, install them as page tables, or any other unsafe
-%% behaviour.
-%%
-%% % block API here
-
-\subsection{Block ring interface}
-
-The block interface is defined by the structures passed over the
-shared memory interface.  These structures are either requests (from
-the frontend to the backend) or responses (from the backend to the
-frontend).
-
-The request structure is defined as follows:
-
-\scriptsize
-\begin{verbatim}
-typedef struct blkif_request {
-    uint8_t        operation;    /* BLKIF_OP_???                         */
-    uint8_t        nr_segments;  /* number of segments                   */
-    blkif_vdev_t   handle;       /* only for read/write requests         */
-    uint64_t       id;           /* private guest value, echoed in resp  */
-    blkif_sector_t sector_number;/* start sector idx on disk (r/w only)  */
-    struct blkif_request_segment {
-        grant_ref_t gref;        /* reference to I/O buffer frame        */
-        /* @first_sect: first sector in frame to transfer (inclusive).   */
-        /* @last_sect: last sector in frame to transfer (inclusive).     */
-        uint8_t     first_sect, last_sect;
-    } seg[BLKIF_MAX_SEGMENTS_PER_REQUEST];
-} blkif_request_t;
-\end{verbatim}
-\normalsize
-
-The fields are as follows:
-
-\begin{description}
-\item[operation] operation ID: one of the operations described above
-\item[nr\_segments] number of segments for scatter / gather IO
-  described by this request
-\item[handle] identifier for a particular virtual device on this
-  interface
-\item[id] this value is echoed in the response message for this IO;
-  the guest may use it to identify the original request
-\item[sector\_number] start sector on the virtual device for this
-  request
-\item[frame\_and\_sects] This array contains structures encoding
-  scatter-gather IO to be performed:
-  \begin{description}
-  \item[gref] The grant reference for the foreign I/O buffer page.
-  \item[first\_sect] First sector to access within the buffer page (0 to 7).
-  \item[last\_sect] Last sector to access within the buffer page (0 to 7).
-  \end{description}
-  Data will be transferred into frames at an offset determined by the
-  value of {\tt first\_sect}.
-\end{description}
-
-\section{Virtual TPM}
-
-Virtual TPM (VTPM) support provides TPM functionality to each virtual
-machine that requests this functionality in its configuration file.
-The interface enables domains to access their own private TPM like it
-was a hardware TPM built into the machine.
-
-The virtual TPM interface is implemented as a split driver,
-similar to the network and block interfaces described above.
-The user domain hosting the frontend exports a character device /dev/tpm0
-to user-level applications for communicating with the virtual TPM.
-This is the same device interface that is also offered if a hardware TPM
-is available in the system. The backend provides a single interface
-/dev/vtpm where the virtual TPM is waiting for commands from all domains
-that have located their backend in a given domain.
-
-\subsection{Data Transfer}
-
-A single shared memory ring is used between the frontend and backend
-drivers. TPM requests and responses are sent in pages where a pointer
-to those pages and other information is placed into the ring such that
-the backend can map the pages into its memory space using the grant
-table mechanism.
-
-The backend driver has been implemented to only accept well-formed
-TPM requests. To meet this requirement, the length indicator in the
-TPM request must correctly indicate the length of the request.
-Otherwise an error message is automatically sent back by the device driver.
-
-The virtual TPM implementation listens for TPM request on /dev/vtpm. Since
-it must be able to apply the TPM request packet to the virtual TPM instance
-associated with the virtual machine, a 4-byte virtual TPM instance
-identifier is pretended to each packet by the backend driver (in network
-byte order) for internal routing of the request.
-
-\subsection{Virtual TPM ring interface}
-
-The TPM protocol is a strict request/response protocol and therefore
-only one ring is used to send requests from the frontend to the backend
-and responses on the reverse path.
-
-The request/response structure is defined as follows:
-
-\scriptsize
-\begin{verbatim}
-typedef struct {
-    unsigned long addr;     /* Machine address of packet.     */
-    grant_ref_t ref;        /* grant table access reference.  */
-    uint16_t unused;        /* unused                         */
-    uint16_t size;          /* Packet size in bytes.          */
-} tpmif_tx_request_t;
-\end{verbatim}
-\normalsize
-
-The fields are as follows:
-
-\begin{description}
-\item[addr] The machine address of the page associated with the TPM
-            request/response; a request/response may span multiple
-            pages
-\item[ref]  The grant table reference associated with the address.
-\item[size] The size of the remaining packet; up to
-            PAGE{\textunderscore}SIZE bytes can be found in the
-            page referenced by 'addr'
-\end{description}
-
-The frontend initially allocates several pages whose addresses
-are stored in the ring. Only these pages are used for exchange of
-requests and responses.
-
-
-\chapter{Further Information}
-
-If you have questions that are not answered by this manual, the
-sources of information listed below may be of interest to you.  Note
-that bug reports, suggestions and contributions related to the
-software (or the documentation) should be sent to the Xen developers'
-mailing list (address below).
-
-
-\section{Other documentation}
-
-If you are mainly interested in using (rather than developing for)
-Xen, the \emph{Xen Users' Manual} is distributed in the {\tt docs/}
-directory of the Xen source distribution.
-
-% Various HOWTOs are also available in {\tt docs/HOWTOS}.
-
-
-\section{Online references}
-
-The official Xen web site can be found at:
-\begin{quote} {\tt http://www.xensource.com}
-\end{quote}
-
-
-This contains links to the latest versions of all online
-documentation, including the latest version of the FAQ.
-
-Information regarding Xen is also available at the Xen Wiki at
-\begin{quote} {\tt http://wiki.xen.org/wiki/}\end{quote}
-The Xen project uses Bugzilla as its bug tracking system. You'll find
-the Xen Bugzilla at http://bugzilla.xensource.com/bugzilla/.
-
-
-\section{Mailing lists}
-
-There are several mailing lists that are used to discuss Xen related
-topics. The most widely relevant are listed below. An official page of
-mailing lists and subscription information can be found at \begin{quote}
-  {\tt http://lists.xensource.com/} \end{quote}
-
-\begin{description}
-\item[xen-devel@lists.xensource.com] Used for development
-  discussions and bug reports.  Subscribe at: \\
-  {\small {\tt http://lists.xensource.com/xen-devel}}
-\item[xen-users@lists.xensource.com] Used for installation and usage
-  discussions and requests for help.  Subscribe at: \\
-  {\small {\tt http://lists.xensource.com/xen-users}}
-\item[xen-announce@lists.xensource.com] Used for announcements only.
-  Subscribe at: \\
-  {\small {\tt http://lists.xensource.com/xen-announce}}
-\item[xen-changelog@lists.xensource.com] Changelog feed
-  from the unstable and 2.0 trees - developer oriented.  Subscribe at: \\
-  {\small {\tt http://lists.xensource.com/xen-changelog}}
-\end{description}
-
-\appendix
-
-
-\chapter{Xen Hypercalls}
-\label{a:hypercalls}
-
-Hypercalls represent the procedural interface to Xen; this appendix 
-categorizes and describes the current set of hypercalls. 
-
-\section{Invoking Hypercalls} 
-
-Hypercalls are invoked in a manner analogous to system calls in a
-conventional operating system; a software interrupt is issued which
-vectors to an entry point within Xen. On x86/32 machines the
-instruction required is {\tt int \$82}; the (real) IDT is setup so
-that this may only be issued from within ring 1. The particular 
-hypercall to be invoked is contained in {\tt EAX} --- a list 
-mapping these values to symbolic hypercall names can be found 
-in {\tt xen/include/public/xen.h}. 
-
-On some occasions a set of hypercalls will be required to carry
-out a higher-level function; a good example is when a guest 
-operating wishes to context switch to a new process which 
-requires updating various privileged CPU state. As an optimization
-for these cases, there is a generic mechanism to issue a set of 
-hypercalls as a batch: 
-
-\begin{quote}
-\hypercall{multicall(void *call\_list, int nr\_calls)}
-
-Execute a series of hypervisor calls; {\tt nr\_calls} is the length of
-the array of {\tt multicall\_entry\_t} structures pointed to be {\tt
-call\_list}. Each entry contains the hypercall operation code followed
-by up to 7 word-sized arguments.
-\end{quote}
-
-Note that multicalls are provided purely as an optimization; there is
-no requirement to use them when first porting a guest operating
-system.
-
-
-\section{Virtual CPU Setup} 
-
-At start of day, a guest operating system needs to setup the virtual
-CPU it is executing on. This includes installing vectors for the
-virtual IDT so that the guest OS can handle interrupts, page faults,
-etc. However the very first thing a guest OS must setup is a pair 
-of hypervisor callbacks: these are the entry points which Xen will
-use when it wishes to notify the guest OS of an occurrence. 
-
-\begin{quote}
-\hypercall{set\_callbacks(unsigned long event\_selector, unsigned long
-  event\_address, unsigned long failsafe\_selector, unsigned long
-  failsafe\_address) }
-
-Register the normal (``event'') and failsafe callbacks for 
-event processing. In each case the code segment selector and 
-address within that segment are provided. The selectors must
-have RPL 1; in XenLinux we simply use the kernel's CS for both 
-{\bf event\_selector} and {\bf failsafe\_selector}.
-
-The value {\bf event\_address} specifies the address of the guest OSes
-event handling and dispatch routine; the {\bf failsafe\_address}
-specifies a separate entry point which is used only if a fault occurs
-when Xen attempts to use the normal callback. 
-
-\end{quote} 
-
-On x86/64 systems the hypercall takes slightly different
-arguments. This is because callback CS does not need to be specified
-(since teh callbacks are entered via SYSRET), and also because an
-entry address needs to be specified for SYSCALLs from guest user
-space:
-
-\begin{quote}
-\hypercall{set\_callbacks(unsigned long event\_address, unsigned long
-  failsafe\_address, unsigned long syscall\_address)}
-\end{quote} 
-
-
-After installing the hypervisor callbacks, the guest OS can 
-install a `virtual IDT' by using the following hypercall: 
-
-\begin{quote} 
-\hypercall{set\_trap\_table(trap\_info\_t *table)} 
-
-Install one or more entries into the per-domain 
-trap handler table (essentially a software version of the IDT). 
-Each entry in the array pointed to by {\bf table} includes the 
-exception vector number with the corresponding segment selector 
-and entry point. Most guest OSes can use the same handlers on 
-Xen as when running on the real hardware.
-
-
-\end{quote} 
-
-A further hypercall is provided for the management of virtual CPUs:
-
-\begin{quote}
-\hypercall{vcpu\_op(int cmd, int vcpuid, void *extra\_args)}
-
-This hypercall can be used to bootstrap VCPUs, to bring them up and
-down and to test their current status.
-
-\end{quote}
-
-\section{Scheduling and Timer}
-
-Domains are preemptively scheduled by Xen according to the 
-parameters installed by domain 0 (see Section~\ref{s:dom0ops}). 
-In addition, however, a domain may choose to explicitly 
-control certain behavior with the following hypercall: 
-
-\begin{quote} 
-\hypercall{sched\_op\_new(int cmd, void *extra\_args)}
-
-Request scheduling operation from hypervisor. The following
-sub-commands are available:
-
-\begin{description}
-\item[SCHEDOP\_yield] voluntarily yields the CPU, but leaves the
-caller marked as runnable. No extra arguments are passed to this
-command. 
-\item[SCHEDOP\_block] removes the calling domain from the run queue
-and causes it to sleep until an event is delivered to it. No extra 
-arguments are passed to this command. 
-\item[SCHEDOP\_shutdown] is used to end the calling domain's
-execution. The extra argument is a {\bf sched\_shutdown} structure
-which indicates the reason why the domain suspended (e.g., for reboot,
-halt, power-off).
-\item[SCHEDOP\_poll] allows a VCPU to wait on a set of event channels
-with an optional timeout (all of which are specified in the {\bf
-sched\_poll} extra argument). The semantics are similar to the UNIX
-{\bf poll} system call. The caller must have event-channel upcalls
-masked when executing this command.
-\end{description}
-\end{quote} 
-
-{\bf sched\_op\_new}  was not available prior to Xen 3.0.2. Older versions
-provide only the following hypercall:
-
-\begin{quote} 
-\hypercall{sched\_op(int cmd, unsigned long extra\_arg)}
-
-This hypercall supports the following subset of {\bf sched\_op\_new} commands:
-
-\begin{description}
-\item[SCHEDOP\_yield] (extra argument is 0).
-\item[SCHEDOP\_block] (extra argument is 0).
-\item[SCHEDOP\_shutdown] (extra argument is numeric reason code).
-\end{description}
-\end{quote}
-
-To aid the implementation of a process scheduler within a guest OS,
-Xen provides a virtual programmable timer:
-
-\begin{quote}
-\hypercall{set\_timer\_op(uint64\_t timeout)} 
-
-Request a timer event to be sent at the specified system time (time 
-in nanoseconds since system boot).
-
-\end{quote} 
-
-Note that calling {\bf set\_timer\_op} prior to {\bf sched\_op} 
-allows block-with-timeout semantics. 
-
-
-\section{Page Table Management} 
-
-Since guest operating systems have read-only access to their page 
-tables, Xen must be involved when making any changes. The following
-multi-purpose hypercall can be used to modify page-table entries, 
-update the machine-to-physical mapping table, flush the TLB, install 
-a new page-table base pointer, and more.
-
-\begin{quote} 
-\hypercall{mmu\_update(mmu\_update\_t *req, int count, int *success\_count)} 
-
-Update the page table for the domain; a set of {\bf count} updates are
-submitted for processing in a batch, with {\bf success\_count} being 
-updated to report the number of successful updates.  
-
-Each element of {\bf req[]} contains a pointer (address) and value; 
-the least significant 2-bits of the pointer are used to distinguish 
-the type of update requested as follows:
-\begin{description} 
-
-\item[MMU\_NORMAL\_PT\_UPDATE:] update a page directory entry or
-page table entry to the associated value; Xen will check that the
-update is safe, as described in Chapter~\ref{c:memory}.
-
-\item[MMU\_MACHPHYS\_UPDATE:] update an entry in the
-  machine-to-physical table. The calling domain must own the machine
-  page in question (or be privileged).
-\end{description}
-
-\end{quote}
-
-Explicitly updating batches of page table entries is extremely
-efficient, but can require a number of alterations to the guest
-OS. Using the writable page table mode (Chapter~\ref{c:memory}) is
-recommended for new OS ports.
-
-Regardless of which page table update mode is being used, however,
-there are some occasions (notably handling a demand page fault) where
-a guest OS will wish to modify exactly one PTE rather than a
-batch, and where that PTE is mapped into the current address space.
-This is catered for by the following:
-
-\begin{quote} 
-\hypercall{update\_va\_mapping(unsigned long va, uint64\_t val,
-                         unsigned long flags)}
-
-Update the currently installed PTE that maps virtual address {\bf va}
-to new value {\bf val}. As with {\bf mmu\_update}, Xen checks the
-modification  is safe before applying it. The {\bf flags} determine
-which kind of TLB flush, if any, should follow the update. 
-
-\end{quote} 
-
-Finally, sufficiently privileged domains may occasionally wish to manipulate 
-the pages of others: 
-
-\begin{quote}
-\hypercall{update\_va\_mapping\_otherdomain(unsigned long va, uint64\_t val,
-                         unsigned long flags, domid\_t domid)}
-
-Identical to {\bf update\_va\_mapping} save that the pages being
-mapped must belong to the domain {\bf domid}. 
-
-\end{quote}
-
-An additional MMU hypercall provides an ``extended command''
-interface.  This provides additional functionality beyond the basic
-table updating commands:
-
-\begin{quote}
-
-\hypercall{mmuext\_op(struct mmuext\_op *op, int count, int *success\_count, domid\_t domid)}
-
-This hypercall is used to perform additional MMU operations.  These
-include updating {\tt cr3} (or just re-installing it for a TLB flush),
-requesting various kinds of TLB flush, flushing the cache, installing
-a new LDT, or pinning \& unpinning page-table pages (to ensure their
-reference count doesn't drop to zero which would require a
-revalidation of all entries).  Some of the operations available are
-restricted to domains with sufficient system privileges.
-
-It is also possible for privileged domains to reassign page ownership
-via an extended MMU operation, although grant tables are used instead
-of this where possible; see Section~\ref{s:idc}.
-
-\end{quote}
-
-Finally, a hypercall interface is exposed to activate and deactivate
-various optional facilities provided by Xen for memory management.
-
-\begin{quote} 
-\hypercall{vm\_assist(unsigned int cmd, unsigned int type)}
-
-Toggle various memory management modes (in particular writable page
-tables).
-
-\end{quote} 
-
-\section{Segmentation Support}
-
-Xen allows guest OSes to install a custom GDT if they require it; 
-this is context switched transparently whenever a domain is 
-[de]scheduled.  The following hypercall is effectively a 
-`safe' version of {\tt lgdt}: 
-
-\begin{quote}
-\hypercall{set\_gdt(unsigned long *frame\_list, int entries)} 
-
-Install a global descriptor table for a domain; {\bf frame\_list} is
-an array of up to 16 machine page frames within which the GDT resides,
-with {\bf entries} being the actual number of descriptor-entry
-slots. All page frames must be mapped read-only within the guest's
-address space, and the table must be large enough to contain Xen's
-reserved entries (see {\bf xen/include/public/arch-x86\_32.h}).
-
-\end{quote}
-
-Many guest OSes will also wish to install LDTs; this is achieved by
-using {\bf mmu\_update} with an extended command, passing the
-linear address of the LDT base along with the number of entries. No
-special safety checks are required; Xen needs to perform this task
-simply since {\tt lldt} requires CPL 0.
-
-
-Xen also allows guest operating systems to update just an 
-individual segment descriptor in the GDT or LDT:  
-
-\begin{quote}
-\hypercall{update\_descriptor(uint64\_t ma, uint64\_t desc)}
-
-Update the GDT/LDT entry at machine address {\bf ma}; the new
-8-byte descriptor is stored in {\bf desc}.
-Xen performs a number of checks to ensure the descriptor is 
-valid. 
-
-\end{quote}
-
-Guest OSes can use the above in place of context switching entire 
-LDTs (or the GDT) when the number of changing descriptors is small. 
-
-\section{Context Switching} 
-
-When a guest OS wishes to context switch between two processes, 
-it can use the page table and segmentation hypercalls described
-above to perform the the bulk of the privileged work. In addition, 
-however, it will need to invoke Xen to switch the kernel (ring 1) 
-stack pointer: 
-
-\begin{quote} 
-\hypercall{stack\_switch(unsigned long ss, unsigned long esp)} 
-
-Request kernel stack switch from hypervisor; {\bf ss} is the new 
-stack segment, which {\bf esp} is the new stack pointer. 
-
-\end{quote} 
-
-A useful hypercall for context switching allows ``lazy'' save and
-restore of floating point state:
-
-\begin{quote}
-\hypercall{fpu\_taskswitch(int set)} 
-
-This call instructs Xen to set the {\tt TS} bit in the {\tt cr0}
-control register; this means that the next attempt to use floating
-point will cause a trap which the guest OS can trap. Typically it will
-then save/restore the FP state, and clear the {\tt TS} bit, using the
-same call.
-\end{quote} 
-
-This is provided as an optimization only; guest OSes can also choose
-to save and restore FP state on all context switches for simplicity. 
-
-Finally, a hypercall is provided for entering vm86 mode:
-
-\begin{quote}
-\hypercall{switch\_vm86}
-
-This allows the guest to run code in vm86 mode, which is needed for
-some legacy software.
-\end{quote}
-
-\section{Physical Memory Management}
-
-As mentioned previously, each domain has a maximum and current 
-memory allocation. The maximum allocation, set at domain creation 
-time, cannot be modified. However a domain can choose to reduce 
-and subsequently grow its current allocation by using the
-following call: 
-
-\begin{quote} 
-\hypercall{memory\_op(unsigned int op, void *arg)}
-
-Increase or decrease current memory allocation (as determined by 
-the value of {\bf op}).  The available operations are:
-
-\begin{description}
-\item[XENMEM\_increase\_reservation] Request an increase in machine
-  memory allocation; {\bf arg} must point to a {\bf
-  xen\_memory\_reservation} structure.
-\item[XENMEM\_decrease\_reservation] Request a decrease in machine
-  memory allocation; {\bf arg} must point to a {\bf
-  xen\_memory\_reservation} structure.
-\item[XENMEM\_maximum\_ram\_page] Request the frame number of the
-  highest-addressed frame of machine memory in the system.  {\bf arg}
-  must point to an {\bf unsigned long} where this value will be
-  stored.
-\item[XENMEM\_current\_reservation] Returns current memory reservation
-  of the specified domain.
-\item[XENMEM\_maximum\_reservation] Returns maximum memory reservation
-  of the specified domain.
-\end{description}
-
-\end{quote} 
-
-In addition to simply reducing or increasing the current memory
-allocation via a `balloon driver', this call is also useful for 
-obtaining contiguous regions of machine memory when required (e.g. 
-for certain PCI devices, or if using superpages).  
-
-
-\section{Inter-Domain Communication}
-\label{s:idc} 
-
-Xen provides a simple asynchronous notification mechanism via
-\emph{event channels}. Each domain has a set of end-points (or
-\emph{ports}) which may be bound to an event source (e.g. a physical
-IRQ, a virtual IRQ, or an port in another domain). When a pair of
-end-points in two different domains are bound together, then a `send'
-operation on one will cause an event to be received by the destination
-domain.
-
-The control and use of event channels involves the following hypercall: 
-
-\begin{quote}
-\hypercall{event\_channel\_op(evtchn\_op\_t *op)} 
-
-Inter-domain event-channel management; {\bf op} is a discriminated 
-union which allows the following 7 operations: 
-
-\begin{description} 
-
-\item[alloc\_unbound:] allocate a free (unbound) local
-  port and prepare for connection from a specified domain. 
-\item[bind\_virq:] bind a local port to a virtual 
-IRQ; any particular VIRQ can be bound to at most one port per domain. 
-\item[bind\_pirq:] bind a local port to a physical IRQ;
-once more, a given pIRQ can be bound to at most one port per
-domain. Furthermore the calling domain must be sufficiently
-privileged.
-\item[bind\_interdomain:] construct an interdomain event 
-channel; in general, the target domain must have previously allocated 
-an unbound port for this channel, although this can be bypassed by 
-privileged domains during domain setup. 
-\item[close:] close an interdomain event channel. 
-\item[send:] send an event to the remote end of a 
-interdomain event channel. 
-\item[status:] determine the current status of a local port. 
-\end{description} 
-
-For more details see
-{\bf xen/include/public/event\_channel.h}. 
-
-\end{quote} 
-
-Event channels are the fundamental communication primitive between 
-Xen domains and seamlessly support SMP. However they provide little
-bandwidth for communication {\sl per se}, and hence are typically 
-married with a piece of shared memory to produce effective and 
-high-performance inter-domain communication. 
-
-Safe sharing of memory pages between guest OSes is carried out by
-granting access on a per page basis to individual domains. This is
-achieved by using the {\tt grant\_table\_op} hypercall.
-
-\begin{quote}
-\hypercall{grant\_table\_op(unsigned int cmd, void *uop, unsigned int count)}
-
-Used to invoke operations on a grant reference, to setup the grant
-table and to dump the tables' contents for debugging.
-
-\end{quote} 
-
-\section{IO Configuration} 
-
-Domains with physical device access (i.e.\ driver domains) receive
-limited access to certain PCI devices (bus address space and
-interrupts). However many guest operating systems attempt to 
-determine the PCI configuration by directly access the PCI BIOS, 
-which cannot be allowed for safety. 
-
-Instead, Xen provides the following hypercall: 
-
-\begin{quote}
-\hypercall{physdev\_op(void *physdev\_op)}
-
-Set and query IRQ configuration details, set the system IOPL, set the
-TSS IO bitmap.
-
-\end{quote} 
-
-
-For examples of using {\tt physdev\_op}, see the 
-Xen-specific PCI code in the linux sparse tree. 
-
-\section{Administrative Operations}
-\label{s:dom0ops}
-
-A large number of control operations are available to a sufficiently
-privileged domain (typically domain 0). These allow the creation and
-management of new domains, for example. A complete list is given 
-below: for more details on any or all of these, please see 
-{\tt xen/include/public/dom0\_ops.h} 
-
-
-\begin{quote}
-\hypercall{dom0\_op(dom0\_op\_t *op)} 
-
-Administrative domain operations for domain management. The options are:
-
-\begin{description} 
-\item [DOM0\_GETMEMLIST:] get list of pages used by the domain
-
-\item [DOM0\_SCHEDCTL:]
-
-\item [DOM0\_ADJUSTDOM:] adjust scheduling priorities for domain
-
-\item [DOM0\_CREATEDOMAIN:] create a new domain
-
-\item [DOM0\_DESTROYDOMAIN:] deallocate all resources associated
-with a domain
-
-\item [DOM0\_PAUSEDOMAIN:] remove a domain from the scheduler run 
-queue. 
-
-\item [DOM0\_UNPAUSEDOMAIN:] mark a paused domain as schedulable
-  once again. 
-
-\item [DOM0\_GETDOMAININFO:] get statistics about the domain
-
-\item [DOM0\_SETDOMAININFO:] set VCPU-related attributes
-
-\item [DOM0\_MSR:] read or write model specific registers
-
-\item [DOM0\_DEBUG:] interactively invoke the debugger
-
-\item [DOM0\_SETTIME:] set system time
-
-\item [DOM0\_GETPAGEFRAMEINFO:] 
-
-\item [DOM0\_READCONSOLE:] read console content from hypervisor buffer ring
-
-\item [DOM0\_PINCPUDOMAIN:] pin domain to a particular CPU
-
-\item [DOM0\_TBUFCONTROL:] get and set trace buffer attributes
-
-\item [DOM0\_PHYSINFO:] get information about the host machine
-
-\item [DOM0\_SCHED\_ID:] get the ID of the current Xen scheduler
-
-\item [DOM0\_SHADOW\_CONTROL:] switch between shadow page-table modes
-
-\item [DOM0\_SETDOMAINMAXMEM:] set maximum memory allocation of a domain
-
-\item [DOM0\_GETPAGEFRAMEINFO2:] batched interface for getting
-page frame info
-
-\item [DOM0\_ADD\_MEMTYPE:] set MTRRs
-
-\item [DOM0\_DEL\_MEMTYPE:] remove a memory type range
-
-\item [DOM0\_READ\_MEMTYPE:] read MTRR
-
-\item [DOM0\_PERFCCONTROL:] control Xen's software performance
-counters
-
-\item [DOM0\_MICROCODE:] update CPU microcode
-
-\item [DOM0\_IOPORT\_PERMISSION:] modify domain permissions for an
-IO port range (enable / disable a range for a particular domain)
-
-\item [DOM0\_GETVCPUCONTEXT:] get context from a VCPU
-
-\item [DOM0\_GETVCPUINFO:] get current state for a VCPU
-\item [DOM0\_GETDOMAININFOLIST:] batched interface to get domain
-info
-
-\item [DOM0\_PLATFORM\_QUIRK:] inform Xen of a platform quirk it
-needs to handle (e.g. noirqbalance)
-
-\item [DOM0\_PHYSICAL\_MEMORY\_MAP:] get info about dom0's memory
-map
-
-\item [DOM0\_MAX\_VCPUS:] change max number of VCPUs for a domain
-
-\item [DOM0\_SETDOMAINHANDLE:] set the handle for a domain
-
-\end{description} 
-\end{quote} 
-
-Most of the above are best understood by looking at the code 
-implementing them (in {\tt xen/common/dom0\_ops.c}) and in 
-the user-space tools that use them (mostly in {\tt tools/libxc}). 
-
-\section{Debugging Hypercalls} 
-
-A few additional hypercalls are mainly useful for debugging: 
-
-\begin{quote} 
-\hypercall{console\_io(int cmd, int count, char *str)}
-
-Use Xen to interact with the console; operations are:
-
-{CONSOLEIO\_write}: Output count characters from buffer str.
-
-{CONSOLEIO\_read}: Input at most count characters into buffer str.
-\end{quote} 
-
-A pair of hypercalls allows access to the underlying debug registers: 
-\begin{quote}
-\hypercall{set\_debugreg(int reg, unsigned long value)}
-
-Set debug register {\bf reg} to {\bf value} 
-
-\hypercall{get\_debugreg(int reg)}
-
-Return the contents of the debug register {\bf reg}
-\end{quote}
-
-And finally: 
-\begin{quote}
-\hypercall{xen\_version(int cmd)}
-
-Request Xen version number.
-\end{quote} 
-
-This is useful to ensure that user-space tools are in sync 
-with the underlying hypervisor. 
-
-
-\end{document}