aboutsummaryrefslogtreecommitdiffstats
path: root/docs/interface.tex
diff options
context:
space:
mode:
authorkaf24@freefall.cl.cam.ac.uk <kaf24@freefall.cl.cam.ac.uk>2004-10-26 14:12:48 +0000
committerkaf24@freefall.cl.cam.ac.uk <kaf24@freefall.cl.cam.ac.uk>2004-10-26 14:12:48 +0000
commitcadf3eafb224880e277913a011a7e358517551a6 (patch)
tree4479b17ef7e947d4eaea752adaeaff6cdf27c10e /docs/interface.tex
parentf1816bd031a1dd3e7927d88b3345b4fbb139ea11 (diff)
downloadxen-cadf3eafb224880e277913a011a7e358517551a6.tar.gz
xen-cadf3eafb224880e277913a011a7e358517551a6.tar.bz2
xen-cadf3eafb224880e277913a011a7e358517551a6.zip
bitkeeper revision 1.1159.1.282 (417e5b60oXUOPTWl81d5zD6oq1tOTQ)
Clean up docs dir layout.
Diffstat (limited to 'docs/interface.tex')
-rw-r--r--docs/interface.tex901
1 files changed, 0 insertions, 901 deletions
diff --git a/docs/interface.tex b/docs/interface.tex
deleted file mode 100644
index 93b6143d39..0000000000
--- a/docs/interface.tex
+++ /dev/null
@@ -1,901 +0,0 @@
-\documentclass[11pt,twoside,final,openright]{xenstyle}
-\usepackage{a4,graphicx,setspace}
-\setstretch{1.15}
-\input{style.tex}
-
-\begin{document}
-
-% TITLE PAGE
-\pagestyle{empty}
-\begin{center}
-\vspace*{\fill}
-\includegraphics{eps/xenlogo.eps}
-\vfill
-\vfill
-\vfill
-\begin{tabular}{l}
-{\Huge \bf Interface manual} \\[4mm]
-{\huge Xen v2.0 for x86} \\[80mm]
-
-{\Large Xen is Copyright (c) 2004, The Xen Team} \\[3mm]
-{\Large University of Cambridge, UK} \\[20mm]
-{\large Last updated on 11th March, 2004}
-\end{tabular}
-\vfill
-\end{center}
-\cleardoublepage
-
-% TABLE OF CONTENTS
-\pagestyle{plain}
-\pagenumbering{roman}
-{ \parskip 0pt plus 1pt
- \tableofcontents }
-\cleardoublepage
-
-% PREPARE FOR MAIN TEXT
-\pagenumbering{arabic}
-\raggedbottom
-\widowpenalty=10000
-\clubpenalty=10000
-\parindent=0pt
-\renewcommand{\topfraction}{.8}
-\renewcommand{\bottomfraction}{.8}
-\renewcommand{\textfraction}{.2}
-\renewcommand{\floatpagefraction}{.8}
-\setstretch{1.15}
-
-\chapter{Introduction}
-Xen allows the hardware resouces of a machine to be virtualized and
-dynamically partitioned such as to allow multiple different 'guest'
-operating system images to be run simultaneously.
-
-Virtualizing the machine in this manner provides flexibility allowing
-different users to choose their preferred operating system (Windows,
-Linux, NetBSD, or a custom operating system). Furthermore, Xen provides
-secure partitioning between these 'domains', and enables better resource
-accounting and QoS isolation than can be achieved with a conventional
-operating system.
-
-The hypervisor runs directly on server hardware and dynamically partitions
-it between a number of {\it domains}, each of which hosts an instance
-of a {\it guest operating system}. The hypervisor provides just enough
-abstraction of the machine to allow effective isolation and resource
-management between these domains.
-
-Xen essentially takes a virtual machine approach as pioneered by IBM
-VM/370. However, unlike VM/370 or more recent efforts such as VMWare
-and Virtual PC, Xen doesn not attempt to completely virtualize the
-underlying hardware. Instead parts of the hosted guest operating
-systems are modified to work with the hypervisor; the operating system
-is effectively ported to a new target architecture, typically
-requiring changes in just the machine-dependent code. The user-level
-API is unchanged, thus existing binaries and operating system
-distributions can work unmodified.
-
-In addition to exporting virtualized instances of CPU, memory, network and
-block devicees, Xen exposes a control interface to set how these resources
-are shared between the running domains. The control interface is privileged
-and may only be accessed by one particular virtual machine: {\it domain0}.
-This domain is a required part of any Xen-base server and runs the application
-software that manages the control-plane aspects of the platform. Running the
-control software in {\it domain0}, distinct from the hypervisor itself, allows
-the Xen framework to separate the notions of {\it mechanism} and {\it policy}
-within the system.
-
-
-\chapter{CPU state}
-
-All privileged state must be handled by Xen. The guest OS has no
-direct access to CR3 and is not permitted to update privileged bits in
-EFLAGS.
-
-\chapter{Exceptions}
-The IDT is virtualised by submitting a virtual 'trap
-table' to Xen. Most trap handlers are identical to native x86
-handlers. The page-fault handler is a noteable exception.
-
-\chapter{Interrupts and events}
-Interrupts are virtualized by mapping them to events, which are delivered
-asynchronously to the target domain. A guest OS can map these events onto
-its standard interrupt dispatch mechanisms, such as a simple vectoring
-scheme. Each physical interrupt source controlled by the hypervisor, including
-network devices, disks, or the timer subsystem, is responsible for identifying
-the target for an incoming interrupt and sending an event to that domain.
-
-This demultiplexing mechanism also provides a device-specific mechanism for
-event coalescing or hold-off. For example, a guest OS may request to only
-actually receive an event after {\it n} packets are queued ready for delivery
-to it, {\it t} nanoseconds after the first packet arrived (which ever is true
-first). This allows latency and throughput requirements to be addressed on a
-domain-specific basis.
-
-\chapter{Time}
-Guest operating systems need to be aware of the passage of real time and their
-own ``virtual time'', i.e. the time they have been executing. Furthermore, a
-notion of time is required in the hypervisor itself for scheduling and the
-activities that relate to it. To this end the hypervisor provides for notions
-of time: cycle counter time, system time, wall clock time, domain virtual
-time.
-
-
-\section{Cycle counter time}
-This provides the finest-grained, free-running time reference, with the
-approximate frequency being publicly accessible. The cycle counter time is
-used to accurately extrapolate the other time references. On SMP machines
-it is currently assumed that the cycle counter time is synchronised between
-CPUs. The current x86-based implementation achieves this within inter-CPU
-communication latencies.
-
-\section{System time}
-This is a 64-bit value containing the nanoseconds elapsed since boot
-time. Unlike cycle counter time, system time accurately reflects the
-passage of real time, i.e. it is adjusted several times a second for timer
-drift. This is done by running an NTP client in {\it domain0} on behalf of
-the machine, feeding updates to the hypervisor. Intermediate values can be
-extrapolated using the cycle counter.
-
-\section{Wall clock time}
-This is the actual ``time of day'' Unix style struct timeval (i.e. seconds and
-microseconds since 1 January 1970, adjusted by leap seconds etc.). Again, an
-NTP client hosted by {\it domain0} can help maintain this value. To guest
-operating systems this value will be reported instead of the hardware RTC
-clock value and they can use the system time and cycle counter times to start
-and remain perfectly in time.
-
-
-\section{Domain virtual time}
-This progresses at the same pace as cycle counter time, but only while a
-domain is executing. It stops while a domain is de-scheduled. Therefore the
-share of the CPU that a domain receives is indicated by the rate at which
-its domain virtual time increases, relative to the rate at which cycle
-counter time does so.
-
-\section{Time interface}
-Xen exports some timestamps to guest operating systems through their shared
-info page. Timestamps are provided for system time and wall-clock time. Xen
-also provides the cycle counter values at the time of the last update
-allowing guests to calculate the current values. The cpu frequency and a
-scaling factor are provided for guests to convert cycle counter values to
-real time. Since all time stamps need to be updated and read
-\emph{atomically} two version numbers are also stored in the shared info
-page.
-
-Xen will ensure that the time stamps are updated frequently enough to avoid
-an overflow of the cycle counter values. A guest can check if its notion of
-time is up-to-date by comparing the version numbers.
-
-\section{Timer events}
-
-Xen maintains a periodic timer (currently with a 10ms period) which sends a
-timer event to the currently executing domain. This allows Guest OSes to
-keep track of the passing of time when executing. The scheduler also
-arranges for a newly activated domain to receive a timer event when
-scheduled so that the Guest OS can adjust to the passage of time while it
-has been inactive.
-
-In addition, Xen exports a hypercall interface to each domain which allows
-them to request a timer event sent to them at the specified system
-time. Guest OSes may use this timer to implement timeout values when they
-block.
-
-\chapter{Memory}
-
-The hypervisor is responsible for providing memory to each of the
-domains running over it. However, the Xen hypervisor's duty is
-restricted to managing physical memory and to policying page table
-updates. All other memory management functions are handled
-externally. Start-of-day issues such as building initial page tables
-for a domain, loading its kernel image and so on are done by the {\it
-domain builder} running in user-space in {\it domain0}. Paging to
-disk and swapping is handled by the guest operating systems
-themselves, if they need it.
-
-On a Xen-based system, the hypervisor itself runs in {\it ring 0}. It
-has full access to the physical memory available in the system and is
-responsible for allocating portions of it to the domains. Guest
-operating systems run in and use {\it rings 1}, {\it 2} and {\it 3} as
-they see fit, aside from the fact that segmentation is used to prevent
-the guest OS from accessing a portion of the linear address space that
-is reserved for use by the hypervisor. This approach allows
-transitions between the guest OS and hypervisor without flushing the
-TLB. We expect most guest operating systems will use ring 1 for their
-own operation and place applications (if they support such a notion)
-in ring 3.
-
-\section{Physical Memory Allocation}
-The hypervisor reserves a small fixed portion of physical memory at
-system boot time. This special memory region is located at the
-beginning of physical memory and is mapped at the very top of every
-virtual address space.
-
-Any physical memory that is not used directly by the hypervisor is divided into
-pages and is available for allocation to domains. The hypervisor tracks which
-pages are free and which pages have been allocated to each domain. When a new
-domain is initialized, the hypervisor allocates it pages drawn from the free
-list. The amount of memory required by the domain is passed to the hypervisor
-as one of the parameters for new domain initialization by the domain builder.
-
-Domains can never be allocated further memory beyond that which was
-requested for them on initialization. However, a domain can return
-pages to the hypervisor if it discovers that its memory requirements
-have diminished.
-
-% put reasons for why pages might be returned here.
-\section{Page Table Updates}
-In addition to managing physical memory allocation, the hypervisor is also in
-charge of performing page table updates on behalf of the domains. This is
-neccessary to prevent domains from adding arbitrary mappings to their page
-tables or introducing mappings to other's page tables.
-
-\section{Writabel Page Tables}
-A domain can also request write access to its page tables. In this
-mode, Xen notes write attempts to page table pages and makes the page
-temporarily writable. In-use page table pages are also disconnect
-from the page directory. The domain can now update entries in these
-page table pages without the assistance of Xen. As soon as the
-writabel page table pages get used as page table pages, Xen makes the
-pages read-only again and revalidates the entries in the pages.
-
-\section{Segment Descriptor Tables}
-
-On boot a guest is supplied with a default GDT, which is {\em not}
-taken from its own memory allocation. If the guest wishes to use other
-than the default `flat' ring-1 and ring-3 segments that this default
-table provides, it must register a custom GDT and/or LDT with Xen,
-allocated from its own memory.
-
-int {\bf set\_gdt}(unsigned long *{\em frame\_list}, int {\em entries})
-
-{\em frame\_list}: An array of up to 16 page frames within which the
-GDT resides. Any frame registered as a GDT frame may only be mapped
-read-only within the guest's address space (e.g., no writable
-mappings, no use as a page-table page, and so on).
-
-{\em entries}: The number of descriptor-entry slots in the GDT. Note
-that the table must be large enough to contain Xen's reserved entries;
-thus we must have '{\em entries $>$ LAST\_RESERVED\_GDT\_ENTRY}'.
-Note also that, after registering the GDT, slots {\em FIRST\_} through
-{\em LAST\_RESERVED\_GDT\_ENTRY} are no longer usable by the guest and
-may be overwritten by Xen.
-
-\section{Pseudo-Physical Memory}
-The usual problem of external fragmentation means that a domain is
-unlikely to receive a contiguous stretch of physical memory. However,
-most guest operating systems do not have built-in support for
-operating in a fragmented physical address space e.g. Linux has to
-have a one-to-one mapping for its physical memory. There a notion of
-{\it pseudo physical memory} is introdouced. Xen maintains a {\it
-real physical} to {\it pseudo physical} mapping which can be consulted
-by every domain. Additionally, at its start of day, a domain is
-supplied a {\it pseudo physical} to {\it real physical} mapping which
-it needs to keep updated itself. From that moment onwards {\it pseudo
-physical} addresses are used instead of discontiguous {\it real
-physical} addresses. Thus, the rest of the guest OS code has an
-impression of operating in a contiguous address space. Guest OS page
-tables contain real physical addresses. Mapping {\it pseudo physical}
-to {\it real physical} addresses is needed on page table updates and
-also on remapping memory regions with the guest OS.
-
-
-
-\chapter{Network I/O}
-
-Virtual network device services are provided by shared memory
-communications with a `backend' domain. From the point of view of
-other domains, the backend may be viewed as a virtual ethernet switch
-element with each domain having one or more virtual network interfaces
-connected to it.
-
-\section{Backend Packet Handling}
-The backend driver is responsible primarily for {\it data-path} operations.
-In terms of networking this means packet transmission and reception.
-
-On the transmission side, the backend needs to perform two key actions:
-\begin{itemize}
-\item {\tt Validation:} A domain may only be allowed to emit packets
-matching a certain specification; for example, ones in which the
-source IP address matches one assigned to the virtual interface over
-which it is sent. The backend would be responsible for ensuring any
-such requirements are met, either by checking or by stamping outgoing
-packets with prescribed values for certain fields.
-
-Validation functions can be configured using standard firewall rules
-(i.e. IP Tables, in the case of Linux).
-
-\item {\tt Scheduling:} Since a number of domains can share a single
-``real'' network interface, the hypervisor must mediate access when
-several domains each have packets queued for transmission. Of course,
-this general scheduling function subsumes basic shaping or
-rate-limiting schemes.
-
-\item {\tt Logging and Accounting:} The hypervisor can be configured
-with classifier rules that control how packets are accounted or
-logged. For example, {\it domain0} could request that it receives a
-log message or copy of the packet whenever another domain attempts to
-send a TCP packet containg a SYN.
-\end{itemize}
-
-On the recive side, the backend's role is relatively straightforward:
-once a packet is received, it just needs to determine the virtual interface(s)
-to which it must be delivered and deliver it via page-flipping.
-
-
-\section{Data Transfer}
-
-Each virtual interface uses two ``descriptor rings'', one for transmit,
-the other for receive. Each descriptor identifies a block of contiguous
-physical memory allocated to the domain. There are four cases:
-
-\begin{itemize}
-
-\item The transmit ring carries packets to transmit from the domain to the
-hypervisor.
-
-\item The return path of the transmit ring carries ``empty'' descriptors
-indicating that the contents have been transmitted and the memory can be
-re-used.
-
-\item The receive ring carries empty descriptors from the domain to the
-hypervisor; these provide storage space for that domain's received packets.
-
-\item The return path of the receive ring carries packets that have been
-received.
-\end{itemize}
-
-Real physical addresses are used throughout, with the domain performing
-translation from pseudo-physical addresses if that is necessary.
-
-If a domain does not keep its receive ring stocked with empty buffers then
-packets destined to it may be dropped. This provides some defense against
-receiver-livelock problems because an overload domain will cease to receive
-further data. Similarly, on the transmit path, it provides the application
-with feedback on the rate at which packets are able to leave the system.
-
-Synchronization between the hypervisor and the domain is achieved using
-counters held in shared memory that is accessible to both. Each ring has
-associated producer and consumer indices indicating the area in the ring
-that holds descriptors that contain data. After receiving {\it n} packets
-or {\t nanoseconds} after receiving the first packet, the hypervisor sends
-an event to the domain.
-
-\chapter{Block I/O}
-
-\section{Virtual Block Devices (VBDs)}
-
-All guest OS disk access goes through the VBD interface. The VBD
-interface provides the administrator with the ability to selectively
-grant domains access to portions of block storage devices visible to
-the the block backend device (usually domain 0).
-
-VBDs can literally be backed by any block device accessible to the
-backend domain, including network-based block devices (iSCSI, *NBD,
-etc), loopback devices and LVM / MD devices.
-
-Old (Xen 1.2) virtual disks are not supported under Xen 2.0, since
-similar functionality can be achieved using the (more advanced) LVM
-system, which is already in widespread use.
-
-\subsection{Data Transfer}
-Domains which have been granted access to a logical block device are permitted
-to read and write it by shared memory communications with the backend domain.
-
-In overview, the same style of descriptor-ring that is used for
-network packets is used here. Each domain has one ring that carries
-operation requests to the hypervisor and carries the results back
-again.
-
-Rather than copying data, the backend simply maps the domain's buffers
-in order to enable direct DMA to them. The act of mapping the buffers
-also increases the reference counts of the underlying pages, so that
-the unprivileged domain cannot try to return them to the hypervisor,
-install them as page tables, or any other unsafe behaviour.
-%block API here
-
-\chapter{Privileged operations}
-{\it Domain0} is responsible for building all other domains on the server
-and providing control interfaces for managing scheduling, networking, and
-blocks.
-
-\chapter{CPU Scheduler}
-
-Xen offers a uniform API for CPU schedulers. It is possible to choose
-from a number of schedulers at boot and it should be easy to add more.
-
-\paragraph*{Note: SMP host support}
-Xen has always supported SMP host systems. Domains are statically assigned to
-CPUs, either at creation time or when manually pinning to a particular CPU.
-The current schedulers then run locally on each CPU to decide which of the
-assigned domains should be run there.
-
-\section{Standard Schedulers}
-
-These BVT, Atropos and Round Robin schedulers are part of the normal
-Xen distribution. BVT provides proportional fair shares of the CPU to
-the running domains. Atropos can be used to reserve absolute shares
-of the CPU for each domain. Round-robin is provided as an example of
-Xen's internal scheduler API.
-
-More information on the characteristics and use of these schedulers is
-available in { \tt Sched-HOWTO.txt }.
-
-\section{Scheduling API}
-
-The scheduling API is used by both the schedulers described above and should
-also be used by any new schedulers. It provides a generic interface and also
-implements much of the ``boilerplate'' code.
-
-Schedulers conforming to this API are described by the following
-structure:
-
-\begin{verbatim}
-struct scheduler
-{
- char *name; /* full name for this scheduler */
- char *opt_name; /* option name for this scheduler */
- unsigned int sched_id; /* ID for this scheduler */
-
- int (*init_scheduler) ();
- int (*alloc_task) (struct task_struct *);
- void (*add_task) (struct task_struct *);
- void (*free_task) (struct task_struct *);
- void (*rem_task) (struct task_struct *);
- void (*wake_up) (struct task_struct *);
- void (*do_block) (struct task_struct *);
- task_slice_t (*do_schedule) (s_time_t);
- int (*control) (struct sched_ctl_cmd *);
- int (*adjdom) (struct task_struct *,
- struct sched_adjdom_cmd *);
- s32 (*reschedule) (struct task_struct *);
- void (*dump_settings) (void);
- void (*dump_cpu_state) (int);
- void (*dump_runq_el) (struct task_struct *);
-};
-\end{verbatim}
-
-The only method that {\em must} be implemented is
-{\tt do\_schedule()}. However, if there is not some implementation for the
-{\tt wake\_up()} method then waking tasks will not get put on the runqueue!
-
-The fields of the above structure are described in more detail below.
-
-\subsubsection{name}
-
-The name field should point to a descriptive ASCII string.
-
-\subsubsection{opt\_name}
-
-This field is the value of the {\tt sched=} boot-time option that will select
-this scheduler.
-
-\subsubsection{sched\_id}
-
-This is an integer that uniquely identifies this scheduler. There should be a
-macro corrsponding to this scheduler ID in {\tt <hypervisor-ifs/sched-if.h>}.
-
-\subsubsection{init\_scheduler}
-
-\paragraph*{Purpose}
-
-This is a function for performing any scheduler-specific initialisation. For
-instance, it might allocate memory for per-CPU scheduler data and initialise it
-appropriately.
-
-\paragraph*{Call environment}
-
-This function is called after the initialisation performed by the generic
-layer. The function is called exactly once, for the scheduler that has been
-selected.
-
-\paragraph*{Return values}
-
-This should return negative on failure --- this will cause an
-immediate panic and the system will fail to boot.
-
-\subsubsection{alloc\_task}
-
-\paragraph*{Purpose}
-Called when a {\tt task\_struct} is allocated by the generic scheduler
-layer. A particular scheduler implementation may use this method to
-allocate per-task data for this task. It may use the {\tt
-sched\_priv} pointer in the {\tt task\_struct} to point to this data.
-
-\paragraph*{Call environment}
-The generic layer guarantees that the {\tt sched\_priv} field will
-remain intact from the time this method is called until the task is
-deallocated (so long as the scheduler implementation does not change
-it explicitly!).
-
-\paragraph*{Return values}
-Negative on failure.
-
-\subsubsection{add\_task}
-
-\paragraph*{Purpose}
-
-Called when a task is initially added by the generic layer.
-
-\paragraph*{Call environment}
-
-The fields in the {\tt task\_struct} are now filled out and available for use.
-Schedulers should implement appropriate initialisation of any per-task private
-information in this method.
-
-\subsubsection{free\_task}
-
-\paragraph*{Purpose}
-
-Schedulers should free the space used by any associated private data
-structures.
-
-\paragraph*{Call environment}
-
-This is called when a {\tt task\_struct} is about to be deallocated.
-The generic layer will have done generic task removal operations and
-(if implemented) called the scheduler's {\tt rem\_task} method before
-this method is called.
-
-\subsubsection{rem\_task}
-
-\paragraph*{Purpose}
-
-This is called when a task is being removed from scheduling (but is
-not yet being freed).
-
-\subsubsection{wake\_up}
-
-\paragraph*{Purpose}
-
-Called when a task is woken up, this method should put the task on the runqueue
-(or do the scheduler-specific equivalent action).
-
-\paragraph*{Call environment}
-
-The task is already set to state RUNNING.
-
-\subsubsection{do\_block}
-
-\paragraph*{Purpose}
-
-This function is called when a task is blocked. This function should
-not remove the task from the runqueue.
-
-\paragraph*{Call environment}
-
-The EVENTS\_MASTER\_ENABLE\_BIT is already set and the task state changed to
-TASK\_INTERRUPTIBLE on entry to this method. A call to the {\tt
- do\_schedule} method will be made after this method returns, in
-order to select the next task to run.
-
-\subsubsection{do\_schedule}
-
-This method must be implemented.
-
-\paragraph*{Purpose}
-
-The method is called each time a new task must be chosen for scheduling on the
-current CPU. The current time as passed as the single argument (the current
-task can be found using the {\tt current} macro).
-
-This method should select the next task to run on this CPU and set it's minimum
-time to run as well as returning the data described below.
-
-This method should also take the appropriate action if the previous
-task has blocked, e.g. removing it from the runqueue.
-
-\paragraph*{Call environment}
-
-The other fields in the {\tt task\_struct} are updated by the generic layer,
-which also performs all Xen-specific tasks and performs the actual task switch
-(unless the previous task has been chosen again).
-
-This method is called with the {\tt schedule\_lock} held for the current CPU
-and local interrupts disabled.
-
-\paragraph*{Return values}
-
-Must return a {\tt struct task\_slice} describing what task to run and how long
-for (at maximum).
-
-\subsubsection{control}
-
-\paragraph*{Purpose}
-
-This method is called for global scheduler control operations. It takes a
-pointer to a {\tt struct sched\_ctl\_cmd}, which it should either
-source data from or populate with data, depending on the value of the
-{\tt direction} field.
-
-\paragraph*{Call environment}
-
-The generic layer guarantees that when this method is called, the
-caller selected the correct scheduler ID, hence the scheduler's
-implementation does not need to sanity-check these parts of the call.
-
-\paragraph*{Return values}
-
-This function should return the value to be passed back to user space, hence it
-should either be 0 or an appropriate errno value.
-
-\subsubsection{sched\_adjdom}
-
-\paragraph*{Purpose}
-
-This method is called to adjust the scheduling parameters of a particular
-domain, or to query their current values. The function should check
-the {\tt direction} field of the {\tt sched\_adjdom\_cmd} it receives in
-order to determine which of these operations is being performed.
-
-\paragraph*{Call environment}
-
-The generic layer guarantees that the caller has specified the correct
-control interface version and scheduler ID and that the supplied {\tt
-task\_struct} will not be deallocated during the call (hence it is not
-necessary to {\tt get\_task\_struct}).
-
-\paragraph*{Return values}
-
-This function should return the value to be passed back to user space, hence it
-should either be 0 or an appropriate errno value.
-
-\subsubsection{reschedule}
-
-\paragraph*{Purpose}
-
-This method is called to determine if a reschedule is required as a result of a
-particular task.
-
-\paragraph*{Call environment}
-The generic layer will cause a reschedule if the current domain is the idle
-task or it has exceeded its minimum time slice before a reschedule. The
-generic layer guarantees that the task passed is not currently running but is
-on the runqueue.
-
-\paragraph*{Return values}
-
-Should return a mask of CPUs to cause a reschedule on.
-
-\subsubsection{dump\_settings}
-
-\paragraph*{Purpose}
-
-If implemented, this should dump any private global settings for this
-scheduler to the console.
-
-\paragraph*{Call environment}
-
-This function is called with interrupts enabled.
-
-\subsubsection{dump\_cpu\_state}
-
-\paragraph*{Purpose}
-
-This method should dump any private settings for the specified CPU.
-
-\paragraph*{Call environment}
-
-This function is called with interrupts disabled and the {\tt schedule\_lock}
-for the specified CPU held.
-
-\subsubsection{dump\_runq\_el}
-
-\paragraph*{Purpose}
-
-This method should dump any private settings for the specified task.
-
-\paragraph*{Call environment}
-
-This function is called with interrupts disabled and the {\tt schedule\_lock}
-for the task's CPU held.
-
-
-\chapter{Debugging}
-
-Xen provides tools for debugging both Xen and guest OSes. Currently, the
-Pervasive Debugger provides a GDB stub, which provides facilities for symbolic
-debugging of Xen itself and of OS kernels running on top of Xen. The Trace
-Buffer provides a lightweight means to log data about Xen's internal state and
-behaviour at runtime, for later analysis.
-
-\section{Pervasive Debugger}
-
-Information on using the pervasive debugger is available in pdb.txt.
-
-
-\section{Trace Buffer}
-
-The trace buffer provides a means to observe Xen's operation from domain 0.
-Trace events, inserted at key points in Xen's code, record data that can be
-read by the {\tt xentrace} tool. Recording these events has a low overhead
-and hence the trace buffer may be useful for debugging timing-sensitive
-behaviours.
-
-\subsection{Internal API}
-
-To use the trace buffer functionality from within Xen, you must {\tt \#include
-<xen/trace.h>}, which contains definitions related to the trace buffer. Trace
-events are inserted into the buffer using the {\tt TRACE\_xD} ({\tt x} = 0, 1,
-2, 3, 4 or 5) macros. These all take an event number, plus {\tt x} additional
-(32-bit) data as their arguments. For trace buffer-enabled builds of Xen these
-will insert the event ID and data into the trace buffer, along with the current
-value of the CPU cycle-counter. For builds without the trace buffer enabled,
-the macros expand to no-ops and thus can be left in place without incurring
-overheads.
-
-\subsection{Trace-enabled builds}
-
-By default, the trace buffer is enabled only in debug builds (i.e. {\tt NDEBUG}
-is not defined). It can be enabled separately by defining {\tt TRACE\_BUFFER},
-either in {\tt <xen/config.h>} or on the gcc command line.
-
-The size (in pages) of the per-CPU trace buffers can be specified using the
-{\tt tbuf\_size=n } boot parameter to Xen. If the size is set to 0, the trace
-buffers will be disabled.
-
-\subsection{Dumping trace data}
-
-When running a trace buffer build of Xen, trace data are written continuously
-into the buffer data areas, with newer data overwriting older data. This data
-can be captured using the {\tt xentrace} program in Domain 0.
-
-The {\tt xentrace} tool uses {\tt /dev/mem} in domain 0 to map the trace
-buffers into its address space. It then periodically polls all the buffers for
-new data, dumping out any new records from each buffer in turn. As a result,
-for machines with multiple (logical) CPUs, the trace buffer output will not be
-in overall chronological order.
-
-The output from {\tt xentrace} can be post-processed using {\tt
-xentrace\_cpusplit} (used to split trace data out into per-cpu log files) and
-{\tt xentrace\_format} (used to pretty-print trace data). For the predefined
-trace points, there is an example format file in {\tt tools/xentrace/formats }.
-
-For more information, see the manual pages for {\tt xentrace}, {\tt
-xentrace\_format} and {\tt xentrace\_cpusplit}.
-
-
-\chapter{Hypervisor calls}
-
-\section{ set\_trap\_table(trap\_info\_t *table)}
-
-Install trap handler table.
-
-\section{ mmu\_update(mmu\_update\_t *req, int count, int *success\_count)}
-Update the page table for the domain. Updates can be batched.
-success\_count will be updated to report the number of successfull
-updates. The update types are:
-
-{\it MMU\_NORMAL\_PT\_UPDATE}:
-
-{\it MMU\_MACHPHYS\_UPDATE}:
-
-{\it MMU\_EXTENDED\_COMMAND}:
-
-\section{ set\_gdt(unsigned long *frame\_list, int entries)}
-Set the global descriptor table - virtualization for lgdt.
-
-\section{ stack\_switch(unsigned long ss, unsigned long esp)}
-Request context switch from hypervisor.
-
-\section{ set\_callbacks(unsigned long event\_selector, unsigned long event\_address,
- unsigned long failsafe\_selector, unsigned
- long failsafe\_address) } Register OS event processing routine. In
- Linux both the event\_selector and failsafe\_selector are the
- kernel's CS. The value event\_address specifies the address for an
- interrupt handler dispatch routine and failsafe\_address specifies a
- handler for application faults.
-
-\section{ fpu\_taskswitch(void)}
-Notify hypervisor that fpu registers needed to be save on context switch.
-
-\section{ sched\_op(unsigned long op)}
-Request scheduling operation from hypervisor. The options are: {\it
-yield}, {\it block}, and {\it shutdown}. {\it yield} keeps the
-calling domain run-able but may cause a reschedule if other domains
-are run-able. {\it block} removes the calling domain from the run
-queue and the domains sleeps until an event is delivered to it. {\it
-shutdown} is used to end the domain's execution and allows to specify
-whether the domain should reboot, halt or suspend..
-
-\section{ dom0\_op(dom0\_op\_t *op)}
-Administrative domain operations for domain management. The options are:
-
-{\it DOM0\_CREATEDOMAIN}: create new domain, specifying the name and memory usage
-in kilobytes.
-
-{\it DOM0\_CREATEDOMAIN}: create domain
-
-{\it DOM0\_PAUSEDOMAIN}: mark domain as unschedulable
-
-{\it DOM0\_UNPAUSEDOMAIN}: mark domain as schedulable
-
-{\it DOM0\_DESTROYDOMAIN}: deallocate resources associated with the domain
-
-{\it DOM0\_GETMEMLIST}: get list of pages used by the domain
-
-{\it DOM0\_SCHEDCTL}:
-
-{\it DOM0\_ADJUSTDOM}: adjust scheduling priorities for domain
-
-{\it DOM0\_BUILDDOMAIN}: do final guest OS setup for domain
-
-{\it DOM0\_GETDOMAINFO}: get statistics about the domain
-
-{\it DOM0\_GETPAGEFRAMEINFO}:
-
-{\it DOM0\_IOPL}: set IO privilege level
-
-{\it DOM0\_MSR}:
-
-{\it DOM0\_DEBUG}: interactively call pervasive debugger
-
-{\it DOM0\_SETTIME}: set system time
-
-{\it DOM0\_READCONSOLE}: read console content from hypervisor buffer ring
-
-{\it DOM0\_PINCPUDOMAIN}: pin domain to a particular CPU
-
-{\it DOM0\_GETTBUFS}: get information about the size and location of
- the trace buffers (only on trace-buffer enabled builds)
-
-{\it DOM0\_PHYSINFO}: get information about the host machine
-
-{\it DOM0\_PCIDEV\_ACCESS}: modify PCI device access permissions
-
-{\it DOM0\_SCHED\_ID}: get the ID of the current Xen scheduler
-
-{\it DOM0\_SHADOW\_CONTROL}:
-
-{\it DOM0\_SETDOMAINNAME}: set the name of a domain
-
-{\it DOM0\_SETDOMAININITIALMEM}: set initial memory allocation of a domain
-
-{\it DOM0\_SETDOMAINMAXMEM}: set maximum memory allocation of a domain
-
-{\it DOM0\_GETPAGEFRAMEINFO2}:
-
-{\it DOM0\_SETDOMAINVMASSIST}: set domain VM assist options
-
-
-\section{ set\_debugreg(int reg, unsigned long value)}
-set debug register reg to value
-
-\section{ get\_debugreg(int reg)}
- get the debug register reg
-
-\section{ update\_descriptor(unsigned long ma, unsigned long word1, unsigned long word2)}
-
-\section{ set\_fast\_trap(int idx)}
- install traps to allow guest OS to bypass hypervisor
-
-\section{ dom\_mem\_op(unsigned int op, unsigned long *extent\_list, unsigned long nr\_extents, unsigned int extent\_order)}
-Increase or decrease memory reservations for guest OS
-
-\section{ multicall(void *call\_list, int nr\_calls)}
-Execute a series of hypervisor calls
-
-\section{ update\_va\_mapping(unsigned long page\_nr, unsigned long val, unsigned long flags)}
-
-\section{ set\_timer\_op(uint64\_t timeout)}
-Request a timer event to be sent at the specified system time.
-
-\section{ event\_channel\_op(void *op)}
-Iinter-domain event-channel management.
-
-\section{ xen\_version(int cmd)}
-Request Xen version number.
-
-\section{ console\_io(int cmd, int count, char *str)}
-Interact with the console, operations are:
-
-{\it CONSOLEIO\_write}: Output count characters from buffer str.
-
-{\it CONSOLEIO\_read}: Input at most count characters into buffer str.
-
-\section{ physdev\_op(void *physdev\_op)}
-
-\section{ grant\_table\_op(unsigned int cmd, void *uop, unsigned int count)}
-
-\section{ vm\_assist(unsigned int cmd, unsigned int type)}
-
-\section{ update\_va\_mapping\_otherdomain(unsigned long page\_nr, unsigned long val, unsigned long flags, uint16\_t domid)}
-
-\end{document}