diff options
author | kaf24@freefall.cl.cam.ac.uk <kaf24@freefall.cl.cam.ac.uk> | 2004-10-26 14:12:48 +0000 |
---|---|---|
committer | kaf24@freefall.cl.cam.ac.uk <kaf24@freefall.cl.cam.ac.uk> | 2004-10-26 14:12:48 +0000 |
commit | cadf3eafb224880e277913a011a7e358517551a6 (patch) | |
tree | 4479b17ef7e947d4eaea752adaeaff6cdf27c10e /docs/interface.tex | |
parent | f1816bd031a1dd3e7927d88b3345b4fbb139ea11 (diff) | |
download | xen-cadf3eafb224880e277913a011a7e358517551a6.tar.gz xen-cadf3eafb224880e277913a011a7e358517551a6.tar.bz2 xen-cadf3eafb224880e277913a011a7e358517551a6.zip |
bitkeeper revision 1.1159.1.282 (417e5b60oXUOPTWl81d5zD6oq1tOTQ)
Clean up docs dir layout.
Diffstat (limited to 'docs/interface.tex')
-rw-r--r-- | docs/interface.tex | 901 |
1 files changed, 0 insertions, 901 deletions
diff --git a/docs/interface.tex b/docs/interface.tex deleted file mode 100644 index 93b6143d39..0000000000 --- a/docs/interface.tex +++ /dev/null @@ -1,901 +0,0 @@ -\documentclass[11pt,twoside,final,openright]{xenstyle} -\usepackage{a4,graphicx,setspace} -\setstretch{1.15} -\input{style.tex} - -\begin{document} - -% TITLE PAGE -\pagestyle{empty} -\begin{center} -\vspace*{\fill} -\includegraphics{eps/xenlogo.eps} -\vfill -\vfill -\vfill -\begin{tabular}{l} -{\Huge \bf Interface manual} \\[4mm] -{\huge Xen v2.0 for x86} \\[80mm] - -{\Large Xen is Copyright (c) 2004, The Xen Team} \\[3mm] -{\Large University of Cambridge, UK} \\[20mm] -{\large Last updated on 11th March, 2004} -\end{tabular} -\vfill -\end{center} -\cleardoublepage - -% TABLE OF CONTENTS -\pagestyle{plain} -\pagenumbering{roman} -{ \parskip 0pt plus 1pt - \tableofcontents } -\cleardoublepage - -% PREPARE FOR MAIN TEXT -\pagenumbering{arabic} -\raggedbottom -\widowpenalty=10000 -\clubpenalty=10000 -\parindent=0pt -\renewcommand{\topfraction}{.8} -\renewcommand{\bottomfraction}{.8} -\renewcommand{\textfraction}{.2} -\renewcommand{\floatpagefraction}{.8} -\setstretch{1.15} - -\chapter{Introduction} -Xen allows the hardware resouces of a machine to be virtualized and -dynamically partitioned such as to allow multiple different 'guest' -operating system images to be run simultaneously. - -Virtualizing the machine in this manner provides flexibility allowing -different users to choose their preferred operating system (Windows, -Linux, NetBSD, or a custom operating system). Furthermore, Xen provides -secure partitioning between these 'domains', and enables better resource -accounting and QoS isolation than can be achieved with a conventional -operating system. - -The hypervisor runs directly on server hardware and dynamically partitions -it between a number of {\it domains}, each of which hosts an instance -of a {\it guest operating system}. The hypervisor provides just enough -abstraction of the machine to allow effective isolation and resource -management between these domains. - -Xen essentially takes a virtual machine approach as pioneered by IBM -VM/370. However, unlike VM/370 or more recent efforts such as VMWare -and Virtual PC, Xen doesn not attempt to completely virtualize the -underlying hardware. Instead parts of the hosted guest operating -systems are modified to work with the hypervisor; the operating system -is effectively ported to a new target architecture, typically -requiring changes in just the machine-dependent code. The user-level -API is unchanged, thus existing binaries and operating system -distributions can work unmodified. - -In addition to exporting virtualized instances of CPU, memory, network and -block devicees, Xen exposes a control interface to set how these resources -are shared between the running domains. The control interface is privileged -and may only be accessed by one particular virtual machine: {\it domain0}. -This domain is a required part of any Xen-base server and runs the application -software that manages the control-plane aspects of the platform. Running the -control software in {\it domain0}, distinct from the hypervisor itself, allows -the Xen framework to separate the notions of {\it mechanism} and {\it policy} -within the system. - - -\chapter{CPU state} - -All privileged state must be handled by Xen. The guest OS has no -direct access to CR3 and is not permitted to update privileged bits in -EFLAGS. - -\chapter{Exceptions} -The IDT is virtualised by submitting a virtual 'trap -table' to Xen. Most trap handlers are identical to native x86 -handlers. The page-fault handler is a noteable exception. - -\chapter{Interrupts and events} -Interrupts are virtualized by mapping them to events, which are delivered -asynchronously to the target domain. A guest OS can map these events onto -its standard interrupt dispatch mechanisms, such as a simple vectoring -scheme. Each physical interrupt source controlled by the hypervisor, including -network devices, disks, or the timer subsystem, is responsible for identifying -the target for an incoming interrupt and sending an event to that domain. - -This demultiplexing mechanism also provides a device-specific mechanism for -event coalescing or hold-off. For example, a guest OS may request to only -actually receive an event after {\it n} packets are queued ready for delivery -to it, {\it t} nanoseconds after the first packet arrived (which ever is true -first). This allows latency and throughput requirements to be addressed on a -domain-specific basis. - -\chapter{Time} -Guest operating systems need to be aware of the passage of real time and their -own ``virtual time'', i.e. the time they have been executing. Furthermore, a -notion of time is required in the hypervisor itself for scheduling and the -activities that relate to it. To this end the hypervisor provides for notions -of time: cycle counter time, system time, wall clock time, domain virtual -time. - - -\section{Cycle counter time} -This provides the finest-grained, free-running time reference, with the -approximate frequency being publicly accessible. The cycle counter time is -used to accurately extrapolate the other time references. On SMP machines -it is currently assumed that the cycle counter time is synchronised between -CPUs. The current x86-based implementation achieves this within inter-CPU -communication latencies. - -\section{System time} -This is a 64-bit value containing the nanoseconds elapsed since boot -time. Unlike cycle counter time, system time accurately reflects the -passage of real time, i.e. it is adjusted several times a second for timer -drift. This is done by running an NTP client in {\it domain0} on behalf of -the machine, feeding updates to the hypervisor. Intermediate values can be -extrapolated using the cycle counter. - -\section{Wall clock time} -This is the actual ``time of day'' Unix style struct timeval (i.e. seconds and -microseconds since 1 January 1970, adjusted by leap seconds etc.). Again, an -NTP client hosted by {\it domain0} can help maintain this value. To guest -operating systems this value will be reported instead of the hardware RTC -clock value and they can use the system time and cycle counter times to start -and remain perfectly in time. - - -\section{Domain virtual time} -This progresses at the same pace as cycle counter time, but only while a -domain is executing. It stops while a domain is de-scheduled. Therefore the -share of the CPU that a domain receives is indicated by the rate at which -its domain virtual time increases, relative to the rate at which cycle -counter time does so. - -\section{Time interface} -Xen exports some timestamps to guest operating systems through their shared -info page. Timestamps are provided for system time and wall-clock time. Xen -also provides the cycle counter values at the time of the last update -allowing guests to calculate the current values. The cpu frequency and a -scaling factor are provided for guests to convert cycle counter values to -real time. Since all time stamps need to be updated and read -\emph{atomically} two version numbers are also stored in the shared info -page. - -Xen will ensure that the time stamps are updated frequently enough to avoid -an overflow of the cycle counter values. A guest can check if its notion of -time is up-to-date by comparing the version numbers. - -\section{Timer events} - -Xen maintains a periodic timer (currently with a 10ms period) which sends a -timer event to the currently executing domain. This allows Guest OSes to -keep track of the passing of time when executing. The scheduler also -arranges for a newly activated domain to receive a timer event when -scheduled so that the Guest OS can adjust to the passage of time while it -has been inactive. - -In addition, Xen exports a hypercall interface to each domain which allows -them to request a timer event sent to them at the specified system -time. Guest OSes may use this timer to implement timeout values when they -block. - -\chapter{Memory} - -The hypervisor is responsible for providing memory to each of the -domains running over it. However, the Xen hypervisor's duty is -restricted to managing physical memory and to policying page table -updates. All other memory management functions are handled -externally. Start-of-day issues such as building initial page tables -for a domain, loading its kernel image and so on are done by the {\it -domain builder} running in user-space in {\it domain0}. Paging to -disk and swapping is handled by the guest operating systems -themselves, if they need it. - -On a Xen-based system, the hypervisor itself runs in {\it ring 0}. It -has full access to the physical memory available in the system and is -responsible for allocating portions of it to the domains. Guest -operating systems run in and use {\it rings 1}, {\it 2} and {\it 3} as -they see fit, aside from the fact that segmentation is used to prevent -the guest OS from accessing a portion of the linear address space that -is reserved for use by the hypervisor. This approach allows -transitions between the guest OS and hypervisor without flushing the -TLB. We expect most guest operating systems will use ring 1 for their -own operation and place applications (if they support such a notion) -in ring 3. - -\section{Physical Memory Allocation} -The hypervisor reserves a small fixed portion of physical memory at -system boot time. This special memory region is located at the -beginning of physical memory and is mapped at the very top of every -virtual address space. - -Any physical memory that is not used directly by the hypervisor is divided into -pages and is available for allocation to domains. The hypervisor tracks which -pages are free and which pages have been allocated to each domain. When a new -domain is initialized, the hypervisor allocates it pages drawn from the free -list. The amount of memory required by the domain is passed to the hypervisor -as one of the parameters for new domain initialization by the domain builder. - -Domains can never be allocated further memory beyond that which was -requested for them on initialization. However, a domain can return -pages to the hypervisor if it discovers that its memory requirements -have diminished. - -% put reasons for why pages might be returned here. -\section{Page Table Updates} -In addition to managing physical memory allocation, the hypervisor is also in -charge of performing page table updates on behalf of the domains. This is -neccessary to prevent domains from adding arbitrary mappings to their page -tables or introducing mappings to other's page tables. - -\section{Writabel Page Tables} -A domain can also request write access to its page tables. In this -mode, Xen notes write attempts to page table pages and makes the page -temporarily writable. In-use page table pages are also disconnect -from the page directory. The domain can now update entries in these -page table pages without the assistance of Xen. As soon as the -writabel page table pages get used as page table pages, Xen makes the -pages read-only again and revalidates the entries in the pages. - -\section{Segment Descriptor Tables} - -On boot a guest is supplied with a default GDT, which is {\em not} -taken from its own memory allocation. If the guest wishes to use other -than the default `flat' ring-1 and ring-3 segments that this default -table provides, it must register a custom GDT and/or LDT with Xen, -allocated from its own memory. - -int {\bf set\_gdt}(unsigned long *{\em frame\_list}, int {\em entries}) - -{\em frame\_list}: An array of up to 16 page frames within which the -GDT resides. Any frame registered as a GDT frame may only be mapped -read-only within the guest's address space (e.g., no writable -mappings, no use as a page-table page, and so on). - -{\em entries}: The number of descriptor-entry slots in the GDT. Note -that the table must be large enough to contain Xen's reserved entries; -thus we must have '{\em entries $>$ LAST\_RESERVED\_GDT\_ENTRY}'. -Note also that, after registering the GDT, slots {\em FIRST\_} through -{\em LAST\_RESERVED\_GDT\_ENTRY} are no longer usable by the guest and -may be overwritten by Xen. - -\section{Pseudo-Physical Memory} -The usual problem of external fragmentation means that a domain is -unlikely to receive a contiguous stretch of physical memory. However, -most guest operating systems do not have built-in support for -operating in a fragmented physical address space e.g. Linux has to -have a one-to-one mapping for its physical memory. There a notion of -{\it pseudo physical memory} is introdouced. Xen maintains a {\it -real physical} to {\it pseudo physical} mapping which can be consulted -by every domain. Additionally, at its start of day, a domain is -supplied a {\it pseudo physical} to {\it real physical} mapping which -it needs to keep updated itself. From that moment onwards {\it pseudo -physical} addresses are used instead of discontiguous {\it real -physical} addresses. Thus, the rest of the guest OS code has an -impression of operating in a contiguous address space. Guest OS page -tables contain real physical addresses. Mapping {\it pseudo physical} -to {\it real physical} addresses is needed on page table updates and -also on remapping memory regions with the guest OS. - - - -\chapter{Network I/O} - -Virtual network device services are provided by shared memory -communications with a `backend' domain. From the point of view of -other domains, the backend may be viewed as a virtual ethernet switch -element with each domain having one or more virtual network interfaces -connected to it. - -\section{Backend Packet Handling} -The backend driver is responsible primarily for {\it data-path} operations. -In terms of networking this means packet transmission and reception. - -On the transmission side, the backend needs to perform two key actions: -\begin{itemize} -\item {\tt Validation:} A domain may only be allowed to emit packets -matching a certain specification; for example, ones in which the -source IP address matches one assigned to the virtual interface over -which it is sent. The backend would be responsible for ensuring any -such requirements are met, either by checking or by stamping outgoing -packets with prescribed values for certain fields. - -Validation functions can be configured using standard firewall rules -(i.e. IP Tables, in the case of Linux). - -\item {\tt Scheduling:} Since a number of domains can share a single -``real'' network interface, the hypervisor must mediate access when -several domains each have packets queued for transmission. Of course, -this general scheduling function subsumes basic shaping or -rate-limiting schemes. - -\item {\tt Logging and Accounting:} The hypervisor can be configured -with classifier rules that control how packets are accounted or -logged. For example, {\it domain0} could request that it receives a -log message or copy of the packet whenever another domain attempts to -send a TCP packet containg a SYN. -\end{itemize} - -On the recive side, the backend's role is relatively straightforward: -once a packet is received, it just needs to determine the virtual interface(s) -to which it must be delivered and deliver it via page-flipping. - - -\section{Data Transfer} - -Each virtual interface uses two ``descriptor rings'', one for transmit, -the other for receive. Each descriptor identifies a block of contiguous -physical memory allocated to the domain. There are four cases: - -\begin{itemize} - -\item The transmit ring carries packets to transmit from the domain to the -hypervisor. - -\item The return path of the transmit ring carries ``empty'' descriptors -indicating that the contents have been transmitted and the memory can be -re-used. - -\item The receive ring carries empty descriptors from the domain to the -hypervisor; these provide storage space for that domain's received packets. - -\item The return path of the receive ring carries packets that have been -received. -\end{itemize} - -Real physical addresses are used throughout, with the domain performing -translation from pseudo-physical addresses if that is necessary. - -If a domain does not keep its receive ring stocked with empty buffers then -packets destined to it may be dropped. This provides some defense against -receiver-livelock problems because an overload domain will cease to receive -further data. Similarly, on the transmit path, it provides the application -with feedback on the rate at which packets are able to leave the system. - -Synchronization between the hypervisor and the domain is achieved using -counters held in shared memory that is accessible to both. Each ring has -associated producer and consumer indices indicating the area in the ring -that holds descriptors that contain data. After receiving {\it n} packets -or {\t nanoseconds} after receiving the first packet, the hypervisor sends -an event to the domain. - -\chapter{Block I/O} - -\section{Virtual Block Devices (VBDs)} - -All guest OS disk access goes through the VBD interface. The VBD -interface provides the administrator with the ability to selectively -grant domains access to portions of block storage devices visible to -the the block backend device (usually domain 0). - -VBDs can literally be backed by any block device accessible to the -backend domain, including network-based block devices (iSCSI, *NBD, -etc), loopback devices and LVM / MD devices. - -Old (Xen 1.2) virtual disks are not supported under Xen 2.0, since -similar functionality can be achieved using the (more advanced) LVM -system, which is already in widespread use. - -\subsection{Data Transfer} -Domains which have been granted access to a logical block device are permitted -to read and write it by shared memory communications with the backend domain. - -In overview, the same style of descriptor-ring that is used for -network packets is used here. Each domain has one ring that carries -operation requests to the hypervisor and carries the results back -again. - -Rather than copying data, the backend simply maps the domain's buffers -in order to enable direct DMA to them. The act of mapping the buffers -also increases the reference counts of the underlying pages, so that -the unprivileged domain cannot try to return them to the hypervisor, -install them as page tables, or any other unsafe behaviour. -%block API here - -\chapter{Privileged operations} -{\it Domain0} is responsible for building all other domains on the server -and providing control interfaces for managing scheduling, networking, and -blocks. - -\chapter{CPU Scheduler} - -Xen offers a uniform API for CPU schedulers. It is possible to choose -from a number of schedulers at boot and it should be easy to add more. - -\paragraph*{Note: SMP host support} -Xen has always supported SMP host systems. Domains are statically assigned to -CPUs, either at creation time or when manually pinning to a particular CPU. -The current schedulers then run locally on each CPU to decide which of the -assigned domains should be run there. - -\section{Standard Schedulers} - -These BVT, Atropos and Round Robin schedulers are part of the normal -Xen distribution. BVT provides proportional fair shares of the CPU to -the running domains. Atropos can be used to reserve absolute shares -of the CPU for each domain. Round-robin is provided as an example of -Xen's internal scheduler API. - -More information on the characteristics and use of these schedulers is -available in { \tt Sched-HOWTO.txt }. - -\section{Scheduling API} - -The scheduling API is used by both the schedulers described above and should -also be used by any new schedulers. It provides a generic interface and also -implements much of the ``boilerplate'' code. - -Schedulers conforming to this API are described by the following -structure: - -\begin{verbatim} -struct scheduler -{ - char *name; /* full name for this scheduler */ - char *opt_name; /* option name for this scheduler */ - unsigned int sched_id; /* ID for this scheduler */ - - int (*init_scheduler) (); - int (*alloc_task) (struct task_struct *); - void (*add_task) (struct task_struct *); - void (*free_task) (struct task_struct *); - void (*rem_task) (struct task_struct *); - void (*wake_up) (struct task_struct *); - void (*do_block) (struct task_struct *); - task_slice_t (*do_schedule) (s_time_t); - int (*control) (struct sched_ctl_cmd *); - int (*adjdom) (struct task_struct *, - struct sched_adjdom_cmd *); - s32 (*reschedule) (struct task_struct *); - void (*dump_settings) (void); - void (*dump_cpu_state) (int); - void (*dump_runq_el) (struct task_struct *); -}; -\end{verbatim} - -The only method that {\em must} be implemented is -{\tt do\_schedule()}. However, if there is not some implementation for the -{\tt wake\_up()} method then waking tasks will not get put on the runqueue! - -The fields of the above structure are described in more detail below. - -\subsubsection{name} - -The name field should point to a descriptive ASCII string. - -\subsubsection{opt\_name} - -This field is the value of the {\tt sched=} boot-time option that will select -this scheduler. - -\subsubsection{sched\_id} - -This is an integer that uniquely identifies this scheduler. There should be a -macro corrsponding to this scheduler ID in {\tt <hypervisor-ifs/sched-if.h>}. - -\subsubsection{init\_scheduler} - -\paragraph*{Purpose} - -This is a function for performing any scheduler-specific initialisation. For -instance, it might allocate memory for per-CPU scheduler data and initialise it -appropriately. - -\paragraph*{Call environment} - -This function is called after the initialisation performed by the generic -layer. The function is called exactly once, for the scheduler that has been -selected. - -\paragraph*{Return values} - -This should return negative on failure --- this will cause an -immediate panic and the system will fail to boot. - -\subsubsection{alloc\_task} - -\paragraph*{Purpose} -Called when a {\tt task\_struct} is allocated by the generic scheduler -layer. A particular scheduler implementation may use this method to -allocate per-task data for this task. It may use the {\tt -sched\_priv} pointer in the {\tt task\_struct} to point to this data. - -\paragraph*{Call environment} -The generic layer guarantees that the {\tt sched\_priv} field will -remain intact from the time this method is called until the task is -deallocated (so long as the scheduler implementation does not change -it explicitly!). - -\paragraph*{Return values} -Negative on failure. - -\subsubsection{add\_task} - -\paragraph*{Purpose} - -Called when a task is initially added by the generic layer. - -\paragraph*{Call environment} - -The fields in the {\tt task\_struct} are now filled out and available for use. -Schedulers should implement appropriate initialisation of any per-task private -information in this method. - -\subsubsection{free\_task} - -\paragraph*{Purpose} - -Schedulers should free the space used by any associated private data -structures. - -\paragraph*{Call environment} - -This is called when a {\tt task\_struct} is about to be deallocated. -The generic layer will have done generic task removal operations and -(if implemented) called the scheduler's {\tt rem\_task} method before -this method is called. - -\subsubsection{rem\_task} - -\paragraph*{Purpose} - -This is called when a task is being removed from scheduling (but is -not yet being freed). - -\subsubsection{wake\_up} - -\paragraph*{Purpose} - -Called when a task is woken up, this method should put the task on the runqueue -(or do the scheduler-specific equivalent action). - -\paragraph*{Call environment} - -The task is already set to state RUNNING. - -\subsubsection{do\_block} - -\paragraph*{Purpose} - -This function is called when a task is blocked. This function should -not remove the task from the runqueue. - -\paragraph*{Call environment} - -The EVENTS\_MASTER\_ENABLE\_BIT is already set and the task state changed to -TASK\_INTERRUPTIBLE on entry to this method. A call to the {\tt - do\_schedule} method will be made after this method returns, in -order to select the next task to run. - -\subsubsection{do\_schedule} - -This method must be implemented. - -\paragraph*{Purpose} - -The method is called each time a new task must be chosen for scheduling on the -current CPU. The current time as passed as the single argument (the current -task can be found using the {\tt current} macro). - -This method should select the next task to run on this CPU and set it's minimum -time to run as well as returning the data described below. - -This method should also take the appropriate action if the previous -task has blocked, e.g. removing it from the runqueue. - -\paragraph*{Call environment} - -The other fields in the {\tt task\_struct} are updated by the generic layer, -which also performs all Xen-specific tasks and performs the actual task switch -(unless the previous task has been chosen again). - -This method is called with the {\tt schedule\_lock} held for the current CPU -and local interrupts disabled. - -\paragraph*{Return values} - -Must return a {\tt struct task\_slice} describing what task to run and how long -for (at maximum). - -\subsubsection{control} - -\paragraph*{Purpose} - -This method is called for global scheduler control operations. It takes a -pointer to a {\tt struct sched\_ctl\_cmd}, which it should either -source data from or populate with data, depending on the value of the -{\tt direction} field. - -\paragraph*{Call environment} - -The generic layer guarantees that when this method is called, the -caller selected the correct scheduler ID, hence the scheduler's -implementation does not need to sanity-check these parts of the call. - -\paragraph*{Return values} - -This function should return the value to be passed back to user space, hence it -should either be 0 or an appropriate errno value. - -\subsubsection{sched\_adjdom} - -\paragraph*{Purpose} - -This method is called to adjust the scheduling parameters of a particular -domain, or to query their current values. The function should check -the {\tt direction} field of the {\tt sched\_adjdom\_cmd} it receives in -order to determine which of these operations is being performed. - -\paragraph*{Call environment} - -The generic layer guarantees that the caller has specified the correct -control interface version and scheduler ID and that the supplied {\tt -task\_struct} will not be deallocated during the call (hence it is not -necessary to {\tt get\_task\_struct}). - -\paragraph*{Return values} - -This function should return the value to be passed back to user space, hence it -should either be 0 or an appropriate errno value. - -\subsubsection{reschedule} - -\paragraph*{Purpose} - -This method is called to determine if a reschedule is required as a result of a -particular task. - -\paragraph*{Call environment} -The generic layer will cause a reschedule if the current domain is the idle -task or it has exceeded its minimum time slice before a reschedule. The -generic layer guarantees that the task passed is not currently running but is -on the runqueue. - -\paragraph*{Return values} - -Should return a mask of CPUs to cause a reschedule on. - -\subsubsection{dump\_settings} - -\paragraph*{Purpose} - -If implemented, this should dump any private global settings for this -scheduler to the console. - -\paragraph*{Call environment} - -This function is called with interrupts enabled. - -\subsubsection{dump\_cpu\_state} - -\paragraph*{Purpose} - -This method should dump any private settings for the specified CPU. - -\paragraph*{Call environment} - -This function is called with interrupts disabled and the {\tt schedule\_lock} -for the specified CPU held. - -\subsubsection{dump\_runq\_el} - -\paragraph*{Purpose} - -This method should dump any private settings for the specified task. - -\paragraph*{Call environment} - -This function is called with interrupts disabled and the {\tt schedule\_lock} -for the task's CPU held. - - -\chapter{Debugging} - -Xen provides tools for debugging both Xen and guest OSes. Currently, the -Pervasive Debugger provides a GDB stub, which provides facilities for symbolic -debugging of Xen itself and of OS kernels running on top of Xen. The Trace -Buffer provides a lightweight means to log data about Xen's internal state and -behaviour at runtime, for later analysis. - -\section{Pervasive Debugger} - -Information on using the pervasive debugger is available in pdb.txt. - - -\section{Trace Buffer} - -The trace buffer provides a means to observe Xen's operation from domain 0. -Trace events, inserted at key points in Xen's code, record data that can be -read by the {\tt xentrace} tool. Recording these events has a low overhead -and hence the trace buffer may be useful for debugging timing-sensitive -behaviours. - -\subsection{Internal API} - -To use the trace buffer functionality from within Xen, you must {\tt \#include -<xen/trace.h>}, which contains definitions related to the trace buffer. Trace -events are inserted into the buffer using the {\tt TRACE\_xD} ({\tt x} = 0, 1, -2, 3, 4 or 5) macros. These all take an event number, plus {\tt x} additional -(32-bit) data as their arguments. For trace buffer-enabled builds of Xen these -will insert the event ID and data into the trace buffer, along with the current -value of the CPU cycle-counter. For builds without the trace buffer enabled, -the macros expand to no-ops and thus can be left in place without incurring -overheads. - -\subsection{Trace-enabled builds} - -By default, the trace buffer is enabled only in debug builds (i.e. {\tt NDEBUG} -is not defined). It can be enabled separately by defining {\tt TRACE\_BUFFER}, -either in {\tt <xen/config.h>} or on the gcc command line. - -The size (in pages) of the per-CPU trace buffers can be specified using the -{\tt tbuf\_size=n } boot parameter to Xen. If the size is set to 0, the trace -buffers will be disabled. - -\subsection{Dumping trace data} - -When running a trace buffer build of Xen, trace data are written continuously -into the buffer data areas, with newer data overwriting older data. This data -can be captured using the {\tt xentrace} program in Domain 0. - -The {\tt xentrace} tool uses {\tt /dev/mem} in domain 0 to map the trace -buffers into its address space. It then periodically polls all the buffers for -new data, dumping out any new records from each buffer in turn. As a result, -for machines with multiple (logical) CPUs, the trace buffer output will not be -in overall chronological order. - -The output from {\tt xentrace} can be post-processed using {\tt -xentrace\_cpusplit} (used to split trace data out into per-cpu log files) and -{\tt xentrace\_format} (used to pretty-print trace data). For the predefined -trace points, there is an example format file in {\tt tools/xentrace/formats }. - -For more information, see the manual pages for {\tt xentrace}, {\tt -xentrace\_format} and {\tt xentrace\_cpusplit}. - - -\chapter{Hypervisor calls} - -\section{ set\_trap\_table(trap\_info\_t *table)} - -Install trap handler table. - -\section{ mmu\_update(mmu\_update\_t *req, int count, int *success\_count)} -Update the page table for the domain. Updates can be batched. -success\_count will be updated to report the number of successfull -updates. The update types are: - -{\it MMU\_NORMAL\_PT\_UPDATE}: - -{\it MMU\_MACHPHYS\_UPDATE}: - -{\it MMU\_EXTENDED\_COMMAND}: - -\section{ set\_gdt(unsigned long *frame\_list, int entries)} -Set the global descriptor table - virtualization for lgdt. - -\section{ stack\_switch(unsigned long ss, unsigned long esp)} -Request context switch from hypervisor. - -\section{ set\_callbacks(unsigned long event\_selector, unsigned long event\_address, - unsigned long failsafe\_selector, unsigned - long failsafe\_address) } Register OS event processing routine. In - Linux both the event\_selector and failsafe\_selector are the - kernel's CS. The value event\_address specifies the address for an - interrupt handler dispatch routine and failsafe\_address specifies a - handler for application faults. - -\section{ fpu\_taskswitch(void)} -Notify hypervisor that fpu registers needed to be save on context switch. - -\section{ sched\_op(unsigned long op)} -Request scheduling operation from hypervisor. The options are: {\it -yield}, {\it block}, and {\it shutdown}. {\it yield} keeps the -calling domain run-able but may cause a reschedule if other domains -are run-able. {\it block} removes the calling domain from the run -queue and the domains sleeps until an event is delivered to it. {\it -shutdown} is used to end the domain's execution and allows to specify -whether the domain should reboot, halt or suspend.. - -\section{ dom0\_op(dom0\_op\_t *op)} -Administrative domain operations for domain management. The options are: - -{\it DOM0\_CREATEDOMAIN}: create new domain, specifying the name and memory usage -in kilobytes. - -{\it DOM0\_CREATEDOMAIN}: create domain - -{\it DOM0\_PAUSEDOMAIN}: mark domain as unschedulable - -{\it DOM0\_UNPAUSEDOMAIN}: mark domain as schedulable - -{\it DOM0\_DESTROYDOMAIN}: deallocate resources associated with the domain - -{\it DOM0\_GETMEMLIST}: get list of pages used by the domain - -{\it DOM0\_SCHEDCTL}: - -{\it DOM0\_ADJUSTDOM}: adjust scheduling priorities for domain - -{\it DOM0\_BUILDDOMAIN}: do final guest OS setup for domain - -{\it DOM0\_GETDOMAINFO}: get statistics about the domain - -{\it DOM0\_GETPAGEFRAMEINFO}: - -{\it DOM0\_IOPL}: set IO privilege level - -{\it DOM0\_MSR}: - -{\it DOM0\_DEBUG}: interactively call pervasive debugger - -{\it DOM0\_SETTIME}: set system time - -{\it DOM0\_READCONSOLE}: read console content from hypervisor buffer ring - -{\it DOM0\_PINCPUDOMAIN}: pin domain to a particular CPU - -{\it DOM0\_GETTBUFS}: get information about the size and location of - the trace buffers (only on trace-buffer enabled builds) - -{\it DOM0\_PHYSINFO}: get information about the host machine - -{\it DOM0\_PCIDEV\_ACCESS}: modify PCI device access permissions - -{\it DOM0\_SCHED\_ID}: get the ID of the current Xen scheduler - -{\it DOM0\_SHADOW\_CONTROL}: - -{\it DOM0\_SETDOMAINNAME}: set the name of a domain - -{\it DOM0\_SETDOMAININITIALMEM}: set initial memory allocation of a domain - -{\it DOM0\_SETDOMAINMAXMEM}: set maximum memory allocation of a domain - -{\it DOM0\_GETPAGEFRAMEINFO2}: - -{\it DOM0\_SETDOMAINVMASSIST}: set domain VM assist options - - -\section{ set\_debugreg(int reg, unsigned long value)} -set debug register reg to value - -\section{ get\_debugreg(int reg)} - get the debug register reg - -\section{ update\_descriptor(unsigned long ma, unsigned long word1, unsigned long word2)} - -\section{ set\_fast\_trap(int idx)} - install traps to allow guest OS to bypass hypervisor - -\section{ dom\_mem\_op(unsigned int op, unsigned long *extent\_list, unsigned long nr\_extents, unsigned int extent\_order)} -Increase or decrease memory reservations for guest OS - -\section{ multicall(void *call\_list, int nr\_calls)} -Execute a series of hypervisor calls - -\section{ update\_va\_mapping(unsigned long page\_nr, unsigned long val, unsigned long flags)} - -\section{ set\_timer\_op(uint64\_t timeout)} -Request a timer event to be sent at the specified system time. - -\section{ event\_channel\_op(void *op)} -Iinter-domain event-channel management. - -\section{ xen\_version(int cmd)} -Request Xen version number. - -\section{ console\_io(int cmd, int count, char *str)} -Interact with the console, operations are: - -{\it CONSOLEIO\_write}: Output count characters from buffer str. - -{\it CONSOLEIO\_read}: Input at most count characters into buffer str. - -\section{ physdev\_op(void *physdev\_op)} - -\section{ grant\_table\_op(unsigned int cmd, void *uop, unsigned int count)} - -\section{ vm\_assist(unsigned int cmd, unsigned int type)} - -\section{ update\_va\_mapping\_otherdomain(unsigned long page\_nr, unsigned long val, unsigned long flags, uint16\_t domid)} - -\end{document} |