aboutsummaryrefslogtreecommitdiffstats
path: root/docs/src/interface/architecture.tex
blob: 19064b7fc5c35796a95a5793c7a67efa9231e162 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
\chapter{Virtual Architecture}

On a Xen-based system, the hypervisor itself runs in {\it ring 0}.  It
has full access to the physical memory available in the system and is
responsible for allocating portions of it to the domains.  Guest
operating systems run in and use {\it rings 1}, {\it 2} and {\it 3} as
they see fit. Segmentation is used to prevent the guest OS from
accessing the portion of the address space that is reserved for Xen.
We expect most guest operating systems will use ring 1 for their own
operation and place applications in ring 3.

In this chapter we consider the basic virtual architecture provided by
Xen: the basic CPU state, exception and interrupt handling, and time.
Other aspects such as memory and device access are discussed in later
chapters.


\section{CPU state}

All privileged state must be handled by Xen.  The guest OS has no
direct access to CR3 and is not permitted to update privileged bits in
EFLAGS. Guest OSes use \emph{hypercalls} to invoke operations in Xen;
these are analogous to system calls but occur from ring 1 to ring 0.

A list of all hypercalls is given in Appendix~\ref{a:hypercalls}.


\section{Exceptions}

A virtual IDT is provided --- a domain can submit a table of trap
handlers to Xen via the {\tt set\_trap\_table()} hypercall.  Most trap
handlers are identical to native x86 handlers, although the page-fault
handler is somewhat different.


\section{Interrupts and events}

Interrupts are virtualized by mapping them to \emph{events}, which are
delivered asynchronously to the target domain using a callback
supplied via the {\tt set\_callbacks()} hypercall.  A guest OS can map
these events onto its standard interrupt dispatch mechanisms.  Xen is
responsible for determining the target domain that will handle each
physical interrupt source. For more details on the binding of event
sources to events, see Chapter~\ref{c:devices}.


\section{Time}

Guest operating systems need to be aware of the passage of both real
(or wallclock) time and their own `virtual time' (the time for which
they have been executing). Furthermore, Xen has a notion of time which
is used for scheduling. The following notions of time are provided:

\begin{description}
\item[Cycle counter time.]

  This provides a fine-grained time reference.  The cycle counter time
  is used to accurately extrapolate the other time references.  On SMP
  machines it is currently assumed that the cycle counter time is
  synchronized between CPUs.  The current x86-based implementation
  achieves this within inter-CPU communication latencies.

\item[System time.]

  This is a 64-bit counter which holds the number of nanoseconds that
  have elapsed since system boot.

\item[Wall clock time.]

  This is the time of day in a Unix-style {\tt struct timeval}
  (seconds and microseconds since 1 January 1970, adjusted by leap
  seconds).  An NTP client hosted by {\it domain 0} can keep this
  value accurate.

\item[Domain virtual time.]

  This progresses at the same pace as system time, but only while a
  domain is executing --- it stops while a domain is de-scheduled.
  Therefore the share of the CPU that a domain receives is indicated
  by the rate at which its virtual time increases.

\end{description}


Xen exports timestamps for system time and wall-clock time to guest
operating systems through a shared page of memory.  Xen also provides
the cycle counter time at the instant the timestamps were calculated,
and the CPU frequency in Hertz.  This allows the guest to extrapolate
system and wall-clock times accurately based on the current cycle
counter time.

Since all time stamps need to be updated and read \emph{atomically}
two version numbers are also stored in the shared info page. The first
is incremented prior to an update, while the second is only
incremented afterwards. Thus a guest can be sure that it read a
consistent state by checking the two version numbers are equal.

Xen includes a periodic ticker which sends a timer event to the
currently executing domain every 10ms.  The Xen scheduler also sends a
timer event whenever a domain is scheduled; this allows the guest OS
to adjust for the time that has passed while it has been inactive.  In
addition, Xen allows each domain to request that they receive a timer
event sent at a specified system time by using the {\tt
  set\_timer\_op()} hypercall.  Guest OSes may use this timer to
implement timeout values when they block.



%% % akw: demoting this to a section -- not sure if there is any point
%% % though, maybe just remove it.

\section{Xen CPU Scheduling}

Xen offers a uniform API for CPU schedulers.  It is possible to choose
from a number of schedulers at boot and it should be easy to add more.
The BVT, Atropos and Round Robin schedulers are part of the normal Xen
distribution.  BVT provides proportional fair shares of the CPU to the
running domains.  Atropos can be used to reserve absolute shares of
the CPU for each domain.  Round-robin is provided as an example of
Xen's internal scheduler API.

\paragraph*{Note: SMP host support}
Xen has always supported SMP host systems.  Domains are statically
assigned to CPUs, either at creation time or when manually pinning to
a particular CPU.  The current schedulers then run locally on each CPU
to decide which of the assigned domains should be run there. The
user-level control software can be used to perform coarse-grain
load-balancing between CPUs.


%% More information on the characteristics and use of these schedulers
%% is available in {\tt Sched-HOWTO.txt}.


\section{Privileged operations}

Xen exports an extended interface to privileged domains (viz.\ {\it
  Domain 0}). This allows such domains to build and boot other domains
on the server, and provides control interfaces for managing
scheduling, memory, networking, and block devices.