diff options
author | mwilli2@equilibrium.research.intel-research.net <mwilli2@equilibrium.research.intel-research.net> | 2004-03-17 18:31:06 +0000 |
---|---|---|
committer | mwilli2@equilibrium.research.intel-research.net <mwilli2@equilibrium.research.intel-research.net> | 2004-03-17 18:31:06 +0000 |
commit | bee5b0bb130f42dabd8cbdcd035d8f737e725dbc (patch) | |
tree | 99ac0cc05ceea17ead1d618190f88dfa33ea7f86 /docs/interface.tex | |
parent | 8306baac6f817aea60eb6e7acfac96cbb007ed5a (diff) | |
download | xen-bee5b0bb130f42dabd8cbdcd035d8f737e725dbc.tar.gz xen-bee5b0bb130f42dabd8cbdcd035d8f737e725dbc.tar.bz2 xen-bee5b0bb130f42dabd8cbdcd035d8f737e725dbc.zip |
bitkeeper revision 1.808 (4058996anVCLQRr3o_Adf9GqJybYSg)
Various updates related to the new generic scheduler API.
The BVT scheduler has been ported to this API and a simple Round Robin
scheduler has been added. There's a new generic control interface for
setting scheduling parameters from userspace.
Use the sched=xxx option at boot time to choose the scheduler. Default
is BVT. The possibilities are "bvt" and "rrobin".
Diffstat (limited to 'docs/interface.tex')
-rw-r--r-- | docs/interface.tex | 303 |
1 files changed, 302 insertions, 1 deletions
diff --git a/docs/interface.tex b/docs/interface.tex index 2736a0412d..84003de1b6 100644 --- a/docs/interface.tex +++ b/docs/interface.tex @@ -353,7 +353,7 @@ create ``virtual disks'' on demand. \subsection{Virtual Disk Management} The VD management code consists of a set of python libraries. It can therefore be accessed by custom scripts as well as the convenience scripts provided. The -VD database is a SQLite database in /var/db/xen\_vdisk.sqlite. +VD database is a SQLite database in /var/db/xen\_vdisks.sqlite. The VD scripts and general VD usage are documented in the VBD-HOWTO.txt. @@ -379,6 +379,307 @@ giving the page back to the hypervisor, or to use them for storing page tables. and providing control interfaces for managing scheduling, networking, and blocks. +\chapter{CPU Scheduler} + +Xen offers a uniform API for CPU schedulers. It is possible to choose +from a number of schedulers at boot and it should be easy to add more. + +\paragraph*{Note: SMP host support} +Xen has always supported SMP host systems. Domains are statically assigned to +CPUs, either at creation time or when manually pinning to a particular CPU. +The current schedulers then run locally on each CPU to decide which of the +assigned domains should be run there. + +\section{Standard Schedulers} + +These BVT and Round Robin schedulers are part of the normal Xen +distribution. A port of the Atropos scheduler from the Nemesis +operating system is almost complete and will be added shortly. + +\subsection{Borrowed Virtual Time (BVT)} + +This was the original Xen scheduler. BVT is designed for general-purpose +environments but also provides support for latency-sensitive threads. It +provides long-term weighted sharing but allows tasks a limited ability to +``warp back'' in virtual time so that they are dispatched earlier. + +BVT can be activated by specifying {\tt sched=bvt} as a boot argument to Xen. + +\subsection{Round Robin} + +The round robin scheduler is a very simple example of some of the basic parts +of the scheduler API. + +Round robin can be activated by specifying {\tt sched=rrobin} as a boot +argument to Xen. + +\section{Scheduling API} + +The scheduling API is used by both the schedulers described above and should +also be used by any new schedulers. It provides a generic interface and also +implements much of the ``boilerplate'' code. + +\paragraph*{Note:} the scheduler API is currently undergoing active development, +so there may be some changes to this API, although they are expected to be small. + +Schedulers conforming to this API are described by the following +structure: + +\begin{verbatim} +struct scheduler +{ + char *name; /* full name for this scheduler */ + char *opt_name; /* option name for this scheduler */ + unsigned int sched_id; /* ID for this scheduler */ + + int (*init_scheduler) (); + int (*alloc_task) (struct task_struct *); + void (*add_task) (struct task_struct *); + void (*free_task) (struct task_struct *); + void (*rem_task) (struct task_struct *); + void (*wake_up) (struct task_struct *); + long (*do_block) (struct task_struct *); + task_slice_t (*do_schedule) (s_time_t); + int (*control) (struct sched_ctl_cmd *); + int (*adjdom) (struct task_struct *, + struct sched_adjdom_cmd *); + s32 (*reschedule) (struct task_struct *); + void (*dump_settings) (void); + void (*dump_cpu_state) (int); + void (*dump_runq_el) (struct task_struct *); +}; +\end{verbatim} + +The only method that {\em must} be implemented is +{\tt do\_schedule()}. However, if there is not some implementation for the +{\tt wake\_up()} method then waking tasks will not get put on the runqueue! + +The fields of the above structure are described in more detail below. + +\subsubsection{name} + +The name field is an arbitrary descriptive ASCII string. + +\subsubsection{opt\_name} + +This field is the value of the {\tt sched=} boot-time option that will select +this scheduler. + +\subsubsection{sched\_id} + +This is an integer that uniquely identifies this scheduler. There should be a +macro corrsponding to this scheduler ID in {\tt <hypervisor-ifs/sched-if.h>}. + +\subsubsection{init\_scheduler} + +\paragraph*{Purpose} + +This is a function for performing any scheduler-specific initialisation. For +instance, it might allocate memory for per-CPU scheduler data and initialise it +appropriately. + +\paragraph*{Call environment} + +This function is called after the initialisation performed by the generic +layer. The function is called exactly once, for the scheduler that has been +selected. + +\paragraph*{Return values} + +This should return negative on failure --- failure to initialise the scheduler +will cause an immediate panic. + +\subsubsection{alloc\_task} + +\paragraph*{Purpose} +This is called when a {\tt task\_struct} is allocated by the generic scheduler +layer. A particular scheduler implementation may use this method to allocate +per-task data for this task. It may use the {\tt sched\_priv} pointer in the +{\tt task\_struct} to point to this data. + +\paragraph*{Call environment} +The generic layer guarantees that the {\tt sched\_priv} field will +remain intact from the time this method is called until the task is +deallocated (so long as the scheduler implementation does not change +it!). + +\paragraph*{Return values} +Negative on failure. + +\subsubsection{add\_task} + +\paragraph*{Purpose} + +Called when a task is initially added by the generic layer. + +\paragraph*{Call environment} + +The fields in the {\tt task\_struct} are now filled out and available for use. +Schedulers should implement appropriate initialisation of any per-task private +information in this method. + +\subsubsection{free\_task} + +\paragraph*{Purpose} + +Schedulers should free the space used by any associated private data +structures. + +\paragraph*{Call environment} + +This is called when a {\tt task\_struct} is about to be deallocated. +The generic layer will have done generic task removal operations and +(if implemented) called the scheduler's {\tt rem\_task} method before +this method is called. + +\subsubsection{rem\_task} + +\paragraph*{Purpose} + +This is called when a task is being removed from scheduling. + +\subsubsection{wake\_up} + +\paragraph*{Purpose} + +Called when a task is woken up, this method should put the task on the runqueue +(or do the scheduler-specific equivalent action). + +\paragraph*{Call environment} + +The generic layer guarantees that the task is already in state +RUNNING. + +\subsubsection{do\_block} + +\paragraph*{Purpose} + +This function is called when a task is blocked. This function should +not remove the task from the runqueue. + +\paragraph*{Call environment} + +The EVENTS\_MASTER\_ENABLE\_BIT is already set and the task state changed to +TASK\_INTERRUPTIBLE on entry to this method. + +\subsubsection{do\_schedule} + +This method must be implemented. + +\paragraph*{Purpose} + +The method is called each time a new task must be chosen for scheduling on the +current CPU. The current time as passed as the single argument (the current +task can be found using the {\tt current} variable). + +This method should select the next task to run on this CPU and set it's minimum +time to run as well as returning the data described below. + +This method should also take the appropriate action if the previous +task has blocked, e.g. removing it from the runqueue. + +\paragraph*{Call environment} + +The other fields in the {\tt task\_struct} are updated by the generic layer, +which also performs all Xen-specific tasks and performs the actual task switch +(unless the previous task has been chosen again). + +This method is called with the {\tt schedule\_lock} held for the current CPU +and with interrupts disabled. + +\paragraph*{Return values} + +Must return a {\tt struct task\_slice} describing what task to run and how long +for (at maximum). + +\subsubsection{control} + +\paragraph*{Purpose} + +This method is called for global scheduler control operations. It takes a +pointer to a {\tt struct sched\_ctl\_cmd}, from which it should select the +appropriate command data. + +\paragraph*{Call environment} + +The generic layer guarantees that when this method is called, the caller was +using the same control interface version and that the caller selected the +correct scheduler ID, hence the scheduler's implementation does not need to +sanity-check these parts of the call. + +\paragraph*{Return values} + +This function should return the value to be passed back to user space, hence it +should either be 0 or an appropriate errno value. + +\subsubsection{sched\_adjdom} + +\paragraph*{Purpose} + +This method is called to adjust the scheduling parameters of a particular +domain. + +\paragraph*{Call environment} + +The generic layer guarantees that the caller has specified the correct +control interface version and scheduler ID and that the supplied {\tt +task\_struct} will not be deallocated during the call (hence it is not +necessary to {\tt get\_task\_struct}). + +\paragraph*{Return values} + +This function should return the value to be passed back to user space, hence it +should either be 0 or an appropriate errno value. + +\subsubsection{reschedule} + +\paragraph*{Purpose} + +This method is called to determine if a reschedule is required as a result of a +particular task. + +\paragraph*{Call environment} +The generic layer will cause a reschedule if the current domain is the idle +task or it has exceeded its minimum time slice before a reschedule. The +generic layer guarantees that the task passed is not currently running but is +on the runqueue. + +\paragraph*{Return values} + +Should return a mask of CPUs to cause a reschedule on. + +\subsubsection{dump\_settings} + +\paragraph*{Purpose} + +If implemented, this should dump any private global settings for this +scheduler to the console. + +\paragraph*{Call environment} + +This function is called with interrupts enabled. + +\subsubsection{dump\_cpu\_state} + +\paragraph*{Purpose} + +This method should dump any private settings for the specified CPU. + +\paragraph*{Call environment} + +This function is called with interrupts disabled and the {\tt schedule\_lock} +for the specified CPU held. + +\subsubsection{dump\_runq\_el} + +\paragraph*{Purpose} + +This method should dump any private settings for the specified task. + +\paragraph*{Call environment} + +This function is called with interrupts disabled and the {\tt schedule\_lock} +for the task's CPU held. \chapter{Debugging} |