bitkeeper revision 1.808 (4058996anVCLQRr3o_Adf9GqJybYSg)

Various updates related to the new generic scheduler API. The BVT scheduler has been ported to this API and a simple Round Robin scheduler has been added. There's a new generic control interface for setting scheduling parameters from userspace. Use the sched=xxx option at boot time to choose the scheduler. Default is BVT. The possibilities are "bvt" and "rrobin".
author: mwilli2@equilibrium.research.intel-research.net <mwilli2@equilibrium.research.intel-research.net> 2004-03-17 18:31:06 +0000
committer: mwilli2@equilibrium.research.intel-research.net <mwilli2@equilibrium.research.intel-research.net> 2004-03-17 18:31:06 +0000
commit: bee5b0bb130f42dabd8cbdcd035d8f737e725dbc (patch)
tree: 99ac0cc05ceea17ead1d618190f88dfa33ea7f86 /docs/interface.tex
parent: 8306baac6f817aea60eb6e7acfac96cbb007ed5a (diff)
download: xen-bee5b0bb130f42dabd8cbdcd035d8f737e725dbc.tar.gz
xen-bee5b0bb130f42dabd8cbdcd035d8f737e725dbc.tar.bz2
xen-bee5b0bb130f42dabd8cbdcd035d8f737e725dbc.zip
1 files changed, 302 insertions, 1 deletions
diff --git a/docs/interface.tex b/docs/interface.tex
index 2736a0412d..84003de1b6 100644
--- a/docs/interface.tex
+++ b/docs/interface.tex
@@ -353,7 +353,7 @@ create ``virtual disks'' on demand.
 \subsection{Virtual Disk Management}
 The VD management code consists of a set of python libraries. It can therefore
 be accessed by custom scripts as well as the convenience scripts provided. The
-VD database is a SQLite database in /var/db/xen\_vdisk.sqlite.
+VD database is a SQLite database in /var/db/xen\_vdisks.sqlite.
 
 The VD scripts and general VD usage are documented in the VBD-HOWTO.txt.
 
@@ -379,6 +379,307 @@ giving the page back to the hypervisor, or to use them for storing page tables.
 and providing control interfaces for managing scheduling, networking, and
 blocks.
 
+\chapter{CPU Scheduler}
+
+Xen offers a uniform API for CPU schedulers.  It is possible to choose
+from a number of schedulers at boot and it should be easy to add more.
+
+\paragraph*{Note: SMP host support}
+Xen has always supported SMP host systems.  Domains are statically assigned to
+CPUs, either at creation time or when manually pinning to a particular CPU.
+The current schedulers then run locally on each CPU to decide which of the
+assigned domains should be run there.
+
+\section{Standard Schedulers}
+
+These BVT and Round Robin schedulers are part of the normal Xen
+distribution.  A port of the Atropos scheduler from the Nemesis
+operating system is almost complete and will be added shortly.
+
+\subsection{Borrowed Virtual Time (BVT)}
+
+This was the original Xen scheduler.  BVT is designed for general-purpose
+environments but also provides support for latency-sensitive threads.  It
+provides long-term weighted sharing but allows tasks a limited ability to
+``warp back'' in virtual time so that they are dispatched earlier.
+
+BVT can be activated by specifying {\tt sched=bvt} as a boot argument to Xen.
+
+\subsection{Round Robin}
+
+The round robin scheduler is a very simple example of some of the basic parts
+of the scheduler API.
+
+Round robin can be activated by specifying {\tt sched=rrobin} as a boot
+argument to Xen.
+
+\section{Scheduling API}
+
+The scheduling API is used by both the schedulers described above and should
+also be used by any new schedulers.  It provides a generic interface and also
+implements much of the ``boilerplate'' code.
+
+\paragraph*{Note:} the scheduler API is currently undergoing active development,
+so there may be some changes to this API, although they are expected to be small.
+
+Schedulers conforming to this API are described by the following
+structure:
+
+\begin{verbatim}
+struct scheduler
+{
+    char *name;             /* full name for this scheduler      */
+    char *opt_name;         /* option name for this scheduler    */
+    unsigned int sched_id;  /* ID for this scheduler             */
+
+    int          (*init_scheduler) ();
+    int          (*alloc_task)     (struct task_struct *);
+    void         (*add_task)       (struct task_struct *);
+    void         (*free_task)      (struct task_struct *);
+    void         (*rem_task)       (struct task_struct *);
+    void         (*wake_up)        (struct task_struct *);
+    long         (*do_block)       (struct task_struct *);
+    task_slice_t (*do_schedule)    (s_time_t);
+    int          (*control)        (struct sched_ctl_cmd *);
+    int          (*adjdom)         (struct task_struct *,
+                                    struct sched_adjdom_cmd *);
+    s32          (*reschedule)     (struct task_struct *);
+    void         (*dump_settings)  (void);
+    void         (*dump_cpu_state) (int);
+    void         (*dump_runq_el)   (struct task_struct *);
+};
+\end{verbatim}
+
+The only method that {\em must} be implemented is
+{\tt do\_schedule()}.  However, if there is not some implementation for the
+{\tt wake\_up()} method then waking tasks will not get put on the runqueue!
+
+The fields of the above structure are described in more detail below.
+
+\subsubsection{name}
+
+The name field is an arbitrary descriptive ASCII string.
+
+\subsubsection{opt\_name}
+
+This field is the value of the {\tt sched=} boot-time option that will select
+this scheduler.
+
+\subsubsection{sched\_id}
+
+This is an integer that uniquely identifies this scheduler.  There should be a
+macro corrsponding to this scheduler ID in {\tt <hypervisor-ifs/sched-if.h>}.
+
+\subsubsection{init\_scheduler}
+
+\paragraph*{Purpose}
+
+This is a function for performing any scheduler-specific initialisation.  For
+instance, it might allocate memory for per-CPU scheduler data and initialise it
+appropriately.
+
+\paragraph*{Call environment}
+
+This function is called after the initialisation performed by the generic
+layer.  The function is called exactly once, for the scheduler that has been
+selected.
+
+\paragraph*{Return values}
+
+This should return negative on failure --- failure to initialise the scheduler
+will cause an immediate panic.
+
+\subsubsection{alloc\_task}
+
+\paragraph*{Purpose}
+This is called when a {\tt task\_struct} is allocated by the generic scheduler
+layer.  A particular scheduler implementation may use this method to allocate
+per-task data for this task.  It may use the {\tt sched\_priv} pointer in the
+{\tt task\_struct} to point to this data.
+
+\paragraph*{Call environment}
+The generic layer guarantees that the {\tt sched\_priv} field will
+remain intact from the time this method is called until the task is
+deallocated (so long as the scheduler implementation does not change
+it!).
+
+\paragraph*{Return values}
+Negative on failure.
+
+\subsubsection{add\_task}
+
+\paragraph*{Purpose}
+
+Called when a task is initially added by the generic layer.
+
+\paragraph*{Call environment}
+
+The fields in the {\tt task\_struct} are now filled out and available for use.
+Schedulers should implement appropriate initialisation of any per-task private
+information in this method.
+
+\subsubsection{free\_task}
+
+\paragraph*{Purpose}
+
+Schedulers should free the space used by any associated private data
+structures.
+
+\paragraph*{Call environment}
+
+This is called when a {\tt task\_struct} is about to be deallocated.
+The generic layer will have done generic task removal operations and
+(if implemented) called the scheduler's {\tt rem\_task} method before
+this method is called.
+
+\subsubsection{rem\_task}
+
+\paragraph*{Purpose}
+
+This is called when a task is being removed from scheduling.
+
+\subsubsection{wake\_up}
+
+\paragraph*{Purpose}
+
+Called when a task is woken up, this method should put the task on the runqueue
+(or do the scheduler-specific equivalent action).
+
+\paragraph*{Call environment}
+
+The generic layer guarantees that the task is already in state
+RUNNING.
+
+\subsubsection{do\_block}
+
+\paragraph*{Purpose}
+
+This function is called when a task is blocked.  This function should
+not remove the task from the runqueue.
+
+\paragraph*{Call environment}
+
+The EVENTS\_MASTER\_ENABLE\_BIT is already set and the task state changed to
+TASK\_INTERRUPTIBLE on entry to this method.
+
+\subsubsection{do\_schedule}
+
+This method must be implemented.
+
+\paragraph*{Purpose}
+
+The method is called each time a new task must be chosen for scheduling on the
+current CPU.  The current time as passed as the single argument (the current
+task can be found using the {\tt current} variable).
+
+This method should select the next task to run on this CPU and set it's minimum
+time to run as well as returning the data described below.
+
+This method should also take the appropriate action if the previous
+task has blocked, e.g. removing it from the runqueue.
+
+\paragraph*{Call environment}
+
+The other fields in the {\tt task\_struct} are updated by the generic layer,
+which also performs all Xen-specific tasks and performs the actual task switch
+(unless the previous task has been chosen again).
+
+This method is called with the {\tt schedule\_lock} held for the current CPU
+and with interrupts disabled.
+
+\paragraph*{Return values}
+
+Must return a {\tt struct task\_slice} describing what task to run and how long
+for (at maximum).
+
+\subsubsection{control}
+
+\paragraph*{Purpose}
+
+This method is called for global scheduler control operations.  It takes a
+pointer to a {\tt struct sched\_ctl\_cmd}, from which it should select the
+appropriate command data.
+
+\paragraph*{Call environment}
+
+The generic layer guarantees that when this method is called, the caller was
+using the same control interface version and that the caller selected the
+correct scheduler ID, hence the scheduler's implementation does not need to
+sanity-check these parts of the call.
+
+\paragraph*{Return values}
+
+This function should return the value to be passed back to user space, hence it
+should either be 0 or an appropriate errno value.
+
+\subsubsection{sched\_adjdom}
+
+\paragraph*{Purpose}
+
+This method is called to adjust the scheduling parameters of a particular
+domain.
+
+\paragraph*{Call environment}
+
+The generic layer guarantees that the caller has specified the correct
+control interface version and scheduler ID and that the supplied {\tt
+task\_struct} will not be deallocated during the call (hence it is not
+necessary to {\tt get\_task\_struct}).
+
+\paragraph*{Return values}
+
+This function should return the value to be passed back to user space, hence it
+should either be 0 or an appropriate errno value.
+
+\subsubsection{reschedule}
+
+\paragraph*{Purpose}
+
+This method is called to determine if a reschedule is required as a result of a
+particular task.
+
+\paragraph*{Call environment}
+The generic layer will cause a reschedule if the current domain is the idle
+task or it has exceeded its minimum time slice before a reschedule.  The
+generic layer guarantees that the task passed is not currently running but is
+on the runqueue.
+
+\paragraph*{Return values}
+
+Should return a mask of CPUs to cause a reschedule on.
+
+\subsubsection{dump\_settings}
+
+\paragraph*{Purpose}
+
+If implemented, this should dump any private global settings for this
+scheduler to the console.
+
+\paragraph*{Call environment}
+
+This function is called with interrupts enabled.
+
+\subsubsection{dump\_cpu\_state}
+
+\paragraph*{Purpose}
+
+This method should dump any private settings for the specified CPU.
+
+\paragraph*{Call environment}
+
+This function is called with interrupts disabled and the {\tt schedule\_lock}
+for the specified CPU held.
+
+\subsubsection{dump\_runq\_el}
+
+\paragraph*{Purpose}
+
+This method should dump any private settings for the specified task.
+
+\paragraph*{Call environment}
+
+This function is called with interrupts disabled and the {\tt schedule\_lock}
+for the task's CPU held.
 
 \chapter{Debugging}
author	mwilli2@equilibrium.research.intel-research.net <mwilli2@equilibrium.research.intel-research.net>	2004-03-17 18:31:06 +0000
committer	mwilli2@equilibrium.research.intel-research.net <mwilli2@equilibrium.research.intel-research.net>	2004-03-17 18:31:06 +0000
commit	bee5b0bb130f42dabd8cbdcd035d8f737e725dbc (patch)
tree	99ac0cc05ceea17ead1d618190f88dfa33ea7f86 /docs/interface.tex
parent	8306baac6f817aea60eb6e7acfac96cbb007ed5a (diff)
download	xen-bee5b0bb130f42dabd8cbdcd035d8f737e725dbc.tar.gz xen-bee5b0bb130f42dabd8cbdcd035d8f737e725dbc.tar.bz2 xen-bee5b0bb130f42dabd8cbdcd035d8f737e725dbc.zip