docs/src/interface/scheduling.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268

\chapter{Scheduling API}  

The scheduling API is used by both the schedulers described above and should
also be used by any new schedulers.  It provides a generic interface and also
implements much of the ``boilerplate'' code.

Schedulers conforming to this API are described by the following
structure:

\begin{verbatim}
struct scheduler
{
    char *name;             /* full name for this scheduler      */
    char *opt_name;         /* option name for this scheduler    */
    unsigned int sched_id;  /* ID for this scheduler             */

    int          (*init_scheduler) ();
    int          (*alloc_task)     (struct task_struct *);
    void         (*add_task)       (struct task_struct *);
    void         (*free_task)      (struct task_struct *);
    void         (*rem_task)       (struct task_struct *);
    void         (*wake_up)        (struct task_struct *);
    void         (*do_block)       (struct task_struct *);
    task_slice_t (*do_schedule)    (s_time_t);
    int          (*control)        (struct sched_ctl_cmd *);
    int          (*adjdom)         (struct task_struct *,
                                    struct sched_adjdom_cmd *);
    s32          (*reschedule)     (struct task_struct *);
    void         (*dump_settings)  (void);
    void         (*dump_cpu_state) (int);
    void         (*dump_runq_el)   (struct task_struct *);
};
\end{verbatim}

The only method that {\em must} be implemented is
{\tt do\_schedule()}.  However, if there is not some implementation for the
{\tt wake\_up()} method then waking tasks will not get put on the runqueue!

The fields of the above structure are described in more detail below.

\subsubsection{name}

The name field should point to a descriptive ASCII string.

\subsubsection{opt\_name}

This field is the value of the {\tt sched=} boot-time option that will select
this scheduler.

\subsubsection{sched\_id}

This is an integer that uniquely identifies this scheduler.  There should be a
macro corrsponding to this scheduler ID in {\tt <xen/sched-if.h>}.

\subsubsection{init\_scheduler}

\paragraph*{Purpose}

This is a function for performing any scheduler-specific initialisation.  For
instance, it might allocate memory for per-CPU scheduler data and initialise it
appropriately.

\paragraph*{Call environment}

This function is called after the initialisation performed by the generic
layer.  The function is called exactly once, for the scheduler that has been
selected.

\paragraph*{Return values}

This should return negative on failure --- this will cause an
immediate panic and the system will fail to boot.

\subsubsection{alloc\_task}

\paragraph*{Purpose}
Called when a {\tt task\_struct} is allocated by the generic scheduler
layer.  A particular scheduler implementation may use this method to
allocate per-task data for this task.  It may use the {\tt
sched\_priv} pointer in the {\tt task\_struct} to point to this data.

\paragraph*{Call environment}
The generic layer guarantees that the {\tt sched\_priv} field will
remain intact from the time this method is called until the task is
deallocated (so long as the scheduler implementation does not change
it explicitly!).

\paragraph*{Return values}
Negative on failure.

\subsubsection{add\_task}

\paragraph*{Purpose}

Called when a task is initially added by the generic layer.

\paragraph*{Call environment}

The fields in the {\tt task\_struct} are now filled out and available for use.
Schedulers should implement appropriate initialisation of any per-task private
information in this method.

\subsubsection{free\_task}

\paragraph*{Purpose}

Schedulers should free the space used by any associated private data
structures.

\paragraph*{Call environment}

This is called when a {\tt task\_struct} is about to be deallocated.
The generic layer will have done generic task removal operations and
(if implemented) called the scheduler's {\tt rem\_task} method before
this method is called.

\subsubsection{rem\_task}

\paragraph*{Purpose}

This is called when a task is being removed from scheduling (but is
not yet being freed).

\subsubsection{wake\_up}

\paragraph*{Purpose}

Called when a task is woken up, this method should put the task on the runqueue
(or do the scheduler-specific equivalent action).

\paragraph*{Call environment}

The task is already set to state RUNNING.

\subsubsection{do\_block}

\paragraph*{Purpose}

This function is called when a task is blocked.  This function should
not remove the task from the runqueue.

\paragraph*{Call environment}

The EVENTS\_MASTER\_ENABLE\_BIT is already set and the task state changed to
TASK\_INTERRUPTIBLE on entry to this method.  A call to the {\tt
  do\_schedule} method will be made after this method returns, in
order to select the next task to run.

\subsubsection{do\_schedule}

This method must be implemented.

\paragraph*{Purpose}

The method is called each time a new task must be chosen for scheduling on the
current CPU.  The current time as passed as the single argument (the current
task can be found using the {\tt current} macro).

This method should select the next task to run on this CPU and set it's minimum
time to run as well as returning the data described below.

This method should also take the appropriate action if the previous
task has blocked, e.g. removing it from the runqueue.

\paragraph*{Call environment}

The other fields in the {\tt task\_struct} are updated by the generic layer,
which also performs all Xen-specific tasks and performs the actual task switch
(unless the previous task has been chosen again).

This method is called with the {\tt schedule\_lock} held for the current CPU
and local interrupts disabled.

\paragraph*{Return values}

Must return a {\tt struct task\_slice} describing what task to run and how long
for (at maximum).

\subsubsection{control}

\paragraph*{Purpose}

This method is called for global scheduler control operations.  It takes a
pointer to a {\tt struct sched\_ctl\_cmd}, which it should either
source data from or populate with data, depending on the value of the
{\tt direction} field.

\paragraph*{Call environment}

The generic layer guarantees that when this method is called, the
caller selected the correct scheduler ID, hence the scheduler's
implementation does not need to sanity-check these parts of the call.

\paragraph*{Return values}

This function should return the value to be passed back to user space, hence it
should either be 0 or an appropriate errno value.

\subsubsection{sched\_adjdom}

\paragraph*{Purpose}

This method is called to adjust the scheduling parameters of a particular
domain, or to query their current values.  The function should check
the {\tt direction} field of the {\tt sched\_adjdom\_cmd} it receives in
order to determine which of these operations is being performed.

\paragraph*{Call environment}

The generic layer guarantees that the caller has specified the correct
control interface version and scheduler ID and that the supplied {\tt
task\_struct} will not be deallocated during the call (hence it is not
necessary to {\tt get\_task\_struct}).

\paragraph*{Return values}

This function should return the value to be passed back to user space, hence it
should either be 0 or an appropriate errno value.

\subsubsection{reschedule}

\paragraph*{Purpose}

This method is called to determine if a reschedule is required as a result of a
particular task.

\paragraph*{Call environment}
The generic layer will cause a reschedule if the current domain is the idle
task or it has exceeded its minimum time slice before a reschedule.  The
generic layer guarantees that the task passed is not currently running but is
on the runqueue.

\paragraph*{Return values}

Should return a mask of CPUs to cause a reschedule on.

\subsubsection{dump\_settings}

\paragraph*{Purpose}

If implemented, this should dump any private global settings for this
scheduler to the console.

\paragraph*{Call environment}

This function is called with interrupts enabled.

\subsubsection{dump\_cpu\_state}

\paragraph*{Purpose}

This method should dump any private settings for the specified CPU.

\paragraph*{Call environment}

This function is called with interrupts disabled and the {\tt schedule\_lock}
for the specified CPU held.

\subsubsection{dump\_runq\_el}

\paragraph*{Purpose}

This method should dump any private settings for the specified task.

\paragraph*{Call environment}

This function is called with interrupts disabled and the {\tt schedule\_lock}
for the task's CPU held.