| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make the locking functions return the lock pointers, so they can be
passed to the unlocking functions (which in turn can check that the
lock is still actually providing the intended protection, i.e. the
parameters determining which lock is the right one didn't change).
Further use proper spin lock primitives rather than open coded
local_irq_...() constructs, so that interrupts can be re-enabled as
appropriate while spinning.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
... with cpumask_any() or cpumask_cycle().
In one case this also allows elimination of a cpumask_empty() call,
and while doing this I also spotted a redundant use of
cpumask_weight(). (When running on big systems, operations on CPU masks
aren't cheap enough to use them carelessly.)
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Under normal circumstances, snext->credit should never be less than
-CSCHED_MIN_TIMER. However, under some circumstances, a vcpu with low
credits may be allowed to run long enough that its credits are
actually less than -CSCHED_CREDIT_INIT.
(Instances have been observed, for example, where a vcpu with 200us of
credit was allowed to run for 11ms, giving it -10.8ms of credit. Thus
it was still negative even after the reset.)
If this is the case for snext, we simply want to keep moving everyone
up until it is in the black again. This fair because none of the
other vcpus want to run at the moment.
Rather than loop, just detect how many times we want to add
CSCHED_CREDIT_INIT. Try to avoid integer divides and multiplies in
the common case.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
| |
In order to avoid high-frequency cpu migration, vcpus may in fact be
scheduled slightly out-of-order. Account for this situation properly.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should help with under-accounting of vCPU-s running for extremly
short periods of time, but becoming runnable again at a high frequency.
Don't bother subtracting the residual from the runtime, as it can only ever
add up to one nanosecond, and will end up being debited during the next
reset interval anyway.
Original-patch-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
csched_runtime() needs to call the ct2() function to change credits
into time. The c2t() function, however, is expensive, as it requires
an integer division.
c2t() was being called twice, once for the main vcpu's credit and once
for the difference between its credit and the next in the queue. But
this is unnecessary; by calculating in "credit" first, we can make it
so that we just do one conversion later in the algorithm.
This also adds more documentation describing the intended algorithm,
along with a relevant assertion..
The effect of the new code should be the same as the old code.
Spotted-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
So that it becomes possible to create scheduler specific trace
records, within each scheduler, without worrying about the
overlapping, and also without giving up being able to recognise them
univocally. The latter is deemed as useful, since we can have more
than one scheduler running at the same time, thanks to cpupools.
The event ID is 12 bits, and this change uses the upper 3 of them for
the 'scheduler ID'. This means we're limited to 8 schedulers and to
512 scheduler specific tracing events. Both seem reasonable
limitations as of now.
This also converts the existing credit2 tracing (the only scheduler
generating tracing events up to now) to the new system.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
| |
Mainly for consistency with credit, at least for the events that are
general enough, like vCPU initialization/destruction and calls
to the specific scheduling function.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
| |
releasing cpu
This fixes a bug that happens when you remove cpus from a credit2
cpupool.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
| |
There are several instances of the same construct finding the cpumask
for a cpupool. Use macros instead.
Signed-off-by: juergen.gross@ts.fujitsu.com
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- call free_xenoprof_pages only ifdef CONFIG_XENOPROF;
- define PRI_stime as PRId64 in an header file;
- respect boundaries in is_kernel_*;
- implement is_kernel_rodata;
- guest_physmap_add_page should be ((void)0).
- fix guest_physmap_add_page;
- introduce CONFIG_XENOPROF;
- define _srodata and _erodata as const char*.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main idea is to move (as much as possible) locking logic
from generic code to the various pluggable schedulers.
While at it, the following is also accomplished:
- pausing all the non-current VCPUs of a domain while changing its
scheduling parameters is not effective in avoiding races and it is
prone to deadlock, so that is removed.
- sedf needs a global lock for preventing races while adjusting
domains' scheduling parameters (as it is for credit and credit2),
so that is added.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I have confirmed that the relevant pages have been transitioned.
What remains is pages which have not yet been moved over:
$ rgrep xenwiki *
tools/libxen/README:http://wiki.xensource.com/xenwiki/XenApi
tools/xenballoon/xenballoond.README:http://wiki.xensource.com/xenwiki/Open_Topics_For_Discussion?action=AttachFile&do=get&target=Memory+Overcommit.pdf
Note that "PythonInXlConfig" never existed in the old wiki and does not exist
in the new. This reference was introduced by 22735:cb94dbe20f97 and was
supposed to have been written prior to the 4.1 release. I have transitioned it
anyway but it's not clear how valuable the message actually is. Perhaps we
should just remove that aspect of it?
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
Acked-by: Ian Jackson <ian.jackson.citrix.com>
|
|
|
|
|
|
|
|
| |
This includes the conversion from for_each_cpu_mask() to for_each-cpu().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
| |
... thus reducing the per-CPU data area size back to one page even when
building for large NR_CPUS.
At once eliminate the old __cpu{mask,list}_scnprintf() helpers.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
| |
... in favor of using the new, nr_cpumask_bits-based ones.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The former is the runtime equivalent of NR_CPUS (and users of NR_CPUS,
where necessary, get adjusted accordingly), while the latter is for the
sole use of determining the allocation size when dynamically allocating
CPU masks (done later in this series).
Adjust accessors to use either of the two to bound their bitmap
operations - which one gets used depends on whether accessing the bits
in the gap between nr_cpu_ids and nr_cpumask_bits is benign but more
efficient.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
| |
Fixes the build under gcc-4.6 -Werror=unused-but-set-variable
Signed-off-by: Olaf Hering <olaf@aepfle.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In credit2, there needs to be a strong correlation between
v->processor and the runqueue to which a vcpu is assigned;
much of the code relies on this invariant. Allow credit2
to manage the actual migration itself.
This fixes the most recent credit2 bug reported on the list
(Xen BUG at sched_credit2.c:1606) in Xen 4.1, as well as
the bug at sched_credit2.c:811 in -unstable (which catches the
same condition earlier).
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
| |
With no modular drivers, all CPU notifier setup is supposed to happen
during boot. There also is a respective comment in the function.=20
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
| |
and passing the outer scope's variables in explicitly.
This is needed to compile xen with clang.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
| |
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
| |
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
| |
Allow the "unbalance tolerance" -- the amount of difference between
two runqueues that will be allowed before rebalancing -- to differ
depending on how busy the runqueue is. If it's less than 100%,
default to a difference of 1.0; if it's more than 100%, default to a
tolerance of 0.125.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a first-cut at getting load balancing. I'm first working on
looking at behavior I want to get correct; then, once I know what kind
of behavior works well, then I'll work on getting it efficient.
The general idea is when balancing runqueues, look for the runqueue
whose loadavg is the most different from ours (higher or lower).
Then, look for a transaction which will bring the loads closest
together: either pushing a vcpu, pulling a vcpu, or swapping them.
Use the per-vcpu load to calculate the expected load after the
exchange.
The current algorithm looks at every combination, which is O(N^2).
That's not going to be suitable for workloads with large numbers of
vcpus (such as highly consolidated VDI deployments). I'll make a more
efficient algorithm once I've experimented and determined what I think
is the best load-balancing behavior.
At the moment, balance from a runqueue every time the credit resets.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
| |
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
| |
Put in infrastructure to allow a vcpu to requeset to migrate to a
specific runqueue. This will allow a load balancer to choose running
VMs to migrate, and know they will go where expected when the VM is
descheduled.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
| |
As vcpus are migrated, track how we expect the load to change. This
helps smooth migrations when the balancing doesn't take immediate
effect on the load average. In theory, if vcpu activity remains
constant, then the measured avgload should converge to the balanced
avgload.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
| |
Track the amount of load contributed by a particular vcpu, to help
us make informed decisions about what will happen if we make a move.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
| |
Calculate a per-runqueue decaying load average.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
| |
Because alloc_pdata() is called before the cpu layout information is
available, we grab a callback to the newly-created CPU_STARTING
notifier.
cpu 0 doesn't get a callback, so we simply hard-code it to runqueue 0.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
| |
In preparation for multiple runqueues, add a simple cpu picker that
will look for
the runqueue with the lowest instantaneous load to assign the vcpu to.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
| |
Add hooks in the various places to detect vcpus becoming active or
inactive. At the moment, record only instantaneous runqueue load;
but this lays the groundwork for having a load average.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In preparation for cross-runqueue migration, make changes to make that
more robust.
Changes include:
* An up-pointer from the svc struct to the runqueue it's assigned to
* Explicit runqueue assign/desassings, with appropriate ASSERTs
* cpu_pick will de-assign a vcpu from a runqueue if it's migrating,
and wake will re-assign it
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Several refactorizations:
* Add prv->initialized cpu mask
* Replace prv->runq_count with active_queue mask
* Replace rqd->cpu_min,cpu_mask with active cpu mask
* Put locks in the runqueue structure, rather than borrowing the
existing cpu locks
* init() initializes all runqueues to NULL, inactive, and maps all
pcpus to runqueue -q
* alloc_pcpu() will add cpus to runqueues, "activating" the runqueue
if necessary. All cpus are currently assigned to runqueue 0.
End-to-end behavior of the system should remain largely the same.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
| |
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
| |
Resist moving a vcpu from one L2 to another unless its credit
is past a certain amount.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Modern processors have one or two L3's per socket, and generally only
one core per L2. Credit2's design relies on having credit shared
across several. So as a first step to sharing a queue across L3 while
avoiding excessive cross-L2 migration
Step one: If the vcpu's current cpu is acceptable, just run it there.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
| |
Makes the traces slightly larger, but easier to analyze.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
| |
No functional changes.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
| |
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
| |
* Add traces for credit reset and scheduling a tasklet
* Remove tsc for traces which probably don't need them
* Print domain info in the debug dump
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous code had a bug where, if a second vcpu woke up and ran
runq_tickle before the first vcpu had actually started running on a
tickled processor, the code would choose the same cpu to tickle, and
the result would be a "missed tickle" -- only one of the vcpus would
run, despite having idle processors.
This change:
* Keeps a mask of idle cpus
* Keeps a mask of cpus which have been tickled, but not entered
schedule yet.
The new tickle code first looks at the set of idle-but-not-tickled
cpus; if it's not empty, it tickles one.
If that doesn't work, it looks at the set of not-idle-but-not-tickled
cpus, finds the one with the lowest credit; if that's lower than the
waking vcpu, it tickles that one.
If any cpu is tickled, its bit in the tickled mask is set, and cleared
when schedule() is called.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|