| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make the locking functions return the lock pointers, so they can be
passed to the unlocking functions (which in turn can check that the
lock is still actually providing the intended protection, i.e. the
parameters determining which lock is the right one didn't change).
Further use proper spin lock primitives rather than open coded
local_irq_...() constructs, so that interrupts can be re-enabled as
appropriate while spinning.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
incompatible with the SEDF scheduler: Any vCPU using
VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
guests) ends up on pCPU0 after that call. Obviously, running all PV
guests' (and namely Dom0's) vCPU-s on pCPU0 causes problems for those
guests rather sooner than later.
So the main thing that was clearly wrong (and bogus from the beginning)
was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
by a construct that prefers to put back the vCPU on the pCPU that it
got launched on.
However, there's one more glitch: When reducing the affinity of a vCPU
temporarily, and then widening it again to a set that includes the pCPU
that the vCPU was last running on, the generic scheduler code would not
force a migration of that vCPU, and hence it would forever stay on the
pCPU it last ran on. Since that can again create a load imbalance, the
SEDF scheduler wants a migration to happen regardless of it being
apparently unnecessary.
Of course, an alternative to checking for SEDF explicitly in
vcpu_set_affinity() would be to introduce a flags field in struct
scheduler, and have SEDF set a "always-migrate-on-affinity-change"
flag.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
| |
The emacs variable to set the C style from a local variable block is
c-file-style, not c-set-style.
Signed-off-by: David Vrabel <david.vrabel@citrix.com
|
|
|
|
|
|
|
|
| |
Namely, `short_cont' which is not updated anywhere in the code.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
| |
Mainly for consistency with credit, at least for the events that are
general enough, like vCPU initialization/destruction and calls
to the specific scheduling function.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
struct task_slice.migrated is not initialised by this function, and
subsequently returned by value, leading to the error:
sched_sedf.c: In function ‘sedf_do_extra_schedule’:
sched_sedf.c:711: error: ‘ret.migrated’ may be used uninitialised in
this function
for both gcc 4.1.2 and 4.4.3 (which are the two I have easily to hand)
when combined with the -fno-inline compile option.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
| |
There are several instances of the same construct finding the cpumask
for a cpupool. Use macros instead.
Signed-off-by: juergen.gross@ts.fujitsu.com
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sched_sedf.c used o have its own mechanism for producing tracing-alike
kind of information (domain block, wakeup, etc.). Nowadays, with an
even not so high number of pCPUs/vCPUs, just trying to enable this
makes the serial console completely unusable, produces tons of very
hard to parse and interpreet logging and can easily livelock
Dom0. Moreover, pretty much the same result this is struggling to get
to, is better achieved by enabling the scheduler-related tracing
events, as it is for the other schedulers (say, credit or credit2).
For all these reasons, this removes that machinery completely. While
at it, check in some cosmetics that harmonize the comments withim
themself and with the rest of the code base.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main idea is to move (as much as possible) locking logic
from generic code to the various pluggable schedulers.
While at it, the following is also accomplished:
- pausing all the non-current VCPUs of a domain while changing its
scheduling parameters is not effective in avoiding races and it is
prone to deadlock, so that is removed.
- sedf needs a global lock for preventing races while adjusting
domains' scheduling parameters (as it is for credit and credit2),
so that is added.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the sedf scheduler, while trying to guarantee to dom0 the proper
amount of CPU time for being able to do its job, we end up allocating
75% CPU-share to each and every of its VCPUs. This, combined with the
fact that such a scheduler has no load balancing logic at all (and
thus *all* dom0's VCPUs just rise and stay on physical CPU #0), means
that for example on a 12-cores system we're trying to exploit
75x12=3D900% of what we have on a PCPU, which is definitely too much!
Moreover, even if a cleverer mechanism for distributing VCPUs among
the available PCPUs (if not a proper load balancer) will be put in
place some day, it will still be difficult to decide how much
guaranteed CPU bandwidth each of the dom0's VCPUs should be provided
with, without posing the system at risk of livelock or starvation,
even during boot time.
Therefore, since sedf is capable of some form of best-effort
scheduling, the best thing we can do is ask for this behaviour for
dom0, as it is for all other domains, right upon creation. It will
then be the sysadmin's job to modify dom0's scheduling parameters to
better fit his particular needs, maybe after spreading the load a bit
by pinning VCPUs on PCPUs, or using cpupools, etc.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
| |
When using sedf scheduler in a cpupool the system might panic when
setting sedf scheduling parameters for a domain. Introduces
for_each_domain_in_cpupool macro as it is usable 4 times now. Add
appropriate locking in cpupool_unassign_cpu().
Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
| |
This includes the conversion from for_each_cpu_mask() to for_each-cpu().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The former is the runtime equivalent of NR_CPUS (and users of NR_CPUS,
where necessary, get adjusted accordingly), while the latter is for the
sole use of determining the allocation size when dynamically allocating
CPU masks (done later in this series).
Adjust accessors to use either of the two to bound their bitmap
operations - which one gets used depends on whether accessing the bits
in the gap between nr_cpu_ids and nr_cpumask_bits is benign but more
efficient.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
| |
Update the sedf scheduler to be compatible with the most recent
generic scheduler interface changes.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The CPU masks embedded in these structures prevent NR_CPUS-independent
sizing of these structures.
Basic concept (in xen/include/cpumask.h) taken from recent Linux.
For scalability purposes, many other uses of cpumask_t should be
replaced by cpumask_var_t, particularly local variables of functions.
This implies that no functions should have by-value cpumask_t
parameters, and that the whole old cpumask interface (cpus_...())
should go away in favor of the new (cpumask_...()) one.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of going with mpc_config_processor struct.
that field ony have 8 bits.
We should not change that struct, because it is shared with mptable.
Also need to increase MAX_APICS.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Rather than using a fixed value of 512, make this scale with NR_CPUS
(which obviously still doesn't cover all theoretically possible
systems, but at least allows some build time control).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current cpupools code interface is a bit inconsistent. This
patch addresses this by making the interaction for each
vcpu in a pool look like this:
alloc_vdata() -- allocates and sets up vcpu data
insert_vcpu() -- the vcpu is ready to run in this pool
remove_vcpu() -- take the vcpu out of the pool
free_vdata() -- delete allocated vcpu data
(Previously, remove_vcpu and free_vdata were combined into a "destroy
vcpu", and insert_vcpu was only called for idle vcpus.)
This also addresses a bug in credit2 which was caused by a
misunderstanding of the cpupools interface.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With IRQs getting bound to the CPU the binding vCPU currently runs on
there can result quite a bit of extra cross CPU traffic as soon as
that vCPU moves to a different pCPU. Likewise, when a domain re-binds
an event channel associated with a pIRQ, that IRQ's affinity should
also be adjusted.
The open issue is how to break ties for interrupts shared by multiple
domains - currently, the last request (at any point in time) is being
honored.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
|
|
|
| |
The significant remaining culprits for x86 are credit2, hpet, and
percpu-area subsystems. To be dealt with in a separate patch.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
| |
Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
|
|
|
|
|
|
|
|
| |
...rather than in softirq context. This is expected to avoid a lot of
subtle deadlocks relating to the fact that softirqs can interrupt a
scheduled vcpu.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
|
|
|
|
|
|
|
| |
Make various data items const or __read_mostly where
possible/reasonable.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the shared info layout is fixed, guests are required to use
VCPUOP_register_vcpu_info prior to booting any vCPU beyond the
traditional limit of 32.
MAX_VIRT_CPUS, being an implemetation detail of the hypervisor, is no
longer being exposed in the public headers.
The tools changes are clearly incomplete (and done only so things
would
build again), and the current state of the tools (using scalar
variables all over the place to represent vCPU bitmaps) very likely
doesn't permit booting DomU-s with more than the traditional number of
vCPU-s. Testing of the extended functionality was done with Dom0 (96
vCPU-s, as well as 128 vCPU-s out of which the kernel elected - by way
of a simple kernel side patch - to use only some, resulting in a
sparse
bitmap).
ia64 changes only to make things build, and build-tested only (and the
tools part only as far as the build would go without encountering
unrelated problems in the blktap code).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
|
|
|
|
|
|
| |
This mainly means removing stack variables that (should) depend on
NR_CPUS (other than cpumask_t ones) and adjusting certain array sizes.
There's at least one open tools issue: The 'xm vcpu-pin' path assumes
a maximum of 64 CPU-s in many places.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
|
|
|
|
| |
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
| |
Signed-off-by: Jose Renato Santos <jsantos@hpl.hp.com>
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
|
|
| |
In multi-core and multi-threaded systems, not all idling "CPUs" are
equal: When there are idling execution vehicles, it's better to spread
VCPUs across sockets and cores before co-scheduling cores and threads.
Signed-off-by: Emmanuel Ackaouy <ack@xensource.com>
|
|
|
|
|
|
| |
fields to already be allocated. This has led to a general cleanup of
domain and vcpu initialisation and destruction.
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
|
|
|
|
| |
Now works for any VCPU, including the caller's VCPU.
By not synchronously pausing the affected VCPU we avoid
any risk of deadlock.
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. platform_op -- used by dom0 kernel to perform actions on the
hardware platform (e.g., MTRR access, microcode update, platform
quirks, ...)
2. domctl -- used by management tools to control a specified domain
3. sysctl -- used by management tools for system-wide actions
Benefits include more sensible factoring of actions to
hypercalls. Also allows tool compatibility to be tracked separately
from the dom0 kernel. The assumption is that it will be easier to
replace libxenctrl, libxenguest and Xen as a matched set if the
dom0 kernel does not need to be replaced too (e.g., because that
would require vendor revalidation).
From here on we hope to maintain dom0 kernel compatibility. This
promise is not extended to tool compatibility beyond the existing
guarantee that compatibility will not be broken within a three-level
stable release [3.0.2, 3.0.3, etc.].
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
|
| |
with for_each_cpu() -- we want to ensure that per_cpu areas are
accessed only for cpus in cpu_possible_map.
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
| |
Signed-off-by: Tristan Gingold <tristan.gingold@bull.net>
|
|
|
|
|
| |
Signed-off-by: Steven Hand <steven@xensource.com>
|
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
|
|
| |
Creating a domain no longer creates vcpu0 -- that is now
done later.
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
|
|
|
| |
equally-weighted share of extratime. Also, hack the short-
blocking logic as otherwise domain0 steals all CPU time
for several seconds after waking from a long sleep.
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
|
|
|
|
| |
1. Move alloc/dealloc routines to domain.[ch]
2. Merge alloc_task/add_vcpu schedops -> init_vcpu
3. Merge free_task/remove_vcpu schedops -> destroy_domain
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
|
|
|
| |
clean up formatting.
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
|
|
|
| |
Remove temporary debug tracing.
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xensource.com>
|
|
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xensource.com>
|