aboutsummaryrefslogtreecommitdiffstats
path: root/xen/common/sched_credit2.c
Commit message (Collapse)AuthorAgeFilesLines
* scheduler: adjust internal locking interfaceJan Beulich2013-10-141-9/+13
| | | | | | | | | | | | | | | Make the locking functions return the lock pointers, so they can be passed to the unlocking functions (which in turn can check that the lock is still actually providing the intended protection, i.e. the parameters determining which lock is the right one didn't change). Further use proper spin lock primitives rather than open coded local_irq_...() constructs, so that interrupts can be re-enabled as appropriate while spinning. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* credit2: replace cpumask_first() usesJan Beulich2013-08-231-8/+10
| | | | | | | | | | | | | ... with cpumask_any() or cpumask_cycle(). In one case this also allows elimination of a cpumask_empty() call, and while doing this I also spotted a redundant use of cpumask_weight(). (When running on big systems, operations on CPU masks aren't cheap enough to use them carelessly.) Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Reset until the front of the runqueue is positiveGeorge Dunlap2013-03-111-8/+40
| | | | | | | | | | | | | | | | | | | | | Under normal circumstances, snext->credit should never be less than -CSCHED_MIN_TIMER. However, under some circumstances, a vcpu with low credits may be allowed to run long enough that its credits are actually less than -CSCHED_CREDIT_INIT. (Instances have been observed, for example, where a vcpu with 200us of credit was allowed to run for 11ms, giving it -10.8ms of credit. Thus it was still negative even after the reset.) If this is the case for snext, we simply want to keep moving everyone up until it is in the black again. This fair because none of the other vcpus want to run at the moment. Rather than loop, just detect how many times we want to add CSCHED_CREDIT_INIT. Try to avoid integer divides and multiplies in the common case. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Fix erronous ASSERTGeorge Dunlap2013-03-111-24/+17
| | | | | | | In order to avoid high-frequency cpu migration, vcpus may in fact be scheduled slightly out-of-order. Account for this situation properly. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: track residual from divisions done during accountingGeorge Dunlap2013-03-041-7/+15
| | | | | | | | | | | | This should help with under-accounting of vCPU-s running for extremly short periods of time, but becoming runnable again at a high frequency. Don't bother subtracting the residual from the runtime, as it can only ever add up to one nanosecond, and will end up being debited during the next reset interval anyway. Original-patch-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Avoid extra c2t calcuation in csched_runtimeGeorge Dunlap2013-03-041-11/+37
| | | | | | | | | | | | | | | | | | | csched_runtime() needs to call the ct2() function to change credits into time. The c2t() function, however, is expensive, as it requires an integer division. c2t() was being called twice, once for the main vcpu's credit and once for the difference between its credit and the next in the queue. But this is unnecessary; by calculating in "credit" first, we can make it so that we just do one conversion later in the algorithm. This also adds more documentation describing the intended algorithm, along with a relevant assertion.. The effect of the new code should be the same as the old code. Spotted-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* xen: tracing: introduce per-scheduler trace event IDsDario Faggioli2012-12-181-12/+16
| | | | | | | | | | | | | | | | | | | | So that it becomes possible to create scheduler specific trace records, within each scheduler, without worrying about the overlapping, and also without giving up being able to recognise them univocally. The latter is deemed as useful, since we can have more than one scheduler running at the same time, thanks to cpupools. The event ID is 12 bits, and this change uses the upper 3 of them for the 'scheduler ID'. This means we're limited to 8 schedulers and to 512 scheduler specific tracing events. Both seem reasonable limitations as of now. This also converts the existing credit2 tracing (the only scheduler generating tracing events up to now) to the new system. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* xen: sched: introduce a couple of counters in credit2 and SEDFDario Faggioli2012-10-231-0/+5
| | | | | | | | | | Mainly for consistency with credit, at least for the events that are general enough, like vCPU initialization/destruction and calls to the specific scheduling function. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* xen, credit2: Put the per-cpu schedule lock back to the default lock when ↵George Dunlap2012-04-101-0/+6
| | | | | | | | | | releasing cpu This fixes a bug that happens when you remove cpus from a credit2 cpupool. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* introduce and use common macros for selecting cpupool based cpumasksJuergen Gross2012-01-241-2/+0
| | | | | | | | There are several instances of the same construct finding the cpumask for a cpupool. Use macros instead. Signed-off-by: juergen.gross@ts.fujitsu.com Committed-by: Keir Fraser <keir@xen.org>
* A collection of fixes to Xen common filesStefano Stabellini2012-01-231-6/+0
| | | | | | | | | | | | | | | | - call free_xenoprof_pages only ifdef CONFIG_XENOPROF; - define PRI_stime as PRId64 in an header file; - respect boundaries in is_kernel_*; - implement is_kernel_rodata; - guest_physmap_add_page should be ((void)0). - fix guest_physmap_add_page; - introduce CONFIG_XENOPROF; - define _srodata and _erodata as const char*. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* Rework locking for sched_adjust.Dario Faggioli2012-01-041-10/+11
| | | | | | | | | | | | | | | | | The main idea is to move (as much as possible) locking logic from generic code to the various pluggable schedulers. While at it, the following is also accomplished: - pausing all the non-current VCPUs of a domain while changing its scheduling parameters is not effective in avoiding races and it is prone to deadlock, so that is removed. - sedf needs a global lock for preventing races while adjusting domains' scheduling parameters (as it is for credit and credit2), so that is added. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Committed-by: Keir Fraser <keir@xen.org>
* Replace references to old wiki with ones to new.Ian Campbell2011-11-291-1/+1
| | | | | | | | | | | | | | | | | | | | I have confirmed that the relevant pages have been transitioned. What remains is pages which have not yet been moved over: $ rgrep xenwiki * tools/libxen/README:http://wiki.xensource.com/xenwiki/XenApi tools/xenballoon/xenballoond.README:http://wiki.xensource.com/xenwiki/Open_Topics_For_Discussion?action=AttachFile&do=get&target=Memory+Overcommit.pdf Note that "PythonInXlConfig" never existed in the old wiki and does not exist in the new. This reference was introduced by 22735:cb94dbe20f97 and was supposed to have been written prior to the 4.1 release. I have transitioned it anyway but it's not clear how valuable the message actually is. Perhaps we should just remove that aspect of it? Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Jackson <ian.jackson.citrix.com> Acked-by: Ian Jackson <ian.jackson.citrix.com>
* eliminate first_cpu() etcJan Beulich2011-11-081-8/+8
| | | | | | | | This includes the conversion from for_each_cpu_mask() to for_each-cpu(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
* eliminate cpu_clear()Jan Beulich2011-11-081-6/+6
| | | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
* eliminate cpu_set()Jan Beulich2011-11-081-5/+5
| | | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
* eliminate cpu_test_xyz()Jan Beulich2011-11-081-10/+10
| | | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
* eliminate cpus_xyz()Jan Beulich2011-11-081-10/+10
| | | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
* cpupools: allocate CPU masks dynamicallyJan Beulich2011-10-211-1/+1
| | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* allocate CPU sibling and core maps dynamicallyJan Beulich2011-10-211-2/+2
| | | | | | | | | | ... thus reducing the per-CPU data area size back to one page even when building for large NR_CPUS. At once eliminate the old __cpu{mask,list}_scnprintf() helpers. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* eliminate cpumask accessors referencing NR_CPUSJan Beulich2011-10-211-6/+6
| | | | | | | ... in favor of using the new, nr_cpumask_bits-based ones. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* introduce and use nr_cpu_ids and nr_cpumask_bitsJan Beulich2011-10-211-1/+1
| | | | | | | | | | | | | | | The former is the runtime equivalent of NR_CPUS (and users of NR_CPUS, where necessary, get adjusted accordingly), while the latter is for the sole use of determining the allocation size when dynamically allocating CPU masks (done later in this series). Adjust accessors to use either of the two to bound their bitmap operations - which one gets used depends on whether accessing the bits in the gap between nr_cpu_ids and nr_cpumask_bits is benign but more efficient. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* use xzalloc in common codeJan Beulich2011-10-041-6/+3
| | | | | Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* xen: Remove some initialised but otherwise unused variables.Olaf Hering2011-05-201-3/+1
| | | | | | Fixes the build under gcc-4.6 -Werror=unused-but-set-variable Signed-off-by: Olaf Hering <olaf@aepfle.de>
* credit2: add a callback to migrate to a new cpuGeorge Dunlap2011-04-271-19/+17
| | | | | | | | | | | | | | In credit2, there needs to be a strong correlation between v->processor and the runqueue to which a vcpu is assigned; much of the code relies on this invariant. Allow credit2 to manage the actual migration itself. This fixes the most recent credit2 bug reported on the list (Xen BUG at sched_credit2.c:1606) in Xen 4.1, as well as the bug at sched_credit2.c:811 in -unstable (which catches the same condition earlier). Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* move register_cpu_notifier() into .init.textJan Beulich2011-04-021-3/+10
| | | | | | | With no modular drivers, all CPU notifier setup is supposed to happen during boot. There also is a respective comment in the function.=20 Signed-off-by: Jan Beulich <jbeulich@novell.com>
* credit2: remove two nested functions, replacing them with static onesTim Deegan2011-03-071-98/+108
| | | | | | | | and passing the outer scope's variables in explicitly. This is needed to compile xen with clang. Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* credit2: Add more debuggingGeorge Dunlap2011-03-031-0/+26
| | | | | Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* credit2: Fix x86_32 build.Keir Fraser2010-12-241-1/+1
| | | | Signed-off-by: Keir Fraser <keir@xen.org>
* credit2: On debug keypress print load average as a fractionKeir Fraser2010-12-241-2/+8
| | | | Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Different unbalance tolerance for underloaded and overloaded queuesKeir Fraser2010-12-241-4/+30
| | | | | | | | | | Allow the "unbalance tolerance" -- the amount of difference between two runqueues that will be allowed before rebalancing -- to differ depending on how busy the runqueue is. If it's less than 100%, default to a difference of 1.0; if it's more than 100%, default to a tolerance of 0.125. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Introduce a loadavg-based load balancerKeir Fraser2010-12-241-0/+213
| | | | | | | | | | | | | | | | | | | | | | | This is a first-cut at getting load balancing. I'm first working on looking at behavior I want to get correct; then, once I know what kind of behavior works well, then I'll work on getting it efficient. The general idea is when balancing runqueues, look for the runqueue whose loadavg is the most different from ours (higher or lower). Then, look for a transaction which will bring the loads closest together: either pushing a vcpu, pulling a vcpu, or swapping them. Use the per-vcpu load to calculate the expected load after the exchange. The current algorithm looks at every combination, which is O(N^2). That's not going to be suitable for workloads with large numbers of vcpus (such as highly consolidated VDI deployments). I'll make a more efficient algorithm once I've experimented and determined what I think is the best load-balancing behavior. At the moment, balance from a runqueue every time the credit resets. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Use loadavg to pick cpus, instead of instantaneous loadKeir Fraser2010-12-241-10/+23
| | | | Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Migrate request infrastructureKeir Fraser2010-12-241-3/+35
| | | | | | | | | Put in infrastructure to allow a vcpu to requeset to migrate to a specific runqueue. This will allow a load balancer to choose running VMs to migrate, and know they will go where expected when the VM is descheduled. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Track expected loadKeir Fraser2010-12-241-1/+14
| | | | | | | | | | As vcpus are migrated, track how we expect the load to change. This helps smooth migrations when the balancing doesn't take immediate effect on the load average. In theory, if vcpu activity remains constant, then the measured avgload should converge to the balanced avgload. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Track average load contributed by a vcpuKeir Fraser2010-12-241-14/+77
| | | | | | | Track the amount of load contributed by a particular vcpu, to help us make informed decisions about what will happen if we make a move. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Calculate load averageKeir Fraser2010-12-241-2/+50
| | | | | | Calculate a per-runqueue decaying load average. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Detect socket layout and assign one runqueue per socketKeir Fraser2010-12-241-3/+66
| | | | | | | | | | Because alloc_pdata() is called before the cpu layout information is available, we grab a callback to the newly-created CPU_STARTING notifier. cpu 0 doesn't get a callback, so we simply hard-code it to runqueue 0. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Simple cpu picker based on instantaneous loadKeir Fraser2010-12-241-8/+69
| | | | | | | | In preparation for multiple runqueues, add a simple cpu picker that will look for the runqueue with the lowest instantaneous load to assign the vcpu to. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Calculate instantaneous runqueue loadKeir Fraser2010-12-241-2/+34
| | | | | | | | Add hooks in the various places to detect vcpus becoming active or inactive. At the moment, record only instantaneous runqueue load; but this lays the groundwork for having a load average. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Handle runqueue changesKeir Fraser2010-12-241-9/+114
| | | | | | | | | | | | In preparation for cross-runqueue migration, make changes to make that more robust. Changes include: * An up-pointer from the svc struct to the runqueue it's assigned to * Explicit runqueue assign/desassings, with appropriate ASSERTs * cpu_pick will de-assign a vcpu from a runqueue if it's migrating, and wake will re-assign it Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Refactor runqueue initializationKeir Fraser2010-12-241-51/+142
| | | | | | | | | | | | | | | | | Several refactorizations: * Add prv->initialized cpu mask * Replace prv->runq_count with active_queue mask * Replace rqd->cpu_min,cpu_mask with active cpu mask * Put locks in the runqueue structure, rather than borrowing the existing cpu locks * init() initializes all runqueues to NULL, inactive, and maps all pcpus to runqueue -q * alloc_pcpu() will add cpus to runqueues, "activating" the runqueue if necessary. All cpus are currently assigned to runqueue 0. End-to-end behavior of the system should remain largely the same. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Quieten some debug messagesKeir Fraser2010-12-241-2/+2
| | | | Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Implement cross-L2 migration "resistance"Keir Fraser2010-12-101-54/+96
| | | | | | | Resist moving a vcpu from one L2 to another unless its credit is past a certain amount. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Don't migrate cpus unnecessarilyKeir Fraser2010-12-101-1/+18
| | | | | | | | | | | Modern processors have one or two L3's per socket, and generally only one core per L2. Credit2's design relies on having credit shared across several. So as a first step to sharing a queue across L3 while avoiding excessive cross-L2 migration Step one: If the vcpu's current cpu is acceptable, just run it there. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Trace rdtsc value with some more recordsKeir Fraser2010-12-101-3/+3
| | | | | | Makes the traces slightly larger, but easier to analyze. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Use vcpu processor explicitlyKeir Fraser2010-12-101-10/+7
| | | | | | No functional changes. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Putting a vcpu to sleep also removes the delayed_runq_add flagKeir Fraser2010-12-101-0/+2
| | | | Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Trace and debug key tweaksKeir Fraser2010-10-291-4/+32
| | | | | | | | * Add traces for credit reset and scheduling a tasklet * Remove tsc for traces which probably don't need them * Print domain info in the debug dump Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
* credit2: Fix runq_tickle to use idle, tickled masksKeir Fraser2010-10-291-48/+70
| | | | | | | | | | | | | | | | | | | | | | | | | The previous code had a bug where, if a second vcpu woke up and ran runq_tickle before the first vcpu had actually started running on a tickled processor, the code would choose the same cpu to tickle, and the result would be a "missed tickle" -- only one of the vcpus would run, despite having idle processors. This change: * Keeps a mask of idle cpus * Keeps a mask of cpus which have been tickled, but not entered schedule yet. The new tickle code first looks at the set of idle-but-not-tickled cpus; if it's not empty, it tickles one. If that doesn't work, it looks at the set of not-idle-but-not-tickled cpus, finds the one with the lowest credit; if that's lower than the waking vcpu, it tickles that one. If any cpu is tickled, its bit in the tickled mask is set, and cleared when schedule() is called. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>