| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
... as being intended to be faster than MSR reads/writes.
In the case of emulate_privileged_op() also use these in favor of the
cached (but possibly stale) addresses from arch.pv_vcpu. This allows
entirely removing the code that was the subject of XSA-67.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
| |
Also clean up some cases of misused/opencoded ENTRY()
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
|
| |
... according to their most recent public documentation.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It using map_domain_page() was entirely wrong. Use __acpi_map_table()
instead for the time being, with locking added as the mappings it
produces get replaced with subsequent invocations. Using locking in
this way is acceptable here since the only two runtime callers are
acpi_os_{read,write}_memory(), which don't leave mappings pending upon
returning to their callers.
Also fix __acpi_map_table()'s first parameter's type - while benign for
unstable, backports to pre-4.3 trees will need this.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Dom0 uses this hypercall to pass ACPI information to Xen. It is not very
uncommon for more cpus to be listed in the ACPI tables than are present on the
system, particularly on systems with a common BIOS for a 2 and 4 socket server
varients.
As Dom0 does not control the number of entries in the ACPI tables, and is
required to pass everything it finds to Xen, change the logging.
There is now an single unconditional warning for the first unknown ID, and
further warnings if "cpuinfo" is requested by the user on the command line.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When in SYS_STATE_suspend, and going through the cpu_disable_scheduler
path, save a copy of the current cpu affinity, and mark a flag to
restore it later.
Later, in the resume process, when enabling nonboot cpus restore these
affinities.
Signed-off-by: Ben Guthro <benjamin.guthro@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the Linux kernel, these two git commits:
- f594065faf4f9067c2283a34619fc0714e79a98d
ACPI: Add fixups for AMD P-state figures
- 9855d8ce41a7801548a05d844db2f46c3e810166
ACPI: Check MSR valid bit before using P-state frequencies
Try to fix the the issue that "some AMD systems may round the
frequencies in ACPI tables to 100MHz boundaries. We can obtain the real
frequencies from MSRs, so add a quirk to fix these frequencies up
on AMD systems." (from f594065..)
In discussion (around 9855d8..) "it turned out that indeed real
HW/BIOSes may choose to not set the valid bit and thus mark the
P-state as invalid. So this could be considered a fix for broken
BIOSes." (from 9855d8..)
which is great for Linux. Unfortunatly the Linux kernel, when
it tries to do the RDMSR under Xen it fails to get the right
value (it gets zero) as Xen traps it and returns zero. Hence
when dom0 uploads the P-states they will be unmodified and
we should take care of updating the frequencies with the right
values.
I've tested it under Dell Inc. PowerEdge T105 /0RR825, BIOS 1.3.2
08/20/2008 where this quirk can be observed (x86 == 0x10, model == 2).
Also on other AMD (x86 == 0x12, A8-3850; x86 = 0x14, AMD E-350) to
make sure the quirk is not applied there.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: stefan.bader@canonical.com
Do the MSR access here (and while at it, also the one reading
MSR_PSTATE_CUR_LIMIT) on the target CPU, and bound the loop over
amd_fixup_frequency() by max_hw_pstate (matching the one in
powernow_cpufreq_cpu_init()).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Note that this also fixes a broken input check in acpi_enter_sleep()
(previously validating the sleep->pm1[ab]_cnt_val relationship based
on acpi_sinfo.pm1b_cnt_val, which however gets set only subsequently).
Also adjust a few minor issues with the pre-v5 handling in
acpi_fadt_parse_sleep_info().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If HW-reduced flag is set in the FADT, do not attempt to access
or initialize any ACPI hardware, including SCI and global lock.
No FACS will be present.
Signed-off-by: Bob Moore <robert.moore@intel.com>
Also adjust acpi_fadt_parse_sleep_info().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Xen hypervisor has two basic access control function calls:
IS_PRIV and the xsm_* functions. Most privileged operations currently
require that both checks succeed, and many times the checks are at
different locations in the code. This patch eliminates the explicit
and implicit IS_PRIV checks that are duplicated in XSM hooks.
When XSM_ENABLE is not defined or when the dummy XSM module is used,
this patch should not change any functionality. Because the locations
of privilege checks have sometimes moved below argument validation,
error returns of some functions may change from EPERM to EINVAL or
ESRCH if called with invalid arguments and from a domain without
permission to perform the operation.
Some checks are removed due to non-obvious duplicates in their
callers:
* acpi_enter_sleep is checked in XENPF_enter_acpi_sleep
* map_domain_pirq has IS_PRIV_FOR checked in its callers:
* physdev_map_pirq checks when acquiring the RCU lock
* ioapic_guest_write is checked in PHYSDEVOP_apic_write
* PHYSDEVOP_{manage_pci_add,manage_pci_add_ext,pci_device_add} are
checked by xsm_resource_plug_pci in pci_add_device
* PHYSDEVOP_manage_pci_remove is checked by xsm_resource_unplug_pci
in pci_remove_device
* PHYSDEVOP_{restore_msi,restore_msi_ext} are checked by
xsm_resource_setup_pci in pci_restore_msi_state
* do_console_io has changed to IS_PRIV from an explicit domid==0
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since these interrupts get setup before APs get brought online, their
affinities naturally could only ever point to CPU 0 alone so far.
Adjust this to include potentially multiple CPUs in the target mask
(when running in one of the cluster modes), and take into account NUMA
information (to handle the interrupts on a CPU on the node where the
respective IOMMU is).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the image pointed to may live in boot services memory (which we
add to the global memory pool long before ACPI tables get looked at),
we should prevent Dom0 from trying to retrieve the image data in that
case.
The alternatives would be to
- not add boot services memory to the global pool at all, or
- defer adding boot services memory until Dom0 indicates it is safe to
do so, or
- find and parse the BGRT table in xen/arch/x86/efi/boot.c, and avoid
adding that specific region to the E820 table.
None of these are really attractive, and as Xen commonly prints to the
video console anyway (without trying to avoid any regions on the
screen), the invalidation would need to be done conditionally anyway.
(xen/include/acpi/actbl3.h is a verbatim copy from Linux 3.7-rc4)
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
| |
It has never been used for anything, and Linux 3.7 doesn't propagate
this information anymore.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
| |
This requires some additions to the VT-d side; AMD IOMMUs use the
"normal" MSI message format even when interrupt remapping is enabled,
thus making adjustments here unnecessary.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Xiantao Zhang<xiantao.zhang@intel.com>
|
|
|
|
|
|
|
| |
And some initial Haswell ones at once.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: "Nakajima, Jun" <jun.nakajima@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Secondary CPUs, between doing their final memory writes (particularly
updating cpu_initialized) and getting a subsequent INIT, may not write
back all modified data. The INIT itself then causes those modifications
to be lost, so in the cpu_initialized case the CPU would find itself
already initialized, (intentionally) entering an infinite loop instead
of actually coming online.
Signed-off-by: Ben Guthro <ben@guthro.net>
Make acpi_dead_idle() call default_dead_idle() rather than duplicating
the logic there.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Committed-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
| |
lapic_timer_{on,off} need to get initialized in this case. This in turn
requires getting HPET broadcast setup to be carried out earlier (and
hence preventing double initialization there).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Newer VIA CPUs have both 64-bit and VMX support. Enable them to be
recognized for these purposes, at once stripping off any 32-bit CPU
only bits from the respective CPU support file, and adding 64-bit ones
found in recent Linux.
This particularly implies untying the VMX == Intel assumption in a few
places.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
| |
Performance is not an issue with printk(), so let the function do
minimally more work and instead save a byte per affected format
specifier.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
| |
This is a port of Linux'es intel-idle driver serving the same purpose.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
| |
... and code used only for initializing it.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
| |
If the legacy APIC invocation of acpi_table_parse_madt() succeeds but
the x2APIC counterpart fails, this is regarded as failure by the
function, yet its return value would indicate success.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
DOMAIN_COORD_TYPE_HW_ALL
When _PSD's CoordType is DOMAIN_COORD_TYPE_HW_ALL (i.e. shared_type is
CPUFREQ_SHARED_TYPE_HW) which most often is the case on servers, there
is no reason to go into on_selected_cpus() code, we call call
transition_pstate() directly.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
| |
First of all, when no ACPI Cx data was reported, make sure the usage
count passed back to user mode is not random.
Besides that, fold a lot of redundant code.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When command to modify turbo mode (CPB on AMD processors) comes
in the actual change happens later, when P-state transition is
requested. There is no time limit on when this transition will
occur and therefore change in CPB state may take long time from
the moment when command to toggle it is issued.
This patch makes CPB mode change happen immediately when request
is made.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
If CPU stays in the same idle state for the full duration of
xenpm sample then average residency may not be reported correctly
since usage counter will not be incremented.
In addition, in order to calculate averages correctly residence
time and usage counter should be read and written atomically.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
| |
Nor is there a need to disable bus master arbitration in that case.
Signed-off-by: Wei Wang <wei.wang2@amd.com>
Modified-by: Zhang, Yang Z <yang.z.zhang@intel.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
|
| |
Use it to replace x86-specific early_boot boolean variable.
Also use it to detect suspend/resume case during cpu offline/online
to avoid unnecessarily breaking vcpu and cpupool affinities.
Signed-off-by: Keir Fraser <keir@xen.org>
Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Calling XENPF_set_processor_pminfo with XEN_PM_CX could cause states
array in "struct acpi_processor_power" to exceed its limit.
The array used to be reset (by function cpuidle_init_cpu()) for each
hypercall. The patch puts it back that way and adds an assertion to
make it clear in case that happens again.
Signed-off-by: Eric Chanudet <eric.chanudet@eu.citrix.com>
- convert assertion to printk() & bail
- eliminate struct acpi_processor_cx's valid member (not read anymore)
- further adjustments to one-time-only vs each-time operations in
cpuidle_init_cpu()
- don't use ACPI_STATE_Cn as array index anymore
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
| |
applied already
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
| |
Currently when a core is taken off-line it is placed in C1 state
(unless MONITOR/MWAIT is used). This patch allows a core to go to
deeper C states resulting in significantly higher power savings.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
| |
Nothing, not even Dom0, should fiddle with this.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
| |
... just to make sure it doesn't get used improperly (resulting from
the discussion around what became c/s 24968:8964c223836c).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
| |
Currently when a core is taken off-line it is placed in C1 state (unless
MONITOR/MWAIT is used). This patch allows a core to go to deeper C states
resulting in significantly higher power savings.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
| |
This is model 0x3a (decimal 58) as per the most recent SDM.
In vPMU code, also add a forgotten earlier model.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Xiantao Zhang<xiantao.zhang@intel.com>
|
|
|
|
| |
Signed-off-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
| |
This was always bogus (xen/config.h should have been used instead) and
is superfluous now that xen/config.h gets included through the compiler
command line.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
| |
Use their proper counterparts in include/acpi/actbl*.h instead.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
| |
This is in accordance with
http://software.intel.com/en-us/articles/intel-processor-identification-with-cpuid-model-and-family-numbers/
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Haitao Shan <maillists.shan@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
So far these messages got pointlessly (as the code in other places
assumes symmetric configuration) emitted once per CPU. Hide the debug
one behind opt_cpu_info, and issue the info one just once (if the code
gets adjusted to support assymtric configurations, this would need to
be revisited, but ideally without producing per-CPU messages again).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
| |
Fold redundant code and eliminate pointless casts.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
| |
This includes the conversion from for_each_cpu_mask() to for_each-cpu().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
| |
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
|
|
|
|
|
|
|
|
| |
struct cpufreq_policy, including a cpumask_t member, gets copied in
cpufreq_limit_change(), cpufreq_add_cpu(), set_cpufreq_gov(), and
set_cpufreq_para(). Make the member a cpumask_var_t, thus reducing the
amount of data needing copying (particularly with large NR_CPUS).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
| |
c/s 20361:51b031b0737e removed the writing of struct
processor_performance's shared_cpu_map member, but the powernow driver
still has code to read it (though presumably that code path can't be
taken on actual hardware supported by the powernow driver). Remove the
use of the field along with the field itself.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
|
|
|
|
|
|
|
|
|
| |
Sandy bridge introduces new MSR to get cc7/pc2 residency (core C-state
7/package C-state 2). Print the cc7/pc2 residency when on sandy bridge
platform.
Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
Committed-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use cpumask_copy() instead of direct variable assignments for copying
CPU masks. While direct assignments are not a problem when both sides
are variables actually defined as cpumask_t (except for possibly
copying *much* more than would actually need to be copied), they must
not happen when the original variable is of type cpumask_var_t (which
may have lass space allocated to it than a full cpumask_t). Eliminate
as many of such assignments as possible (in several cases it's even
possible to collapse two operations [copy then clear one bit] into one
[cpumask_andnot()]), and thus set the way for reducing the allocation
size in alloc_cpumask_var().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
| |
... in favor of using the new, nr_cpumask_bits-based ones.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The former is the runtime equivalent of NR_CPUS (and users of NR_CPUS,
where necessary, get adjusted accordingly), while the latter is for the
sole use of determining the allocation size when dynamically allocating
CPU masks (done later in this series).
Adjust accessors to use either of the two to bound their bitmap
operations - which one gets used depends on whether accessing the bits
in the gap between nr_cpu_ids and nr_cpumask_bits is benign but more
efficient.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
|