aboutsummaryrefslogtreecommitdiffstats
path: root/xen/include
Commit message (Collapse)AuthorAgeFilesLines
* spinlock: ensure the flags parameter is wide enoughstagingAndrew Cooper2013-10-221-3/+15
| | | | | | | | | | Because of the construction of spin_lock_irq() (and varients), the flags parameter could be trucated. Use a BUILD_BUG_ON() to verify the width of the parameter. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
* x86/irq: local_irq_restore() should not blindly popfAndrew Cooper2013-10-221-3/+8
| | | | | | | | | | local_irq_restore() should only be concerned with possibly changing the interrupt flag. A blind popf could corrupt other system flags. While playing in this area, fixup an opencoded use of X86_EFLAGS_IF. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* xen: arm: Emacs style fixWei Liu2013-10-161-1/+1
| | | | | Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
* xen/evtchn: Fix build on ARMJulien Grall2013-10-151-0/+1
| | | | | | | | | The recent event channel changes introduced by commit a77eb86 and before... break the compilation on Xen ARM. This commit adds missing includes in common/event_fifo.c and include/xen/sched.h. Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
* Add DOMCTL to limit the number of event channels a domain may useDavid Vrabel2013-10-142-0/+14
| | | | | | | | | | | | | | | Add XEN_DOMCTL_set_max_evtchn which may be used during domain creation to set the maximum event channel port a domain may use. This may be used to limit the amount of Xen resources (global mapping space and xenheap) that a domain may use for event channels. A domain that does not have a limit set may use all the event channels supported by the event channel ABI in use. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Keir Fraser <keir@xen.org>
* evtchn: add FIFO-based event channel hypercalls and port opsDavid Vrabel2013-10-142-1/+52
| | | | | | | | | | Add the implementation for the FIFO-based event channel ABI. The new hypercall sub-ops (EVTCHNOP_init_control, EVTCHNOP_expand_array) and the required evtchn_ops (set_pending, unmask, etc.). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* evtchn: implement EVTCHNOP_set_priority and add the set_priority hookDavid Vrabel2013-10-141-0/+11
| | | | | | | | | | | | Implement EVTCHNOP_set_priority. A new set_priority hook added to struct evtchn_port_ops will do the ABI specific validation and setup. If an ABI does not provide a set_priority hook (as is the case of the 2-level ABI), the sub-op will return -ENOSYS. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* evtchn: add FIFO-based event channel ABIDavid Vrabel2013-10-143-3/+80
| | | | | | | | | | | | | | | | | | | | | | | | | Add the event channel hypercall sub-ops and the definitions for the shared data structures for the FIFO-based event channel ABI. The design document for this new ABI is available here: http://xenbits.xen.org/people/dvrabel/event-channels-F.pdf In summary, events are reported using a per-domain shared event array of event words. Each event word has PENDING, LINKED and MASKED bits and a LINK field for pointing to the next event in the event queue. There are 16 event queues (with different priorities) per-VCPU. Key advantages of this new ABI include: - Support for over 100,000 events (2^17). - 16 different event priorities. - Improved fairness in event latency through the use of FIFOs. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* evtchn: allow many more evtchn objects to be allocated per domainDavid Vrabel2013-10-142-11/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Expand the number of event channels that can be supported internally by altering now struct evtchn's are allocated. The objects are indexed using a two level scheme of groups and buckets (instead of only buckets). Each group is a page of bucket pointers. Each bucket is a page-sized array of struct evtchn's. The optimal number of evtchns per bucket is calculated at compile time. If XSM is not enabled, struct evtchn is 16 bytes and each bucket contains 256, requiring only 1 group of 512 pointers for 2^17 (131,072) event channels. With XSM enabled, struct evtchn is 24 bytes, each bucket contains 128 and 2 groups are required. For the common case of a domain with only a few event channels, instead of requiring an additional allocation for the group page, the first bucket is indexed directly. As a consequence of this, struct domain shrinks by at least 232 bytes as 32 bucket pointers are replaced with 1 bucket pointer and (at most) 2 group pointers. [ Based on a patch from Wei Liu with improvements from Malcolm Crossley. ] Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* evtchn: use a per-domain variable for the max number of event channelsDavid Vrabel2013-10-142-2/+2
| | | | | | | | | Instead of the MAX_EVTCHNS(d) macro, use d->max_evtchns instead. This avoids having to repeatedly check the ABI type. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* evtchn: print ABI specific state with the 'e' debug keyDavid Vrabel2013-10-141-0/+7
| | | | | | | | | | | | | | | | In the output of the 'e' debug key, print some ABI specific state in addition to the (p)ending and (m)asked bits. For the 2-level ABI, print the state of that event's selector bit. e.g., (XEN) port [p/m/s] (XEN) 1 [0/0/1]: s=3 n=0 x=0 d=0 p=74 (XEN) 2 [0/0/1]: s=3 n=0 x=0 d=0 p=75 Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* evtchn: refactor low-level event channel port opsDavid Vrabel2013-10-142-0/+49
| | | | | | | | | | | | | Use functions for the low-level event channel port operations (set/clear pending, unmask, is_pending and is_masked). Group these functions into a struct evtchn_port_op so they can be replaced by alternate implementations (for different ABIs) on a per-domain basis. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86/HVM: cache emulated instruction for retry processingJan Beulich2013-10-141-0/+3
| | | | | | | | | | | | | Rather than re-reading the instruction bytes upon retry processing, stash away and re-use what we already read. That way we can be certain that the retry won't do something different from what requested the retry, getting once again closer to real hardware behavior (where what we use retries for is simply a bus operation, not involving redundant decoding of instructions). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86/HVM: fix direct PCI port I/O emulation retry and error handlingJan Beulich2013-10-141-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | dpci_ioport_{read,write}() guest memory access failure handling should be modelled after process_portio_intercept()'s (and others): Upon encountering an error on other than the first iteration, the count successfully handled needs to be stored and X86EMUL_OKAY returned, in order for the generic instruction emulator to update register state correctly before reporting failure or retrying (both of which would only happen after re-invoking emulation). Further we leverage (and slightly extend, due to the above mentioned need to return X86EMUL_OKAY) the "large MMIO" retry model. Note that there is still a special case not explicitly taken care of here: While the first retry on the last iteration of a "rep ins" correctly recovers the already read data, an eventual subsequent retry is being handled by the pre-existing mmio-large logic (through hvmemul_do_io() storing the [recovered] data [again], also taking into consideration that the emulator converts a single iteration "ins" to ->read_io() plus ->write()). Also fix an off-by-one in the mmio-large-read logic, and slightly simplify the copying of the data. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* sched: Correct function prototypesAndrew Cooper2013-10-141-3/+3
| | | | | | | struct vcpu pointers are traditionally v rather than d. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* scheduler: adjust internal locking interfaceJan Beulich2013-10-141-82/+56
| | | | | | | | | | | | | | | Make the locking functions return the lock pointers, so they can be passed to the unlocking functions (which in turn can check that the lock is still actually providing the intended protection, i.e. the parameters determining which lock is the right one didn't change). Further use proper spin lock primitives rather than open coded local_irq_...() constructs, so that interrupts can be re-enabled as appropriate while spinning. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: fix bug_line()Jan Beulich2013-10-141-2/+4
| | | | | | | | | | | Due to the packing into a bit field together with a relocated field, the computation can overflow when the relocated field ends up getting a negative value stored. Hence it isn't sufficient to correct the value by 1 in this case, but we also need to mask the result to the width of the original bit field. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: use {rd,wr}{fs,gs}base when availableJan Beulich2013-10-112-3/+59
| | | | | | | | | | | | ... as being intended to be faster than MSR reads/writes. In the case of emulate_privileged_op() also use these in favor of the cached (but possibly stale) addresses from arch.pv_vcpu. This allows entirely removing the code that was the subject of XSA-67. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: add address validity check to guest_map_l1e()Jan Beulich2013-10-111-1/+2
| | | | | | | | | | Just like for guest_get_eff_l1e() this prevents accessing as page tables (and with the wrong memory attribute) internal data inside Xen happening to be mapped with 1Gb pages. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: correct LDT checksJan Beulich2013-10-112-2/+3
| | | | | | | | | | | | | | | | | | | | | | - MMUEXT_SET_LDT should behave as similarly to the LLDT instruction as possible: fail only if the base address is non-canonical - instead LDT descriptor accesses should fault if the descriptor address ends up being non-canonical (by ensuring this we at once avoid reading an entry from the mach-to-phys table and consider it a page table entry) - fault propagation on using LDT selectors must distinguish #PF and #GP (the latter must be raised for a non-canonical descriptor address, which also applies to several other uses of propagate_page_fault(), and hence the problem is being fixed there) - map_ldt_shadow_page() should properly wrap addresses for 32-bit VMs At once remove the odd invokation of map_ldt_shadow_page() from the MMUEXT_SET_LDT handler: There's nothing really telling us that the first LDT page is going to be preferred over others. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* xen/arm: Fixing clear_guest_offset macroJaeyong Yoo2013-10-101-2/+3
| | | | | | | | Fix the the broken macro 'clear_guest_offset' in arm. Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> Reviewed-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
* xen/x86: Remove GB macro in asm-x86/config.hJulien Grall2013-10-081-1/+0
| | | | | | | | | | | Commit 983843e "xen: Add macros MB and GB" introduce a generic GB macro. By mistake, the macro in asm-x86/config.h was not removed. This is result to a compilation error when Xen is build for x86. Signed-off-by: Julien Grall <julien.grall@linaro.org> CC: Keir Fraser <keir@xen.org> CC: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
* xen: Add macro ROUNDUPJulien Grall2013-10-081-0/+2
| | | | | | Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Keir Fraser <keir@xen.org> CC: Jan Beulich <jbeulich@suse.com>
* xen: Add macros MB and GBJulien Grall2013-10-082-1/+3
| | | | | | | Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Keir Fraser <keir@xen.org> CC: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
* x86/HPET: basic cleanupAndrew Cooper2013-10-081-4/+2
| | | | | | | | | * Strip trailing whitespace * Remove redundant definitions * Update stale documentation links * Move hpet_address into __initdata Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
* xen: add LZ4 decompression supportKyungsik Lee2013-10-073-1/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for LZ4 decompression in Xen. LZ4 Decompression APIs for Xen are based on LZ4 implementation by Yann Collet. Benchmark Results(PATCH v3) Compiler: Linaro ARM gcc 4.6.2 1. ARMv7, 1.5GHz based board Kernel: linux 3.4 Uncompressed Kernel Size: 14MB Compressed Size Decompression Speed LZO 6.7MB 20.1MB/s, 25.2MB/s(UA) LZ4 7.3MB 29.1MB/s, 45.6MB/s(UA) 2. ARMv7, 1.7GHz based board Kernel: linux 3.7 Uncompressed Kernel Size: 14MB Compressed Size Decompression Speed LZO 6.0MB 34.1MB/s, 52.2MB/s(UA) LZ4 6.5MB 86.7MB/s - UA: Unaligned memory Access support - Latest patch set for LZO applied This patch set is for adding support for LZ4-compressed Kernel. LZ4 is a very fast lossless compression algorithm and it also features an extremely fast decoder [1]. But we have five of decompressors already and one question which does arise, however, is that of where do we stop adding new ones? This issue had been discussed and came to the conclusion [2]. Russell King said that we should have: - one decompressor which is the fastest - one decompressor for the highest compression ratio - one popular decompressor (eg conventional gzip) If we have a replacement one for one of these, then it should do exactly that: replace it. The benchmark shows that an 8% increase in image size vs a 66% increase in decompression speed compared to LZO(which has been known as the fastest decompressor in the Kernel). Therefore the "fast but may not be small" compression title has clearly been taken by LZ4 [3]. [1] http://code.google.com/p/lz4/ [2] http://thread.gmane.org/gmane.linux.kbuild.devel/9157 [3] http://thread.gmane.org/gmane.linux.kbuild.devel/9347 LZ4 homepage: http://fastcompression.blogspot.com/p/lz4.html LZ4 source repository: http://code.google.com/p/lz4/ Signed-off-by: Kyungsik Lee <kyungsik.lee@lge.com> Signed-off-by: Yann Collet <yann.collet.73@gmail.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
* x86: Improve information from domain_crash_synchronousAndrew Cooper2013-10-042-0/+11
| | | | | | | | | | | | | | | | | | | | | As it currently stands, the string "domain_crash_sync called from entry.S" is not helpful at identifying why the domain was crashed, and a debug build of Xen doesn't help the matter This patch improves the information printed, by pointing to where the crash decision was made. Specific improvements include: * Moving the ascii string "domain_crash_sync called from entry.S\n" away from some semi-hot code cache lines. * Moving the printk into C code (especially as this_cpu() is miserable to use in assembly code) * Undo the previous confusing situation of having the domain_crash_synchronous() as a macro in C code, yet a global symbol in assembly code. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* xsm: clean up unneeded current referencesDaniel De Graaf2013-10-041-2/+2
| | | | | | | | | | | | Some XSM hooks in dummy.h used current->domain when this was also passed as a parameter; use the parameter in these cases. There are two hooks where this does not apply and which are not immediately obvious: xsm_set_target's parameters are the device model and HVM domains, and xsm_mem_sharing_op's first parameter is the source of the shared page, not the domain making the hypercall. Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
* xsm: forbid PV guest console readsDaniel De Graaf2013-10-041-3/+3
| | | | | | | | The CONSOLEIO_read operation was incorrectly allowed to PV guests if the hypervisor was compiled in debug mode (with VERBOSE defined). Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
* Nested VMX: fix IA32_VMX_CR4_FIXED1 msr emulationYang Zhang2013-10-042-0/+2
| | | | | | | | | | | | | Currently, it use hardcode value for IA32_VMX_CR4_FIXED1. This is wrong. We should check guest's cpuid to know which bits are writeable in CR4 by guest and allow the guest to set the corresponding bit only when guest has the feature. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Cleanup. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
* VMX: clean up capability checksJan Beulich2013-10-041-2/+8
| | | | | | | | | | | | VMCS size validation on APs should check against BP's size. No need for a separate cpu_has_vmx_ins_outs_instr_info variable anymore. Use proper symbolics. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
* Nested VMX: check VMX capability before read VMX related MSRsYang Zhang2013-10-041-0/+2
| | | | | | | | | | | | VMX MSRs only available when the CPU support the VMX feature. In addition, VMX_TRUE* MSRs only available when bit 55 of VMX_BASIC MSR is set. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Cleanup. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
* x86: don't blindly create L3 tables for the direct mapJan Beulich2013-09-301-1/+1
| | | | | | | | | | | | Now that the direct map area can extend all the way up to almost the end of address space, this is wasteful. Also fold two almost redundant messages in SRAT parsing into one. Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Malcolm Crossley <malcolm.crossley@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
* x86/AMD-Vi: Fix IVRS HPET special->handle overrideSuravee Suthikulpanit2013-09-301-2/+5
| | | | | | | | | | | | | The current logic does not handle the case when HPET special->handle is invalid in IVRS. On such system, the following message is shown: (XEN) AMD-Vi: Failed to setup HPET MSI remapping: Wrong HPET This patch will allow the ivrs_hpet[<handle>]=<sbdf> to override the IVRS. Also, it removes struct hpet_sbdf.iommu since it is not used anywhere in the code. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
* xen: arm: move smp_init_cpus to smpboot.cIan Campbell2013-09-271-0/+1
| | | | | | | Seems like a better home. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org>
* xen: arm: use symbolic names for MPIDR bits.Ian Campbell2013-09-271-4/+6
| | | | | | | | arm32 already uses MPIDR_HWID_MASK, use it on arm64 too. Add MPIDR_{SMP,UP} (and bitwise equivalents) and use them. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
* xen: arm: rewrite start of day page table and cpu bring upIan Campbell2013-09-274-32/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is unfortunately a rather large monolithic patch. Rather than bringing up all CPUs in lockstep as we setup paging and relocate Xen instead create a simplified set of dedicated boot time pagetables. This allows secondary CPUs to remain powered down or in the firmware until we actually want to enable them. The bringup is now done later on in C and can be driven by DT etc. I have included code for the vexpress platform, but other platforms will need to be added. The mechanism for deciding how to bring up a CPU differs between arm32 and arm64. On arm32 it is essentially a per-platform property, with the exception of PSCI which can be implemented globally (but isn't here). On arm64 there is a per-cpu property in the device tree. Secondary CPUs are brought up directly into the relocated Xen image, instead of relying on being able to launch on the unrelocated Xen and hoping that it hasn't been clobbered. As part of this change drop support for switching from secure mode to NS HYP as well as the early CPU kick. Xen now requires that it is launched in NS HYP mode and that firmware configure things such that secondary CPUs can be woken up by a primarly CPU in HYP mode. This may require fixes to bootloaders or the use of a boot wrapper. The changes done here (re)exposed an issue with relocating Xen and the compiler spilling values to the stack between the copy and the actual switch to the relocaed copy of Xen in setup_pagetables. Therefore switch to doing the copy and switch in a single asm function where we can control precisely what gets spilled to the stack etc. Since we now have a separate set of boot pagetables it is much easier to build the real Xen pagetables inplace before relocating rather than the more complex approach of rewriting the pagetables in the relocated copy before switching. This will also enable Xen to be loaded above the 4GB boundary on 64-bit. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Julien Grall <julien.grall@linaro.org>
* xen: arm: implement arch/platform SMP and CPU initialisation frameworkIan Campbell2013-09-272-0/+18
| | | | | | | | | | | | Includes an implementation for vexpress using the sysflags interface and support for the ARMv8 "spin-table" method. Unused until "rewrite start of day page table and cpu bring up", split out to simplify review. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Julien Grall <julien.grall@linaro.org>
* xen: arm: add two new device tree helpersIan Campbell2013-09-271-0/+17
| | | | | | | | - dt_property_read_u64 - dt_find_node_by_type Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org>
* x86/microcode: Scan the initramfs payload for microcode blobKonrad Rzeszutek Wilk2013-09-271-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Linux kernel is able to update the microcode during early bootup via inspection of the initramfs blob to see if there is an cpio image with certain microcode files. Linux is able to function with two (or more) cpio archives in the initrd b/c it unpacks all of the cpio archives. The format of the early initramfs is nicely documented in Linux's Documentation/x86/early-microcode.txt: Early load microcode ==================== By Fenghua Yu <fenghua.yu@intel.com> Kernel can update microcode in early phase of boot time. Loading microcode early can fix CPU issues before they are observed during kernel boot time. Microcode is stored in an initrd file. The microcode is read from the initrd file and loaded to CPUs during boot time. The format of the combined initrd image is microcode in cpio format followed by the initrd image (maybe compressed). Kernel parses the combined initrd image during boot time. The microcode file in cpio name space is: kernel/x86/microcode/GenuineIntel.bin During BSP boot (before SMP starts), if the kernel finds the microcode file in the initrd file, it parses the microcode and saves matching microcode in memory. If matching microcode is found, it will be uploaded in BSP and later on in all APs. The cached microcode patch is applied when CPUs resume from a sleep state. There are two legacy user space interfaces to load microcode, either through /dev/cpu/microcode or through /sys/devices/system/cpu/microcode/reload file in sysfs. In addition to these two legacy methods, the early loading method described here is the third method with which microcode can be uploaded to a system's CPUs. The following example script shows how to generate a new combined initrd file in /boot/initrd-3.5.0.ucode.img with original microcode microcode.bin and original initrd image /boot/initrd-3.5.0.img. mkdir initrd cd initrd mkdir kernel mkdir kernel/x86 mkdir kernel/x86/microcode cp ../microcode.bin kernel/x86/microcode/GenuineIntel.bin find .|cpio -oc >../ucode.cpio cd .. cat ucode.cpio /boot/initrd-3.5.0.img >/boot/initrd-3.5.0.ucode.img As such this code inspects the initrd to see if the microcode signatures are present and if so updates the hypervisor. The option to turn this scan on/off is gated by the 'ucode' parameter. The options are now: 'scan' Scan for the microcode in any multiboot payload. <index> Attempt to load microcode blob (not the cpio archive format) from the multiboot payload number. This option alters slightly the 'ucode' parameter by only allowing either parameter: ucode=[<index>|scan] Implementation wise the ucode_blob is defined as __initdata. That is OK from the viewpoint of suspend/resume as the the underlaying architecture microcode (microcode_intel or microcode_amd) end up saving the blob in 'struct ucode_cpu_info' which is a per-cpu data structure (see ucode_cpu_info). They end up saving it when doing the pre-SMP (for CPU0) and SMP (for the rest) microcode loading. Naturally if one does a hypercall to update the microcode and it is newer, then the old per-cpu data is replaced. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Keir Fraser <keir@xen.org>
* AMD IOMMU: fix Dom0 device setup failure for host bridgesSuravee Suthikulpanit2013-09-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The host bridge device (i.e. 0x18 for AMD) does not require IOMMU, and therefore is not included in the IVRS. The current logic tries to map all PCI devices to an IOMMU. In this case, "xl dmesg" shows the following message on AMD sytem. (XEN) setup 0000:00:18.0 for d0 failed (-19) (XEN) setup 0000:00:18.1 for d0 failed (-19) (XEN) setup 0000:00:18.2 for d0 failed (-19) (XEN) setup 0000:00:18.3 for d0 failed (-19) (XEN) setup 0000:00:18.4 for d0 failed (-19) (XEN) setup 0000:00:18.5 for d0 failed (-19) This patch adds a new device type (i.e. DEV_TYPE_PCI_HOST_BRIDGE) which corresponds to PCI class code 0x06 and sub-class 0x00. Then, it uses this new type to filter when trying to map device to IOMMU. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reported-by: Stefan Bader <stefan.bader@canonical.com> On VT-d refuse (un)mapping host bridges for other than the hardware domain. Coding style cleanup. Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
* xen/arm: rename boot misc region to boot reloc now it has a single purposeIan Campbell2013-09-261-5/+2
| | | | | | Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org> Acked-by: Tim Deegan <tim@xen.org>
* xen/arm: Support dtb /memreserve/ regionsIan Campbell2013-09-262-3/+6
| | | | | | | | | | | | | This requires a mapping of the DTB during setup_mm. Previously this was in the BOOT_MISC slot, which is clobbered by setup_pagetables. Split it out into its own slot which can be preserved. Also handle these regions as part of consider_modules() and when adding pages to the heaps to ensure we do not locate any part of Xen or the heaps over them. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
* xen/arm: Reserve FDT via early module mechanismIan Campbell2013-09-261-6/+7
| | | | | | | | | | This will stop us putting any heaps or relocating Xen itself over the FDT. The devicetree will be copied to allocated memory in setup_mm and the original copy will be freed by discard_initial_modules. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
* xen/arm: DOMHEAP_SECOND_PAGES is arm32 specificIan Campbell2013-09-261-3/+3
| | | | | | | | since 5263507b1b4a "xen: arm: Use a direct mapping of RAM on arm64" Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org> Acked-by: Tim Deegan <tim@xen.org>
* xen/arm: Use the hardware ID to boot correctly secondary cpusJulien Grall2013-09-261-0/+2
| | | | | | | | | Secondary CPUs will spin in head.S until their MPIDR[23:0] correspond to the smp_up_cpu. Actually Xen will set the value with the logical CPU ID which is wrong. Use the cpu_logical_map to get the correct CPU ID. Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
* xen/arm: Dissociate logical and hardware CPU IDJulien Grall2013-09-261-0/+4
| | | | | | | | | | | Introduce cpu_logical_map to associate a logical CPU ID to an hardware CPU ID. This map will be filled during Xen boot via the device tree. Each CPU node contains a "reg" property which contains the hardware ID (ie MPIDR[0:23]). Also move /cpus parsing later so we can use the dt_* API. Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
* xen/arm: use cpumask_t to describe cpu mask in gic_route_dt_irqJulien Grall2013-09-261-1/+2
| | | | | | | Replace by cpumask_t to take advantage of cpumask_* helpers. Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
* xen/arm: Introduce init_info structureJulien Grall2013-09-261-0/+6
| | | | | | | | This structure will gather all information to boot a secondary cpus. For now it just contains the initial stack. Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
* x86: fix compat guest handling of XENPF_enter_acpi_sleepJan Beulich2013-09-261-0/+1
| | | | | | | | | | | | | Rather than blindly defining the native name to the compat one, when we want to pass the compat structure to a native function we ought to verify that their layouts match. With a respective xlat.lst entry there's then also no need anymore to do such aliasing. While cleaaning up that file I also noticed that the Cx and Px interface handling here has quite a few unnecessary #define-s - delete them. Signed-off-by: Jan Beulich <jbeulich@suse.com>