aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* libxc: Introduce xc_bitops.hIan Jackson2013-06-141-0/+63
| | | | | | | | | | | | | | | | | Copy the one file tools/libxc/xc_bitops.h from xen.git#aa1355f9. We will need this for the next patch, which calls for a bitmap in libxc. xc_bitops.h was introduced to unify various existing sets of bitmap operations. In this patch we backport only the introduction, not the replacement of the other instances. So we introduce another instance Sorry :-/. This is part of the fix to a security issue, XSA-55. This patch is unique to the Xen 4.1 version of the XSA-55 series. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
* libelf: use only unsigned integersIan Jackson2013-06-149-80/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed integers have undesirable undefined behaviours on overflow. Malicious compilers can turn apparently-correct code into code with security vulnerabilities etc. So use only unsigned integers. Exceptions are booleans (which we have already changed) and error codes. We _do_ change all the chars which aren't fixed constants from our own text segment, but not the char*s. This is because it is safe to access an arbitrary byte through a char*, but not necessarily safe to convert an arbitrary value to a char. As a consequence we need to compile libelf with -Wno-pointer-sign. It is OK to change all the signed integers to unsigned because all the inequalities in libelf are in contexts where we don't "expect" negative numbers. In libelf-dominfo.c:elf_xen_parse we rename a variable "rc" to "more_notes" as it actually contains a note count derived from the input image. The "error" return value from elf_xen_parse_notes is changed from -1 to ~0U. grepping shows only one occurrence of "PRId" or "%d" or "%ld" in libelf and xc_dom_elfloader.c (a "%d" which becomes "%u"). This is part of the fix to a security issue, XSA-55. Conflicts in 4.1 series: * xc_dom_load_elf_kernel has no rc variable to change. * elf_load_image doesn't exist. For those concerned about unintentional functional changes, the following rune produces a version of the patch which is much smaller and eliminates only non-functional changes: GIT_EXTERNAL_DIFF=.../unsigned-differ git-diff <before>..<after> where <before> and <after> are git refs for the code before and after this patch, and unsigned-differ is this shell script: #!/bin/bash set -e seddery () { perl -pe 's/\b(?:elf_errorstatus|elf_negerrnoval)\b/int/g' } path="$1" in="$2" out="$5" set +e diff -pu --label "$path~" <(seddery <"$in") --label "$path" <(seddery <"$out") rc=$? set -e if [ $rc = 1 ]; then rc=0; fi exit $rc Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
* libelf: use C99 bool for booleansIan Jackson2013-06-146-21/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to remove uses of "int" because signed integers have undesirable undefined behaviours on overflow. Malicious compilers can turn apparently-correct code into code with security vulnerabilities etc. In this patch we change all the booleans in libelf to C99 bool, from <stdbool.h>. For the one visible libelf boolean in libxc's public interface we retain the use of int to avoid changing the ABI; libxc converts it to a bool for consumption by libelf. It is OK to change all values only ever used as booleans to _Bool (bool) because conversion from any scalar type to a _Bool works the same as the boolean test in if() or ?: and is always defined (C99 6.3.1.2). But we do need to check that all these variables really are only ever used that way. (It is theoretically possible that the old code truncated some 64-bit values to 32-bit ints which might become zero depending on the value, which would mean a behavioural change in this patch, but it seems implausible that treating 0x????????00000000 as false could have been intended.) This is part of the fix to a security issue, XSA-55. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
* libelf: Make all callers call elf_check_brokenIan Jackson2013-06-144-8/+45
| | | | | | | | | | | | | | | | | | | | | | | This arranges that if the new pointer reference error checking tripped, we actually get a message about it. In this patch these messages do not change the actual return values from the various functions: so pointer reference errors do not prevent loading. This is for fear that some existing kernels might cause the code to make these wild references, which would then break, which is not a good thing in a security patch. In xen/arch/x86/domain_build.c we have to introduce an "out" label and change all of the "return rc" beyond the relevant point into "goto out". This is part of the fix to a security issue, XSA-55. Differences in 4.1 backport: * No xen/arch/arm. * There was less error handling in xen/arch/x86/domain_build.c so less need to change it. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
* libelf: Check pointer references in elf_is_elfbinaryIan Jackson2013-06-144-6/+11
| | | | | | | | | | | | | | | | | | | elf_is_elfbinary didn't take a length parameter and could potentially access out of range when provided with a very short image. We only need to check the size is enough for the actual dereference in elf_is_elfbinary; callers are just using it to check the magic number and do their own checks (usually via the new elf_ptrval system) before dereferencing other parts of the header. This is part of the fix to a security issue, XSA-55. Conflicts in 4.1 backport: * xen/arch/x86/bzimage.c in 4.1 doesn't use elf_is_elfbinary. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* libelf: check all pointer accessesIan Jackson2013-06-148-84/+303
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We change the ELF_PTRVAL and ELF_HANDLE types and associated macros: * PTRVAL becomes a uintptr_t, for which we provide a typedef elf_ptrval. This means no arithmetic done on it can overflow so the compiler cannot do any malicious invalid pointer arithmetic "optimisations". It also means that any places where we dereference one of these pointers without using the appropriate macros or functions become a compilation error. So we can be sure that we won't miss any memory accesses. All the PTRVAL variables were previously void* or char*, so the actual address calculations are unchanged. * ELF_HANDLE becomes a union, one half of which keeps the pointer value and the other half of which is just there to record the type. The new type is not a pointer type so there can be no address calculations on it whose meaning would change. Every assignment or access has to go through one of our macros. * The distinction between const and non-const pointers and char*s and void*s in libelf goes away. This was not important (and anyway libelf tended to cast away const in various places). * The fields elf->image and elf->dest are renamed. That proves that we haven't missed any unchecked uses of these actual pointer values. * The caller may fill in elf->caller_xdest_base and _size to specify another range of memory which is safe for libelf to access, besides the input and output images. * When accesses fail due to being out of range, we mark the elf "broken". This will be checked and used for diagnostics in a following patch. We do not check for write accesses to the input image. This is because libelf actually does this in a number of places. So we simply permit that. * Each caller of libelf which used to set dest now sets dest_base and dest_size. * In xc_dom_load_elf_symtab we provide a new actual-pointer value hdr_ptr which we get from mapping the guest's kernel area and use (checking carefully) as the caller_xdest area. * The STAR(h) macro in libelf-dominfo.c now uses elf_access_unsigned. * elf-init uses the new elf_uval_3264 accessor to access the 32-bit fields, rather than an unchecked field access (ie, unchecked pointer access). * elf_uval has been reworked to use elf_uval_3264. Both of these macros are essentially new in this patch (although they are derived from the old elf_uval) and need careful review. * ELF_ADVANCE_DEST is now safe in the sense that you can use it to chop parts off the front of the dest area but if you chop more than is available, the dest area is simply set to be empty, preventing future accesses. * We introduce some #defines for memcpy, memset, memmove and strcpy: - We provide elf_memcpy_safe and elf_memset_safe which take PTRVALs and do checking on the supplied pointers. - Users inside libelf must all be changed to either elf_mem*_unchecked (which are just like mem*), or elf_mem*_safe (which take PTRVALs) and are checked. Any unchanged call sites become compilation errors. * We do _not_ at this time fix elf_access_unsigned so that it doesn't make unaligned accesses. We hope that unaligned accesses are OK on every supported architecture. But it does check the supplied pointer for validity. This is part of the fix to a security issue, XSA-55. Additional change in 4.1 backport: * ELF_PRPTRVAL needs to be defined oddly on 4.1 and earlier because Xen's headers provide no definitions of uintptr_t or PRIuPTR. Conflicts: * Callers of elf_load_binary don't check its return value in 4.1. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
* libelf: check nul-terminated strings properlyIan Jackson2013-06-144-11/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is not safe to simply take pointers into the ELF and use them as C pointers. They might not be properly nul-terminated (and the pointers might be wild). So we are going to introduce a new function elf_strval for safely getting strings. This will check that the addresses are in range and that there is a proper nul-terminated string. Of course it might discover that there isn't. In that case, it will be made to fail. This means that elf_note_name might fail, too. For the benefit of call sites which are just going to pass the value to a printf-like function, we provide elf_strfmt which returns "(invalid)" on failure rather than NULL. In this patch we introduce dummy definitions of these functions. We introduce calls to elf_strval and elf_strfmt everywhere, and update all the call sites with appropriate error checking. There is not yet any semantic change, since before this patch all the places where we introduce elf_strval dereferenced the value anyway, so it mustn't have been NULL. In future patches, when elf_strval is made able return NULL, when it does so it will mark the elf "broken" so that an appropriate diagnostic can be printed. This is part of the fix to a security issue, XSA-55. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* tools/xcutils/readnotes: adjust print_l1_mfn_valid_noteIan Jackson2013-06-141-5/+6
| | | | | | | | | | | | | | | | Use the new PTRVAL macros and elf_access_unsigned in print_l1_mfn_valid_note. No functional change unless the input is wrong, or we are reading a file for a different endianness. Separated out from the previous patch because this change does produce a difference in the generated code. This is part of the fix to a security issue, XSA-55. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
* libelf: introduce macros for memory access and pointer handlingIan Jackson2013-06-146-153/+310
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We introduce a collection of macros which abstract away all the pointer arithmetic and dereferences used for accessing the input ELF and the output area(s). We use the new macros everywhere. For now, these macros are semantically identical to the code they replace, so this patch has no functional change. elf_is_elfbinary is an exception: since it doesn't take an elf*, we need to handle it differently. In a future patch we will change it to take, and check, a length parameter. For now we just mark it with a fixme. Nontrivial differences in the 4.1 backport: * We need to provide our own elf_uintptr_t since Xen doesn't. * We see some additional differences in our verification diff. * The "function-filter" needs to massage additional symbol names. Conflicts: * In xc_dom_load_elf_symtab the old code used *(Elf64_Word*)(&shdr->e64.sh_name) and the new Elf32_Word but in fact the type in the struct has changed too so the new code using elf_store_field is still correct. * loadelfimage, elf_load_image etc. don't exist and are done directly with memcpy/memset; patch adjusted appropriately. * elf_note_numeric_array doesn't exist in 4.1. That this patch has no functional change can be verified as follows: 0. Copy the scripts "comparison-generate" and "function-filter" out of this commit message. 1. Check out the tree before this patch. 2. Run the script ../comparison-generate .... ../before 3. Check out the tree after this patch. 4. Run the script ../comparison-generate .... ../after 5. diff --exclude=\*.[soi] -ruN before/ after/ |less Expect these differences: * stubdom/zlib-x86_64/ztest*.s2 The filename of this test file apparently contains the pid. * stubdom/grub/kexec.s2: Large differences following ".section .debug_info" (which the 4.1 build system erroneously fails to suppress). * tools/libxc/xc_domain_restore.s2 (64-bit build): One trivial code gen difference with no semantic import. * xen/common/version.s2 The xen build timestamp appears in two diff hunks. Verification that this is all that's needed: In a completely built xen.git, find * -name .*.d -type f | xargs grep -l libelf\.h Expect results in: xen/arch/x86: Checked above. tools/libxc: Checked above. tools/xcutils/readnotes: Checked above. tools/xenstore: Checked above. xen/common/libelf: This is the build for the hypervisor; checked in B above. stubdom: We have one stubdom which reads ELFs using our libelf, pvgrub, which is checked above. I have not done this verification for ARM. This is part of the fix to a security issue, XSA-55. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> -8<- comparison-generate -8<- #!/bin/bash # usage: # cd xen.git # .../comparison-generate OUR-CONFIG BUILD-RUNE-PREFIX ../before|../after # eg: # .../comparison-generate ~/work/.config 'schroot -pc64 --' ../before set -ex test $# = 3 || need-exactly-three-arguments our_config=$1 build_rune_prefix=$2 result_dir=$3 git clean -x -d -f cp "$our_config" . cat <<END >>.config debug_symbols=n CFLAGS += -save-temps END perl -i~ -pe 's/ -g / -g0 / if m/^CFLAGS/' xen/Rules.mk if [ -f ./configure ]; then $build_rune_prefix ./configure fi $build_rune_prefix make -C xen $build_rune_prefix make -C tools/include $build_rune_prefix make -C stubdom grub $build_rune_prefix make -C tools/libxc $build_rune_prefix make -C tools/xenstore $build_rune_prefix make -C tools/xcutils rm -rf "$result_dir" mkdir "$result_dir" set +x for f in `find xen tools stubdom -name \*.[soi]`; do mkdir -p "$result_dir"/`dirname $f` cp $f "$result_dir"/${f} case $f in *.s) ../function-filter <$f >"$result_dir"/${f}2 ;; esac done echo ok. -8<- -8<- function-filter -8<- #!/usr/bin/perl -w # function-filter # script for massaging gcc-generated labels to be consistent use strict; our @lines; my $sedderybody = "sub seddery () {\n"; while (<>) { push @lines, $_; if (m/^(__FUNCTION__|__func__|_ctx|note_desc|types|last_order|memflags|mutex|d\d_cpu_last|write_count|wall_last|__PRETTY_FUNCTION__)\.(\d+)\:/ || m/^\s+\.local\s+(_ctx|write_count|d\d_cpu_last|wall_last|mutex)\.(\d+)\s*$/) { $sedderybody .= " s/\\b$1\\.$2\\b/__XSA55MANGLED__$1.$./g;\n"; } } $sedderybody .= "}\n1;\n"; eval $sedderybody or die $@; foreach (@lines) { seddery(); print or die $!; } -8<-
* libelf/xc_dom_load_elf_symtab: Do not use "syms" uninitialisedIan Jackson2013-06-141-1/+1
| | | | | | | | | | | | | | | | | | | xc_dom_load_elf_symtab (with load==0) calls elf_round_up, but it mistakenly used the uninitialised variable "syms" when calculating dom->bsd_symtab_start. This should be a reference to "elf". This change might have the effect of rounding the value differently. Previously if the uninitialised value (a single byte on the stack) was ELFCLASS64 (ie, 2), the alignment would be to 8 bytes, otherwise to 4. However, the value is calculated from dom->kernel_seg.vend so this could only make a difference if that value wasn't already aligned to 8 bytes. This is part of the fix to a security issue, XSA-55. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
* libelf: abolish elf_sval and elf_access_signedIan Jackson2013-06-142-39/+0
| | | | | | | | | These are not used anywhere. This is part of the fix to a security issue, XSA-55. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
* libxc: Fix range checking in xc_dom_pfn_to_ptr etc.Ian Jackson2013-06-144-5/+38
| | | | | | | | | | | | | | | | | | | | * Ensure that xc_dom_pfn_to_ptr (when called with count==0) does not return a previously-allocated block which is entirely before the requested pfn (!) * Provide a version of xc_dom_pfn_to_ptr, xc_dom_pfn_to_ptr_retcount, which provides the length of the mapped region via an out parameter. * Change xc_dom_vaddr_to_ptr to always provide the length of the mapped region and change the call site in xc_dom_binloader.c to check it. The call site in xc_dom_load_elf_symtab will be corrected in a forthcoming patch, and for now ignores the returned length. This is part of the fix to a security issue, XSA-55. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> v5: This patch is new in v5 of the series.
* libxc: introduce xc_dom_seg_to_ptr_pagesIan Jackson2013-06-141-3/+16
| | | | | | | | | | | | Provide a version of xc_dom_seg_to_ptr which returns the number of guest pages it has actually mapped. This is useful for callers who want to do range checking; we will use this later in this series. This is part of the fix to a security issue, XSA-55. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* libelf: abolish libelf-relocate.cIan Jackson2013-06-142-373/+1
| | | | | | | | | | | | | | | This file is not actually used. It's not built in Xen's instance of libelf; in libxc's it's built but nothing in it is called. Do not compile it in libxc, and delete it. This reduces the amount of work we need to do in forthcoming patches to libelf (particularly since as libelf-relocate.c is not used it is probably full of bugs). This is part of the fix to a security issue, XSA-55. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
* Revert "AMD/iommu: SR56x0 Erratum 64 - Reset all head & tail pointers"Jan Beulich2013-06-051-10/+0
| | | | | | | | | | | | | | This reverts commit 22de18bb89e776f77256653901a590aad9fc0a52. The code this patch added is redundant with already present code in set_iommu_{command_buffer,event_log}_control(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> master commit: e430510e5cbbfcdc1077739292def633e70fedea master date: 2013-06-05 10:05:49 +0200
* x86/xsave: properly check guest input to XSETBVJan Beulich2013-06-041-0/+5
| | | | | | | | | | | | Other than the HVM emulation path, the PV case so far failed to check that YMM state requires SSE state to be enabled, allowing for a #GP to occur upon passing the inputs to XSETBV inside the hypervisor. This is CVE-2013-2078 / XSA-54. Signed-off-by: Jan Beulich <jbeulich@suse.com> master commit: 365c95f7de789e1dca03f119eab7dc61fe0f77c9 master date: 2013-06-04 09:29:07 +0200
* x86/xsave: recover from faults on XRSTORJan Beulich2013-06-041-5/+20
| | | | | | | | | | | | | | | | Just like FXRSTOR, XRSTOR can raise #GP if bad content is being passed to it in the memory block (i.e. aspects not under the control of the hypervisor, other than e.g. proper alignment of the block). Also correct the comment explaining why FXRSTOR needs exception recovery code to not wrongly state that this can only be a result of the control tools passing a bad image. This is CVE-2013-2077 / XSA-53. Signed-off-by: Jan Beulich <jbeulich@suse.com> master commit: c6ae65db36b98f2866f74a9a7ae6ac5d51fedc67 master date: 2013-06-04 09:27:58 +0200
* x86/xsave: fix information leak on AMD CPUsJan Beulich2013-06-041-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | Just like for FXSAVE/FXRSTOR, XSAVE/XRSTOR also don't save/restore the last instruction and operand pointers as well as the last opcode if there's no pending unmasked exception (see CVE-2006-1056 and commit 9747:4d667a139318). While the FXSR solution sits in the save path, I prefer to have this in the restore path because there the handling is simpler (namely in the context of the pending changes to properly save the selector values for 32-bit guest code). Also this is using FFREE instead of EMMS, as it doesn't seem unlikely that in the future we may see CPUs with x87 and SSE/AVX but no MMX support. The goal here anyway is just to avoid an FPU stack overflow. I would have preferred to use FFREEP instead of FFREE (freeing two stack slots at once), but AMD doesn't document that instruction. This is CVE-2013-2076 / XSA-52. Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> master commit: 8dcf9f0113454f233089e8e5bb3970d891928410 master date: 2013-06-04 09:26:54 +0200
* libxc: limit cpu values when setting vcpu affinityPetr Matousek2013-05-311-0/+12
| | | | | | | | | | | | | | | When support for pinning more than 64 cpus was added, check for cpu out-of-range values was removed. This can lead to subsequent out-of-bounds cpumap array accesses in case the cpu number is higher than the actual count. This patch returns the check. This is CVE-2013-2072 / XSA-56 Signed-off-by: Petr Matousek <pmatouse@redhat.com> master commit: 41abbadef60e5fccdfd688579dd458f7f7887cf5 master date: 2013-05-29 15:49:22 +0100
* x86: fix boot time APIC mode detectionJan Beulich2013-05-311-1/+1
| | | | | | | | | | | | | | | | | current_cpu_data becomes valid only relatively late in the boot process, so looking there for a particular feature early in the game would generally give the appearance of the feature being unavailable. Getting this wrong means that at kexec time the system would get returned to xAPIC mode, causing disconnect_bsp_APIC() to try to access the APIC page, which on systems with x2APIC pre-enabled will never get set up. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> master commit: 234c4dde2fd4f1182fe1a6bea6bced83fe363007 master date: 2013-05-23 13:08:32 +0200
* tools: fix dependency file generationJan Beulich2013-05-231-1/+1
| | | | | | | | | | | | | | | | | There is a small set of places where files in subdirectories get compiled from the parent directory. Dependency file wise this is no problem as long as the files use names distinct without regard to the directories they sit in, and tools/console/ violates this (in having two main.c files). Hence we need to avoid losing the directory name, both to ensure the two compiler instances don't simultaneously write to the same file (happening of which is what triggered me looking into this) and to guarantee dependencies for all files will be seen by make on an incremental rebuild. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> master commit: 4d788e164d6556d931bc3e0a69e36b8cf7280794 master date: 2013-05-21 10:16:30 +0200
* AMD/iommu: SR56x0 Erratum 64 - Reset all head & tail pointersAndrew Cooper2013-05-231-0/+10
| | | | | | | | | | | | | | | | | | Reference at time of patch: http://support.amd.com/us/ChipsetMotherboard_TechDocs/46303.pdf Erratum 64 states that the head and tail pointers for the Command buffer and Event log are only reset on a cold boot, not a warm boot. While the erratum is limited to systems using SR56xx chipsets (such as Family 10h CPUs), resetting the pointers is a sensible action in all cases, including the PPR log for consistency. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> master commit: 6d243308e1d75f866679db226159c797d6c83aad master date: 2013-05-22 15:26:52 +0200
* fix XSA-46 regression with xend/xmJan Beulich2013-05-232-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hypervisor side changes for XSA-46 require the tool stack to now always map the guest pIRQ before granting access permission to the underlying host IRQ (GSI). This in particular requires that pciif.py no longer can skip this step (assuming qemu would do it) for HVM guests. This in turn exposes, however, an inconsistency between xend and qemu: The former wants to always establish 1:1 mappings between pIRQ and host IRQ (for non-MSI only of course), while the latter always wants to allocate an arbitrary mapping. Since the whole tool stack obviously should always agree on the mapping model, make libxc enforce the 1:1 mapping as the more natural one (as well as being the one that allows for easier debugging, since there no need to find out the extra mapping). Users of libxc that want to establish a particular (rather than an allocated) mapping are still free to do so, as well as tool stacks not based on libxc wanting to implement an allocation based model (which is why it's not the hypervisor that's being changed to enforce either model). Since libxl, like xend, already uses a 1:1 model, it's unaffected by the libxc change (and it being unaffected by the original hypervisor side changes is - afaict - simply due to qemu getting spawned at a later point in time compared to the xend event flow). Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Andreas Falck <falck.andreas.lists@gmail.com> (on 4.1) Tested-by: Gordan Bobic <gordan@bobich.net> (on 4.2) Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> master commit: 934a5253d932b6f67fe40fc48975a2b0117e4cce master date: 2013-05-21 11:32:34 +0200
* x86/shadow: fix off-by-one in MMIO permission checkJan Beulich2013-05-231-3/+3
| | | | | | | | | | | iomem_access_permitted() wants an inclusive range as input. Also use pfn_to_paddr() in nearby code instead of open coding it. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> master commit: afa65ddfd88184a894d9364bec587554c28c20e0 master date: 2013-05-15 14:34:05 +0200
* x86/IO-APIC: fix guest RTE write corner casesJan Beulich2013-05-233-9/+52
| | | | | | | | | | | | | | | | | | | | | | This fixes two regressions from c/s 20143:a7de5bd776ca ("x86: Make the hypercall PHYSDEVOP_alloc_irq_vector hypercall dummy"): For one, IRQs that had their vector set up by Xen internally without a handler ever having got set (e.g. via "com<n>=..." without a matching consumer option like "console=com<n>") would wrongly call add_pin_to_irq() here, triggering the BUG_ON() in that function. Second, when assign_irq_vector() fails this addition to irq_2_pin[] needs to be undone. In the context of this I'm also surprised that the irq_2_pin[] manipulations here occur without any lock, i.e. rely on Dom0 to do some sort of serialization. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> master commit: 30256a0ff17f6f3b1278b85103187341d5b0ac42 master date: 2013-05-15 10:52:02 +0200
* x86: cleanup after making various page table manipulation operations preemptibleJan Beulich2013-05-033-56/+57
| | | | | | | | | | | | | | This drops the "preemptible" parameters from various functions where now they can't (or shouldn't, validated by assertions) be run in non- preemptible mode anymore, to prove that manipulations of at least L3 and L4 page tables and page table entries are now always preemptible, i.e. the earlier patches actually fulfill their purpose of fixing the resulting security issue. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> master commit: b965b31a6bce8c37e67e525fae6da0e2f26d6b2e master date: 2013-05-02 17:04:14 +0200
* VT-d: don't permit SVT_NO_VERIFY entries for known device typesJan Beulich2013-05-021-5/+9
| | | | | | | | | | | | | Only in cases where we don't know what to do we should leave the IRTE blank (suppressing all validation), but we should always log a warning in those cases (as being insecure). This is CVE-2013-1952 / XSA-49. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: "Zhang, Xiantao" <xiantao.zhang@intel.com> master commit: 63cec00679cc65ab5d5a9447a62d5202f155b78c master date: 2013-05-02 17:08:58 +0200
* x86: make page table handling error paths preemptibleJan Beulich2013-05-021-27/+98
| | | | | | | | | | | | | | | | | | | | | ... as they may take significant amounts of time. This requires cloning the tweaked continuation logic from do_mmuext_op() to do_mmu_update(). Note that in mod_l[34]_entry() a negative "preemptible" value gets passed to put_page_from_l[34]e() now, telling the callee to store the respective page in current->arch.old_guest_table (for a hypercall continuation to pick up), rather than carrying out the put right away. This is going to be made a little more explicit by a subsequent cleanup patch. This is part of CVE-2013-1918 / XSA-45. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> master commit: b8efae696c9a2d46e91fa0eda739427efc16c250 master date: 2013-05-02 16:39:37 +0200
* x86: make page table unpinning preemptibleJan Beulich2013-05-022-7/+56
| | | | | | | | | | | | | | | | ... as it may take significant amounts of time. Since we can't re-invoke the operation in a second attempt, the continuation logic must be slightly tweaked so that we make sure do_mmuext_op() gets run one more time even when the preempted unpin operation was the last one in a batch. This is part of CVE-2013-1918 / XSA-45. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> master commit: a3e049f8e86fe18e3b87f18dc0c7be665026fd9f master date: 2013-05-02 16:39:06 +0200
* x86: make arch_set_info_guest() preemptibleJan Beulich2013-05-024-38/+83
| | | | | | | | | | | | .. as the root page table validation (and the dropping of an eventual old one) can require meaningful amounts of time. This is part of CVE-2013-1918 / XSA-45. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> master commit: 99d2b149915010e986f4d8778708c5891e7b4635 master date: 2013-05-02 16:38:30 +0200
* x86: make vcpu_reset() preemptibleJan Beulich2013-05-029-19/+37
| | | | | | | | | | | | ... as dropping the old page tables may take significant amounts of time. This is part of CVE-2013-1918 / XSA-45. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> master commit: 4939f9a6dee4280f38730fd3066e5dce353112f6 master date: 2013-05-02 16:37:24 +0200
* x86: make MMUEXT_NEW_USER_BASEPTR preemptibleJan Beulich2013-05-021-6/+32
| | | | | | | | | | | ... as it may take significant amounts of time. This is part of CVE-2013-1918 / XSA-45. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> master commit: 918a5f17b447072b40780f4d03a3adc99ff0073b master date: 2013-05-02 16:36:44 +0200
* x86: make new_guest_cr3() preemptibleJan Beulich2013-05-022-20/+65
| | | | | | | | | | | ... as it may take significant amounts of time. This is part of CVE-2013-1918 / XSA-45. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> master commit: e2e6b7b627fec0d7a769ab46441f2985ebccbf04 master date: 2013-05-02 16:35:50 +0200
* x86: make vcpu_destroy_pagetables() preemptibleJan Beulich2013-05-025-64/+94
| | | | | | | | | | | | | | | | | ... as it may take significant amounts of time. The function, being moved to mm.c as the better home for it anyway, and to avoid having to make a new helper function there non-static, is given a "preemptible" parameter temporarily (until, in a subsequent patch, its other caller is also being made capable of dealing with preemption). This is part of CVE-2013-1918 / XSA-45. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> master commit: 6cdc9be2a5f2a87b4504404fbf648d16d9503c19 master date: 2013-05-02 16:34:21 +0200
* update Xen version to 4.1.6-preJan Beulich2013-04-291-1/+1
|
* update Xen version to 4.1.5RELEASE-4.1.5Jan Beulich2013-04-232-2/+2
|
* libxl: fix build error after 21c31a81Jan Beulich2013-04-191-1/+1
| | | | | | "libxl: Fix SEGV in network-attach" dropped a necessary &. Signed-off-by: Jan Beulich <jbeulich@suse.com>
* libxl: Fix SEGV in network-attachIan Jackson2013-04-181-1/+2
| | | | | | | | | | | | | | When "device/vif" directory exists but is empty l!=NULL, but nb==0, so l[nb-1] is invalid. Add missing check. Signed-off-by: Marek Marczykowski <marmarek@invisiblethingslab.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> (cherry picked from commit 9f1a6ff38b8e7bb97a016794115de28553a6559f) Conflicts: tools/libxl/libxl.c Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
* Fix rcu domain locking for transitive grantsPaul Durrant2013-04-182-31/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | When acquiring a transitive grant for copy then the owning domain needs to be locked down as well as the granting domain. This was being done, but the unlocking was not. The acquire code now stores the struct domain * of the owning domain (rather than the domid) in the active entry in the granting domain. The release code then does the unlock on the owning domain. Note that I believe I also fixed a bug where, for non-transitive grants the active entry contained a reference to the acquiring domain rather than the granting domain. From my reading of the code this would stop the release code for transitive grants from terminating its recursion correctly. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> master commit: f544bf377ee829e1342abd818ac30478c6f3a134 master date: 2011-03-08 16:30:30 +0000 Also, for non-transitive grants we now avoid incorrectly recursing in __release_grant_for_copy. This is CVE-2013-1964 / XSA-50. Reported-by: Manuel Bouyer <bouyer@antioche.eu.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Manuel Bouyer <bouyer@antioche.eu.org>
* x86: fix various issues with handling guest IRQsJan Beulich2013-04-188-24/+82
| | | | | | | | | | | | | | | - properly revoke IRQ access in map_domain_pirq() error path - don't permit replacing an in use IRQ - don't accept inputs in the GSI range for MAP_PIRQ_TYPE_MSI - track IRQ access permission in host IRQ terms, not guest IRQ ones (and with that, also disallow Dom0 access to IRQ0) This is CVE-2013-1919 / XSA-46. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> master commit: 545607eb3cfeb2abf5742d1bb869734f317fcfe5 master date: 2013-04-18 16:11:23 +0200
* x86: clear EFLAGS.NT in SYSENTER entry pathJan Beulich2013-04-183-3/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... as it causes problems if we happen to exit back via IRET: In the course of trying to handle the fault, the hypervisor creates a stack frame by hand, and uses PUSHFQ to set the respective EFLAGS field, but expects to be able to IRET through that stack frame to the second portion of the fixup code (which causes a #GP due to the stored EFLAGS having NT set). And even if this worked (e.g if we cleared NT in that path), it would then (through the fail safe callback) cause a #GP in the guest with the SYSENTER handler's first instruction as the source, which in turn would allow guest user mode code to crash the guest kernel. Inject a #GP on the fake (NULL) address of the SYSENTER instruction instead, just like in the case where the guest kernel didn't register a corresponding entry point. On 32-bit we also need to make sure we clear SYSENTER_CS for all CPUs (neither #RESET nor #INIT guarantee this). This is CVE-2013-1917 / XSA-44. Reported-by: Andrew Cooper <andrew.cooper3@citirx.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> master commit: fdac9515607b757c044e7ef0d61b1453ef999b08 master date: 2013-04-18 16:00:35 +0200
* iommu/crash: Interrupt remapping is also disabled on crashAndrew Cooper2013-04-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | This fixes a regression side-effect caused by: IOMMU: properly check whether interrupt remapping is enabled git: fae0372140befb88d890a30704a8ec058c902af8 hg: 26742:e1ec14bad0cb On the crash path in nmi_shootdown_cpus(), we shut down the IOMMU, then disable the IOAPIC. On systems which support interrupt remapping, the variable iommu_intremap remains set, meaning that disable_IO_APIC() issues interrupt remapping invalidate requests. IOAPIC interrupt remapping used to be conditional on iommu_enabled, but is now conditional on iommu_intremap, following the above changeset. This behaviour can be fixed by also indicating that interrupt remapping is not enabled after shutting down the IOMMU. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> master commit: 53fd1d8458de01169dfb56feb315f02c2b521a86 master date: 2013-04-16 10:34:32 +0200
* x86: don't pass negative time to gtime_to_gtsc()Jan Beulich2013-04-181-0/+9
| | | | | | | | | | | | | | scale_delta(), which is being called by that function, doesn't cope with that. Also print a warning message, so hopefully we can eventually figure why occasionally a negative value results from the calculation in the first place. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> master commit: eb60be3dd870aecfa47bed1118069680389c15f7 master date: 2013-04-11 12:07:55 +0200
* update Xen version to 4.1.5-rc24.1.5-rc2Jan Beulich2013-04-121-1/+1
|
* VMX: Always disable SMEP when guest is in non-paging modeStefan Bader2013-04-091-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | commit e7dda8ec9fc9020e4f53345cdbb18a2e82e54a65 VMX: disable SMEP feature when guest is in non-paging mode disabled the SMEP bit if a guest VCPU was using HAP and was not in paging mode. However I could observe VCPUs getting stuck in the trampoline after the following patch in the Linux kernel changed the way CR4 gets set up: x86, realmode: read cr4 and EFER from kernel for 64-bit trampoline The change will set CR4 from already set flags which includes the SMEP bit. On bare metal this does not matter as the CPU is in non- paging mode at that time. But Xen seems to use the emulated non- paging mode regardless of HAP (I verified that on the guests I was seeing the issue, HAP was not used). Therefor it seems right to unset the SMEP bit for a VCPU that is not in paging-mode, regardless of its HAP usage. Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Dongxiao Xu <dongxiao.xu@intel.com> master commit: 0d2e673a763bc7c2ddf97fed074eb691d325ecc5 master date: 2013-04-04 10:37:19 +0200
* VMX: disable SMEP feature when guest is in non-paging modeDongxiao Xu2013-04-091-0/+7
| | | | | | | | | | | | | | | | | | | SMEP is disabled if CPU is in non-paging mode in hardware. However Xen always uses paging mode to emulate guest non-paging mode with HAP. To emulate this behavior, SMEP needs to be manually disabled when guest switches to non-paging mode. We met an issue that, SMP Linux guest with recent kernel (enable SMEP support, for example, 3.5.3) would crash with triple fault if setting unrestricted_guest=0 in grub. This is because Xen uses an identity mapping page table to emulate the non-paging mode, where the page table is set with USER flag. If SMEP is still enabled in this case, guest will meet unhandlable page fault and then crash. Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> master commit: e7dda8ec9fc9020e4f53345cdbb18a2e82e54a65 master date: 2013-01-30 09:17:30 -0800
* vmx: Simplify cr0 update handling by deferring cr4 changes to the cr4 handler.Keir Fraser2013-04-091-10/+5
| | | | | | Signed-off-by: Keir Fraser <keir@xen.org> master commit: 1453984eab1297559e016d4e907a27e55997191c master date: 2013-01-30 09:15:39 -0800
* x86/mm/shadow: spurious warning when unmapping xenheap pages.Tim Deegan2013-04-092-3/+6
| | | | | | | | | | | | Xenheap pages will always have an extra typecount, taken in share_xen_page_with_guest(), which doesn't come from a shadow PTE. Adjust the warning in sh_remove_all_mappings() to account for it. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Tim Deegan <tim@xen.org> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com> master commit: cfc515dabe91e3d6c690c68c6a669d6d77fb7ac4 master date: 2013-04-04 10:14:30 +0100
* x86/S3: Restore broken vcpu affinity on resumeBen Guthro2013-04-093-3/+51
| | | | | | | | | | | | | | | When in SYS_STATE_suspend, and going through the cpu_disable_scheduler path, save a copy of the current cpu affinity, and mark a flag to restore it later. Later, in the resume process, when enabling nonboot cpus restore these affinities. Signed-off-by: Ben Guthro <benjamin.guthro@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org> master commit: 41e71c2607e036f1ac00df898b8f4acb2d4df7ee master date: 2013-04-02 09:52:32 +0200
* x86: irq_move_cleanup_interrupt() must ignore legacy vectorsJan Beulich2013-04-092-1/+8
| | | | | | | | | | | | | | | | | | | | | | | Since the main loop in the function includes legacy vectors, and since vector_irq[] gets set up for legacy vectors regardless of whether those get handled through the IO-APIC, it must not do anything on this vector range. In fact, we should never get past the move_cleanup_count check for IRQs not handled through the IO-APIC. Adding a respective assertion woulkd make those iterations more expensive (due to the lock acquire). For such an assertion to not have false positives we however ought to suppress setting up IRQ2 as an 8259A interrupt (which wasn't correct anyway), which is being done here despite the assertion not actually getting added. Furthermore, there's no point iterating over the vectors past LAST_HIPRIORITY_VECTOR, so terminate the loop accordingly. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org> master commit: af699220ad6d111ba76fc3040342184e423cc9a1 master date: 2013-04-02 08:30:03 +0200