Commits · 498be5b87d61f377b38701b31e460eed7a104c0d · Anton / libtcg

Feb 04, 2022

cpuid: use unsigned for max cpuid · 2a728de1


__get_cpuid_max returns an unsigned value.
For consistency, store the result in an unsigned variable.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

2a728de1

Apr 01, 2020

util/bufferiszero: improve avx2 accelerator · 8f13a39d

Robert Hoo authored 5 years ago


By increasing avx2 length_to_accel to 128, we can simplify its logic and reduce a
branch.

The authorship of this patch actually belongs to Richard Henderson
<richard.henderson@linaro.org>, I just fixed a boundary case on his
original patch.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Message-Id: <1585119021-46593-2-git-send-email-robert.hu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

8f13a39d

util/bufferiszero: assign length_to_accel value for each accelerator case · b87c99d0

Robert Hoo authored 5 years ago


Because in unit test, init_accel() will be called several times, each with
different accelerator type.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Message-Id: <1585119021-46593-1-git-send-email-robert.hu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b87c99d0

Mar 16, 2020

util: add util function buffer_zero_avx512() · 27f08ea1

Robert Hoo authored 5 years ago


And intialize buffer_is_zero() with it, when Intel AVX512F is
available on host.

This function utilizes Intel AVX512 fundamental instructions which
is faster than its implementation with AVX2 (in my unit test, with
4K buffer, on CascadeLake SP, ~36% faster, buffer_zero_avx512() V.S.
buffer_zero_avx2()).

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

27f08ea1

Jun 12, 2019

Include qemu-common.h exactly where needed · a8d25326

Markus Armbruster authored 5 years ago


No header includes qemu-common.h after this commit, as prescribed by
qemu-common.h's file comment.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-5-armbru@redhat.com>
[Rebased with conflicts resolved automatically, except for
include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c
block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c
target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h
target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h
target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h
target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and
net/tap-bsd.c fixed up]

a8d25326

Jul 24, 2017

util: Introduce include/qemu/cpuid.h · 5dd89908

Richard Henderson authored 7 years ago


Clang 3.9 passes the CONFIG_AVX2_OPT configure test.  However, the
supplied <cpuid.h> does not contain the bit_AVX2 define that we use
when detecting whether the routine can be enabled.

Introduce a qemu-specific header that uses the compiler's definition
of __cpuid et al, but supplies any missing bit_* definitions needed.
This avoids introducing any extra ifdefs to util/bufferiszero.c, and
allows quite a few to be removed from tcg/i386/tcg-target.inc.c.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20170719044018.18063-1-rth@twiddle.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

5dd89908

Sep 14, 2016

cutils: Rewrite x86 buffer zero checking · d9911d14

Richard Henderson authored 8 years ago


Handle alignment of buffers, so that the vector paths
can be used more often.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1473800239-13841-1-git-send-email-rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d9911d14

Sep 13, 2016

cutils: Add generic prefetch · 083d012a

Richard Henderson authored 8 years ago


There's no real knowledge of the cacheline size,
just prefetching one loop ahead.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1472496380-19706-7-git-send-email-rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

083d012a

cutils: Add SSE4 version · 86444f08
Paolo Bonzini authored 8 years ago
```
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
```
86444f08

cutils: Add test for buffer_is_zero · efad6682

Richard Henderson authored 8 years ago


Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1472496380-19706-6-git-send-email-rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

efad6682

cutils: Remove ppc buffer zero checking · 43ff5e01

Richard Henderson authored 8 years ago


For ppc64le, gcc6 does extremely poorly with the Altivec code.
Moreover, on POWER7 and POWER8, a hand-optimized Altivec version
turns out to be no faster than the revised integer version, and
therefore not worth the effort.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

43ff5e01

cutils: Remove aarch64 buffer zero checking · 2250d3a2

Richard Henderson authored 8 years ago


The revised integer version is 4 times faster than the neon version
on an AppliedMicro Mustang.  Even with hand scheduling and additional
unrolling I cannot make any neon version run as fast as the integer.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

2250d3a2

cutils: Rearrange buffer_is_zero acceleration · 5e33a872

Richard Henderson authored 8 years ago


Allow selection of several acceleration functions
based on the size and alignment of the buffer.
Do not require ifunc support for AVX2 acceleration.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1472496380-19706-5-git-send-email-rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5e33a872

cutils: Export only buffer_is_zero · a1febc49

Richard Henderson authored 8 years ago


Since the two users don't make use of the returned offset,
beyond ensuring that the entire buffer is zero, consider the
can_use_buffer_find_nonzero_offset and buffer_find_nonzero_offset
functions internal.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1472496380-19706-4-git-send-email-rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a1febc49

cutils: Remove SPLAT macro · 8c70c1b0

Richard Henderson authored 8 years ago


This is unused and complicates the vector interface.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1472496380-19706-3-git-send-email-rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

8c70c1b0

cutils: Move buffer_is_zero and subroutines to a new file · 88ca8e80

Richard Henderson authored 8 years ago


Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1472496380-19706-2-git-send-email-rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

88ca8e80