Commits · ab419fd8a035a65942de4e63effcd55ccbf1a9fe · Anton / libtcg

Jul 21, 2021

qemu/atomic: Add aligned_{int64,uint64}_t types · 9ef0c6d6


Use it to avoid some clang-12 -Watomic-alignment errors,
forcing some structures to be aligned and as a pointer when
we have ensured that the address is aligned.

Tested-by: Cole Robinson <crobinso@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

9ef0c6d6

Sep 23, 2020

qemu/atomic.h: rename atomic_ to qatomic_ · d73415a3

Stefan Hajnoczi authored 4 years ago


clang's C11 atomic_fetch_*() functions only take a C11 atomic type
pointer argument. QEMU uses direct types (int, etc) and this causes a
compiler error when a QEMU code calls these functions in a source file
that also included <stdatomic.h> via a system header file:

  $ CC=clang CXX=clang++ ./configure ... && make
  ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int *' invalid)

Avoid using atomic_*() names in QEMU's atomic.h since that namespace is
used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h
and <stdatomic.h> can co-exist. I checked /usr/include on my machine and
searched GitHub for existing "qatomic_" users but there seem to be none.

This patch was generated using:

  $ git grep -h -o '\<atomic\(64\)\?_[a-z0-9_]\+' include/qemu/atomic.h | \
    sort -u >/tmp/changed_identifiers
  $ for identifier in $(</tmp/changed_identifiers); do
        sed -i "s%\<$identifier\>%q$identifier%g" \
            $(git grep -I -l "\<$identifier\>")
    done

I manually fixed line-wrap issues and misaligned rST tables.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200923105646.47864-1-stefanha@redhat.com>

d73415a3

Dec 17, 2019

qsp: Use WITH_RCU_READ_LOCK_GUARD · 2a86be25

Dr. David Alan Gilbert authored 5 years ago


The automatic rcu read lock maintenance works quite
nicely in this case where it previously relied on a comment to
delimit the lifetime and now has a block.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

2a86be25

Sep 16, 2019

qemu-thread: Add qemu_cond_timedwait · 3dcc9c6e

Yury Kotov authored 5 years ago


The new function is needed to implement conditional sleep for CPU
throttling. It's possible to reuse qemu_sem_timedwait, but it's more
difficult than just add qemu_cond_timedwait.

Also moved compute_abs_deadline function up the code to reuse it in
qemu_cond_timedwait_impl win32.

Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190909131335.16848-2-yury-kotov@yandex-team.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3dcc9c6e

Apr 18, 2019

qsp: Simplify how qsp_report() prints · ac7ff4cf

Markus Armbruster authored 5 years ago


qsp_report() takes an fprintf()-like callback and a FILE * to pass to
it.

Its only caller hmp_sync_profile() passes monitor_fprintf() and the
current monitor cast to FILE *.  monitor_fprintf() casts it right
back, and is otherwise identical to monitor_printf().  The
type-punning is ugly.

Drop the callback, and call qemu_printf() instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-7-armbru@redhat.com>

ac7ff4cf

Dec 17, 2018

include: move exec/tb-hash-xx.h to qemu/xxhash.h · fe656e31

Emilio G. Cota authored 6 years ago


Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

fe656e31

exec: introduce qemu_xxhash{2,4,5,6,7} · c971d8fa

Emilio G. Cota authored 6 years ago


Before moving them all to include/qemu/xxhash.h.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

c971d8fa

Oct 02, 2018

qsp: use atomic64 accessors · ac8c7748

Emilio G. Cota authored 6 years ago


With the seqlock, we either have to use atomics to remain
within defined behaviour (and note that 64-bit atomics aren't
always guaranteed to compile, irrespective of __nocheck), or
drop the atomics and be in undefined behaviour territory.

Fix it by dropping the seqlock and using atomic64 accessors.
This will limit scalability when !CONFIG_ATOMIC64, but those
machines (1) don't have many users and (2) are unlikely to
have many cores.

- With CONFIG_ATOMIC64:
$ tests/atomic_add-bench -n 1 -m -p
 Throughput:         13.00 Mops/s

- Forcing !CONFIG_ATOMIC64:
$ tests/atomic_add-bench -n 1 -m -p
 Throughput:         10.89 Mops/s

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20180910232752.31565-5-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ac8c7748

Sep 26, 2018

qht: drop ht argument from qht iterators · 78255ba2

Emilio G. Cota authored 6 years ago


Accessing the HT from an iterator results almost always
in a deadlock. Given that only one qht-internal function
uses this argument, drop it from the interface.

Suggested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

78255ba2

Aug 23, 2018

qsp: track BQL callers explicitly · cb764d06

Emilio G. Cota authored 7 years ago


The BQL is acquired via qemu_mutex_lock_iothread(), which makes
the profiler assign the associated wait time (i.e. most of
BQL wait time) entirely to that function. This loses the original
call site information, which does not help diagnose BQL contention.
Fix it by tracking the callers explicitly.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

cb764d06

qsp: support call site coalescing · d557de4a

Emilio G. Cota authored 6 years ago


Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d557de4a

qsp: add qsp_reset · 996e8d9a

Emilio G. Cota authored 6 years ago


I first implemented this by deleting all entries in the global
hash table. But doing that safely slows down profiling, since
we'd need to introduce rcu_read_lock/unlock in the fast path.

What's implemented here avoids messing with the thread-local
data in the global hash table. It achieves this by taking a snapshot
of the current state, so that subsequent reports present the delta
wrt to the snapshot.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

996e8d9a

qsp: add sort_by option to qsp_report · 0a22777c

Emilio G. Cota authored 6 years ago


Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0a22777c

qsp: QEMU's Synchronization Profiler · fe9959a2

Emilio G. Cota authored 7 years ago


The goal of this module is to profile synchronization primitives (i.e.
mutexes, recursive mutexes and condition variables) so that scalability
issues can be quickly diagnosed.

Sync primitives are profiled by QSP based on the vaddr of the object accessed
as well as the call site (file:line_nr). That means the same object called
from two different call sites will be tracked in separate entries, which
might be reported together or separately (see subsequent commit on
call site coalescing).

Some perf numbers:

Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
Command: taskset -c 0 tests/atomic_add-bench -d 5 -m

- Before: 54.80 Mops/s
- After:  54.75 Mops/s

That is, a negligible slowdown due to the now indirect call to
qemu_mutex_lock. Note that using a branch instead of an indirect
call introduces a more severe slowdown (53.65 Mops/s, i.e. 2% slowdown).

Enabling the profiler (with -p, added in this series) is more interesting:

- No profiling: 54.75 Mops/s
- W/ profiling: 12.53 Mops/s

That is, a 4.36X slowdown.

We can break down this slowdown by removing the get_clock calls or
the entry lookup:

- No profiling:     54.75 Mops/s
- W/o get_clock:    25.37 Mops/s
- W/o entry lookup: 19.30 Mops/s
- W/ profiling:     12.53 Mops/s

Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

fe9959a2