Commits · 07db29f20a9a845c8408df11936889e5411ff44f · Anton / libtcg

Oct 29, 2021

target/i386: Remove core-capability in Snowridge CPU model · 07db29f2

Chenyi Qiang authored 3 years ago


Because core-capability releated features are model-specific and KVM
won't support it, remove the core-capability in CPU model to avoid the
warning message.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210827064818.4698-3-chenyi.qiang@intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

07db29f2

Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20211028' into staging · a92cecba

Richard Henderson authored 3 years ago


Improvements to qemu/int128
Fixes for 128/64 division.
Cleanup tcg/optimize.c
Optimize redundant sign extensions

# gpg: Signature made Thu 28 Oct 2021 09:06:00 PM PDT
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [ultimate]

* remotes/rth/tags/pull-tcg-20211028: (60 commits)
  softmmu: fix for "after access" watchpoints
  softmmu: remove useless condition in watchpoint check
  softmmu: fix watchpoint processing in icount mode
  tcg/optimize: Propagate sign info for shifting
  tcg/optimize: Propagate sign info for bit counting
  tcg/optimize: Propagate sign info for setcond
  tcg/optimize: Propagate sign info for logical operations
  tcg/optimize: Optimize sign extensions
  tcg/optimize: Use fold_xx_to_i for rem
  tcg/optimize: Use fold_xi_to_x for div
  tcg/optimize: Use fold_xi_to_x for mul
  tcg/optimize: Use fold_xx_to_i for orc
  tcg/optimize: Stop forcing z_mask to "garbage" for 32-bit values
  tcg: Extend call args using the correct opcodes
  tcg/optimize: Sink commutative operand swapping into fold functions
  tcg/optimize: Expand fold_addsub2_i32 to 64-bit ops
  tcg/optimize: Expand fold_mulu2_i32 to all 4-arg multiplies
  tcg/optimize: Split out fold_masks
  tcg/optimize: Split out fold_ix_to_i
  tcg/optimize: Split out fold_xi_to_x
  ...

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

a92cecba

Merge remote-tracking branch 'remotes/quic/tags/pull-hex-20211028' into staging · edf044c5

Richard Henderson authored 3 years ago


Followup to replace more tcg_const_* with tcg_constant_tl*
Fix bug to delay writes to USR until packet commit

# gpg: Signature made Thu 28 Oct 2021 08:59:24 PM PDT
# gpg:                using RSA key 7B0244FB12DE4422
# gpg: Good signature from "Taylor Simpson (Rock on) <tsimpson@quicinc.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 3635 C788 CE62 B91F D4C5  9AB4 7B02 44FB 12DE 4422

* remotes/quic/tags/pull-hex-20211028:
  Hexagon (target/hexagon) put writes to USR into temp until commit
  Hexagon (target/hexagon) more tcg_constant_*

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

edf044c5

softmmu: fix for "after access" watchpoints · efd629fb

Pavel Dovgalyuk authored 3 years ago


Watchpoints that should fire after the memory access
break an execution of the current block, try to
translate current instruction into the separate block,
which then causes debug interrupt.
But cpu_interrupt can't be called in such block when
icount is enabled, because interrupts muse be allowed
explicitly.
This patch sets CF_LAST_IO flag for retranslated block,
allowing interrupt request for the last instruction.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <163542169727.2127597.8141772572696627329.stgit@pasha-ThinkPad-X280>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

efd629fb

softmmu: remove useless condition in watchpoint check · 1ab0ba8a

Pavel Dovgalyuk authored 3 years ago


cpu_check_watchpoint function checks cpu->watchpoint_hit at the entry.
But then it also does the same in the middle of the function,
while this field can't change.
That is why this patch removes this useless condition.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <163542169094.2127597.8801843697434113110.stgit@pasha-ThinkPad-X280>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

1ab0ba8a

softmmu: fix watchpoint processing in icount mode · 9f660c07

Pavel Dovgalyuk authored 3 years ago


Watchpoint processing code restores vCPU state twice:
in tb_check_watchpoint and in cpu_loop_exit_restore/cpu_restore_state.
Normally it does not affect anything, but in icount mode instruction
counter is incremented twice and becomes incorrect.
This patch eliminates unneeded CPU state restore.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <163542168516.2127597.8781375223437124644.stgit@pasha-ThinkPad-X280>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

9f660c07

tcg/optimize: Propagate sign info for shifting · 93a967fb

Richard Henderson authored 3 years ago


For constant shifts, we can simply shift the s_mask.

For variable shifts, we know that sar does not reduce
the s_mask, which helps for sequences like

    ext32s_i64  t, in
    sar_i64     t, t, v
    ext32s_i64  out, t

allowing the final extend to be eliminated.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

93a967fb

tcg/optimize: Propagate sign info for bit counting · 2b9d0c59

Richard Henderson authored 3 years ago


The results are generally 6 bit unsigned values, though
the count leading and trailing bits may produce any value
for a zero input.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

2b9d0c59

tcg/optimize: Propagate sign info for setcond · 275d7d8e

Richard Henderson authored 3 years ago


The result is either 0 or 1, which means that we have
a 2 bit signed result, and thus 62 bits of sign.
For clarity, use the smask_from_zmask function.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

275d7d8e

tcg/optimize: Propagate sign info for logical operations · 3f2b1f83

Richard Henderson authored 3 years ago


Sign repetitions are perforce all identical, whether they are 1 or 0.
Bitwise operations preserve the relative quantity of the repetitions.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

3f2b1f83

tcg/optimize: Optimize sign extensions · 57fe5c6d

Richard Henderson authored 3 years ago


Certain targets, like riscv, produce signed 32-bit results.
This can lead to lots of redundant extensions as values are
manipulated.

Begin by tracking only the obvious sign-extensions, and
converting them to simple copies when possible.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

57fe5c6d

tcg/optimize: Use fold_xx_to_i for rem · 267c17e8

Richard Henderson authored 3 years ago


Recognize the constant function for remainder.

Suggested-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

267c17e8

tcg/optimize: Use fold_xi_to_x for div · 2f9d9a34

Richard Henderson authored 3 years ago


Recognize the identity function for division.

Suggested-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

2f9d9a34

tcg/optimize: Use fold_xi_to_x for mul · 5b5cf479

Richard Henderson authored 3 years ago


Recognize the identity function for low-part multiply.

Suggested-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

5b5cf479

tcg/optimize: Use fold_xx_to_i for orc · 4e858d96

Richard Henderson authored 3 years ago


Recognize the constant function for or-complement.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

4e858d96

tcg/optimize: Stop forcing z_mask to "garbage" for 32-bit values · faa2e100

Richard Henderson authored 3 years ago


This "garbage" setting pre-dates the addition of the type
changing opcodes INDEX_op_ext_i32_i64, INDEX_op_extu_i32_i64,
and INDEX_op_extr{l,h}_i64_i32.

So now we have a definitive points at which to adjust z_mask
to eliminate such bits from the 32-bit operands.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

faa2e100

tcg: Extend call args using the correct opcodes · 18cf3d07

Richard Henderson authored 3 years ago


Pretending that the source is i64 when it is in fact i32 is
incorrect; we have type-changing opcodes that must be used.
This bug trips up the subsequent change to the optimizer.

Fixes: 4f2331e5
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

18cf3d07

Hexagon (target/hexagon) put writes to USR into temp until commit · b9dd6ff9

Taylor Simpson authored 3 years ago


Change SET_USR_FIELD to write to hex_new_value[HEX_REG_USR] instead
of hex_gpr[HEX_REG_USR].

Then, we need code to mark the instructions that can set implicitly
set USR
- Macros added to hex_common.py
- A_FPOP added in translate.c

Test case added in tests/tcg/hexagon/overflow.c

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>

b9dd6ff9

Hexagon (target/hexagon) more tcg_constant_* · f448397a

Taylor Simpson authored 3 years ago


Change additional tcg_const_tl to tcg_constant_tl

Note that gen_pred_cancal had slot_mask initialized with tcg_const_tl.
However, it is not constant throughout, so we initialize it with
tcg_temp_new and replace the first use with the constant value.

Inspired-by: Richard Henderson <richard.henderson@linaro.org>
Inspired-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>

f448397a

Oct 28, 2021

tcg/optimize: Sink commutative operand swapping into fold functions · 7a2f7084

Richard Henderson authored 3 years ago


Most of these are handled by creating a fold_const2_commutative
to handle all of the binary operators.  The rest were already
handled on a case-by-case basis in the switch, and have their
own fold function in which to place the call.

We now have only one major switch on TCGOpcode.

Introduce NO_DEST and a block comment for swap_commutative in
order to make the handling of brcond and movcond opcodes cleaner.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

7a2f7084

tcg/optimize: Expand fold_addsub2_i32 to 64-bit ops · 9531c078

Richard Henderson authored 3 years ago


Rename to fold_addsub2.
Use Int128 to implement the wider operation.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

9531c078

tcg/optimize: Expand fold_mulu2_i32 to all 4-arg multiplies · 407112b0

Richard Henderson authored 3 years ago


Rename to fold_multiply2, and handle muls2_i32, mulu2_i64,
and muls2_i64.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

407112b0

tcg/optimize: Split out fold_masks · fae450ba

Richard Henderson authored 3 years ago


Move all of the known-zero optimizations into the per-opcode
functions.  Use fold_masks when there is a possibility of the
result being determined, and simply set ctx->z_mask otherwise.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

fae450ba

tcg/optimize: Split out fold_ix_to_i · da48e272

Richard Henderson authored 3 years ago


Pull the "op r, 0, b => movi r, 0" optimization into a function,
and use it in fold_shift.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

da48e272

tcg/optimize: Split out fold_xi_to_x · a63ce0e9

Richard Henderson authored 3 years ago


Pull the "op r, a, i => mov r, a" optimization into a function,
and use them in the outer-most logical operations.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

a63ce0e9

tcg/optimize: Split out fold_sub_to_neg · 9caca88a

Richard Henderson authored 3 years ago


Even though there is only one user, place this more complex
conversion into its own helper.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

9caca88a

tcg/optimize: Split out fold_to_not · 0e0a32ba

Richard Henderson authored 3 years ago


Split out the conditional conversion from a more complex logical
operation to a simple NOT.  Create a couple more helpers to make
this easy for the outer-most logical operations.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

0e0a32ba

tcg/optimize: Add type to OptContext · 67f84c96

Richard Henderson authored 3 years ago


Compute the type of the operation early.

There are at least 4 places that used a def->flags ladder
to determine the type of the operation being optimized.

There were two places that assumed !TCG_OPF_64BIT means
TCG_TYPE_I32, and so could potentially compute incorrect
results for vector operations.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

67f84c96

tcg/optimize: Split out fold_xi_to_i · e8679955

Richard Henderson authored 3 years ago


Pull the "op r, a, 0 => movi r, 0" optimization into a function,
and use it in the outer opcode fold functions.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

e8679955

tcg/optimize: Split out fold_xx_to_x · ca7bb049

Richard Henderson authored 3 years ago


Pull the "op r, a, a => mov r, a" optimization into a function,
and use it in the outer opcode fold functions.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

ca7bb049

tcg/optimize: Split out fold_xx_to_i · cbe42fb2

Richard Henderson authored 3 years ago


Pull the "op r, a, a => movi r, 0" optimization into a function,
and use it in the outer opcode fold functions.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

cbe42fb2

tcg/optimize: Split out fold_mov · 2cfac7fa

Richard Henderson authored 3 years ago


This is the final entry in the main switch that was in a
different form.  After this, we have the option to convert
the switch into a function dispatch table.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

2cfac7fa

tcg/optimize: Split out fold_dup, fold_dup2 · 8cdb3fcb

Richard Henderson authored 3 years ago


Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

8cdb3fcb

tcg/optimize: Split out fold_bswap · 09bacdc2

Richard Henderson authored 3 years ago


Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

09bacdc2

tcg/optimize: Split out fold_count_zeros · 30dd0bfe

Richard Henderson authored 3 years ago


Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

30dd0bfe

tcg/optimize: Split out fold_deposit · 1b1907b8

Richard Henderson authored 3 years ago


Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

1b1907b8

tcg/optimize: Split out fold_extract, fold_sextract · b6617c88

Richard Henderson authored 3 years ago


Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

b6617c88

tcg/optimize: Split out fold_extract2 · dcd08996

Richard Henderson authored 3 years ago


Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

dcd08996

tcg/optimize: Split out fold_movcond · 0c310a30

Richard Henderson authored 3 years ago


Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

0c310a30

tcg/optimize: Split out fold_addsub2_i32 · e3f7dc21

Richard Henderson authored 3 years ago


Add two additional helpers, fold_add2_i32 and fold_sub2_i32
which will not be simple wrappers forever.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

e3f7dc21