Skip to content
Snippets Groups Projects
  1. Sep 20, 2023
  2. Sep 18, 2023
    • Ilya Maximets's avatar
      net: add initial support for AF_XDP network backend · cb039ef3
      Ilya Maximets authored
      
      AF_XDP is a network socket family that allows communication directly
      with the network device driver in the kernel, bypassing most or all
      of the kernel networking stack.  In the essence, the technology is
      pretty similar to netmap.  But, unlike netmap, AF_XDP is Linux-native
      and works with any network interfaces without driver modifications.
      Unlike vhost-based backends (kernel, user, vdpa), AF_XDP doesn't
      require access to character devices or unix sockets.  Only access to
      the network interface itself is necessary.
      
      This patch implements a network backend that communicates with the
      kernel by creating an AF_XDP socket.  A chunk of userspace memory
      is shared between QEMU and the host kernel.  4 ring buffers (Tx, Rx,
      Fill and Completion) are placed in that memory along with a pool of
      memory buffers for the packet data.  Data transmission is done by
      allocating one of the buffers, copying packet data into it and
      placing the pointer into Tx ring.  After transmission, device will
      return the buffer via Completion ring.  On Rx, device will take
      a buffer form a pre-populated Fill ring, write the packet data into
      it and place the buffer into Rx ring.
      
      AF_XDP network backend takes on the communication with the host
      kernel and the network interface and forwards packets to/from the
      peer device in QEMU.
      
      Usage example:
      
        -device virtio-net-pci,netdev=guest1,mac=00:16:35:AF:AA:5C
        -netdev af-xdp,ifname=ens6f1np1,id=guest1,mode=native,queues=1
      
      XDP program bridges the socket with a network interface.  It can be
      attached to the interface in 2 different modes:
      
      1. skb - this mode should work for any interface and doesn't require
               driver support.  With a caveat of lower performance.
      
      2. native - this does require support from the driver and allows to
                  bypass skb allocation in the kernel and potentially use
                  zero-copy while getting packets in/out userspace.
      
      By default, QEMU will try to use native mode and fall back to skb.
      Mode can be forced via 'mode' option.  To force 'copy' even in native
      mode, use 'force-copy=on' option.  This might be useful if there is
      some issue with the driver.
      
      Option 'queues=N' allows to specify how many device queues should
      be open.  Note that all the queues that are not open are still
      functional and can receive traffic, but it will not be delivered to
      QEMU.  So, the number of device queues should generally match the
      QEMU configuration, unless the device is shared with something
      else and the traffic re-direction to appropriate queues is correctly
      configured on a device level (e.g. with ethtool -N).
      'start-queue=M' option can be used to specify from which queue id
      QEMU should start configuring 'N' queues.  It might also be necessary
      to use this option with certain NICs, e.g. MLX5 NICs.  See the docs
      for examples.
      
      In a general case QEMU will need CAP_NET_ADMIN and CAP_SYS_ADMIN
      or CAP_BPF capabilities in order to load default XSK/XDP programs to
      the network interface and configure BPF maps.  It is possible, however,
      to run with no capabilities.  For that to work, an external process
      with enough capabilities will need to pre-load default XSK program,
      create AF_XDP sockets and pass their file descriptors to QEMU process
      on startup via 'sock-fds' option.  Network backend will need to be
      configured with 'inhibit=on' to avoid loading of the program.
      QEMU will need 32 MB of locked memory (RLIMIT_MEMLOCK) per queue
      or CAP_IPC_LOCK.
      
      There are few performance challenges with the current network backends.
      
      First is that they do not support IO threads.  This means that data
      path is handled by the main thread in QEMU and may slow down other
      work or may be slowed down by some other work.  This also means that
      taking advantage of multi-queue is generally not possible today.
      
      Another thing is that data path is going through the device emulation
      code, which is not really optimized for performance.  The fastest
      "frontend" device is virtio-net.  But it's not optimized for heavy
      traffic either, because it expects such use-cases to be handled via
      some implementation of vhost (user, kernel, vdpa).  In practice, we
      have virtio notifications and rcu lock/unlock on a per-packet basis
      and not very efficient accesses to the guest memory.  Communication
      channels between backend and frontend devices do not allow passing
      more than one packet at a time as well.
      
      Some of these challenges can be avoided in the future by adding better
      batching into device emulation or by implementing vhost-af-xdp variant.
      
      There are also a few kernel limitations.  AF_XDP sockets do not
      support any kinds of checksum or segmentation offloading.  Buffers
      are limited to a page size (4K), i.e. MTU is limited.  Multi-buffer
      support implementation for AF_XDP is in progress, but not ready yet.
      Also, transmission in all non-zero-copy modes is synchronous, i.e.
      done in a syscall.  That doesn't allow high packet rates on virtual
      interfaces.
      
      However, keeping in mind all of these challenges, current implementation
      of the AF_XDP backend shows a decent performance while running on top
      of a physical NIC with zero-copy support.
      
      Test setup:
      
      2 VMs running on 2 physical hosts connected via ConnectX6-Dx card.
      Network backend is configured to open the NIC directly in native mode.
      The driver supports zero-copy.  NIC is configured to use 1 queue.
      
      Inside a VM - iperf3 for basic TCP performance testing and dpdk-testpmd
      for PPS testing.
      
      iperf3 result:
       TCP stream      : 19.1 Gbps
      
      dpdk-testpmd (single queue, single CPU core, 64 B packets) results:
       Tx only         : 3.4 Mpps
       Rx only         : 2.0 Mpps
       L2 FWD Loopback : 1.5 Mpps
      
      In skb mode the same setup shows much lower performance, similar to
      the setup where pair of physical NICs is replaced with veth pair:
      
      iperf3 result:
        TCP stream      : 9 Gbps
      
      dpdk-testpmd (single queue, single CPU core, 64 B packets) results:
        Tx only         : 1.2 Mpps
        Rx only         : 1.0 Mpps
        L2 FWD Loopback : 0.7 Mpps
      
      Results in skb mode or over the veth are close to results of a tap
      backend with vhost=on and disabled segmentation offloading bridged
      with a NIC.
      
      Signed-off-by: default avatarIlya Maximets <i.maximets@ovn.org>
      Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> (docker/lcitool)
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      cb039ef3
  3. Sep 08, 2023
  4. Sep 07, 2023
    • Paolo Bonzini's avatar
      Python: Drop support for Python 3.7 · ca056f44
      Paolo Bonzini authored
      
      Debian 10 is not anymore a supported distro, since Debian 12 was
      released on June 10, 2023.  Our supported build platforms as of today
      all support at least 3.8 (and all of them except for Ubuntu 20.04
      support 3.9):
      
      openSUSE Leap 15.5: 3.6.15 (3.11.2)
      CentOS Stream 8:    3.6.8  (3.8.13, 3.9.16, 3.11.4)
      CentOS Stream 9:    3.9.17 (3.11.4)
      Fedora 37:          3.11.4
      Fedora 38:          3.11.4
      Debian 11:          3.9.2
      Debian 12:          3.11.2
      Alpine 3.14, 3.15:  3.9.16
      Alpine 3.16, 3.17:  3.10.10
      Ubuntu 20.04 LTS:   3.8.10
      Ubuntu 22.04 LTS:   3.10.12
      NetBSD 9.3:         3.9.13*
      FreeBSD 12.4:       3.9.16
      FreeBSD 13.1:       3.9.18
      OpenBSD 7.2:        3.9.17
      
      Note: NetBSD does not appear to have a default meta-package, but offers
      several options, the lowest of which is 3.7.15. However, "python39"
      appears to be a pre-requisite to one of the other packages we request
      in tests/vm/netbsd.
      
      Since it is safe under our supported platform policy, bump our
      minimum supported version of Python to 3.8.  The two most interesting
      features to have by default include:
      
      - the importlib.metadata module, whose lack is responsible for over 100
        lines of code in mkvenv.py
      
      - improvements to asyncio, for example asyncio.CancelledError
        inherits from BaseException rather than Exception
      
      In addition, code can now use the assignment operator ':='
      
      Because mypy now learns about importlib.metadata, a small change to
      mkvenv.py is needed to pass type checking.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ca056f44
    • Paolo Bonzini's avatar
      configure, meson: move --enable-plugins to meson · 2c13c574
      Paolo Bonzini authored
      
      While the option still needs to be parsed in the configure script
      (it's needed by tests/tcg, and also to decide about recursing
      into contrib/plugins), passing it to Meson can be done with -D
      instead of using config-host.mak.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2c13c574
    • Paolo Bonzini's avatar
  5. Aug 31, 2023
  6. Aug 28, 2023
    • Paolo Bonzini's avatar
      Revert "tests: Use separate virtual environment for avocado" · c03f57fd
      Paolo Bonzini authored
      This reverts commit e8e4298f.
      
      ensuregroup allows to specify both the acceptable versions of avocado,
      and a locked version to be used when avocado is not installed as a system
      pacakge.  This lets us install avocado in pyvenv/ using "mkvenv.py" and
      reuse the distro package on Fedora and CentOS Stream (the only distros
      where it's available).
      
      ensuregroup's usage of "(>=..., <=...)" constraints when evaluating
      the distro package, and "==" constraints when installing it from PyPI,
      makes it possible to avoid conflicts between the known-good version and
      a package plugins included in the distro.
      
      This is because package plugins have "==" constraints on the version
      that is included in the distro, and, using "pip install avocado==88.1"
      on a venv that includes system packages will result in an error:
      
         avocado-framework-plugin-varianter-yaml-to-mux 98.0 requires avocado-framework==98.0, but you have avocado-framework 88.1 which is incompatible.
         avocado-framework-plugin-result-html 98.0 requires avocado-framework==98.0, but you have avocado-framework 88.1 which is incompatible.
      
      But at the same time, if the venv does not include a system distribution
      of avocado then we can install a known-good version and stick to LTS
      releases.
      
      Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1663
      
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c03f57fd
  7. Jul 25, 2023
    • Peter Maydell's avatar
      tests/decode: Suppress "error: " string for expected-failure tests · 78cc9034
      Peter Maydell authored
      
      The "expected failure" tests for decodetree result in the
      error messages from decodetree ending up in logs and in
      V=1 output:
      
      >>> MALLOC_PERTURB_=226 /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/x86/pyvenv/bin/python3 /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/scripts/decodetree.py --output-null --test-for-error /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/x86/../../tests/decode/err_argset1.decode
      ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀  ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
      /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/x86/../../tests/decode/err_argset1.decode:5: error: duplicate argument "a"
      ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
       1/44 qemu:decodetree / err_argset1                OK              0.05s
      
      This then produces false positives when scanning the
      logfiles for strings like "error: ".
      
      For the expected-failure tests, make decodetree print
      "detected:" instead of "error:".
      
      Signed-off-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: default avatarPhilippe Mathieu-Daudé <philmd@linaro.org>
      Reviewed-by: default avatarRichard Henderson <richard.henderson@linaro.org>
      Message-id: 20230720131521.1325905-1-peter.maydell@linaro.org
      78cc9034
    • Peter Maydell's avatar
      scripts/git-submodule.sh: Don't rely on non-POSIX 'read' behaviour · f9540bb1
      Peter Maydell authored
      
      The POSIX definition of the 'read' utility requires that you
      specify the variable name to set; omitting the name and
      having it default to 'REPLY' is a bashism. If your system
      sh is dash, then it will print an error message during build:
      
      qemu/pc-bios/s390-ccw/../../scripts/git-submodule.sh: 106: read: arg count
      
      Specify the variable name explicitly.
      
      Fixes: fdb8fd8c ("git-submodule: allow partial update of .git-submodule-status")
      Signed-off-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: default avatarDaniel P. Berrangé <berrange@redhat.com>
      Reviewed-by: default avatarPhilippe Mathieu-Daudé <philmd@linaro.org>
      Message-id: 20230720153038.1587196-1-peter.maydell@linaro.org
      f9540bb1
  8. Jul 17, 2023
  9. Jul 07, 2023
  10. Jul 03, 2023
    • Alex Bennée's avatar
      scripts/oss-fuzz: add a suppression for keymap · 9ea2e69f
      Alex Bennée authored
      
      When updating to the latest fedora the santizer found more leaks
      inside xkbmap:
      
        FAILED: pc-bios/keymaps/ar
        /builds/stsquad/qemu/build-oss-fuzz/qemu-keymap -f pc-bios/keymaps/ar -l ara
        =================================================================
        ==3604==ERROR: LeakSanitizer: detected memory leaks
        Direct leak of 1424 byte(s) in 1 object(s) allocated from:
            #0 0x56316418ebec in __interceptor_calloc (/builds/stsquad/qemu/build-oss-fuzz/qemu-keymap+0x127bec) (BuildId: a2ad9da3190962acaa010fa8f44a9269f9081e1c)
            #1 0x7f60d4dc067e  (/lib64/libxkbcommon.so.0+0x1c67e) (BuildId: b243a34e4e58e6a30b93771c256268b114d34b80)
            #2 0x7f60d4dc2137 in xkb_keymap_new_from_names (/lib64/libxkbcommon.so.0+0x1e137) (BuildId: b243a34e4e58e6a30b93771c256268b114d34b80)
            #3 0x5631641ca50f in main /builds/stsquad/qemu/build-oss-fuzz/../qemu-keymap.c:215:11
      
      and many more. As we can't do anything about the library add a
      suppression to keep the CI going with what its meant to be doing.
      
      Reviewed-by: default avatarRichard Henderson <richard.henderson@linaro.org>
      Signed-off-by: default avatarAlex Bennée <alex.bennee@linaro.org>
      Message-Id: <20230630180423.558337-8-alex.bennee@linaro.org>
      9ea2e69f
  11. Jun 27, 2023
  12. Jun 26, 2023
  13. Jun 07, 2023
  14. Jun 06, 2023
    • Paolo Bonzini's avatar
      configure: remove --with-git-submodules= · 6f3ae23b
      Paolo Bonzini authored
      
      Reuse --enable/--disable-download to control git submodules as well.
      Adjust the error messages of git-submodule.sh to refer to the new
      option.
      
      Reviewed-by: default avatarDaniel P. Berrangé <berrange@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6f3ae23b
    • Paolo Bonzini's avatar
      meson: subprojects: replace berkeley-{soft,test}float-3 with wraps · d2dfe0b5
      Paolo Bonzini authored
      
      Unlike other subprojects, these require an overlay directory to include
      meson rules to build the libraries.  The rules are basically lifted
      from tests/fp/meson.build, with a few changes to create platform.h
      and publish a dependency.
      
      The build defines are passed through a subproject option, and posted
      back to users of the library via the dependency's compile_args.
      
      The only remaining user of GIT_SUBMODULES and GIT_SUBMODULES_ACTION
      is roms/SLOF, which is used to build pc-bios/s390-ccw.  All other
      roms submodules are only present to satisfy the license on pre-built
      firmware blobs.
      
      Best reviewed with --color-moved.
      
      Reviewed-by: default avatarDaniel P. Berrangé <berrange@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d2dfe0b5
    • Paolo Bonzini's avatar
      meson: subprojects: replace submodules with wrap files · 2019cabf
      Paolo Bonzini authored
      
      Compared to submodules, .wrap files have several advantages:
      
      * option parsing and downloading is delegated to meson
      
      * the commit is stored in a text file instead of a magic entry in the
        git tree object
      
      * we could stop shipping external dependencies that are only used as a
        fallback, but not break compilation on platforms that lack them.
        For example it may make sense to download dtc at build time, controlled
        by --enable-download, even when building from a tarball.  Right now,
        this patch does the opposite: make-release treats dtc like libvfio-user
        (which is not stable API and therefore hasn't found its way into any
        distros) and keycodemap (which is a copylib, for better or worse).
      
      dependency() can fall back to a wrap automatically.  However, this
      is only possible for libraries that come with a .pc file, and this
      is not very common for libfdt even though the upstream project in
      principle provides it; it also removes the control that we provide with
      --enable-fdt={system,internal}.  Therefore, the logic to pick system
      vs. internal libfdt is left untouched.
      
      --enable-fdt=git is removed; it was already a synonym for
      --enable-fdt=internal.
      
      Reviewed-by: default avatarDaniel P. Berrangé <berrange@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2019cabf
    • Paolo Bonzini's avatar
      build: log submodule update from git-submodule.sh · d120116b
      Paolo Bonzini authored
      
      Print exactly which submodules have been updated, by reusing the logic of
      "git-submodule.sh validate" after executing "git submodule update --init'.
      
      Reviewed-by: default avatarAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d120116b
    • Paolo Bonzini's avatar
      git-submodule: allow partial update of .git-submodule-status · fdb8fd8c
      Paolo Bonzini authored
      
      Allow a specific subdirectory to run git-submodule.sh with only a
      subset of submodules, without removing the others from the
      .git-submodule-status file.
      
      This also allows scripts/git-submodule.sh to be more lenient:
      validating an empty set of submodules is not a mistake.
      
      Reviewed-by: default avatarAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fdb8fd8c
    • Paolo Bonzini's avatar
      configure: remove --with-git= option · 50cfed80
      Paolo Bonzini authored
      
      The scenario for which --with-git= was introduced was to use a SOCKS proxy
      such as tsocks.  However, this was back in 2017 when QEMU's submodules
      used the git:// protocol, and it is not as important when using the
      "smart HTTP" backend; for example, neither "meson subprojects download"
      nor scripts/checkpatch.pl obey the GIT environment variable.
      
      So remove the knob, but test for the presence of git in the configure and
      git-submodule.sh scripts, and suggest using --with-git-submodules=validate
      + a manual invocation of git-submodule.sh when git does not work.  Hopefully
      in the future the GIT environment variable will be supported by Meson.
      
      Reviewed-by: default avatarThomas Huth <thuth@redhat.com>
      Reviewed-by: default avatarAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      50cfed80
    • Paolo Bonzini's avatar
      tests: Use separate virtual environment for avocado · e8e4298f
      Paolo Bonzini authored
      
      This reverts commits eea2d141 ("Makefile: remove $(TESTS_PYTHON)",
      2023-05-26) and 9c6692db ("tests: Use configure-provided pyvenv for
      tests", 2023-05-18).
      
      Right now, there is a conflict between wanting a ">=" constraint when
      using a distro-provided package and wanting a "==" constraint when
      installing Avocado from PyPI; this would provide the best of both worlds
      in terms of resiliency for both distros that have required packages and
      distros that don't.
      
      The conflict is visible also for meson, where we would like to install
      the latest 0.63.x version but also accept a distro 1.1.x version.
      But it is worse for avocado, for two reasons:
      
      1) we cannot use an "==" constraint to install avocado if the venv
      includes a system avocado.  The distro will package plugins that have
      "==" constraints on the version that is included in the distro, and, using
      "pip install avocado==88.1" on a venv that includes system packages will
      result in this error:
      
         ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
         avocado-framework-plugin-varianter-yaml-to-mux 98.0 requires avocado-framework==98.0, but you have avocado-framework 88.1 which is incompatible.
         avocado-framework-plugin-result-html 98.0 requires avocado-framework==98.0, but you have avocado-framework 88.1 which is incompatible.
         make[1]: Leaving directory '/home/berrange/src/virt/qemu/build'
      
      2) we cannot use ">=" either if the venv does _not_ include a system
      avocado, because that would result in the installation of v101.0 which
      is the one we've just reverted.
      
      So the idea is to encode the dependencies as an (acceptable, locked)
      tuple, like this hypothetical TOML that would be committed inside
      python/ and used by mkvenv.py:
      
        [meson]
        meson = { minimum = "0.63.0", install = "0.63.3", canary = "meson" }
      
        [docs]
        # 6.0 drops support for Python 3.7
        sphinx = { minimum = "1.6", install = "<6.0", canary = "sphinx-build" }
        sphinx_rtd_theme = { minimum = "0.5" }
      
        [avocado]
        avocado-framework = { minimum = "88.1", install = "88.1", canary = "avocado" }
      
      Once this is implemented, it would also be possible to install avocado in
      pyvenv/ using "mkvenv.py ensure", thus using the distro package on Fedora
      and CentOS Stream (the only distros where it's available).  But until
      this is implemented, keep avocado in a separate venv.  There is still the
      benefit of using a single python for meson custom_targets and for sphinx.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e8e4298f
    • Paolo Bonzini's avatar
      scripts: remove dead file · 0dec4e6f
      Paolo Bonzini authored
      
      scripts/test-driver.py was used when "make check" was already using meson
      introspection data, but it did not execute "meson test".  It is dead since
      commit 3d2f73ef ("build: use "meson test" as the test harness", 2021-12-23).
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0dec4e6f
  15. Jun 05, 2023
  16. Jun 01, 2023
  17. May 30, 2023
Loading