Skip to content
Snippets Groups Projects
  1. Sep 18, 2023
    • Ilya Maximets's avatar
      net: add initial support for AF_XDP network backend · cb039ef3
      Ilya Maximets authored
      
      AF_XDP is a network socket family that allows communication directly
      with the network device driver in the kernel, bypassing most or all
      of the kernel networking stack.  In the essence, the technology is
      pretty similar to netmap.  But, unlike netmap, AF_XDP is Linux-native
      and works with any network interfaces without driver modifications.
      Unlike vhost-based backends (kernel, user, vdpa), AF_XDP doesn't
      require access to character devices or unix sockets.  Only access to
      the network interface itself is necessary.
      
      This patch implements a network backend that communicates with the
      kernel by creating an AF_XDP socket.  A chunk of userspace memory
      is shared between QEMU and the host kernel.  4 ring buffers (Tx, Rx,
      Fill and Completion) are placed in that memory along with a pool of
      memory buffers for the packet data.  Data transmission is done by
      allocating one of the buffers, copying packet data into it and
      placing the pointer into Tx ring.  After transmission, device will
      return the buffer via Completion ring.  On Rx, device will take
      a buffer form a pre-populated Fill ring, write the packet data into
      it and place the buffer into Rx ring.
      
      AF_XDP network backend takes on the communication with the host
      kernel and the network interface and forwards packets to/from the
      peer device in QEMU.
      
      Usage example:
      
        -device virtio-net-pci,netdev=guest1,mac=00:16:35:AF:AA:5C
        -netdev af-xdp,ifname=ens6f1np1,id=guest1,mode=native,queues=1
      
      XDP program bridges the socket with a network interface.  It can be
      attached to the interface in 2 different modes:
      
      1. skb - this mode should work for any interface and doesn't require
               driver support.  With a caveat of lower performance.
      
      2. native - this does require support from the driver and allows to
                  bypass skb allocation in the kernel and potentially use
                  zero-copy while getting packets in/out userspace.
      
      By default, QEMU will try to use native mode and fall back to skb.
      Mode can be forced via 'mode' option.  To force 'copy' even in native
      mode, use 'force-copy=on' option.  This might be useful if there is
      some issue with the driver.
      
      Option 'queues=N' allows to specify how many device queues should
      be open.  Note that all the queues that are not open are still
      functional and can receive traffic, but it will not be delivered to
      QEMU.  So, the number of device queues should generally match the
      QEMU configuration, unless the device is shared with something
      else and the traffic re-direction to appropriate queues is correctly
      configured on a device level (e.g. with ethtool -N).
      'start-queue=M' option can be used to specify from which queue id
      QEMU should start configuring 'N' queues.  It might also be necessary
      to use this option with certain NICs, e.g. MLX5 NICs.  See the docs
      for examples.
      
      In a general case QEMU will need CAP_NET_ADMIN and CAP_SYS_ADMIN
      or CAP_BPF capabilities in order to load default XSK/XDP programs to
      the network interface and configure BPF maps.  It is possible, however,
      to run with no capabilities.  For that to work, an external process
      with enough capabilities will need to pre-load default XSK program,
      create AF_XDP sockets and pass their file descriptors to QEMU process
      on startup via 'sock-fds' option.  Network backend will need to be
      configured with 'inhibit=on' to avoid loading of the program.
      QEMU will need 32 MB of locked memory (RLIMIT_MEMLOCK) per queue
      or CAP_IPC_LOCK.
      
      There are few performance challenges with the current network backends.
      
      First is that they do not support IO threads.  This means that data
      path is handled by the main thread in QEMU and may slow down other
      work or may be slowed down by some other work.  This also means that
      taking advantage of multi-queue is generally not possible today.
      
      Another thing is that data path is going through the device emulation
      code, which is not really optimized for performance.  The fastest
      "frontend" device is virtio-net.  But it's not optimized for heavy
      traffic either, because it expects such use-cases to be handled via
      some implementation of vhost (user, kernel, vdpa).  In practice, we
      have virtio notifications and rcu lock/unlock on a per-packet basis
      and not very efficient accesses to the guest memory.  Communication
      channels between backend and frontend devices do not allow passing
      more than one packet at a time as well.
      
      Some of these challenges can be avoided in the future by adding better
      batching into device emulation or by implementing vhost-af-xdp variant.
      
      There are also a few kernel limitations.  AF_XDP sockets do not
      support any kinds of checksum or segmentation offloading.  Buffers
      are limited to a page size (4K), i.e. MTU is limited.  Multi-buffer
      support implementation for AF_XDP is in progress, but not ready yet.
      Also, transmission in all non-zero-copy modes is synchronous, i.e.
      done in a syscall.  That doesn't allow high packet rates on virtual
      interfaces.
      
      However, keeping in mind all of these challenges, current implementation
      of the AF_XDP backend shows a decent performance while running on top
      of a physical NIC with zero-copy support.
      
      Test setup:
      
      2 VMs running on 2 physical hosts connected via ConnectX6-Dx card.
      Network backend is configured to open the NIC directly in native mode.
      The driver supports zero-copy.  NIC is configured to use 1 queue.
      
      Inside a VM - iperf3 for basic TCP performance testing and dpdk-testpmd
      for PPS testing.
      
      iperf3 result:
       TCP stream      : 19.1 Gbps
      
      dpdk-testpmd (single queue, single CPU core, 64 B packets) results:
       Tx only         : 3.4 Mpps
       Rx only         : 2.0 Mpps
       L2 FWD Loopback : 1.5 Mpps
      
      In skb mode the same setup shows much lower performance, similar to
      the setup where pair of physical NICs is replaced with veth pair:
      
      iperf3 result:
        TCP stream      : 9 Gbps
      
      dpdk-testpmd (single queue, single CPU core, 64 B packets) results:
        Tx only         : 1.2 Mpps
        Rx only         : 1.0 Mpps
        L2 FWD Loopback : 0.7 Mpps
      
      Results in skb mode or over the veth are close to results of a tap
      backend with vhost=on and disabled segmentation offloading bridged
      with a NIC.
      
      Signed-off-by: default avatarIlya Maximets <i.maximets@ovn.org>
      Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> (docker/lcitool)
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      cb039ef3
    • Ilya Maximets's avatar
      tests: bump libvirt-ci for libasan and libxdp · a6f376e9
      Ilya Maximets authored
      
      This pulls in the fixes for libasan version as well as support for
      libxdp that will be used for af-xdp netdev in the next commits.
      
      Signed-off-by: default avatarIlya Maximets <i.maximets@ovn.org>
      Reviewed-by: default avatarDaniel P. Berrangé <berrange@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      a6f376e9
  2. Sep 08, 2023
  3. Aug 30, 2023
    • Alex Bennée's avatar
      tests/docker: cleanup non-verbose output · 6445c2ca
      Alex Bennée authored
      
      Even with --quiet docker will spam the sha256 to the console. Avoid
      this by redirecting stdout. While we are at it fix the name we echo
      which was broken during 0b1a6490 (tests/docker: use direct RUNC call
      to build containers).
      
      Reviewed-by: default avatarThomas Huth <thuth@redhat.com>
      Reviewed-by: default avatarPhilippe Mathieu-Daudé <philmd@linaro.org>
      Signed-off-by: default avatarAlex Bennée <alex.bennee@linaro.org>
      Message-Id: <20230829161528.2707696-3-alex.bennee@linaro.org>
      6445c2ca
    • Daniel P. Berrangé's avatar
      gitlab: enable ccache for many build jobs · 2f7350cd
      Daniel P. Berrangé authored
      
      The `ccache` tool can be very effective at reducing compilation times
      when re-running pipelines with only minor changes each time. For example
      a fresh 'build-system-fedora' job will typically take 20 minutes on the
      gitlab.com shared runners. With ccache this is reduced to as little as
      6 minutes.
      
      Normally meson would auto-detect existance of ccache in $PATH and use
      it automatically, but the way we wrap meson from configure breaks this,
      as we're passing in an config file with explicitly set compiler paths.
      Thus we need to add $CCACHE_WRAPPERSPATH to the front of $PATH. For
      unknown reasons if doing this in msys though, gcc becomes unable to
      invoke 'cc1' when run from meson. For msys we thus set CC='ccache gcc'
      before invoking 'configure' instead.
      
      A second problem with msys is that cache misses are incredibly
      expensive, so enabling ccache massively slows down the build when
      the cache isn't well populated. This is suspected to be a result of
      the cost of spawning processes under the msys architecture. To deal
      with this we set CCACHE_DEPEND=1 which enables ccache's 'depend_only'
      strategy. This avoids extra spawning of the pre-processor during
      cache misses, with the downside that is it less likely ccache will
      find a cache hit after semantically benign compiler flag changes.
      This is the lesser of two evils, as otherwise we can't use ccache
      at all under msys and remain inside the job time limit.
      
      If people are finding ccache to hurt their pipelines, it can be
      disabled by setting the 'CCACHE_DISABLE=1' env variable against
      their gitlab fork CI settings.
      
      Signed-off-by: default avatarDaniel P. Berrangé <berrange@redhat.com>
      Reviewed-by: default avatarPhilippe Mathieu-Daudé <philmd@linaro.org>
      Message-Id: <20230804111054.281802-2-berrange@redhat.com>
      Signed-off-by: default avatarAlex Bennée <alex.bennee@linaro.org>
      Message-Id: <20230829161528.2707696-2-alex.bennee@linaro.org>
      2f7350cd
  4. Aug 28, 2023
  5. Jul 17, 2023
  6. Jul 03, 2023
  7. May 26, 2023
  8. May 18, 2023
    • John Snow's avatar
      tests/docker: add python3-venv dependency · a22a4b29
      John Snow authored
      
      Several debian-based tests need the python3-venv dependency as a
      consequence of Debian debundling the "ensurepip" module normally
      included with Python.
      
      As mkvenv.py stands as of this commit, Debian requires EITHER:
      
      (A) setuptools and pip, or
      (B) ensurepip
      
      mkvenv is a few seconds faster if you have setuptools and pip, so
      developers should prefer the first requirement. For the purposes of CI,
      the time-save is a wash; it's only a matter of who is responsible for
      installing pip and when; the timing is about the same.
      
      Arbitrarily, I chose adding ensurepip to the test configuration because
      it is normally part of the Python stdlib, and always having it allows us
      a more consistent cross-platform environment.
      
      Signed-off-by: default avatarJohn Snow <jsnow@redhat.com>
      Reviewed-by: default avatarDaniel P. Berrangé <berrange@redhat.com>
      Message-Id: <20230511035435.734312-12-jsnow@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a22a4b29
  9. May 16, 2023
    • Ani Sinha's avatar
      tests/lcitool: Add mtools and xorriso and remove genisoimage as dependencies · da900078
      Ani Sinha authored
      
      Bios bits avocado tests need mformat (provided by the mtools package) and
      xorriso tools in order to run within gitlab CI containers. Add those
      dependencies within the Dockerfiles so that containers can be built with
      those tools present and bios bits avocado tests can be run there.
      
      xorriso package conflicts with genisoimage package on some distributions.
      Therefore, it is not possible to have both the packages at the same time
      in the container image uniformly for all distribution flavors. Further,
      on some distributions like RHEL, both xorriso and genisoimage
      packages provide /usr/bin/genisoimage and on some other distributions like
      Fedora, only genisoimage package provides the same utility.
      Therefore, this change removes the dependency on geninsoimage for building
      container images altogether keeping only xorriso package. At the same time,
      cdrom-test.c is updated to use and check for existence of only xorrisofs.
      
      Signed-off-by: default avatarAni Sinha <anisinha@redhat.com>
      Message-Id: <20230504154611.85854-3-anisinha@redhat.com>
      Reviewed-by: default avatarThomas Huth <thuth@redhat.com>
      Reviewed-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      da900078
  10. May 10, 2023
  11. Apr 20, 2023
  12. Apr 04, 2023
  13. Mar 22, 2023
  14. Mar 13, 2023
  15. Mar 01, 2023
  16. Feb 27, 2023
  17. Feb 23, 2023
    • John Snow's avatar
      python: drop pipenv · 6832189f
      John Snow authored
      
      The pipenv tool was nice in theory, but in practice it's just too hard
      to update selectively, and it makes using it a pain. The qemu.qmp repo
      dropped pipenv support a while back and it's been functioning just fine,
      so I'm backporting that change here to qemu.git.
      
      Signed-off-by: default avatarJohn Snow <jsnow@redhat.com>
      Message-id: 20230210003147.1309376-3-jsnow@redhat.com
      Signed-off-by: default avatarJohn Snow <jsnow@redhat.com>
      6832189f
  18. Feb 02, 2023
Loading