docs: refactor DPDK docs and add performance tuning section

Ticket: #5857
Ticket: #5858
pull/8861/head
Lukas Sismis 2 years ago committed by Victor Julien
parent 03319263db
commit 1c3cb1e8cc

@ -1927,16 +1927,16 @@ Data Plane Development Kit (DPDK)
`Data Plane Development Kit <https://www.dpdk.org/>`_ is a framework for fast `Data Plane Development Kit <https://www.dpdk.org/>`_ is a framework for fast
packet processing in data plane applications running on a wide variety of CPU packet processing in data plane applications running on a wide variety of CPU
architectures. DPDK `Environment Abstraction Layer (EAL) architectures. DPDK's `Environment Abstraction Layer (EAL)
<https://doc.dpdk.org/guides/prog_guide/env_abstraction_layer.html>`_ <https://doc.dpdk.org/guides/prog_guide/env_abstraction_layer.html>`_
provides a generic interface to low-level resources. It is a unique way how provides a generic interface to low-level resources. It is a unique way how
DPDK libraries access NICs. EAL creates an API for application to access NIC DPDK libraries access NICs. EAL creates an API for an application to access NIC
resources from the userspace level. In DPDK, packets are not retrieved via resources from the userspace level. In DPDK, packets are not retrieved via
interrupt handling. Instead, the application `polls interrupt handling. Instead, the application `polls
<https://doc.dpdk.org/guides/prog_guide/poll_mode_drv.html>`_ NIC for newly <https://doc.dpdk.org/guides/prog_guide/poll_mode_drv.html>`_ the NIC for newly
received packets. received packets.
DPDK allows the user space application to directly access memory where NIC DPDK allows the user space application to directly access memory where the NIC
stores the packets. As a result, neither DPDK nor the application copies the stores the packets. As a result, neither DPDK nor the application copies the
packets for the inspection. The application directly processes packets via packets for the inspection. The application directly processes packets via
passed packet descriptors. passed packet descriptors.
@ -1958,7 +1958,7 @@ Support for DPDK can be enabled in configure step of the build process such as:
Suricata makes use of DPDK for packet acquisition in workers runmode. Suricata makes use of DPDK for packet acquisition in workers runmode.
The whole DPDK configuration resides in the `dpdk:` node. This node encapsulates The whole DPDK configuration resides in the `dpdk:` node. This node encapsulates
2 main subnodes and those are eal-params and interfaces. 2 main subnodes, and those are eal-params and interfaces.
:: ::
@ -1981,43 +1981,47 @@ The whole DPDK configuration resides in the `dpdk:` node. This node encapsulates
copy-iface: none # or PCIe address of the second interface copy-iface: none # or PCIe address of the second interface
The node `dpdk.eal-params` consists of `DPDK arguments The `DPDK arguments
<https://doc.dpdk.org/guides/linux_gsg/linux_eal_parameters.html>`_ <https://doc.dpdk.org/guides/linux_gsg/linux_eal_parameters.html>`_, which
that are usually passed through command line. These arguments are used to are typically provided through the command line, are contained in the node
initialize and configure EAL. Arguments can be specified in either long or `dpdk.eal-params`. EAL is configured and initialized using these
short forms. When specifying the arguments, the dashes are omitted. parameters. There are two ways to specify arguments: lengthy and short.
Among other settings, this configuration node is able to configure available Dashes are omitted when describing the arguments. This setup node can be
NICs to Suricata, memory settings or other parameters related to EAL. used to set up the memory configuration, accessible NICs, and other EAL-related
parameters, among other things. The definition of lcore affinity as an EAL
The node `dpdk.interfaces` wraps a list of interface configurations. Items of parameter is a standard practice. However, lcore parameters like `-l`, `-c`,
the list follows the structure that can be found in other capture interfaces. and `--lcores`` are specified within the `suricata-yaml-threading`_ section
to prevent configuration overlap.
The node `dpdk.interfaces` wraps a list of interface configurations. Items on
the list follow the structure that can be found in other capture interfaces.
The individual items contain the usual configuration options The individual items contain the usual configuration options
such as `threads`/`copy-mode`/`checksum-checks` settings. Other capture such as `threads`/`copy-mode`/`checksum-checks` settings. Other capture
interfaces, such as AF_PACKET, rely on the user that NICs are appropriately interfaces, such as AF_PACKET, rely on the user to ensure that NICs are
configured. appropriately configured.
Configuration through kernel does not apply to applications running under DPDK. Configuration through the kernel does not apply to applications running under
The application is solely responsible for the initialization of NICs it is DPDK. The application is solely responsible for the initialization of the NICs
using. So, before the start of Suricata, NICs that Suricata uses, must undergo it is using. So, before the start of Suricata, the NICs that Suricata uses,
the process of initialization. must undergo the process of initialization.
As a result, there are extra extra configuration options (how NICs can be As a result, there are extra configuration options (how NICs can be
configured) in the items (interfaces) of the `dpdk.interfaces` list. configured) in the items (interfaces) of the `dpdk.interfaces` list.
At the start of the configuration process, all NIC offloads are disabled to At the start of the configuration process, all NIC offloads are disabled to
prevent any packet modification. According to the configuration, checksum prevent any packet modification. According to the configuration, checksum
validation offload can be enabled to drop invalid packets. Other offloads can validation offload can be enabled to drop invalid packets. Other offloads can
not be currently enabled. not currently be enabled.
Additionally, the list items of `dpdk.interfaces` contains DPDK specific Additionally, the list items in `dpdk.interfaces` contain DPDK specific
settings such as `mempool-size` or `rx-descriptors`. These settings adjust settings such as `mempool-size` or `rx-descriptors`. These settings adjust
individual parameters of EAL. One of the entries of the `dpdk.interfaces` is individual parameters of EAL. One of the entries in `dpdk.interfaces` is
the `default` interface. When loading interface configuration and some entry is the `default` interface. When loading interface configuration and some entry is
missing, the corresponding value of the `default` interface is used. missing, the corresponding value of the `default` interface is used.
The worker threads must be assigned to a specific cores. The configuration The worker threads must be assigned to specific cores. The configuration
module `threading` can be used to set threads affinity. module `threading` must be used to set thread affinity.
Worker threads can be pinned to cores in the array configured in Worker threads can be pinned to cores in the array configured in
`threading.cpu-affinity["worker-cpu-set"]`. Performance-oriented setups have `threading.cpu-affinity["worker-cpu-set"]`. Performance-oriented setups have
everything (the NIC, memory and CPU cores interacting with the NIC) based on everything (the NIC, memory, and CPU cores interacting with the NIC) based on
one NUMA node. one NUMA node.
It is therefore required to know layout of the server architecture to get the It is therefore required to know the layout of the server architecture to get the
best results. The CPU core ids and NUMA locations can be determined for example best results. The CPU core ids and NUMA locations can be determined for example
from the output of `/proc/cpuinfo` where `physical id` described the NUMA from the output of `/proc/cpuinfo` where `physical id` described the NUMA
number. The NUMA node to which the NIC is connected to can be determined from number. The NUMA node to which the NIC is connected to can be determined from
@ -2032,34 +2036,37 @@ the file `/sys/class/net/<KERNEL NAME OF THE NIC>/device/numa_node`.
## cat /sys/class/net/<KERNEL NAME OF THE NIC>/device/numa_node e.g. ## cat /sys/class/net/<KERNEL NAME OF THE NIC>/device/numa_node e.g.
cat /sys/class/net/eth1/device/numa_node cat /sys/class/net/eth1/device/numa_node
If Suricata has enabled at least 2 (or more) workers, the incoming traffic is Suricata operates in workers runmode. Packet distribution relies on Receive
load balanced across the worker threads by Receive Side Scaling (RSS). Side Scaling (RSS), which distributes packets across the NIC queues.
Individual Suricata workers then poll packets from the NIC queues.
Internally, DPDK runmode uses a `symmetric hash (0x6d5a) Internally, DPDK runmode uses a `symmetric hash (0x6d5a)
<https://www.ran-lifshitz.com/2014/08/28/symmetric-rss-receive-side-scaling/>`_ <https://www.ran-lifshitz.com/2014/08/28/symmetric-rss-receive-side-scaling/>`_
that redirects bi-flows to specific workers. that redirects bi-flows to specific workers.
Before Suricata can be run, it is required to allocate sufficient number of Before Suricata can be run, it is required to allocate a sufficient number of
hugepages. Suricata allocates continuous block of memory. hugepages. For efficiency, hugepages are continuous chunks of memory (pages)
For efficiency, CPU allocates memory in RAM in chunks. These chunks are usually that are larger (2 MB+) than what is typically used in the operating systems
in size of 4096 bytes. DPDK and other memory intensive applications makes use (4 KB). A lower count of pages allows faster lookup of page entries. The
of hugepages. Hugepages start at the size of 2MB but they can be as large as hugepages need to be allocated on the NUMA node where the NIC and affiniated
1GB. Lower count of pages (memory chunks) allows faster lookup of page entries. CPU cores reside. For example, if the hugepages are allocated only on NUMA
The hugepages need to be allocated on the NUMA node where the NIC and CPU node 0 and the NIC is connected to NUMA node 1, then the application will fail
resides. Otherwise, if the hugepages are allocated only on NUMA node 0 and to start. As a result, it is advised to identify the NUMA node to which the
the NIC is connected to NUMA node 1, then the application will fail to start. NIC is attached before allocating hugepages and setting CPU core affinity to
Therefore, it is recommended to first find out to which NUMA node the NIC is that node. In case Suricata deployment uses multiple NICs, hugepages must be
connected to and only then allocate hugepages and set CPU cores affinity to allocated on each of the NUMA nodes used by the Suricata deployment.
the given NUMA node. If the Suricata deployment is using multiple NICs on
different NUMA nodes then hugepages must be allocated on all of those NUMA
nodes.
:: ::
## To check number of allocated hugepages: ## To check number of allocated hugepages:
sudo dpdk-hugepages.py -s
# alternative (older) way
grep Huge /proc/meminfo grep Huge /proc/meminfo
## Allocate hugepages on NUMA node 0: ## Allocate 2 GB in hugepages on all available NUMA nodes:
echo 8192 | sudo tee \ # (number of hugepages depend on the default size of hugepages 2 MB / 1 GB)
sudo dpdk-hugepages.py --setup 2G
# alternative (older) way allocates 1024 2 MB hugepages but only on NUMA 0
echo 1024 | sudo tee \
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
@ -2067,37 +2074,49 @@ DPDK memory pools hold packets received from NICs. These memory pools are
allocated in hugepages. One memory pool is allocated per interface. The size allocated in hugepages. One memory pool is allocated per interface. The size
of each memory pool can be individual and is set with the `mempool-size`. of each memory pool can be individual and is set with the `mempool-size`.
Memory (in bytes) for one memory pool is calculated as: `mempool-size` * `mtu`. Memory (in bytes) for one memory pool is calculated as: `mempool-size` * `mtu`.
Sum of memory pool requirements divided by the size of one hugepage results in The sum of memory pool requirements divided by the size of one hugepage results
the number of required hugepages. It causes no problem to allocate more memory in the number of required hugepages. It causes no problem to allocate more
than required but it is vital for Suricata to not run out of hugepages. memory than required, but it is vital for Suricata to not run out of hugepages.
Mempool cache is local to the individual CPU cores and holds packets that were The mempool cache is local to the individual CPU cores and holds packets that
recently processed. As the mempool is shared among all cores, cache tries to were recently processed. As the mempool is shared among all cores, the cache
minimize the required inter-process synchronization. Recommended size of the tries to minimize the required inter-process synchronization. The recommended
cache is covered in the YAML file. size of the cache is covered in the YAML file.
There has been an ongoing effort to add a DPDK support into Suricata. While the To be able to run DPDK on Intel cards, it is required to change the default
capture interface is continually evolving, there has been certain areas with Intel driver to either `vfio-pci` or `igb_uio` driver. The process is
an increased focus. The current version of the DPDK capture interface provides described in `DPDK manual page regarding Linux drivers
support for physical NICs and for running on physical machines in workers <https://doc.dpdk.org/guides/linux_gsg/linux_drivers.html>`_.
runmode. The work has not been tested neither with the virtual interfaces nor DPDK is natively supported by Mellanox and thus their NICs should work
in the virtual environments like VMs, Docker or similar. "out of the box".
Although the capture interface uses DPDK library, there is no need to configure **Current DPDK support** involves Suricata running on:
any lcores. The capture interface uses the standard Suricata threading module.
Additionally, Suricata is intended to run as a primary process only. * a physical machine with a physical NICs such as:
* mlx5 (ConnectX-4/ConnectX-5/ConnectX-6)
* ixgbe
* i40e
* ice
* a virtual machine with virtual interfaces such as:
* e1000
* VMXNET3
* virtio-net
Other NICs using the same driver as mentioned above should work as well.
The DPDK capture interface has not been tested neither with the virtual
interfaces nor in the virtual environments like VMs, Docker or similar.
The minimal supported DPDK is version 19.11 which should be available in most The minimal supported DPDK is version 19.11 which should be available in most
repositories of major distributions. Alternatively, it is also possible to use repositories of major distributions.
`meson` and `ninja` to build and install DPDK from scratch. Alternatively, it is also possible to use `meson` and `ninja` to build and
install DPDK from source files.
It is required to have correctly configured tool `pkg-config` as it is used to It is required to have correctly configured tool `pkg-config` as it is used to
load libraries and CFLAGS during the Suricata configuration and compilation. load libraries and CFLAGS during the Suricata configuration and compilation.
This can be tested by querying DPDK version as:
To be able to run DPDK on Intel cards, it is required to change the default ::
Intel driver to either `vfio-pci` or `igb_uio` driver. The process is described
in `DPDK manual page regarding Linux drivers pkg-config --modversion libdpdk
<https://doc.dpdk.org/guides/linux_gsg/linux_drivers.html>`_. DPDK is natively
supported by Mellanox and thus their NICs should work "out of the box".
Pf-ring Pf-ring
~~~~~~~ ~~~~~~~

Loading…
Cancel
Save