There was an inversion in code resulting as all sockets being seen
as non IPS mode when doing the peering. This resulted in a crash at
first packet because it has no peer.
A crash can occurs in the following conditions:
* Suricata running in other mode than "workers"
* Kernel fill in the ring buffer
Under this conditions, it is possible that the capture thread reads
a packet that has not yet released by one of the treatment threads
because there is no modification done on the ring buffer entry when
a packet is read. Doing, this it access to memory which can be
released to the kernel and modified. This results in a kind of memory
corruption.
This bug has only been seen recently and this has to be linked with the
read speed improvement recently made in AF_PACKET support.
The patch fixes the issue by modifying the tp_status bitmask in the
ring buffer. It sets the TP_STATUS_USER_BUSY flag when it is confirmed
that the packet will be treated. And at the start of the read, it exits
from the reading loop (returning to poll) when it reaches a packet with
the flag set. As tp_status is set to 0 during packet release the flag
is destroyed when releasing the packet.
Regarding concurrency, we've got a sequence of modification. The
capture thread read the packet and set the flag, then it passes the
queue and the packet get processed by other threads. The change on
tp_status are thus made at different time.
Regarding the value of the flag, the patch uses the last bit of
tp_status to avoid be impacting by a change in kernel. I will
propose a patch to have TP_STATUS_USER_BUSY included in kernel
as this is a generic issue for multithreading application using
AF_PACKET mechanism.
If a capture loop does exit, the thread needs to start without
synchronization with the other threads. This patch fixes this
by resetting the turn count on the peerslist structure and
adding a test on this condition in the wait function.
It seems that, in some case, there is a read waiting but the
offset in the ring buffer is not correct and Suricata need to
walk the ring to find the correct place and make the read.
This patch fixes emergency mode by setting the variable even if we
have a non kernel checksum check. It also does a call to
AFPDUmpCounters() as it seems to improve thing to do it ASAP.
This patch implements "late open". On high performance system, it
is needed to create the AF_PACKET just before reading to avoid
overflow. Socket creation has to be done with respect to the order
of thread creation to respect affinity settings.
This patch adds a counter to AFPPeer to be ale to synchronize the
initial socket creation.
Suricata was not able to start cleanly in AF_PACKET with default
suricata.yaml file if there was no eth1 on the system. This patch
fixes this issue and rework the socket transition phase to fix
some serious issues (file descriptor leak) found when fixing this
problem.
Every 20 seconds it displays a message to the user to warn him about
the interface not being accessible:
[ERRCODE: SC_ERR_AFP_CREATE(196)] - Can not open iface 'eth1'
If the MTU on the reception interface and the one on the transmission
interface are different, this will result in an error at transmission
when sending packet to the wire.
Flush all waiting packets to be in sync with kernel when drop
occurs. This mode can be activated by setting use-emergency-flush
to yes in the interface configuration.
This patch moves raw socket binding at the end of init code to
avoid to have a flow of packets reaching the socket before we
start to read them.
The socket creation is now made in the loop function to avoid
any timing issue between init function and the call of the loop.
This patch adds a new feature to AF_PACKET capture mode. It is now
possible to use AF_PACKET in IPS and TAP mode: all traffic received
on a interface will be forwarded (at the Ethernet level) to an other
interface. To do so, Suricata create a raw socket and sends the receive
packets to a interface designed in the configuration file.
This patch adds two variables to the configuration of af-packet
interface:
copy-mode: ips or tap
copy-iface: eth1 #the interface where packet are copied
If copy-mode is set to ips then the packet wth action DROP are not
copied to the destination interface. If copy-mode is set to tap,
all packets are copied to the destination interface.
Any other value of copy-mode results in the feature to be unused.
There is no default interface for copy-iface and the variable has
to be set for the ids or tap mode to work.
For now, this feature depends of the release data system. This
implies you need to activate the ring mode and zero copy. Basically
use-mmap has to be set to yes.
This patch adds a peering of AF_PACKET sockets from the thread on
one interface to the threads on another interface. Peering is
necessary as if we use an other socket the capture socket receives
all emitted packets. This is made using a new AFPPeer structure to
avoid direct interaction between AFPTreadVars.
There is currently a bug in Linux kernel (prior to 3.6) and it is
not possible to use multiple threads.
You need to setup two interfaces with equality on the threads
variable. copy-mode variable must be set on the two interfaces
and use-mmap must be set to activated.
A valid configuration for an IPS using eth0 and vboxnet1 interfaces
will look like:
af-packet:
- interface: eth0
threads: 1
defrag: yes
cluster-type: cluster_flow
cluster-id: 98
copy-mode: ips
copy-iface: vboxnet1
buffer-size: 64535
use-mmap: yes
- interface: vboxnet1
threads: 1
cluster-id: 97
defrag: yes
cluster-type: cluster_flow
copy-mode: ips
copy-iface: eth0
buffer-size: 64535
use-mmap: yes
This patch adds a data release mechanism. If the capture module
has a call to indicate that userland has finished with the data,
it is possible to use this system. The data will then be released
when the treatment of the packet is finished.
To do so the Packet structure has been modified:
+ TmEcode (*ReleaseData)(ThreadVars *, struct Packet_ *);
If ReleaseData is null, the function is called when the treatment
of the Packet is finished.
Thus it is sufficient for the capture module to code a function
wrapping the data release mechanism and to assign it to ReleaseData
field.
This patch also includes an implementation of this mechanism for
AF_PACKET.
The mmaped mode was using a too small ring buffer size which was
not able to handle burst of packets coming from the network. This
may explain the important packet loss rate observed by Edward
Fjellskål.
This patch increases the default value and adds a ring-size
variable which can be used to manually tune the value.
This patch should bring some improvements by looping on the
ring when there is some data available instead of getting back
to the poll. It also fix recovery in case of drops on the ring
because the poll command will not return correctly in this case.
clang was issuing some warnings related to unused return in function.
This patch adds some needed error treatment and ignore the rest of the
warnings by adding a cast to void.
This patch adds counters for kernel drops and accepts to af-packet
capture module. This information are periodically displayed in
stats.log:
capture.kernel_packets | RxAFP1 | 1792
capture.kernel_drops | RxAFP1 | 0
The statistic is fetch via a setsockopt call every 255 packets.
This patch adds support for BPF in AF_PACKET running
mode. The command line syntax is the same as the one
used of PF_RING.
The method is the same too: The pcap_compile__nopcap()
function is used to build the BPF filter. It is then
injected into the kernel with a setsockopt() call. If
the adding of the BPF fail, suricata exit.
This patch will allow us to use the datalink when computing the filter.
It also fixes a potential issue where an interface data type change
after the interface if going down/up.
This patch adds support for zero copy to AF_PACKET running mode.
This requires to use the 'worker' mode which is the only one where
the threading architecture is simple enough to permit this without
heavy modification.
This patch adds mmap support for af-packet. Suricata now makes
use of the ring buffer feature of AF_PACKET if 'use-mmap' variable
is set to yes on an interface.
This flag adds variable to disable offloading detection. The effect
of the flag is to avoid to transmit auxiliary data at each packet.
This could result in a potential performance gain.
A devide configuration can be used by multiple threads. It is thus
necessary to wait that all threads stop using the configuration before
freeing it. This patch introduces an atomic counter and a free function
which has to be called by each thread when it will not use anymore
the structure. If the configuration is not used anymore, it is freed
by the free function.