Since commit in kernel
commit a3bcc23e890a6d49d6763d9eb073d711de2e0469
Author: Ben Greear <greearb@candelatech.com>
Date: Wed Jun 1 06:49:10 2011 +0000
af-packet: Add flag to distinguish VID 0 from no-vlan.
a flag is set to indicate VLAN has been set in packet header.
As suggested in commit message, using a test of the flag followed
by a check on vci value ensure backward compatibility of the test.
To be able to register counters from AppLayerGetCtxThread, the
ThreadVars pointer needs to be available in it and thus in it's
callers:
- AppLayerGetCtxThread
- DecodeThreadVarsAlloc
- StreamTcpReassembleInitThreadCtx
Logic of patch 98e4a14f6d was correct
but implementation is wrong because TP_STATUS_KERNEL is equal to
zero and thus can not be evaluated in a binary operation. This patch
updates the logic by doing two tests.
Reported-by: Alessandro Guido
This patch updates the logic of the packet acquisition loop. When
the reader loop function is called and when the data to read
at offset is a without data (kernel) or still used by suricata. We
try to iter for a loop on the ring to try to find kernel put by
data.
As we are entering the function because the poll said there was some
data. This allow us to jump to the data added to the ring by the
kernel.
When using suricata in autofp mode, with multiple detect threads and
packet acquisition threads attached to a dedicated CPU, the reader
loop function was looping really fast because poll call was returning
immediatly because we did read the data available.
Live device counter was in fact the number of packets seen by suricata
and not the total number of packet reported by kernel. This patch fixes
this by using counter provided by kernel instead.
The counter is Clear On Read, so by adding the value fetch at each call
and earch sockets we get the number of packets and drops for the
interface.
This patch adds and increments a invalid packet counter. It
does this by introducing PacketDecodeFinalize function
This function is incrementing the invalid counter and is also
signalling the packet to CUDA.
This commit allows handling Packets allocated by different methods.
The ReleaseData function pointer in the Packet structure is replaced
with ReleasePacket function pointer, which is then always called to
release the memory associated with a Packet.
Currently, the only usage of ReleaseData is in AF Packet. Previously
ReleaseData was only called when it was not NULL. To implement the
same functionality as before in AF Packet, a new function is defined
in AF Packet to first call the AFP specific ReleaseData function and
then releases the Packet structure.
Three new general functions are defined for releasing packets in the
default case:
1) PacketFree() - To release a packet alloced with SCMalloc()
2) PacketPoolReturnPacket() - For packets allocated from the Packet Pool.
Calls RECYCLE_PACKET(p)
3) PacketFreeOrRelease() - Calls PacketFree() or PacketPoolReturnPacket()
based on the PKT_ALLOC flag.
Having these functions removes the need to check the PKT_ALLOC flag
when releasing a packet in most cases, since the ReleasePacket
function encodes how the Packet was allocated. The PKT_ALLOC flag is
still set and is needed when AF Packet releases a packet, since it
replaces the ReleasePacket function pointer with its own function and
then calls PacketFreeOfRelease(), which uses the PKT_ALLOC flag.
Use test macro instead of direct access to action field.
This patch has been obtained by using the following
spatch file:
@@
Packet *p;
expression E;
@@
- p->action & E
+ TEST_PACKET_ACTION(p, E)
If no packet arrives to a capture thread, it is possible that the
AFPReadLoop() function goes into an infinite loop. This could cause
suricata to hang at exit on non busy system.
This patch adds a counter to detect when Suricata start looping in
the ring to stop when it reaches this point.
When handling error case on SCMallog, SCCalloc or SCStrdup
we are in an unlikely case. This patch adds the unlikely()
expression to indicate this to gcc.
This patch has been obtained via coccinelle. The transformation
is the following:
@istested@
identifier x;
statement S1;
identifier func =~ "(SCMalloc|SCStrdup|SCCalloc)";
@@
x = func(...)
... when != x
- if (x == NULL) S1
+ if (unlikely(x == NULL)) S1
This patch resets the AFPPacketVar linked to a Packet in the release
function to avoid any side effect when the packet is reused. To do
so a new AFPV_CLEANUP macro has been introduced.
This patch cleans the code were two almost identical treatment on
the packet we're made. It may be linked by a merge error I've done
or to a simple mistake on my side.
There was an inversion in code resulting as all sockets being seen
as non IPS mode when doing the peering. This resulted in a crash at
first packet because it has no peer.
A crash can occurs in the following conditions:
* Suricata running in other mode than "workers"
* Kernel fill in the ring buffer
Under this conditions, it is possible that the capture thread reads
a packet that has not yet released by one of the treatment threads
because there is no modification done on the ring buffer entry when
a packet is read. Doing, this it access to memory which can be
released to the kernel and modified. This results in a kind of memory
corruption.
This bug has only been seen recently and this has to be linked with the
read speed improvement recently made in AF_PACKET support.
The patch fixes the issue by modifying the tp_status bitmask in the
ring buffer. It sets the TP_STATUS_USER_BUSY flag when it is confirmed
that the packet will be treated. And at the start of the read, it exits
from the reading loop (returning to poll) when it reaches a packet with
the flag set. As tp_status is set to 0 during packet release the flag
is destroyed when releasing the packet.
Regarding concurrency, we've got a sequence of modification. The
capture thread read the packet and set the flag, then it passes the
queue and the packet get processed by other threads. The change on
tp_status are thus made at different time.
Regarding the value of the flag, the patch uses the last bit of
tp_status to avoid be impacting by a change in kernel. I will
propose a patch to have TP_STATUS_USER_BUSY included in kernel
as this is a generic issue for multithreading application using
AF_PACKET mechanism.
If a capture loop does exit, the thread needs to start without
synchronization with the other threads. This patch fixes this
by resetting the turn count on the peerslist structure and
adding a test on this condition in the wait function.
It seems that, in some case, there is a read waiting but the
offset in the ring buffer is not correct and Suricata need to
walk the ring to find the correct place and make the read.
This patch fixes emergency mode by setting the variable even if we
have a non kernel checksum check. It also does a call to
AFPDUmpCounters() as it seems to improve thing to do it ASAP.
This patch implements "late open". On high performance system, it
is needed to create the AF_PACKET just before reading to avoid
overflow. Socket creation has to be done with respect to the order
of thread creation to respect affinity settings.
This patch adds a counter to AFPPeer to be ale to synchronize the
initial socket creation.
Suricata was not able to start cleanly in AF_PACKET with default
suricata.yaml file if there was no eth1 on the system. This patch
fixes this issue and rework the socket transition phase to fix
some serious issues (file descriptor leak) found when fixing this
problem.
Every 20 seconds it displays a message to the user to warn him about
the interface not being accessible:
[ERRCODE: SC_ERR_AFP_CREATE(196)] - Can not open iface 'eth1'
If the MTU on the reception interface and the one on the transmission
interface are different, this will result in an error at transmission
when sending packet to the wire.
Flush all waiting packets to be in sync with kernel when drop
occurs. This mode can be activated by setting use-emergency-flush
to yes in the interface configuration.
This patch moves raw socket binding at the end of init code to
avoid to have a flow of packets reaching the socket before we
start to read them.
The socket creation is now made in the loop function to avoid
any timing issue between init function and the call of the loop.
This patch adds a new feature to AF_PACKET capture mode. It is now
possible to use AF_PACKET in IPS and TAP mode: all traffic received
on a interface will be forwarded (at the Ethernet level) to an other
interface. To do so, Suricata create a raw socket and sends the receive
packets to a interface designed in the configuration file.
This patch adds two variables to the configuration of af-packet
interface:
copy-mode: ips or tap
copy-iface: eth1 #the interface where packet are copied
If copy-mode is set to ips then the packet wth action DROP are not
copied to the destination interface. If copy-mode is set to tap,
all packets are copied to the destination interface.
Any other value of copy-mode results in the feature to be unused.
There is no default interface for copy-iface and the variable has
to be set for the ids or tap mode to work.
For now, this feature depends of the release data system. This
implies you need to activate the ring mode and zero copy. Basically
use-mmap has to be set to yes.
This patch adds a peering of AF_PACKET sockets from the thread on
one interface to the threads on another interface. Peering is
necessary as if we use an other socket the capture socket receives
all emitted packets. This is made using a new AFPPeer structure to
avoid direct interaction between AFPTreadVars.
There is currently a bug in Linux kernel (prior to 3.6) and it is
not possible to use multiple threads.
You need to setup two interfaces with equality on the threads
variable. copy-mode variable must be set on the two interfaces
and use-mmap must be set to activated.
A valid configuration for an IPS using eth0 and vboxnet1 interfaces
will look like:
af-packet:
- interface: eth0
threads: 1
defrag: yes
cluster-type: cluster_flow
cluster-id: 98
copy-mode: ips
copy-iface: vboxnet1
buffer-size: 64535
use-mmap: yes
- interface: vboxnet1
threads: 1
cluster-id: 97
defrag: yes
cluster-type: cluster_flow
copy-mode: ips
copy-iface: eth0
buffer-size: 64535
use-mmap: yes
This patch adds a data release mechanism. If the capture module
has a call to indicate that userland has finished with the data,
it is possible to use this system. The data will then be released
when the treatment of the packet is finished.
To do so the Packet structure has been modified:
+ TmEcode (*ReleaseData)(ThreadVars *, struct Packet_ *);
If ReleaseData is null, the function is called when the treatment
of the Packet is finished.
Thus it is sufficient for the capture module to code a function
wrapping the data release mechanism and to assign it to ReleaseData
field.
This patch also includes an implementation of this mechanism for
AF_PACKET.
The mmaped mode was using a too small ring buffer size which was
not able to handle burst of packets coming from the network. This
may explain the important packet loss rate observed by Edward
Fjellskål.
This patch increases the default value and adds a ring-size
variable which can be used to manually tune the value.
This patch should bring some improvements by looping on the
ring when there is some data available instead of getting back
to the poll. It also fix recovery in case of drops on the ring
because the poll command will not return correctly in this case.
clang was issuing some warnings related to unused return in function.
This patch adds some needed error treatment and ignore the rest of the
warnings by adding a cast to void.
This patch adds counters for kernel drops and accepts to af-packet
capture module. This information are periodically displayed in
stats.log:
capture.kernel_packets | RxAFP1 | 1792
capture.kernel_drops | RxAFP1 | 0
The statistic is fetch via a setsockopt call every 255 packets.
This patch adds support for BPF in AF_PACKET running
mode. The command line syntax is the same as the one
used of PF_RING.
The method is the same too: The pcap_compile__nopcap()
function is used to build the BPF filter. It is then
injected into the kernel with a setsockopt() call. If
the adding of the BPF fail, suricata exit.
This patch will allow us to use the datalink when computing the filter.
It also fixes a potential issue where an interface data type change
after the interface if going down/up.
This patch adds support for zero copy to AF_PACKET running mode.
This requires to use the 'worker' mode which is the only one where
the threading architecture is simple enough to permit this without
heavy modification.
This patch adds mmap support for af-packet. Suricata now makes
use of the ring buffer feature of AF_PACKET if 'use-mmap' variable
is set to yes on an interface.
This flag adds variable to disable offloading detection. The effect
of the flag is to avoid to transmit auxiliary data at each packet.
This could result in a potential performance gain.
A devide configuration can be used by multiple threads. It is thus
necessary to wait that all threads stop using the configuration before
freeing it. This patch introduces an atomic counter and a free function
which has to be called by each thread when it will not use anymore
the structure. If the configuration is not used anymore, it is freed
by the free function.
This patch adds multi interface support to AF_PACKET. A structure
is used at thread creation to give all needed information to the
input module. Parsing of the options is done in runmode preparation
through a dedicated function which return the configuration in a
structure usable by thread creation.
This patch handles the end of AF_PACKET socket support work. It
provides conditional compilation, autofp and single runmode.
It also adds a 'defrag' option which is used to activate defrag
support in kernel to avoid rx_hash computation in flow mode to fail
due to fragmentation.
This patch contains some fixes by Anoop Saldanha, and incorporate
change following review by Anoop Saldanha and Victor Julien.
AF_PACKET support is only build if the --enable-af-packet flag is
given to the configure command line. Detection of code availability
is also done: a check of the existence of AF_PACKET in standard
header is done. It seems this variable is Linux specific and it
should be enough to avoid compilation of AF_PACKET support on other
OSes.
Compilation does not depend on up-to-date headers on the system. If
none are present, wemake our own declaration of FANOUT variables. This
will permit compilation of the feature for system where only the kernel
has been updated to a version superior to 3.1.