It is used to set the block size in tpacket_v3. It will allow user
to tune the capture depending on his bandwidth.
Default block size value has been updated to a bigger value to
allow more efficient wlak on block.
Error handling was not done. The implementation is making the
choice to consider we must detroy the socket in case of parsing
error. The same was done for tpacket_v2.
Suppress useless fields in AFPThreadVars. This patch also get rid
of bytes counter as it was only used to display a message at exit.
Information on livedev and on packet counters are enough.
This patch adds a basic implementation of AF_PACKET tpacket v3. It
is basic in the way it is only working for 'workers' runnning mode.
If not in 'workers' mode there is a fallback to tpacket_v2. Feature
is activated via tpacket-v3 option in the af-packet section of
Suricata YAML.
Capture methods that are non blocking will still not generate packets
that go through the system if there is no traffic. Some maintenance
tasks, like rule reloads rely on packets to complete.
This patch introduces a new thread flag, THV_CAPTURE_INJECT_PKT, that
instructs the capture thread to create a fake packet.
The capture implementations can call the TmThreadsCaptureInjectPacket
utility function either with the packet they already got from the pool
or without a packet. In this case the util func will get it's own
packet.
Implementations for pcap, AF_PACKET and PF_RING.
This patch adds a new callback PktAcqBreakLoop() in TmModule to let
packet acquisition modules define "break-loop" functions to terminate
the capture loop. This is useful in case of blocking functions that
need special actions to take place in order to stop the execution.
Implement this for PF_RING
The global variable suricata_ctl_flags needs to volatile, otherwise the
compiler might not cause the variable to be read every time because it
doesn't know other threads might write the variable.
This was causing Suricata to not exit under some conditions.
Using a stack for free Packet storage causes recently freed Packets to be
reused quickly, while there is more likelihood of the data still being in
cache.
The new structure has a per-thread private stack for allocating Packets
which does not need any locking. Since Packets can be freed by any thread,
there is a second stack (return stack) for freeing packets by other threads.
The return stack is protected by a mutex. Packets are moved from the return
stack to the private stack when the private stack is empty.
Returning packets back to their "home" stack keeps the stacks from getting out
of balance.
The PacketPoolInit() function is now called by each thread that will be
allocating packets. Each thread allocates max_pending_packets, which is a
change from before, where that was the total number of packets across all
threads.
During socket creation all error cases were leading to suricata to
retry the opening of capture. This patch updates this behavior to
have fatal and recoverable error case. In case of a fatal error,
suricata is leaving cleanly.
Previously the sync code would depend on traffic to complete. This
patch adds poll support and can complete the setup if the poll timeout
is reached as well.
Part of bug #1130.
This patch is updating af-packet to discard packets that have been
sent to a socket before all socket in a fanout group have been setup.
Without this, there is no way to assure that all packets for a single
flow will be treated by the same thread.
Tests have been done on a system with an ixgbe network card. When using
'cluster_flow' load balancing and disactivating receive hash on the iface:
ethtool -K IFACE rxhash off
then suricata is behaving as expected and all packets for a single flow
are treated by the same thread.
For some unknown reason, this is not the case when using cluster_cpu. It
seems that in that case the load balancing is not perfect on the card side.
The rxhash offloading has a direct impact on the cluster_flow load balancing
because load balancing is done by using a generic hash key attached to
each skb. This hash can be computed by the network card or can be
computed by the kernel. In the xase of a ixgbe network card, it seems there
is some issue with the hash key for TCP. This explains why it is necessary to
remove the rxhash offloading to have a correct behavior. This could also
explain why cluster_cpu is currently failing because the card is using the
same hash key computation to do the RSS queues load balancing.
Some of the packets counters were using a 32bit integer. Given the
bandwidth that is often seen, this is not a good idea. This patch
switches to 64bit counter.
Packets counter is incremented in AFPDumpCounters and it was
also incremented during packet reading. The result was a value
that is twice the expected result.
Spotted-by: Victor Julien <victor@inliniac.net>
In really short Suricata runtimes the capture counters would not
be updated. This patch does a force update at the end of the
capture loops in pcap and af-packet.
Some old distribution don't ship recent enough linux header. This
result in TP_STATUS_VLAN_VALID being undefined. This patch defines
the constant and use it as it is used in backward compatible method
in the code: the flag is not set by kernel and a test on vci value
will be made.
This should fix https://redmine.openinfosecfoundation.org/issues/1106
Flow-timeout code injects pseudo packets into the decoders, leading
to various issues. For a full explanation, see:
https://redmine.openinfosecfoundation.org/issues/1107
This patch works around the issues with a hack. It adds a check to
each of the decoder entry points to bail out as soon as a pseudo
packet from the flow timeout is encountered.
Ticket #1107.
This patch uses the new function SCKernelVersionIsAtLeast to know
that we've got a old kernel that do not strip the VLAN header from
the message before sending it to userspace.
Since commit in kernel
commit a3bcc23e890a6d49d6763d9eb073d711de2e0469
Author: Ben Greear <greearb@candelatech.com>
Date: Wed Jun 1 06:49:10 2011 +0000
af-packet: Add flag to distinguish VID 0 from no-vlan.
a flag is set to indicate VLAN has been set in packet header.
As suggested in commit message, using a test of the flag followed
by a check on vci value ensure backward compatibility of the test.
To be able to register counters from AppLayerGetCtxThread, the
ThreadVars pointer needs to be available in it and thus in it's
callers:
- AppLayerGetCtxThread
- DecodeThreadVarsAlloc
- StreamTcpReassembleInitThreadCtx
Logic of patch 98e4a14f6d was correct
but implementation is wrong because TP_STATUS_KERNEL is equal to
zero and thus can not be evaluated in a binary operation. This patch
updates the logic by doing two tests.
Reported-by: Alessandro Guido
This patch updates the logic of the packet acquisition loop. When
the reader loop function is called and when the data to read
at offset is a without data (kernel) or still used by suricata. We
try to iter for a loop on the ring to try to find kernel put by
data.
As we are entering the function because the poll said there was some
data. This allow us to jump to the data added to the ring by the
kernel.
When using suricata in autofp mode, with multiple detect threads and
packet acquisition threads attached to a dedicated CPU, the reader
loop function was looping really fast because poll call was returning
immediatly because we did read the data available.
Live device counter was in fact the number of packets seen by suricata
and not the total number of packet reported by kernel. This patch fixes
this by using counter provided by kernel instead.
The counter is Clear On Read, so by adding the value fetch at each call
and earch sockets we get the number of packets and drops for the
interface.
This patch adds and increments a invalid packet counter. It
does this by introducing PacketDecodeFinalize function
This function is incrementing the invalid counter and is also
signalling the packet to CUDA.
This commit allows handling Packets allocated by different methods.
The ReleaseData function pointer in the Packet structure is replaced
with ReleasePacket function pointer, which is then always called to
release the memory associated with a Packet.
Currently, the only usage of ReleaseData is in AF Packet. Previously
ReleaseData was only called when it was not NULL. To implement the
same functionality as before in AF Packet, a new function is defined
in AF Packet to first call the AFP specific ReleaseData function and
then releases the Packet structure.
Three new general functions are defined for releasing packets in the
default case:
1) PacketFree() - To release a packet alloced with SCMalloc()
2) PacketPoolReturnPacket() - For packets allocated from the Packet Pool.
Calls RECYCLE_PACKET(p)
3) PacketFreeOrRelease() - Calls PacketFree() or PacketPoolReturnPacket()
based on the PKT_ALLOC flag.
Having these functions removes the need to check the PKT_ALLOC flag
when releasing a packet in most cases, since the ReleasePacket
function encodes how the Packet was allocated. The PKT_ALLOC flag is
still set and is needed when AF Packet releases a packet, since it
replaces the ReleasePacket function pointer with its own function and
then calls PacketFreeOfRelease(), which uses the PKT_ALLOC flag.
Use test macro instead of direct access to action field.
This patch has been obtained by using the following
spatch file:
@@
Packet *p;
expression E;
@@
- p->action & E
+ TEST_PACKET_ACTION(p, E)
If no packet arrives to a capture thread, it is possible that the
AFPReadLoop() function goes into an infinite loop. This could cause
suricata to hang at exit on non busy system.
This patch adds a counter to detect when Suricata start looping in
the ring to stop when it reaches this point.
When handling error case on SCMallog, SCCalloc or SCStrdup
we are in an unlikely case. This patch adds the unlikely()
expression to indicate this to gcc.
This patch has been obtained via coccinelle. The transformation
is the following:
@istested@
identifier x;
statement S1;
identifier func =~ "(SCMalloc|SCStrdup|SCCalloc)";
@@
x = func(...)
... when != x
- if (x == NULL) S1
+ if (unlikely(x == NULL)) S1
This patch resets the AFPPacketVar linked to a Packet in the release
function to avoid any side effect when the packet is reused. To do
so a new AFPV_CLEANUP macro has been introduced.
This patch cleans the code were two almost identical treatment on
the packet we're made. It may be linked by a merge error I've done
or to a simple mistake on my side.
There was an inversion in code resulting as all sockets being seen
as non IPS mode when doing the peering. This resulted in a crash at
first packet because it has no peer.
A crash can occurs in the following conditions:
* Suricata running in other mode than "workers"
* Kernel fill in the ring buffer
Under this conditions, it is possible that the capture thread reads
a packet that has not yet released by one of the treatment threads
because there is no modification done on the ring buffer entry when
a packet is read. Doing, this it access to memory which can be
released to the kernel and modified. This results in a kind of memory
corruption.
This bug has only been seen recently and this has to be linked with the
read speed improvement recently made in AF_PACKET support.
The patch fixes the issue by modifying the tp_status bitmask in the
ring buffer. It sets the TP_STATUS_USER_BUSY flag when it is confirmed
that the packet will be treated. And at the start of the read, it exits
from the reading loop (returning to poll) when it reaches a packet with
the flag set. As tp_status is set to 0 during packet release the flag
is destroyed when releasing the packet.
Regarding concurrency, we've got a sequence of modification. The
capture thread read the packet and set the flag, then it passes the
queue and the packet get processed by other threads. The change on
tp_status are thus made at different time.
Regarding the value of the flag, the patch uses the last bit of
tp_status to avoid be impacting by a change in kernel. I will
propose a patch to have TP_STATUS_USER_BUSY included in kernel
as this is a generic issue for multithreading application using
AF_PACKET mechanism.
If a capture loop does exit, the thread needs to start without
synchronization with the other threads. This patch fixes this
by resetting the turn count on the peerslist structure and
adding a test on this condition in the wait function.
It seems that, in some case, there is a read waiting but the
offset in the ring buffer is not correct and Suricata need to
walk the ring to find the correct place and make the read.
This patch fixes emergency mode by setting the variable even if we
have a non kernel checksum check. It also does a call to
AFPDUmpCounters() as it seems to improve thing to do it ASAP.
This patch implements "late open". On high performance system, it
is needed to create the AF_PACKET just before reading to avoid
overflow. Socket creation has to be done with respect to the order
of thread creation to respect affinity settings.
This patch adds a counter to AFPPeer to be ale to synchronize the
initial socket creation.
Suricata was not able to start cleanly in AF_PACKET with default
suricata.yaml file if there was no eth1 on the system. This patch
fixes this issue and rework the socket transition phase to fix
some serious issues (file descriptor leak) found when fixing this
problem.
Every 20 seconds it displays a message to the user to warn him about
the interface not being accessible:
[ERRCODE: SC_ERR_AFP_CREATE(196)] - Can not open iface 'eth1'
If the MTU on the reception interface and the one on the transmission
interface are different, this will result in an error at transmission
when sending packet to the wire.
Flush all waiting packets to be in sync with kernel when drop
occurs. This mode can be activated by setting use-emergency-flush
to yes in the interface configuration.
This patch moves raw socket binding at the end of init code to
avoid to have a flow of packets reaching the socket before we
start to read them.
The socket creation is now made in the loop function to avoid
any timing issue between init function and the call of the loop.
This patch adds a new feature to AF_PACKET capture mode. It is now
possible to use AF_PACKET in IPS and TAP mode: all traffic received
on a interface will be forwarded (at the Ethernet level) to an other
interface. To do so, Suricata create a raw socket and sends the receive
packets to a interface designed in the configuration file.
This patch adds two variables to the configuration of af-packet
interface:
copy-mode: ips or tap
copy-iface: eth1 #the interface where packet are copied
If copy-mode is set to ips then the packet wth action DROP are not
copied to the destination interface. If copy-mode is set to tap,
all packets are copied to the destination interface.
Any other value of copy-mode results in the feature to be unused.
There is no default interface for copy-iface and the variable has
to be set for the ids or tap mode to work.
For now, this feature depends of the release data system. This
implies you need to activate the ring mode and zero copy. Basically
use-mmap has to be set to yes.
This patch adds a peering of AF_PACKET sockets from the thread on
one interface to the threads on another interface. Peering is
necessary as if we use an other socket the capture socket receives
all emitted packets. This is made using a new AFPPeer structure to
avoid direct interaction between AFPTreadVars.
There is currently a bug in Linux kernel (prior to 3.6) and it is
not possible to use multiple threads.
You need to setup two interfaces with equality on the threads
variable. copy-mode variable must be set on the two interfaces
and use-mmap must be set to activated.
A valid configuration for an IPS using eth0 and vboxnet1 interfaces
will look like:
af-packet:
- interface: eth0
threads: 1
defrag: yes
cluster-type: cluster_flow
cluster-id: 98
copy-mode: ips
copy-iface: vboxnet1
buffer-size: 64535
use-mmap: yes
- interface: vboxnet1
threads: 1
cluster-id: 97
defrag: yes
cluster-type: cluster_flow
copy-mode: ips
copy-iface: eth0
buffer-size: 64535
use-mmap: yes
This patch adds a data release mechanism. If the capture module
has a call to indicate that userland has finished with the data,
it is possible to use this system. The data will then be released
when the treatment of the packet is finished.
To do so the Packet structure has been modified:
+ TmEcode (*ReleaseData)(ThreadVars *, struct Packet_ *);
If ReleaseData is null, the function is called when the treatment
of the Packet is finished.
Thus it is sufficient for the capture module to code a function
wrapping the data release mechanism and to assign it to ReleaseData
field.
This patch also includes an implementation of this mechanism for
AF_PACKET.
The mmaped mode was using a too small ring buffer size which was
not able to handle burst of packets coming from the network. This
may explain the important packet loss rate observed by Edward
Fjellskål.
This patch increases the default value and adds a ring-size
variable which can be used to manually tune the value.
This patch should bring some improvements by looping on the
ring when there is some data available instead of getting back
to the poll. It also fix recovery in case of drops on the ring
because the poll command will not return correctly in this case.