When there are many threads and/or the packet pool (max-pending-packets) is
small, a potential dead lock exists between the packet pool return pool
logic and the capture threads. The autofp workers together can have all the
packets in their return pools, while the capture thread(s) are waiting at an
empty pool. A race between the worker threads and the capture thread, where
the latter signals the former, is lost by the capture thread. Now everyone
is waiting.
To avoid this scenario, this patch makes the previously hardcoded 'return
pool' threshold dynamic based on the number of threads and the packet pool
size.
It sets the threshold to the max pending packets value, divided by the number
of lister threads. The max value hasn't changed. Normally, in the autofp
runmode these are the stream/detect/log worker threads.
The max_pending_return_packets value needs to stay below the packet pool size
of the 'producers' (normally pkt capture threads but also flow timeout
injection) to avoid the deadlock.
As it's quite impossible at this time to learn how many threads will be
created before starting the runmodes, and thus spawning the threads and
already initializing the packet pools, this code sets a global variable
after runmode setup, but before the threads are 'unpaused'.
The flow timeout mechanism called both from the flow manager at run time
and at shutdown creates pseudo packets. For this it has it's own packet
pool, which can be depleted if the timeout logic is faster than the packet
processing threads. In this case the flow timeout would enter a wait loop.
The problem however, is that this wait loop would happen while keeping a
flow locked. This could lead to a race condition when the packet thread(s)
are waiting for the lock that the flow manager has.
This patch introduces a new packet pool call 'PacketPoolWaitForN', meant
to make sure that the thread's packet pool has at least N available
packets. The flow timeout paths use this to make sure enough packets are
available *before* grabbing the flow lock. If there aren't enough packets
available yet, the wait happens before the lock as well.
This still means the wait can happen while the flow hash row is locked, so
we do make sure some more packets are available when entering that. But
perhaps in the future we need a more precise logic there as well.
PacketPoolWait in autofp can wait for considerable time. Until now
it was essentially spinning, keeping the CPU 100% busy.
This patch introduces a condition to wait in such cases.
Atomically flag pool that consumer is waiting, so that we can sync
the pending pool right away instead of waiting for the
MAX_PENDING_RETURN_PACKETS limit.
The field ext_pkt was cleaned before calling the release function.
The result was that IPS mode such as the one of AF_PACKET were not
working anymore because they were not able to send the data which
were initially pointed by ext_pkt.
This patch moves the ext_pkt cleaning to the cleaning macro. This
ensures that the cleaning is done for allocated and pool packets.
Call PACKET_RELEASE_REFS from PacketPoolGetPacket() so that
we only access the large packet structure just before actually
using it. Should give better cache behaviour.
On non-TLS systems, check each time the Thread Local Storage
is requested and if it has not been initialized for this thread, initialize it.
The prevents not initializing the worker threads in autofp run mode.
At -O1+ in both Gcc and Clang, PacketPoolWait would optimize the
wait loop in the wrong way. Adding a compiler barrier to prevent
this optimization issue.
Better handle the autofp case where one thread allocates the majority
of the packets and other threads free those packets.
Add a list of locally pending packets. The first packet freed goes on the
pending list, then subsequent freed packets for the same Packet Pool are
added to this list until it hits a fixed number of packets, then the
entire list of packets is pushed onto the pool's return stack. If a freed
packet is not for the pending pool, it is freed immediately to its pool's
return stack, as before.
For the autofp case, since there is only one Packet Pool doing all the
allocation, every other thread will keep a list of pending packets for
that pool.
For the worker run mode, most packets are allocated and freed locally. For
the case where packets are being returned to a remote pool, a pending list
will be kept for one of those other threads, all others are returned as before.
Which remote pool for which to keep a pending list is changed each time the
pending list is returned. Since the return pending pool is cleared when it is
freed, then next packet to be freed chooses the new pending pool.
Using a stack for free Packet storage causes recently freed Packets to be
reused quickly, while there is more likelihood of the data still being in
cache.
The new structure has a per-thread private stack for allocating Packets
which does not need any locking. Since Packets can be freed by any thread,
there is a second stack (return stack) for freeing packets by other threads.
The return stack is protected by a mutex. Packets are moved from the return
stack to the private stack when the private stack is empty.
Returning packets back to their "home" stack keeps the stacks from getting out
of balance.
The PacketPoolInit() function is now called by each thread that will be
allocating packets. Each thread allocates max_pending_packets, which is a
change from before, where that was the total number of packets across all
threads.
Extended data were freed before the release function was called.
The result was that, in AF_PACKET IPS mode, the release function
was only sending void data because it the content of the extended
data is the content of the packet.
This patch updates the code to have the freeing of extended data
done in the cleaning function for a packet which is called by the
release function. This improves consistency of the code and fixes
the bug.
End profiling inside the lock for a tunnel packet as otherwise another
thread may already free the packet while the profiling code runs.
SEGV's observed and now gone.
This patch adds and increments a invalid packet counter. It
does this by introducing PacketDecodeFinalize function
This function is incrementing the invalid counter and is also
signalling the packet to CUDA.
This commit allows handling Packets allocated by different methods.
The ReleaseData function pointer in the Packet structure is replaced
with ReleasePacket function pointer, which is then always called to
release the memory associated with a Packet.
Currently, the only usage of ReleaseData is in AF Packet. Previously
ReleaseData was only called when it was not NULL. To implement the
same functionality as before in AF Packet, a new function is defined
in AF Packet to first call the AFP specific ReleaseData function and
then releases the Packet structure.
Three new general functions are defined for releasing packets in the
default case:
1) PacketFree() - To release a packet alloced with SCMalloc()
2) PacketPoolReturnPacket() - For packets allocated from the Packet Pool.
Calls RECYCLE_PACKET(p)
3) PacketFreeOrRelease() - Calls PacketFree() or PacketPoolReturnPacket()
based on the PKT_ALLOC flag.
Having these functions removes the need to check the PKT_ALLOC flag
when releasing a packet in most cases, since the ReleasePacket
function encodes how the Packet was allocated. The PKT_ALLOC flag is
still set and is needed when AF Packet releases a packet, since it
replaces the ReleasePacket function pointer with its own function and
then calls PacketFreeOfRelease(), which uses the PKT_ALLOC flag.
The memset() inside PACKET_INITIALIZE() is redundant in some cases and
it is cleaner to do as part of the memory allocation. This simplifies
changes for integrating Tilera mPIPE support because the size of memory
cleared in that case is different from SIZE_OF_PACKET.
For the cases where Packets are directly allocated and then call
PACKET_INITIALIZE() without memset() first, this patch adds memset() calls.
A further change would use GetPacketFromAlloc() directly.
When handling error case on SCMallog, SCCalloc or SCStrdup
we are in an unlikely case. This patch adds the unlikely()
expression to indicate this to gcc.
This patch has been obtained via coccinelle. The transformation
is the following:
@istested@
identifier x;
statement S1;
identifier func =~ "(SCMalloc|SCStrdup|SCCalloc)";
@@
x = func(...)
... when != x
- if (x == NULL) S1
+ if (unlikely(x == NULL)) S1
This patch adds a data release mechanism. If the capture module
has a call to indicate that userland has finished with the data,
it is possible to use this system. The data will then be released
when the treatment of the packet is finished.
To do so the Packet structure has been modified:
+ TmEcode (*ReleaseData)(ThreadVars *, struct Packet_ *);
If ReleaseData is null, the function is called when the treatment
of the Packet is finished.
Thus it is sufficient for the capture module to code a function
wrapping the data release mechanism and to assign it to ReleaseData
field.
This patch also includes an implementation of this mechanism for
AF_PACKET.
This patch adds support for zero copy to AF_PACKET running mode.
This requires to use the 'worker' mode which is the only one where
the threading architecture is simple enough to permit this without
heavy modification.
Per packet profiling uses tick based accounting. It has 2 outputs, a summary
and a csv file that contains per packet stats.
Stats per packet include:
1) total ticks spent
2) ticks spent per individual thread module
3) "threading overhead" which is simply calculated by subtracting (2) of (1).
A number of changes were made to integrate the new code in a clean way:
a number of generic enums are now placed in tm-threads-common.h so we can
include them from any part of the engine.
Code depends on --enable-profiling just like the rule profiling code.
New yaml parameters:
profiling:
# packet profiling
packets:
# Profiling can be disabled here, but it will still have a
# performance impact if compiled in.
enabled: yes
filename: packet_stats.log
append: yes
# per packet csv output
csv:
# Output can be disabled here, but it will still have a
# performance impact if compiled in.
enabled: no
filename: packet_stats.csv
Example output of summary stats:
IP ver Proto cnt min max avg
------ ----- ------ ------ ---------- -------
IPv4 6 19436 11448 5404365 32993
IPv4 256 4 11511 49968 30575
Per Thread module stats:
Thread Module IP ver Proto cnt min max avg
------------------------ ------ ----- ------ ------ ---------- -------
TMM_DECODEPCAPFILE IPv4 6 19434 1242 47889 1770
TMM_DETECT IPv4 6 19436 1107 137241 1504
TMM_ALERTFASTLOG IPv4 6 19436 90 1323 155
TMM_ALERTUNIFIED2ALERT IPv4 6 19436 108 1359 138
TMM_ALERTDEBUGLOG IPv4 6 19436 90 1134 154
TMM_LOGHTTPLOG IPv4 6 19436 414 5392089 7944
TMM_STREAMTCP IPv4 6 19434 828 1299159 19438
The proto 256 is a counter for handling of pseudo/tunnel packets.
Example output of csv:
pcap_cnt,ipver,ipproto,total,TMM_DECODENFQ,TMM_VERDICTNFQ,TMM_RECEIVENFQ,TMM_RECEIVEPCAP,TMM_RECEIVEPCAPFILE,TMM_DECODEPCAP,TMM_DECODEPCAPFILE,TMM_RECEIVEPFRING,TMM_DECODEPFRING,TMM_DETECT,TMM_ALERTFASTLOG,TMM_ALERTFASTLOG4,TMM_ALERTFASTLOG6,TMM_ALERTUNIFIEDLOG,TMM_ALERTUNIFIEDALERT,TMM_ALERTUNIFIED2ALERT,TMM_ALERTPRELUDE,TMM_ALERTDEBUGLOG,TMM_ALERTSYSLOG,TMM_LOGDROPLOG,TMM_ALERTSYSLOG4,TMM_ALERTSYSLOG6,TMM_RESPONDREJECT,TMM_LOGHTTPLOG,TMM_LOGHTTPLOG4,TMM_LOGHTTPLOG6,TMM_PCAPLOG,TMM_STREAMTCP,TMM_DECODEIPFW,TMM_VERDICTIPFW,TMM_RECEIVEIPFW,TMM_RECEIVEERFFILE,TMM_DECODEERFFILE,TMM_RECEIVEERFDAG,TMM_DECODEERFDAG,threading
1,4,6,172008,0,0,0,0,0,0,47889,0,0,48582,1323,0,0,0,0,1359,0,1134,0,0,0,0,0,8028,0,0,0,49356,0,0,0,0,0,0,0,14337
First line of the file contains labels.
2 example gnuplot scripts added to plot the data.
This patch modifies decode.c and decode.h to avoid the usage
by default of a bigger than 65535 bytes array in Packet structure.
The idea is that the packet are mainly under 1514 bytes size and
a bigger size must be supported but should not be the default.
If the packet length is bigger than DFLT_PACKET_SIZE then the
data are stored in a dynamically allocated part of the memory.
To ease the modification of the rest of the code, functions to
access and set the payload/length in a Packet have been introduced.
The default packet size can be set at runtime via the default-packet-size
configuration variable.