The TILE-Gx processor includes a packet processing engine, called
mPIPE, that can deliver packets directly into user space memory. It
handles buffer allocation and load balancing (either static 5-tuple
hashing, or dynamic flow affinity hashing are used here). The new
packet source code is in source-mpipe.c and source-mpipe.h
A new Tile runmode is added that configures the Suricata pipelines in
worker mode, where each thread does the entire packet processing
pipeline. It scales across all the Gx chips sizes of 9, 16, 36 or 72
cores. The new runmode is in runmode-tile.c and runmode-tile.h
The configure script detects the TILE-Gx architecture and defines
HAVE_MPIPE, which is then used to conditionally enable the code to
support mPIPE packet processing. Suricata runs on TILE-Gx even without
mPIPE support enabled.
The Suricata Packet structures are allocated by the mPIPE hardware by
allocating the Suricata Packet structure immediatley before the mPIPE
packet buffer and then pushing the mPIPE packet buffer pointer onto
the mPIPE buffer stack. This way, mPIPE writes the packet data into
the buffer, returns the mPIPE packet buffer pointer, which is then
converted into a Suricata Packet pointer for processing inside
Suricata. When the Packet is freed, the buffer is returned to mPIPE's
buffer stack, by setting ReleasePacket to an mPIPE release specific
function.
The code checks for the largest Huge page available in Linux when
Suricata is started. TILE-Gx supports Huge pages sizes of 16MB, 64MB,
256MB, 1GB and 4GB. Suricata then divides one of those page into
packet buffers for mPIPE.
The code is not yet optimized for high performance. Performance
improvements will follow shortly.
The code was originally written by Tom Decanio and then further
modified by Tilera.
This code has been tested with Tilera's Multicore Developement
Environment (MDE) version 4.1.5. The TILEncore-Gx36 (PCIe card) and
TILEmpower-Gx (1U Rack mount).
In some cases using the vlan id(s) in flow hashing is problematic. Cases
of broken routers have been reported. So this option allows for disabling
the use of vlan id(s) while calculating the flow hash, and in the future
other hashes.
Vlan tracking for flow is enabled by default.
This commit allows handling Packets allocated by different methods.
The ReleaseData function pointer in the Packet structure is replaced
with ReleasePacket function pointer, which is then always called to
release the memory associated with a Packet.
Currently, the only usage of ReleaseData is in AF Packet. Previously
ReleaseData was only called when it was not NULL. To implement the
same functionality as before in AF Packet, a new function is defined
in AF Packet to first call the AFP specific ReleaseData function and
then releases the Packet structure.
Three new general functions are defined for releasing packets in the
default case:
1) PacketFree() - To release a packet alloced with SCMalloc()
2) PacketPoolReturnPacket() - For packets allocated from the Packet Pool.
Calls RECYCLE_PACKET(p)
3) PacketFreeOrRelease() - Calls PacketFree() or PacketPoolReturnPacket()
based on the PKT_ALLOC flag.
Having these functions removes the need to check the PKT_ALLOC flag
when releasing a packet in most cases, since the ReleasePacket
function encodes how the Packet was allocated. The PKT_ALLOC flag is
still set and is needed when AF Packet releases a packet, since it
replaces the ReleasePacket function pointer with its own function and
then calls PacketFreeOfRelease(), which uses the PKT_ALLOC flag.
The action field in Packet structure should not be accessed
directly as the tunneled packet needs to update the root packet
and not the initial packet.
This patch is fixing issue #819 where suricata was not able to
drop fragmented packets in AF_PACKET IPS mode. It also fixes
drop capability for tunneled packets.
The memset() inside PACKET_INITIALIZE() is redundant in some cases and
it is cleaner to do as part of the memory allocation. This simplifies
changes for integrating Tilera mPIPE support because the size of memory
cleared in that case is different from SIZE_OF_PACKET.
For the cases where Packets are directly allocated and then call
PACKET_INITIALIZE() without memset() first, this patch adds memset() calls.
A further change would use GetPacketFromAlloc() directly.
When nothing can be fetch from the pool, this can repeat frequently.
Thus displaying a message in the log will not help. This patch
uses a counter instead of a log message. As this is a sort of memcap
this is conformed to what is done for other issues of the same type.
This patch adds a data release mechanism. If the capture module
has a call to indicate that userland has finished with the data,
it is possible to use this system. The data will then be released
when the treatment of the packet is finished.
To do so the Packet structure has been modified:
+ TmEcode (*ReleaseData)(ThreadVars *, struct Packet_ *);
If ReleaseData is null, the function is called when the treatment
of the Packet is finished.
Thus it is sufficient for the capture module to code a function
wrapping the data release mechanism and to assign it to ReleaseData
field.
This patch also includes an implementation of this mechanism for
AF_PACKET.
Add profiling per lock location in the code. Accounts how often a
lock is requested, how often it was contended, the max number of
ticks spent waiting for it, avg number of ticks waiting for it and
the total ticks for that location.
Added a new configure flag --enable-profiling-locks to enable this
feature.
This patch adds support for zero copy to AF_PACKET running mode.
This requires to use the 'worker' mode which is the only one where
the threading architecture is simple enough to permit this without
heavy modification.
Checksum of local traffic is often offloaded to the network device.
This causes some problems on parsing of this traffic. This patch
introduces a PKT_INCOMPLETE_CHECKSUM flag which can be used to
indicate that the checksum is not computed/correct for good reason.
This patch renames DECODER_SET_EVENT, DECODER_ISSET_EVENT and some
other structures to ENGINE equivalent to take into account the fact
the event list is now related to all engines and not only to decoder.
Per packet per app layer parser profiling. Example summary output:
Per App layer parser stats:
App Layer IP ver Proto cnt min max avg
-------------------- ------ ----- ------ ------ ---------- -------
ALPROTO_HTTP IPv4 6 163394 126 38560320 42814
ALPROTO_FTP IPv4 6 644 117 26100 2566
ALPROTO_TLS IPv4 6 670 117 7137 799
ALPROTO_SMB IPv4 6 114794 126 225270 957
ALPROTO_DCERPC IPv4 6 5207 126 25596 1266
Also added to the csv out.
In the csv out there is a new column "stream (no app)" that removes the
app layer parsers from the stream tracking. So raw stream engine performance
becomes visible.
Per packet profiling uses tick based accounting. It has 2 outputs, a summary
and a csv file that contains per packet stats.
Stats per packet include:
1) total ticks spent
2) ticks spent per individual thread module
3) "threading overhead" which is simply calculated by subtracting (2) of (1).
A number of changes were made to integrate the new code in a clean way:
a number of generic enums are now placed in tm-threads-common.h so we can
include them from any part of the engine.
Code depends on --enable-profiling just like the rule profiling code.
New yaml parameters:
profiling:
# packet profiling
packets:
# Profiling can be disabled here, but it will still have a
# performance impact if compiled in.
enabled: yes
filename: packet_stats.log
append: yes
# per packet csv output
csv:
# Output can be disabled here, but it will still have a
# performance impact if compiled in.
enabled: no
filename: packet_stats.csv
Example output of summary stats:
IP ver Proto cnt min max avg
------ ----- ------ ------ ---------- -------
IPv4 6 19436 11448 5404365 32993
IPv4 256 4 11511 49968 30575
Per Thread module stats:
Thread Module IP ver Proto cnt min max avg
------------------------ ------ ----- ------ ------ ---------- -------
TMM_DECODEPCAPFILE IPv4 6 19434 1242 47889 1770
TMM_DETECT IPv4 6 19436 1107 137241 1504
TMM_ALERTFASTLOG IPv4 6 19436 90 1323 155
TMM_ALERTUNIFIED2ALERT IPv4 6 19436 108 1359 138
TMM_ALERTDEBUGLOG IPv4 6 19436 90 1134 154
TMM_LOGHTTPLOG IPv4 6 19436 414 5392089 7944
TMM_STREAMTCP IPv4 6 19434 828 1299159 19438
The proto 256 is a counter for handling of pseudo/tunnel packets.
Example output of csv:
pcap_cnt,ipver,ipproto,total,TMM_DECODENFQ,TMM_VERDICTNFQ,TMM_RECEIVENFQ,TMM_RECEIVEPCAP,TMM_RECEIVEPCAPFILE,TMM_DECODEPCAP,TMM_DECODEPCAPFILE,TMM_RECEIVEPFRING,TMM_DECODEPFRING,TMM_DETECT,TMM_ALERTFASTLOG,TMM_ALERTFASTLOG4,TMM_ALERTFASTLOG6,TMM_ALERTUNIFIEDLOG,TMM_ALERTUNIFIEDALERT,TMM_ALERTUNIFIED2ALERT,TMM_ALERTPRELUDE,TMM_ALERTDEBUGLOG,TMM_ALERTSYSLOG,TMM_LOGDROPLOG,TMM_ALERTSYSLOG4,TMM_ALERTSYSLOG6,TMM_RESPONDREJECT,TMM_LOGHTTPLOG,TMM_LOGHTTPLOG4,TMM_LOGHTTPLOG6,TMM_PCAPLOG,TMM_STREAMTCP,TMM_DECODEIPFW,TMM_VERDICTIPFW,TMM_RECEIVEIPFW,TMM_RECEIVEERFFILE,TMM_DECODEERFFILE,TMM_RECEIVEERFDAG,TMM_DECODEERFDAG,threading
1,4,6,172008,0,0,0,0,0,0,47889,0,0,48582,1323,0,0,0,0,1359,0,1134,0,0,0,0,0,8028,0,0,0,49356,0,0,0,0,0,0,0,14337
First line of the file contains labels.
2 example gnuplot scripts added to plot the data.
This patch introduces 'nfq_set_mark' which is new rules option. If a packet
matches a rule using nfq_set_mark in NFQ mode, it is marked with the mark/mask
specified in the option during the verdict.
It is thus possible to trigger different behaviour on the packet inside
Linux/Netfilter.
This patch modifies decode.c and decode.h to avoid the usage
by default of a bigger than 65535 bytes array in Packet structure.
The idea is that the packet are mainly under 1514 bytes size and
a bigger size must be supported but should not be the default.
If the packet length is bigger than DFLT_PACKET_SIZE then the
data are stored in a dynamically allocated part of the memory.
To ease the modification of the rest of the code, functions to
access and set the payload/length in a Packet have been introduced.
The default packet size can be set at runtime via the default-packet-size
configuration variable.