The field ext_pkt was cleaned before calling the release function.
The result was that IPS mode such as the one of AF_PACKET were not
working anymore because they were not able to send the data which
were initially pointed by ext_pkt.
This patch moves the ext_pkt cleaning to the cleaning macro. This
ensures that the cleaning is done for allocated and pool packets.
Most flows are marked for clean up by the flow manager, which then
passes them to the recycler. The recycler logs and cleans up. However,
under resource stress conditions, the packet threads can recycle
existing flow directly. So here the recycler has no role to play, as
the flow is immediately used.
For this reason, the packet threads need to be able to invoke the
flow logger directly.
The flow logging thread ctx will stored in the DecodeThreadVars
stucture. Therefore, this patch makes the DecodeThreadVars an argument
to FlowHandlePacket.
Using a stack for free Packet storage causes recently freed Packets to be
reused quickly, while there is more likelihood of the data still being in
cache.
The new structure has a per-thread private stack for allocating Packets
which does not need any locking. Since Packets can be freed by any thread,
there is a second stack (return stack) for freeing packets by other threads.
The return stack is protected by a mutex. Packets are moved from the return
stack to the private stack when the private stack is empty.
Returning packets back to their "home" stack keeps the stacks from getting out
of balance.
The PacketPoolInit() function is now called by each thread that will be
allocating packets. Each thread allocates max_pending_packets, which is a
change from before, where that was the total number of packets across all
threads.
For packets that were freed, not recycled, profiling memory wasn't
freed:
==15745== 13,312 bytes in 8 blocks are definitely lost in loss record 611 of 615
==15745== at 0x4C2C494: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15745== by 0xA190D5: SCProfilePacketStart (util-profiling.c:963)
==15745== by 0x4E4345: PacketGetFromAlloc (decode.c:134)
==15745== by 0x83FE75: FlowForceReassemblyPseudoPacketGet (flow-timeout.c:276)
==15745== by 0x8413BF: FlowForceReassemblyForHash (flow-timeout.c:588)
==15745== by 0x841897: FlowForceReassembly (flow-timeout.c:716)
==15745== by 0x9540F6: main (suricata.c:2296)
==15745==
==15745== 14,976 bytes in 9 blocks are definitely lost in loss record 612 of 615
==15745== at 0x4C2C494: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15745== by 0xA190D5: SCProfilePacketStart (util-profiling.c:963)
==15745== by 0x4E4345: PacketGetFromAlloc (decode.c:134)
==15745== by 0x83FE75: FlowForceReassemblyPseudoPacketGet (flow-timeout.c:276)
==15745== by 0x841508: FlowForceReassemblyForHash (flow-timeout.c:620)
==15745== by 0x841897: FlowForceReassembly (flow-timeout.c:716)
==15745== by 0x9540F6: main (suricata.c:2296)
This patch addresses that.
This patch introduces a new counter "decoder.vlan_qinq". It counts
packets that have more than two stacked vlan layers.
Packets with 2 vlan layers will both increment "decoder.vlan" and
"decoder.vlan_qinq".
Instead of a large (6k+) structure in the Packet, make the profiling
storage dynamic. To do this the Packet->profile is now a pointer.
Initial support for selective sampling, e.g. only profile every
1000th packet.
Move app layer event handling into app-layer-event.[ch].
Convert 'Set' macro's to functions.
Get rid of duplication in Set and SetRaw. Set now calls SetRaw.
Fix potentential int overflow condition in the event storage.
Update callers.
To be able to register counters from AppLayerGetCtxThread, the
ThreadVars pointer needs to be available in it and thus in it's
callers:
- AppLayerGetCtxThread
- DecodeThreadVarsAlloc
- StreamTcpReassembleInitThreadCtx
For AppLayerThreadCtx, AppLayerParserState, AppLayerParserThreadCtx
and AppLayerProtoDetectThreadCtx, use opaque pointers instead of
void pointers.
AppLayerParserState is declared in flow.h as it's part of the Flow
structure.
AppLayerThreadCtx is declared in decode.h, as it's part of the
DecodeThreadVars structure.
app-layer.[ch], app-layer-detect-proto.[ch] and app-layer-parser.[ch].
Things addressed in this commit:
- Brings out a proper separation between protocol detection phase and the
parser phase.
- The dns app layer now is registered such that we don't use "dnstcp" and
"dnsudp" in the rules. A user who previously wrote a rule like this -
"alert dnstcp....." or
"alert dnsudp....."
would now have to use,
alert dns (ipproto:tcp;) or
alert udp (app-layer-protocol:dns;) or
alert ip (ipproto:udp; app-layer-protocol:dns;)
The same rules extend to other another such protocol, dcerpc.
- The app layer parser api now takes in the ipproto while registering
callbacks.
- The app inspection/detection engine also takes an ipproto.
- All app layer parser functions now take direction as STREAM_TOSERVER or
STREAM_TOCLIENT, as opposed to 0 or 1, which was taken by some of the
functions.
- FlowInitialize() and FlowRecycle() now resets proto to 0. This is
needed by unittests, which would try to clean the flow, and that would
call the api, AppLayerParserCleanupParserState(), which would try to
clean the app state, but the app layer now needs an ipproto to figure
out which api to internally call to clean the state, and if the ipproto
is 0, it would return without trying to clean the state.
- A lot of unittests are now updated where if they are using a flow and
they need to use the app layer, we would set a flow ipproto.
- The "app-layer" section in the yaml conf has also been updated as well.
The uint8_t *pkt in the Packet structure always points to the memory
immediately following the Packet structure. It is better to simply
calculate that value every time than store the 8 byte pointer.
If we have multiple layer of tunnel, the decoding of initial
Packet will recurse in DecodeTunnel function called in
PacketTunnelPktSetup. If we are not setting the pseudo
packet root before calling DecodeTunnel (as done in previous
code), then the tunnel root will no be correct for the lower
layer packets. This result in an counter problem and a suricata
failure after some time.
This patch adds and increments a invalid packet counter. It
does this by introducing PacketDecodeFinalize function
This function is incrementing the invalid counter and is also
signalling the packet to CUDA.
This patch replaces PacketPseudoPktSetup by a better named
PacketTunnelPktSetup function which is also in charge of doing
the decoding of the tunneled packet.
This allow to clean the code. But it also fixes an issue.
Previously, if the DecodeTunnel function was failling (cause of
an invalid packet mainly), the result was that the original packet
to be considered as a tunnel packet (and not inspected by payload
detection).
In some cases, the decoding is not possible and some really invalid
packet can be created. This is in particular the case of tunnel. In
that case, it is more interesting to forget about the tunneled
packet and only consider the original packet.
DecodeTunnel function is maked as warn_unused_result because it is
meaningful for the decoder to know if the underlying data were not
correct. And in this case, only focus detection on the content.
Share memory space for IPV4Vars and (IPV6Vars, IPV6ExtHdrs), since a
packet can only be either IPv4 or IPv6, but not both.
Share memory for TCPVars, UDPVars, ICMPV4Vars and ICMPV6Vars, since a
packet can only be only of these.
Then move other structure members around to remove holes reported by pahole.
This reduces the size of the Packet structure from 2944 bytes (46 cachelines)
down to 1976 (31 cachelines), a 33% reduction.
Keep a separate checksum for IPV4, since a packet can have both an IPV4
checksum and a TCPV4 checksum, or IPV4 and UDPV4 checksum.
This will allow future sharing of more values.
Use PACKET_RESET_CHECKSUMS() in Unit Tests in place of setting the
individual checksum values.
When generating an alert and storing it in the packet, store the tx_id
as well. This way the output modules can log the tx_id and access the
proper tx for logging.
Issue #904.
The TILE-Gx processor includes a packet processing engine, called
mPIPE, that can deliver packets directly into user space memory. It
handles buffer allocation and load balancing (either static 5-tuple
hashing, or dynamic flow affinity hashing are used here). The new
packet source code is in source-mpipe.c and source-mpipe.h
A new Tile runmode is added that configures the Suricata pipelines in
worker mode, where each thread does the entire packet processing
pipeline. It scales across all the Gx chips sizes of 9, 16, 36 or 72
cores. The new runmode is in runmode-tile.c and runmode-tile.h
The configure script detects the TILE-Gx architecture and defines
HAVE_MPIPE, which is then used to conditionally enable the code to
support mPIPE packet processing. Suricata runs on TILE-Gx even without
mPIPE support enabled.
The Suricata Packet structures are allocated by the mPIPE hardware by
allocating the Suricata Packet structure immediatley before the mPIPE
packet buffer and then pushing the mPIPE packet buffer pointer onto
the mPIPE buffer stack. This way, mPIPE writes the packet data into
the buffer, returns the mPIPE packet buffer pointer, which is then
converted into a Suricata Packet pointer for processing inside
Suricata. When the Packet is freed, the buffer is returned to mPIPE's
buffer stack, by setting ReleasePacket to an mPIPE release specific
function.
The code checks for the largest Huge page available in Linux when
Suricata is started. TILE-Gx supports Huge pages sizes of 16MB, 64MB,
256MB, 1GB and 4GB. Suricata then divides one of those page into
packet buffers for mPIPE.
The code is not yet optimized for high performance. Performance
improvements will follow shortly.
The code was originally written by Tom Decanio and then further
modified by Tilera.
This code has been tested with Tilera's Multicore Developement
Environment (MDE) version 4.1.5. The TILEncore-Gx36 (PCIe card) and
TILEmpower-Gx (1U Rack mount).
In some cases using the vlan id(s) in flow hashing is problematic. Cases
of broken routers have been reported. So this option allows for disabling
the use of vlan id(s) while calculating the flow hash, and in the future
other hashes.
Vlan tracking for flow is enabled by default.
This commit allows handling Packets allocated by different methods.
The ReleaseData function pointer in the Packet structure is replaced
with ReleasePacket function pointer, which is then always called to
release the memory associated with a Packet.
Currently, the only usage of ReleaseData is in AF Packet. Previously
ReleaseData was only called when it was not NULL. To implement the
same functionality as before in AF Packet, a new function is defined
in AF Packet to first call the AFP specific ReleaseData function and
then releases the Packet structure.
Three new general functions are defined for releasing packets in the
default case:
1) PacketFree() - To release a packet alloced with SCMalloc()
2) PacketPoolReturnPacket() - For packets allocated from the Packet Pool.
Calls RECYCLE_PACKET(p)
3) PacketFreeOrRelease() - Calls PacketFree() or PacketPoolReturnPacket()
based on the PKT_ALLOC flag.
Having these functions removes the need to check the PKT_ALLOC flag
when releasing a packet in most cases, since the ReleasePacket
function encodes how the Packet was allocated. The PKT_ALLOC flag is
still set and is needed when AF Packet releases a packet, since it
replaces the ReleasePacket function pointer with its own function and
then calls PacketFreeOfRelease(), which uses the PKT_ALLOC flag.
The action field in Packet structure should not be accessed
directly as the tunneled packet needs to update the root packet
and not the initial packet.
This patch is fixing issue #819 where suricata was not able to
drop fragmented packets in AF_PACKET IPS mode. It also fixes
drop capability for tunneled packets.