Using a stack for free Packet storage causes recently freed Packets to be
reused quickly, while there is more likelihood of the data still being in
cache.
The new structure has a per-thread private stack for allocating Packets
which does not need any locking. Since Packets can be freed by any thread,
there is a second stack (return stack) for freeing packets by other threads.
The return stack is protected by a mutex. Packets are moved from the return
stack to the private stack when the private stack is empty.
Returning packets back to their "home" stack keeps the stacks from getting out
of balance.
The PacketPoolInit() function is now called by each thread that will be
allocating packets. Each thread allocates max_pending_packets, which is a
change from before, where that was the total number of packets across all
threads.
It is possible to have a non-contiguous CPU set, which was not being
handled correctly on the TILE architecture.
Added a "rank" field in the ThreadVar to store the worker's rank separately
from the cpu for this case.
Flow-timeout code injects pseudo packets into the decoders, leading
to various issues. For a full explanation, see:
https://redmine.openinfosecfoundation.org/issues/1107
This patch works around the issues with a hack. It adds a check to
each of the decoder entry points to bail out as soon as a pseudo
packet from the flow timeout is encountered.
Ticket #1107.
Add two new mPIPE load-balancing configuration options in suricata.yaml.
1) "sticky" which keep sending flows to one CPU, but if that queue is full,
don't drop the packet, move the flow to the least loaded queue.
2) Round-robin, which always picks the least full input queue for each
packet.
Allow configuring the number of packets in the input queue (iqueue) in
suricata.yaml.
For the mPipe.buckets configuration, which must be a power of 2, round
up to the next power of two, rather than report an error.
Added mpipe.min-buckets, which defaults to 256, so if the requested number
of buckets can't be allocated, Suricata will keep dividing by 2 until either
it succeeds in allocating buckets, or reaches the minimum number of buckets
and fails.
This patch adds and increments a invalid packet counter. It
does this by introducing PacketDecodeFinalize function
This function is incrementing the invalid counter and is also
signalling the packet to CUDA.
Move the allocation of the mPipe ingress queue from a loop over
the number of workers in the main init function to being done inside
each worker thread. This allows allocating the memory locally on the
worker's CPU without needing to figure out ahead of time where that thread
will be running. This fixes one case of static mapping of workers to CPUs.
Use __thread to hold the queue rather than a global tables of queues.
When installing the rules to tell mPIPE to send packet to Suricata,
give it a higher priority than the default used by Linux. This way if
Linux also tells mPIPE to send it packets, Suricata will get them
instead, as long as Suricata is running.
This patch is a result of applying the following coccinelle
transformation to suricata sources:
@istested@
identifier x;
statement S1;
identifier func =~ "(SCMalloc|SCStrdup|SCCalloc|SCMallocAligned|SCRealloc)";
@@
x = func(...)
... when != x
- if (x == NULL) S1
+ if (unlikely(x == NULL)) S1
The TILE-Gx processor includes a packet processing engine, called
mPIPE, that can deliver packets directly into user space memory. It
handles buffer allocation and load balancing (either static 5-tuple
hashing, or dynamic flow affinity hashing are used here). The new
packet source code is in source-mpipe.c and source-mpipe.h
A new Tile runmode is added that configures the Suricata pipelines in
worker mode, where each thread does the entire packet processing
pipeline. It scales across all the Gx chips sizes of 9, 16, 36 or 72
cores. The new runmode is in runmode-tile.c and runmode-tile.h
The configure script detects the TILE-Gx architecture and defines
HAVE_MPIPE, which is then used to conditionally enable the code to
support mPIPE packet processing. Suricata runs on TILE-Gx even without
mPIPE support enabled.
The Suricata Packet structures are allocated by the mPIPE hardware by
allocating the Suricata Packet structure immediatley before the mPIPE
packet buffer and then pushing the mPIPE packet buffer pointer onto
the mPIPE buffer stack. This way, mPIPE writes the packet data into
the buffer, returns the mPIPE packet buffer pointer, which is then
converted into a Suricata Packet pointer for processing inside
Suricata. When the Packet is freed, the buffer is returned to mPIPE's
buffer stack, by setting ReleasePacket to an mPIPE release specific
function.
The code checks for the largest Huge page available in Linux when
Suricata is started. TILE-Gx supports Huge pages sizes of 16MB, 64MB,
256MB, 1GB and 4GB. Suricata then divides one of those page into
packet buffers for mPIPE.
The code is not yet optimized for high performance. Performance
improvements will follow shortly.
The code was originally written by Tom Decanio and then further
modified by Tilera.
This code has been tested with Tilera's Multicore Developement
Environment (MDE) version 4.1.5. The TILEncore-Gx36 (PCIe card) and
TILEmpower-Gx (1U Rack mount).