Goals:
- reduce locking
- take advantage of 'hot' caches
- better locality
Locking reduction
New flow spare pool. The global pool is implmented as a list of blocks,
where each block has a 100 spare flows. Worker threads fetch a block at
a time, storing the block in the local thread storage.
Flow Recycler now returns flows to the pool is blocks as well.
Flow Recycler fetches all flows to be processed in one step instead of
one at a time.
Cache 'hot'ness
Worker threads now check the timeout of flows they evaluate during lookup.
The worker will have to read the flow into cache anyway, so the added
overhead of checking the timeout value is minimal. When a flow is considered
timed out, one of 2 things happens:
- if the flow is 'owned' by the thread it is handled locally. Handling means
checking if the flow needs 'timeout' work.
- otherwise, the flow is added to a special 'evicted' list in the flow
bucket where it will be picked up by the flow manager.
Flow Manager timing
By default the flow manager now tries to do passes of the flow hash in
smaller steps, where the goal is to do full pass in 8 x the lowest timeout
value it has to enforce. So if the lowest timeout value is 30s, a full pass
will take 4 minutes. The goal here is to reduce locking overhead and not
get in the way of the workers.
In emergency mode each pass is full, and lower timeouts are used.
Timing of the flow manager is also no longer relying on pthread condition
variables, as these generally cause waking up much quicker than the desired
timout. Instead a simple (u)sleep loop is used.
Both changes reduce the number of hash passes a lot.
Emergency behavior
In emergency mode there a number of changes to the workers. In this scenario
the flow memcap is fully used up and it is unavoidable that some flows won't
be tracked.
1. flow spare pool fetches are reduced to once a second. This avoids locking
overhead, while the chance of success was very low.
2. getting an active flow directly from the hash skips flows that had very
recent activity to avoid the scenario where all flows get only into the
NEW state before getting reused. Rather allow some to have a chance of
completing.
3. TCP packets that are not SYN packets will not get a used flow, unless
stream.midstream is enabled. The goal here is again to avoid evicting
active flows unnecessarily.
Better Localily
Flow Manager injects flows into the worker threads now, instead of one or
two packets. Advantage of this is that the worker threads can get packets
from their local packet pools, avoiding constant overhead of packets returning
to 'foreign' pools.
Counters
A lot of flow counters have been added and some have been renamed.
Overall the worker threads increment 'flow.wrk.*' counters, while the flow
manager increments 'flow.mgr.*'.
Additionally, none of the counters are snapshots anymore, they all increment
over time. The flow.memuse and flow.spare counters are exceptions.
Misc
FlowQueue has been split into a FlowQueuePrivate (unlocked) and FlowQueue.
Flow no longer has 'prev' pointers and used a unified 'next' pointer for
both hash and queue use.
This commit modifies the JSON loggers with changes necessary to support
multi-threaded EVE output.
Each "thread-init" function sets up the per-thread log file context for
subsequent calls to the JSON output to buffer function.
JB_SET_TRUE(jb, key), and JB_SET_STRING(string, key, val) are C macros
around jb_set_formatted to set static string and true values as a
(micro) optimization.
Convert alert Eve logging JsonBuilder. Currently
makes heavy use of JsonBuilder being able to log Jansson's json_t
which is a temporary measure until all protocols loggers can be
converted to JsonBuilder.
New functions that replace Jansson versions with JsonBuilder
variations use "Eve" instead of "JSON".
Currently the flow logger also logs app_proto information,
but not to the flow object, but instead to the root object
of the log record.
Refactor into 2 separate methods, one for the app_proto
and one for the flow, to make this more clear, as well
as make it easier to refactor for JsonBuilder as JsonBuilder
can only write to the currently open object.
This define is used to remove reference to capture bypass in case
no capture method implementing this is active.
This patch also introduces CAPTURE_OFFLOAD_MANAGER that is defined
if we need the flow bypass manager code.
This patch introduces and uses a new bypass strategy
based on a callback. EBPF bypass implementation is
updated to use this new strategy.
Once the flow manager detect that a flow should be timeouted,
it asks the capture method if it has seen packets in the interval.
If it is the case the lastts of the flow is updated and the timeout
is postponed.
There is a synchronization issue occuring when a flow is
added to the eBPF bypass maps. The flow can have packets
in the ring buffer that have already passed the eBPF stage.
By consequences, they are not accounted in the eBPF counter
but are accounted by Suricata flow engine.
This was causing counters to be completely wrong. This code
fixes the issue by avoiding the counter change in invalid
case.
To avoid adding 4 64bits integers to the Flow structure for the
bypass accounting, we use instead a FlowStorage. This limits the
memory usage to the size of a pointer.
Set event at most once per flow, for the first 'wrong' packet.
Add 'tcp.pkt_on_wrong_thread' counter. This is incremented for each
'wrong' packet. Note that the first packet for a flow determines
what thread is 'correct'.
The new OutputInitResult is a struct return type that allows
logger init functions to return a NULL context without
raising error.
Instead of returning NULL to signal error, the "ok" field will
be set to false. If ok, but the ctx is NULL, then silently
move on to the next logger.
Use case: multiple versions of a specific logger, and one
implementation decides the configuration is not for that
implemenation. It can return NULL, ok.
This patch adds a partial flow entry in the alert event
(if applayer or flow is selected) or simply app_proto if
it is not.
app_proto is useful as filter and aggregation field. And
the partial flow entry contains more information about the
proto as well as some volumetry info.
Set flags by default:
-Wmissing-prototypes
-Wmissing-declarations
-Wstrict-prototypes
-Wwrite-strings
-Wcast-align
-Wbad-function-cast
-Wformat-security
-Wno-format-nonliteral
-Wmissing-format-attribute
-funsigned-char
Fix minor compiler warnings for these new flags on gcc and clang.
A warning log is already emitted if eve-log is enabled in the
configuration but json support is not built so the logger
registration functions can be silent.
In the case of a bypassed flow we add a 'bypass' key that can
be 'local' or 'capture'. This will allow the user to know if
capture bypass method is failing by looking at the 'bypass' key.
As the logging modules are no longer threading modules, rename
them so they don't look like they are being registered as
threading modules.
Also, move the registration to the output.c which will handle
registration of the loggers.