Fix FNs in case of too many prefilter engines. A transaction was tracking
which engines have run using a u64 bit array. The engines 'local_id' was
used to set and check this bit. However the bit checking code didn't
handle int types correctly, leading to an incorrect left shift result of
a u32 to a u64 bit value.
This commit addresses that by fixing the int handling, but also by
changing how the engines are tracked.
To avoid wasting prefilter engine tracking bit space, track what
ran by the progress they are registered at, instead of the individual
engine id's. While we can have many engines, the protocols use far
fewer unique progress values. So instead of tracking for dozens of
prefilter id's, we track for the handful of progress values.
To allow for this the engine array is sorted by tx_min_progress, then
app_proto and finally local_id. A new field is added to "know" when
the last relevant engine for a progress value is reached, so that we
can set the prefilter bit then.
A consquence is that the progress values have a ceiling now that
needs to fit in a 64 bit bitarray. The values used by parsers currently
does not exceed 5, so that seems to be ok.
Bug: #4685.
Instead of the hardcode L4 matching in MPM that was recently introduced,
add an API similar to the AppLayer MPM and inspect engines.
Share part of the registration code with the AppLayer.
Implement for the tcp.hdr and udp.hdr keywords.
If the global detect.prefilter.default setting is not "auto", it is
wasteful to run each prefilter setup routine. This patch tracks which
of the engines have been explicitly enabled in the rules and only
runs those.
When prefilter is not enabled globally, it is still possible to
enable it per signature. This was broken however, as the setup
code would never be called.
This commit always call the setup code and lets that sort out
which signatures (if any) to enable prefiltering for.
Introduce InspectionBuffer a structure for passing data between
prefilters, transforms and inspection engines.
At rule parsing time, we'll register new unique 'DetectBufferType's
for a 'parent' buffer (e.g. pure file_data) with its transformations.
Each unique combination of buffer with transformations gets it's
own buffer id.
Similarly, mpm registration and inspect engine registration will be
copied from the 'parent' (again, e.g. pure file_data) to the new id's.
The transforms are called from within the prefilter engines themselves.
Provide generic MPM matching and setup callbacks. Can be used by
keywords to avoid needless code duplication. Supports transformations.
Use unique name for profiling, to distinguish between pure buffers
and buffers with transformation.
Add new registration calls for mpm/prefilters and inspect engines.
Inspect engine api v2: Pass engine to itself. Add generic engine that
uses GetData callback and other registered settings.
The generic engine should be usable for every 'simple' case where
there is just a single non-streaming buffer. For example HTTP uri.
The v2 API assumes that registered MPM implements transformations.
Add util func to set new transform in rule and add util funcs for rule
parsing.
Use per tx detect_flags to track prefilter. Detect flags are used for 2
things:
1. marking tx as fully inspected
2. tracking already run prefilter (incl mpm) engines
This supercedes the MpmIDs API for directionless tracking
of the prefilter engines.
When we have no SGH we have to flag the txs that are 'complete'
as inspected as well.
Special handling for the stream engine:
If a rule mixes TX inspection and STREAM inspection, we can encounter
the case where the rule is evaluated against multiple transactions
during a single inspection run. As the stream data is exactly the same
for each of those runs, it's wasteful to rerun inspection of the stream
portion of the rule.
This patch enables caching of the stream 'inspect engine' result in
the local 'RuleMatchCandidateTx' array. This is valid only during the
live of a single inspection run.
Remove stateful inspection from 'mask' (SignatureMask). The mask wasn't
used in most cases for those rules anyway, as there we rely on the
prefilter. Add a alproto check to catch the remaining cases.
When building the active non-mpm/non-prefilter list check not just
the mask, but also the alproto. This especially helps stateful rules
with negated mpm.
Simplify AppLayerParserHasDecoderEvents usage in detection to only
return true if protocol detection events are set. Other detection is done
in inspect engines.
Move rule group lookup and handling into it's own function. Handle
'post lookup' tasks immediately, instead of after the first detect
run. The tasks were independent of the initial detection.
Many cleanups and much refactoring.
Set flags by default:
-Wmissing-prototypes
-Wmissing-declarations
-Wstrict-prototypes
-Wwrite-strings
-Wcast-align
-Wbad-function-cast
-Wformat-security
-Wno-format-nonliteral
-Wmissing-format-attribute
-funsigned-char
Fix minor compiler warnings for these new flags on gcc and clang.
In various scenarios buffers would be checked my MPM more than
once. This was because the buffers would be inspected for a
certain progress value or higher.
For example, for each packet in a file upload, the engine would
not just rerun the 'http client body' MPM on the new data, it
would also rerun the method, uri, headers, cookie, etc MPMs.
This was obviously inefficent, so this patch changes the logic.
The patch only runs the MPM engines when the progress is exactly
the intended progress. If the progress is beyond the desired
value, it is run once. A tracker is added to the app layer API,
where the completed MPMs are tracked.
Implemented for HTTP, TLS and SSH.
Remove the 'StreamMsg' approach from the engine. In this approach the
stream engine would create a list of chunks for inspection by the
detection engine. There were several issues:
1. the messages had a fixed size, so blocks of data bigger than ~4k
would be cut into multiple messages
2. it lead to lots of data copying and unnecessary memory use
3. the StreamMsgs used a central pool
The Stream engine switched over to the streaming buffer API, which
means that the reassembled data is always available. This made the
StreamMsg approach even clunkier.
The new approach exposes the streaming buffer data to the detection
engine. It has to pay attention to an important issue though: packet
loss. The data may have gaps. The streaming buffer API tracks the
blocks of continuous data.
To access the data for inspection a callback approach is used. The
'StreamReassembleRaw' function is called with a callback and data.
This way it runs the MPM and individual rule inspection code. At
the end of each detection run the stream engine is notified that it
can move forward it's 'progress'.
Instead of the linked list of engines setup an array
with the engines. This should provide better locality.
Also shrink the engine structure so that we can fit
2 on a cacheline.
Remove the FreeFunc from the runtime engines. Engines
now have a 'gid' (global id) that can be used to look
up the registered Free function.