Multi buffer matching is implemented as a way for a rule to match
on multiple buffers within the same transaction.
Before this patch a rule like:
dns.query; content:"example"; dns.query; content:".com";
would be equivalent to:
dns.query; content:"example"; content:".com";
If a DNS query would request more than one name, e.g.:
DNS: [example.net][something.com]
Eeach would be inspected to have both patterns present. Otherwise,
it would not be a match. So the rule above would not match, as neither
example.net and somthing.com satisfy both conditions at the same time.
This patch changes this behavior. Instead of the above, each time the
sticky buffer is specified, it creates a separate detection unit. Each
buffer is a "multi buffer" sticky buffer will now be evaluated against
each "instance" of the sticky buffer.
To continue with the above example:
DNS: [example.net] <- matches 'dns.query; content:"example";'
DNS: [something.com] <- matches 'dns.query; content:".com"'
So this would now be a match.
To make sure both patterns match in a single query string, the expression
'dns.query; content:"example"; content:".com";' still works for this.
This patch doesn't yet enable the behavior for the keywords. That is
done in a follow up patch.
To be able to implement this the internal storage of parsed rules
is changed. Until this patch and array of lists was used, where the
index was the buffer id (e.g. http_uri, dns_query). Therefore there
was only one list of matches per buffer id. As a side effect this
array was always very sparsely populated as many buffers could not
be mixed.
This patch changes the internal representation. The new array is densely
packed:
dns.query; content:"1"; dns.query; bsize:1; content:"2";
[type: dns_query][list: content:"1";]
[type: dns_query][list: bsize:1; content:"2";]
The new scheme allows for multiple instances of the same buffer.
These lists are then translated into multiple inspection engines
during the final setup of the rule.
Ticket: #5784.
Instead of tracking ip only rules by the internal signum, track them by
a separate counter that starts at zero. This results in dense
SigNumArrays instead of sparse ones and a much smaller max_idx.
Issue: 4578
Instead of a shared mpm context for just "file.data" or "file.magic"
use per alproto mpms. This way http file.data rules won't affect smb
file.data performance.
Ticket: #4378.
Inspect individual chunks in lossy traffic.
Don't use the frame idx as the inspection buffer idx. Engines are running
per frame, so multi inspection can be used for stream chunks instead.
Ticket: #4977.
Use the lzma-rs crate for decompressing swf/lzma files instead of
the lzma decompressor in libhtp. This decouples suricata from libhtp
except for actual http parsing, and means libhtp no longer has to
export a lzma decompression interface.
Ticket: #5638
Update APIs to store files in transactions instead of the per flow state.
Goal is to avoid the overhead of matching up files and transactions in
cases where there are many of both.
Update all protocol implementations to support this.
Update file logging logic to account for having files in transactions. Instead
of it acting separately on file containers, it is now tied into the
transaction logging.
Update the filestore keyword to consider a match if filestore output not
enabled.
Rules that look like they should be IP-only but contain a negated rule
address are now marked with an LIKE_IPONLY flag. This is so they are
treated like IPONLY rules with respect to flow action, but don't
interfere with other IPONLY processing like using the radix tree.
Ticket: #5361
A lot of time was spent in `SigMatchListSMBelongsTo` for the `mpm_sm`.
Optimize this by keeping the value at hand during Signature parsing and
detection engine setup.
Instead of storing a name and description as a pointer in DetectBufferType
store them in fixed size arrays. This is in preparation of runtime registration
of buffer types, where a constant name/desc is not available.
In preparation of more dynamic logic in rule loading also doing
some registration, allow for buffers to be registered as fast_patterns
during rule parsing.
Leaves the register time registrations mostly as-is, but copies the
resulting list into the DetectEngineCtx and works with that onwards.
This list can then be extended.
Instead of a map that is constantly realloc'd, use 2 hash tables for
DetectBufferType entries: one by name (+transforms), the other by
id. Use these everywhere.
Fix FNs in case of too many prefilter engines. A transaction was tracking
which engines have run using a u64 bit array. The engines 'local_id' was
used to set and check this bit. However the bit checking code didn't
handle int types correctly, leading to an incorrect left shift result of
a u32 to a u64 bit value.
This commit addresses that by fixing the int handling, but also by
changing how the engines are tracked.
To avoid wasting prefilter engine tracking bit space, track what
ran by the progress they are registered at, instead of the individual
engine id's. While we can have many engines, the protocols use far
fewer unique progress values. So instead of tracking for dozens of
prefilter id's, we track for the handful of progress values.
To allow for this the engine array is sorted by tx_min_progress, then
app_proto and finally local_id. A new field is added to "know" when
the last relevant engine for a progress value is reached, so that we
can set the prefilter bit then.
A consquence is that the progress values have a ceiling now that
needs to fit in a 64 bit bitarray. The values used by parsers currently
does not exceed 5, so that seems to be ok.
Bug: #4685.
Unify handling of signature matches between various rule types and
between noalert and regular rules.
"noalert" sigs are added to the alert queue initially, but removed
from it after handling their actions. This way all actions are applied
from a single place.
Make sure flow drop and pass are mutually exclusive.
The above addresses issue with pass and drops not getting applied
correctly in various cases.
Bug: #4663
Bug: #4670
Dump all patterns to `patterns.json`, with the pattern, a total count (`cnt`),
count of how many times this pattern is the mpm (`mpm`) and some of the flags.
Patterns are listed per buffer. So payload, http_uri, etc.
If a parser exceeds 1024 buffers we stop processing them and
set a detect event instead. This is to avoid parser bugs as well as
crafted bad traffic leading to resources starvation due to excessive
loops.