Rearrange code slightly to make it more clear that `found` cannot
be NULL further down the loop.
cppcheck:
src/detect-engine-content-inspection.c:316:50: warning: Either the condition 'found!=NULL' is redundant or there is overflow in pointer subtraction. [nullPointerArithmeticRedundantCheck]
match_offset = (uint32_t)((found - buffer) + cd->content_len);
^
src/detect-engine-content-inspection.c:308:30: note: Assuming that condition 'found!=NULL' is not redundant
} else if (found != NULL && (cd->flags & DETECT_CONTENT_NEGATED)) {
^
src/detect-engine-content-inspection.c:316:50: note: Null pointer subtraction
match_offset = (uint32_t)((found - buffer) + cd->content_len);
^
Bug: #5291.
Datasets are sets/lists of data that can be accessed or added from
the rule language.
This patch implements 3 data types:
1. string (or buffer)
2. md5
3. sha256
The patch also implements 2 new rule keywords:
1. dataset
2. datarep
The dataset keyword allows matching against a list of values to see if
it exists or not. It can also add the value to the set. The set can
optionally be stored to disk on exit.
The datarep support matching/lookups only. With each item in the set a
reputation value is stored and this value can be matched against. The
reputation value is unsigned 16 bit, so values can be between 0 and 65535.
Datasets can be registered in 2 ways:
1. through the yaml
2. through the rules
The goal of this rules based approach is that rule writers can start using
this without the need for config changes.
A dataset is implemented using a thash hash table. Each dataset is its own
separate thash.
Allows matching on stickybuffers. Like dsize, it allows matching on
exact values, greater than and less than, and ranges.
For streaming buffers, such as HTTP bodies, the final size of the
body is only known at the end of the transaction.
The expression 'isdataat:!1,relative' is used to make sure a match
is at the end of a buffer quite often. This patch optimizes this case
for 'content' followed by the expression. It enforces it by setting
and 'ends with' flag on the content and then taking that flag into
account while doing the pattern match.
Content inspection optimization: when just distance is used without
within we don't need to search recursively.
E.g. content:"a"; content:"b"; distance:1; will scan the buffer for
'a' and when it finds 'a' it will scan the remainder for 'b'. Until
now, the failure to find 'b' would lead to looking for the next 'a'
and then for 'b' after that. However, we already inspected the
entire buffer for 'b', so we know this will fail.
To simplify locking, move all locking out of the individual detect
code. Instead at the start of detection lock the flow, and at the
end of detection unlock it.
The lua code can be called without a lock still (from the output
code paths), so still pass around a lock hint to take care of this.
This new API allows for different SPM implementations, using a function
pointer table like that used for MPM.
This change also switches over the paths that make use of
DetectContentData (which previously used BoyerMoore directly) to the new
API.
The Match functions don't need a pointer to the SigMatch object, just the
context pointer contained inside, so pass the Context to the Match function
rather than the SigMatch object. This allows for further optimization.
Change SigMatch->ctx to have type SigMatchCtx* rather than void* for better
type checking. This requires adding type casts when using or assigning it.
The SigMatch contex should not be changed by the Match() funciton, so pass it
as a const SigMatchCtx*.
Rather than converting the search string to lower case while searching,
convert it to lowercase during initialization.
Changes the Boyer Moore search API for take BmCtx
Change the API for BoyerMoore to take a BmCtx rather than the two parts that
are stored in the context. Which is how it is mostly used. This enforces
always calling BoyerMooreCtxToNocase() to convert to no-case.
Use CtxInit and CtxDeinit functions to create and destroy the context,
even in unit tests.
The old behaviour of returning a failure if we found a pattern while
matching on negated content is now changed to continuing searching
for other combinations where we don't find the pattern for the
negated content.
Thanks to Will Metcalf for reporting this.
Now depth is kept in mind when we inspect chunks in client/server body.
This takes care of FPs originating from inspecting subsequent chunks that
match with depth, but shouldn't.
If a pattern has matched on mpm, don't re-inspect it later, subject to certain
conditions met by the pattern - namely, not negated, right chop, no replacet
attached to it.