Until now the way to specify a var name in pcre substring capture
into pkt and flow vars was to use the pcre named substring support:
e.g. /(?P<pkt_somename>.*)/
This had 2 drawbacks:
1. limitations of the name. The name could be max 32 chars, only have
alphanumeric and the underscore characters. This imposed limitations
that are not present in flowbits/ints.
2. we didn't actually use the named substrings in pcre through the
API. We parsed the names separately. So putting the names in pcre
would actually be wasteful.
This patch introduces a new way of mapping captures with names:
pcre:"/(.*)/, pkt:somename";
pcre:"/([A-z]+) ([0-9]+)/, pkt:somename,flow:anothername";
The order of the captures and the order of the names are mapped 1 on 1.
This method is no longer limited by the pcre API's naming limits. The
'flow:' and 'pkt:' prefixes indicate what the type of variable is. It's
mandatory to specify one.
The old method is still supported as well.
Flowvars were already using a temporary store in the detect thread
ctx.
Use the same facility for pktvars. The reasons are:
1. packet is not always available, e.g. when running pcre on http
buffers.
2. setting of vars should be done post match. Until now it was also
possible that it is done on a partial match.
Until now variable names, such as flowbit names, were local to a detect
engine. This made sense as they were only ever used in that context.
For the purpose of logging these names, this needs a different approach.
The loggers live outside of the detect engine. Also, in the case of
reloads and multi-tenancy, there are even multiple detect engines, so
it would be even more tricky to access them from the outside.
This patch brings a new approach. A any time, there is a single active
hash table mapping the variable names and their id's. For multiple
tenants the table is shared between tenants.
The table is set up in a 'staging' area, where locking makes sure that
multiple loading threads don't mess things up. Then when the preparing
of a detection engine is ready, but before the detect threads are made
aware of the new detect engine, the active varname hash is swapped with
the staging instance.
For this to work, all the mappings from the 'current' or active mapping
are added to the staging table.
After the threads have reloaded and the new detection engine is active,
the old table can be freed.
For multi tenancy things are similar. The staging area is used for
setting up until the new detection engines / tenants are applied to
the system.
This patch also changes the variable 'id'/'idx' field to uint32_t. Due
to data structure padding and alignment, this should have no practical
drawback while allowing for a lot more vars.
To be able to add a transaction counter we will need a ThreadVars
in the AppLayerParserParse function.
This function is massively used in unittests
and this result in an long commit.
Convert HTTP body handling to use the Streaming Buffer API. This means
the HtpBodyChunks no longer maintain their own data segments, but
instead add their data to the StreamingBuffer instance in the HtpBody
structure.
In case the HtpBodyChunk needs to access it's data it can do so still
through the Streaming Buffer API.
Updates & simplifies the various users of the reassembled bodies:
multipart parsing and the detection engine.
This commit do a find and replace of the following:
- DETECT_SM_LIST_HSBDMATCH by DETECT_SM_LIST_FILEDATA
sed -i 's/DETECT_SM_LIST_HSBDMATCH/DETECT_SM_LIST_FILEDATA/g' src/*
- HSBD by FILEDATA:
sed -i 's/HSBDMATCH/FILEDATA/g' src/*
The Match functions don't need a pointer to the SigMatch object, just the
context pointer contained inside, so pass the Context to the Match function
rather than the SigMatch object. This allows for further optimization.
Change SigMatch->ctx to have type SigMatchCtx* rather than void* for better
type checking. This requires adding type casts when using or assigning it.
The SigMatch contex should not be changed by the Match() funciton, so pass it
as a const SigMatchCtx*.