Until now the way to specify a var name in pcre substring capture
into pkt and flow vars was to use the pcre named substring support:
e.g. /(?P<pkt_somename>.*)/
This had 2 drawbacks:
1. limitations of the name. The name could be max 32 chars, only have
alphanumeric and the underscore characters. This imposed limitations
that are not present in flowbits/ints.
2. we didn't actually use the named substrings in pcre through the
API. We parsed the names separately. So putting the names in pcre
would actually be wasteful.
This patch introduces a new way of mapping captures with names:
pcre:"/(.*)/, pkt:somename";
pcre:"/([A-z]+) ([0-9]+)/, pkt:somename,flow:anothername";
The order of the captures and the order of the names are mapped 1 on 1.
This method is no longer limited by the pcre API's naming limits. The
'flow:' and 'pkt:' prefixes indicate what the type of variable is. It's
mandatory to specify one.
The old method is still supported as well.
Flowvars were already using a temporary store in the detect thread
ctx.
Use the same facility for pktvars. The reasons are:
1. packet is not always available, e.g. when running pcre on http
buffers.
2. setting of vars should be done post match. Until now it was also
possible that it is done on a partial match.
Until now variable names, such as flowbit names, were local to a detect
engine. This made sense as they were only ever used in that context.
For the purpose of logging these names, this needs a different approach.
The loggers live outside of the detect engine. Also, in the case of
reloads and multi-tenancy, there are even multiple detect engines, so
it would be even more tricky to access them from the outside.
This patch brings a new approach. A any time, there is a single active
hash table mapping the variable names and their id's. For multiple
tenants the table is shared between tenants.
The table is set up in a 'staging' area, where locking makes sure that
multiple loading threads don't mess things up. Then when the preparing
of a detection engine is ready, but before the detect threads are made
aware of the new detect engine, the active varname hash is swapped with
the staging instance.
For this to work, all the mappings from the 'current' or active mapping
are added to the staging table.
After the threads have reloaded and the new detection engine is active,
the old table can be freed.
For multi tenancy things are similar. The staging area is used for
setting up until the new detection engines / tenants are applied to
the system.
This patch also changes the variable 'id'/'idx' field to uint32_t. Due
to data structure padding and alignment, this should have no practical
drawback while allowing for a lot more vars.
Matches on the start of a HTTP request or response.
Uses a buffer constructed from the request line and normalized request
headers, including the Cookie header.
Or for the response side, it uses the response line plus the
normalized response headers, including the Set-Cookie header.
Both buffers are terminated by an extra \r\n.
A sticky buffer that allows content inspection on a contructed buffer
of HTTP header names. The buffer starts with \r\n, the names are
separated by \r\n and the end of the buffer contains an extra \r\n.
E.g. \r\nHost\r\nUser-Agent\r\n\r\n
The leading \r\n is to make sure one can match on a full name in all
cases.
Some keywords need a scratch space where they can do store the results
of expensive operations that remain valid for the time of a packets
journey through the detection engine.
An example is the reconstructed 'http_header' field, that is needed
in MPM, and then for each rule that manually inspects it. Storing this
data in the flow is a waste, and reconstructing multiple times on
demand as well.
This API allows for registering a keyword with an init and free function.
It it mean to be used an initialization time, when the keyword is
registered.