This patch introduces new option to dataset keyword.
Where regular dataset allows match from sets, dataset with json
format allows the same but also adds JSON data to the alert
event. This data is coming from the set definition it self.
For example, an ipv4 set will look like:
[{"ip": "10.16.1.11", "test": "success","context":3}]
The syntax is a JSON array but it can also be a JSON object
with an array inside. The idea is to directly used data coming
from the API of a threat intel management software.
The syntax of the keyword is the following:
dataset:isset,src_ip,type ip,load src.lst,format json, \
enrichment_key src_ip, value_key ip;
Compare to dataset, it just have a supplementary option key
that is used to indicate in which subobject the JSON value
should be added.
The information is added in the even under the alert.extra
subobject:
"alert": {
"extra": {
"src_ip": {
"ip": "10.6.1.11",
"test": "success",
"context": 3
},
The main interest of the feature is to be able to contextualize
a match. For example, if you have an IOC source, you can do
[
{"buffer": "value1", "actor":"APT28","Country":"FR"},
{"buffer": "value2", "actor":"APT32","Country":"NL"}
]
This way, a single dataset is able to produce context to the
event where it was not possible before and multiple signatures
had to be used.
The format introduced in datajson is an evolution of the
historical datarep format. This has some limitations. For example,
if a user fetch IOCs from a threat intel server there is a large
change that the format will be JSON or XML. Suricata has no support
for the second but can support the first one.
Keeping the key value may seem redundant but it is useful to have it
directly accessible in the extra data to be able to query it
independantly of the signature (where it can be multiple metadata
or even be a transformed metadata).
In some case, when interacting with data (mostly coming from
threat intel servers), the JSON array containing the data
to use is not at the root of the object and it is ncessary
to access a subobject.
This patch implements this with support of key in level1.level2.
This is done via the `array_key` option that contains the path
to the data.
Ticket: #7372
Add a pure rust base64 decoder. This supports 3 modes of operation just
like the C decoder as follows.
1. RFC 2045
2. RFC 4648
3. Strict
One notable change is that "strict" mode is carried out by the rust
base64 crate instead of native Rust. This crate was already used for
encoding in a few places like datasets of string type. As a part of this
mode, now, only the strings that can be reliably converted back are
decoded.
The decoder fn is available to C via FFI.
Bug 6280
Ticket 7065
Ticket 7058
Datasets are sets/lists of data that can be accessed or added from
the rule language.
This patch implements 3 data types:
1. string (or buffer)
2. md5
3. sha256
The patch also implements 2 new rule keywords:
1. dataset
2. datarep
The dataset keyword allows matching against a list of values to see if
it exists or not. It can also add the value to the set. The set can
optionally be stored to disk on exit.
The datarep support matching/lookups only. With each item in the set a
reputation value is stored and this value can be matched against. The
reputation value is unsigned 16 bit, so values can be between 0 and 65535.
Datasets can be registered in 2 ways:
1. through the yaml
2. through the rules
The goal of this rules based approach is that rule writers can start using
this without the need for config changes.
A dataset is implemented using a thash hash table. Each dataset is its own
separate thash.