It is now possible to set the memcap and hashsize via suricata.yaml and
rules.
Rule example:
alert http any any -> any any (http.user_agent; dataset:isset,ua-seen,type string,load datasets.csv,memcap 100mb,hashsize 2048; sid:1;)
suricata.yaml example:
datasets:
ua-seen:
type: string
load: datasets.csv
memcap: 20mb
hashsize: 2048
Datasets are sets/lists of data that can be accessed or added from
the rule language.
This patch implements 3 data types:
1. string (or buffer)
2. md5
3. sha256
The patch also implements 2 new rule keywords:
1. dataset
2. datarep
The dataset keyword allows matching against a list of values to see if
it exists or not. It can also add the value to the set. The set can
optionally be stored to disk on exit.
The datarep support matching/lookups only. With each item in the set a
reputation value is stored and this value can be matched against. The
reputation value is unsigned 16 bit, so values can be between 0 and 65535.
Datasets can be registered in 2 ways:
1. through the yaml
2. through the rules
The goal of this rules based approach is that rule writers can start using
this without the need for config changes.
A dataset is implemented using a thash hash table. Each dataset is its own
separate thash.