Add the modbus.function and subfunction) keywords for public function match in rules (Modbus layer).
Matching based on code function, and if necessary, sub-function code
or based on category (assigned, unassigned, public, user or reserved)
and negation is permitted.
Add the modbus.access keyword for read/write Modbus function match in rules (Modbus layer).
Matching based on access type (read or write),
and/or function type (discretes, coils, input or holding)
and, if necessary, read or write address access,
and, if necessary, value to write.
For address and value matching, "<", ">" and "<>" is permitted.
Based on TLS source code and file size source code (address and value matching).
Signed-off-by: David DIALLO <diallo@et.esia.fr>
If a live reload signal was given before the engine was fully started
up (e.g. pcap file thread waiting for a disk to spin up), a segv could
occur.
This patch only enables live reloads after the threads have been
started up completely.
Buffers in per thread HTTP header, client body and server body storage
would be freed based on the usage indicator instead of the size
indicator.
As the usage indicator (e.g. hsbd_buffers_list_len) could be reset
while leaving the memory untouched for later reuse, the free function
would not iterate over all memory blocks.
Removed DrMemory suppressions as well.
Bug #980.
app-layer.[ch], app-layer-detect-proto.[ch] and app-layer-parser.[ch].
Things addressed in this commit:
- Brings out a proper separation between protocol detection phase and the
parser phase.
- The dns app layer now is registered such that we don't use "dnstcp" and
"dnsudp" in the rules. A user who previously wrote a rule like this -
"alert dnstcp....." or
"alert dnsudp....."
would now have to use,
alert dns (ipproto:tcp;) or
alert udp (app-layer-protocol:dns;) or
alert ip (ipproto:udp; app-layer-protocol:dns;)
The same rules extend to other another such protocol, dcerpc.
- The app layer parser api now takes in the ipproto while registering
callbacks.
- The app inspection/detection engine also takes an ipproto.
- All app layer parser functions now take direction as STREAM_TOSERVER or
STREAM_TOCLIENT, as opposed to 0 or 1, which was taken by some of the
functions.
- FlowInitialize() and FlowRecycle() now resets proto to 0. This is
needed by unittests, which would try to clean the flow, and that would
call the api, AppLayerParserCleanupParserState(), which would try to
clean the app state, but the app layer now needs an ipproto to figure
out which api to internally call to clean the state, and if the ipproto
is 0, it would return without trying to clean the state.
- A lot of unittests are now updated where if they are using a flow and
they need to use the app layer, we would set a flow ipproto.
- The "app-layer" section in the yaml conf has also been updated as well.
Make sure we register the detect.alerts counter before packet runtime starts
even in delayed detect mode. The registration of new counters at packet
runtime is not supported by the counters api and might lead to crashes as there
is no proper locking to allow for this operation.
This changes how delayed detect works a bit. Now we call the ThreadInit
callback twice. The first call will only register the counter. The 2nd call
will do all the other setup. This way the counter is registered before the
counters api starts operating in the packet runtime.
Fixes the segv reported in ticket #1018.
Aho-Corasick mpm optimized for Tilera Tile-Gx architecture. Based on the
util-mpm-ac.c code base. The primary optimizations are:
1) Matching function used Tilera specific instructions.
2) Alphabet compression to reduce delta table size to increase cache
utilization and performance.
The basic observation is that not all 256 ASCII characters are used by
the set of multiple patterns in a group for which a DFA is
created. The first reason is that Suricata's pattern matching is
case-insensitive, so all uppercase characters are converted to
lowercase, leaving a hole of 26 characters in the
alphabet. Previously, this hole was simply left in the middle of the
alphabet and thus in the generated Next State (delta) tables.
A new, smaller, alphabet is created using a translation table of 256
bytes per mpm group. Previously, there was one global translation
table for converting upper case to lowercase.
Additional, unused characters are found by creating a histogram of all
the characters in all the patterns. Then all the characters with zero
counts are mapped to one character (0) in the new alphabet. Since
These characters appear in no pattern, they can all be mapped to a
single character and still result in the same matches being
found. Zero was chosen for the value in the new alphabet since this
"character" is more likely to appear in the input. The unused
character always results in the next state being state zero, but that
fact is not currently used by the code, since special casing takes
additional instructions.
The characters that do appear in some pattern are mapped to
consecutive characters in the new alphabet, starting at 1. This
results in a dense packing of next state values in the delta tables
and additionally can allow for a smaller number of columns in that
table, thus using less memory and better packing into the cache. The
size of the new alphabet is the number of used characters plus 1 for
the unused catch-all character.
The alphabet size is rounded up to the next larger power-of-2 so that
multiplication by the alphabet size can be done with a shift. It
might be possible to use a multiply instruction, so that the exact
alphabet size could be used, which would further reduce the size of
the delta tables, increase cache density and not require the
specialized search functions. The multiply would likely add 1 cycle to
the inner search loop.
Since the multiply by alphabet-size is cleverly merged with a mask
instruction (in the SINDEX macro), specialized versions of the
SCACSearch function are generated for alphabet sizes 256, 128, 64, 32
and 16. This is done by including the file util-mpm-ac-small.c
multiple times with a redefined SINDEX macro. A function pointer is
then stored in the mpm context for the search function. For alpha bit
sizes of 8 or smaller, the number of states usually small, so the DFA
is already very small, so there is little difference using the 16
state search function.
The SCACSearch function is also specialized by the size of the value
stored in the next state (delta) tables, either 16-bits or 32-bits.
This removes a conditional inside the Search function. That
conditional is only called once, but doesn't hurt to remove
it. 16-bits are used for up to 32K states, with the sign bit set for
states with matches.
Future optimization:
The state-has-match values is only needed per state, not per next
state, so checking the next-state sign bit could be replaced with
reading a different value, at the cost of an additional load, but
increasing the 16-bit next state span to 64K.
Since the order of the characters in the new alphabet doesn't matter,
the new alphabet could be sorted by the frequency of the characters in
the expected input stream for that multi-pattern matcher. This would
group more frequent characters into the same cache lines, thus
increasing the probability of reusing a cache-line.
All the next state values for each state live in their own set of
cache-lines. With power-of-two sizes alphabets, these don't overlap.
So either 32 or 16 character's next states are loaded in each cache
line load. If the alphabet size is not an exact power-of-2, then the
last cache-line is not completely full and up to 31*2 bytes of that
line could be wasted per state.
The next state table could be transposed, so that all the next states
for a specific character are stored sequentially, this could be better
if some characters, for example the unused character, are much more
frequent.
Improved accuracy, improved performance. Performance improvement
noticeable with http heavy traffic and ruleset.
A lot of other cosmetic changes carried out as well. Wrappers introduced
for a lot of app layer functions.
Failing dce unittests disabled. Will be reintroduced in the updated dce
engine.
Cross transaction matching taken care of. FPs emanating from these
matches have now disappeared. Double inspection of transactions taken
care of as well.
DetectEngineThreadCtxInit and DetectEngineThreadCtxInitForLiveRuleSwap did
pretty much the same thing, except for a counters registration. As can be
predicted with code duplication like this, things got out of sync. To make
sure this doesn't happen again, I created a helper function that does the
heavy lifting in this function.
All fp id assignment now happens in one go.
Also noticing a slight perf increase, probably emanating from improved cache
perf.
Removed irrelevant unittests as well.
Insert pseudo packet under low load conditions to complete rule swap.
This is necessary when we use autofp active packets where most packets
would be sent to the first queue under low load conditions.
When handling error case on SCMallog, SCCalloc or SCStrdup
we are in an unlikely case. This patch adds the unlikely()
expression to indicate this to gcc.
This patch has been obtained via coccinelle. The transformation
is the following:
@istested@
identifier x;
statement S1;
identifier func =~ "(SCMalloc|SCStrdup|SCCalloc)";
@@
x = func(...)
... when != x
- if (x == NULL) S1
+ if (unlikely(x == NULL)) S1
Some detection keywords need thread local ctx storage. Example is the
filemagic keyword that has a ctx that is modified with each call. That
is not thread safe. This functionality allows registration of thread
local ctxs so that each detect thread works on it's own copy.
This patch converts the series of variable to an atomic.
Furthermore, as the callbacks are now always run, it is not
necessary anymore to refuse a ruleswap if HTP parameters are
changing.
To reload ruleset during engine runtime, send the USR2 signal to the engine, and the ruleset would be reloaded from the same yaml file supplied at engine startup
Per packet profiling uses tick based accounting. It has 2 outputs, a summary
and a csv file that contains per packet stats.
Stats per packet include:
1) total ticks spent
2) ticks spent per individual thread module
3) "threading overhead" which is simply calculated by subtracting (2) of (1).
A number of changes were made to integrate the new code in a clean way:
a number of generic enums are now placed in tm-threads-common.h so we can
include them from any part of the engine.
Code depends on --enable-profiling just like the rule profiling code.
New yaml parameters:
profiling:
# packet profiling
packets:
# Profiling can be disabled here, but it will still have a
# performance impact if compiled in.
enabled: yes
filename: packet_stats.log
append: yes
# per packet csv output
csv:
# Output can be disabled here, but it will still have a
# performance impact if compiled in.
enabled: no
filename: packet_stats.csv
Example output of summary stats:
IP ver Proto cnt min max avg
------ ----- ------ ------ ---------- -------
IPv4 6 19436 11448 5404365 32993
IPv4 256 4 11511 49968 30575
Per Thread module stats:
Thread Module IP ver Proto cnt min max avg
------------------------ ------ ----- ------ ------ ---------- -------
TMM_DECODEPCAPFILE IPv4 6 19434 1242 47889 1770
TMM_DETECT IPv4 6 19436 1107 137241 1504
TMM_ALERTFASTLOG IPv4 6 19436 90 1323 155
TMM_ALERTUNIFIED2ALERT IPv4 6 19436 108 1359 138
TMM_ALERTDEBUGLOG IPv4 6 19436 90 1134 154
TMM_LOGHTTPLOG IPv4 6 19436 414 5392089 7944
TMM_STREAMTCP IPv4 6 19434 828 1299159 19438
The proto 256 is a counter for handling of pseudo/tunnel packets.
Example output of csv:
pcap_cnt,ipver,ipproto,total,TMM_DECODENFQ,TMM_VERDICTNFQ,TMM_RECEIVENFQ,TMM_RECEIVEPCAP,TMM_RECEIVEPCAPFILE,TMM_DECODEPCAP,TMM_DECODEPCAPFILE,TMM_RECEIVEPFRING,TMM_DECODEPFRING,TMM_DETECT,TMM_ALERTFASTLOG,TMM_ALERTFASTLOG4,TMM_ALERTFASTLOG6,TMM_ALERTUNIFIEDLOG,TMM_ALERTUNIFIEDALERT,TMM_ALERTUNIFIED2ALERT,TMM_ALERTPRELUDE,TMM_ALERTDEBUGLOG,TMM_ALERTSYSLOG,TMM_LOGDROPLOG,TMM_ALERTSYSLOG4,TMM_ALERTSYSLOG6,TMM_RESPONDREJECT,TMM_LOGHTTPLOG,TMM_LOGHTTPLOG4,TMM_LOGHTTPLOG6,TMM_PCAPLOG,TMM_STREAMTCP,TMM_DECODEIPFW,TMM_VERDICTIPFW,TMM_RECEIVEIPFW,TMM_RECEIVEERFFILE,TMM_DECODEERFFILE,TMM_RECEIVEERFDAG,TMM_DECODEERFDAG,threading
1,4,6,172008,0,0,0,0,0,0,47889,0,0,48582,1323,0,0,0,0,1359,0,1134,0,0,0,0,0,8028,0,0,0,49356,0,0,0,0,0,0,0,14337
First line of the file contains labels.
2 example gnuplot scripts added to plot the data.
This patch generated via coccinelle is getting rid of logging
message after a SCMalloc failure. They were useless as SCMalloc
already displays a message.