Commit Graph

4288 Commits (d76a5bedbc51a862e3a6722c3c05ae7bd8eb7c75)
 

Author SHA1 Message Date
Ken Steele e05034f5dd New Multi-pattern matcher, ac-tile, optimized for Tile architecture.
Aho-Corasick mpm optimized for Tilera Tile-Gx architecture. Based on the
util-mpm-ac.c code base. The primary optimizations are:
1) Matching function used Tilera specific instructions.
2) Alphabet compression to reduce delta table size to increase cache
   utilization  and performance.

The basic observation is that not all 256 ASCII characters are used by
the set of multiple patterns in a group for which a DFA is
created. The first reason is that Suricata's pattern matching is
case-insensitive, so all uppercase characters are converted to
lowercase, leaving a hole of 26 characters in the
alphabet. Previously, this hole was simply left in the middle of the
alphabet and thus in the generated Next State (delta) tables.

A new, smaller, alphabet is created using a translation table of 256
bytes per mpm group. Previously, there was one global translation
table for converting upper case to lowercase.

Additional, unused characters are found by creating a histogram of all
the characters in all the patterns. Then all the characters with zero
counts are mapped to one character (0) in the new alphabet. Since
These characters appear in no pattern, they can all be mapped to a
single character and still result in the same matches being
found. Zero was chosen for the value in the new alphabet since this
"character" is more likely to appear in the input. The unused
character always results in the next state being state zero, but that
fact is not currently used by the code, since special casing takes
additional instructions.

The characters that do appear in some pattern are mapped to
consecutive characters in the new alphabet, starting at 1. This
results in a dense packing of next state values in the delta tables
and additionally can allow for a smaller number of columns in that
table, thus using less memory and better packing into the cache. The
size of the new alphabet is the number of used characters plus 1 for
the unused catch-all character.

The alphabet size is rounded up to the next larger power-of-2 so that
multiplication by the alphabet size can be done with a shift.  It
might be possible to use a multiply instruction, so that the exact
alphabet size could be used, which would further reduce the size of
the delta tables, increase cache density and not require the
specialized search functions. The multiply would likely add 1 cycle to
the inner search loop.

Since the multiply by alphabet-size is cleverly merged with a mask
instruction (in the SINDEX macro), specialized versions of the
SCACSearch function are generated for alphabet sizes 256, 128, 64, 32
and 16.  This is done by including the file util-mpm-ac-small.c
multiple times with a redefined SINDEX macro. A function pointer is
then stored in the mpm context for the search function. For alpha bit
sizes of 8 or smaller, the number of states usually small, so the DFA
is already very small, so there is little difference using the 16
state search function.

The SCACSearch function is also specialized by the size of the value
stored in the next state (delta) tables, either 16-bits or 32-bits.
This removes a conditional inside the Search function. That
conditional is only called once, but doesn't hurt to remove
it. 16-bits are used for up to 32K states, with the sign bit set for
states with matches.

Future optimization:

The state-has-match values is only needed per state, not per next
state, so checking the next-state sign bit could be replaced with
reading a different value, at the cost of an additional load, but
increasing the 16-bit next state span to 64K.

Since the order of the characters in the new alphabet doesn't matter,
the new alphabet could be sorted by the frequency of the characters in
the expected input stream for that multi-pattern matcher. This would
group more frequent characters into the same cache lines, thus
increasing the probability of reusing a cache-line.

All the next state values for each state live in their own set of
cache-lines. With power-of-two sizes alphabets, these don't overlap.
So either 32 or 16 character's next states are loaded in each cache
line load. If the alphabet size is not an exact power-of-2, then the
last cache-line is not completely full and up to 31*2 bytes of that
line could be wasted per state.

The next state table could be transposed, so that all the next states
for a specific character are stored sequentially, this could be better
if some characters, for example the unused character, are much more
frequent.
12 years ago
Victor Julien 77b429c402 xff: fix unittest crashes 12 years ago
Victor Julien 05d68ce394 xff: don't do xff check if there are no alerts anyway. 12 years ago
Duarte Silva 7dbb305255 Adds X-Forwarded-For support to the Unified2 output format
- Added the Unified2 file format related constants
 - Added IPv6 support
 - Two modes of operation with a fall-back to "extra-data" mode if
   "overwrite" mode is not applicable
 - Changed the configuration loading code to handle the new
   configuration structure
 - When creating the packet that fakes the one that generated the alert
   the flow direction wasn't taken into account in overwrite mode
 - Fixed BUG_ON condition
12 years ago
Duarte Silva a28ec79912 Modified suricata configuration
- Added the settings for XFF support
 - Removed non printable characters
12 years ago
Eric Leblond daa9dcb75f Use wget or curl to download ruleset. 12 years ago
Victor Julien 900918a5d1 Bug #948: detect thread local storage support 12 years ago
Ken Steele 0861d3a2a3 Minor optimization in time caching code.
Reduced the size of the cached string buffer from 128 to 32, which is
still larger than the largest possible time string, which is 26
characters.

Added a check for the user passing in an output buffer that is smaller
than the cached string. Previously, the code would have copied past
the end of the users buffer.
12 years ago
Anoop Saldanha 49dcb0ca84 fix for #925.
Log sensible error message when the user doesn't supply a value for
stream.prealloc-sessions or when the values supplied in invalid and
the engine resorts to using a default.
12 years ago
Anoop Saldanha db6ef81fb0 fix for #926.
Supply meaningful error message when user supplies invalid value for
host.prealloc.
12 years ago
Anoop Saldanha b90a56b626 fix for #927.
Print an error message when the user supplies an invalid value for
detect-thread-ratio in the conf file.
12 years ago
Anoop Saldanha bed3f605fa Fix for #922.
Add more relevant error message when we supply invalid value for
defrag.trackers and defrag.hash-size
12 years ago
Anoop Saldanha 6608e7f523 Introduce generic utility API to log message on invalid config entry. 12 years ago
Victor Julien 6d34834623 Runmode fixes and cleanups
Bug #939: thread name buffers are sized inconsistently
These buffers are now all fixed at 16 bytes.

Bug #914: Having a high number of pickup queues (216+) makes suricata crash
Fixed so that we can now have 256 pickup queues, which is the current built-in
maximum. Improved the error reporting.

Bug #928: Max number of threads
Error reporting improved. Issue was the same as #914.
12 years ago
Eric Leblond 8a96296b4a prscript: add verbose option 12 years ago
Eric Leblond f23556dcdb prscript: exit when no build exists 12 years ago
Eric Leblond c151b218f1 prscript: check if branch is synced with master
The script now check if the tested branch is in sync with current
inliniac's master.
12 years ago
Eric Leblond c390006aee script: add script to start personal builder
This script HAS to be used by developer having an account on Suricata
buildbot. It MUST be run before doing a PR. It will trigger a build on
the branch and this will check the validity of the proposed branch.
The cinematic is simple:
 - Push branch XXX to github
 - Run 'prscript.py -u USER -p PASSWORD XXX'
 - Wait for the result
 - If successful, PR can be done
12 years ago
Anoop Saldanha 56143131da Fix unittests that use chunked encoding. 12 years ago
Nelson Escobar ef4d11aeb5 Use the Async versions of SCCudaMemcpy* to improve gpu performance. 12 years ago
Eric Leblond 867a44f378 autotools: all target are conditional 12 years ago
Eric Leblond 77f2b9968e autotools: use builddir instead of srcdir
srcdir is supposed to be read-only when running distcheck so it is better to
create the log directory in builddir.
12 years ago
Ignacio Sanchez 1b2f251866 Various custom http logging improvements
Cookie is parsed now using uint8_t pointers (inliniac PR comments)
Changed buffer size to a power of 2 (8192) and cookie value extraction function to static (inliniac PR comments)
Added %b for request size (vinfang patch)
Writing "-" if an unknown % directive is used (vinfang patch)
Fixed bug in cookie parser
Fixed format string issue logging literal values
Improve error handling (Victor Julien comments)

(patchset rebased and reworded by Victor Julien)
12 years ago
Ignacio Sanchez 8051dc8a6a Added modifications suggested by Charles Smutz (https://redmine.openinfosecfoundation.org/issues/602) 12 years ago
Ignacio Sanchez 796bfab231 Added support for %{cookiename}C
Added support for the definition of maximun length. ie: %[50]{user-agent}i
Some small bugfixes
12 years ago
Eric Leblond 3dbf6c6fee solaris: fix compilation failure
This patch fixes a compilation failure on Solaris. Compiler does
not support when a function returning void is used in return of
an other function returning void.
12 years ago
Ken Steele 1bbbcf5120 Make the missing libhtp error message more clear.
Use exact git clone command and then rerun autogen.sh and configure.
12 years ago
Ken Steele a2b502a30c Formatting change for function call.
Put open brace { for function on a new line to match coding standard.

Changed:

int foo(int x) {
}

to:

int foo(int x)
{
}
12 years ago
Ken Steele d4dd18eb85 Clean up SCLocalTime() usage
Remove cast of return type from SCLocalTime() as it is not needed.
Replace last use of localtime_r() with SCLocalTime().
12 years ago
Ken Steele 77fae5313d On Open BSD systems don't cache time.
Open BSD doesn't support __thread, which is used for time caching, so
don't do time chaching for BSD systems.
12 years ago
Ken Steele 2feb37c155 Cache time conversions for localtime() and CreateTimeString()
When converting a time in seconds (64-bit seconds since 1970) to
Month/Day/Year hours minutes, Suricata calls localtime_r(), which
always aquires a lock and then does complex comutation based on the
current time zone. The time zone can be specified in the TZ
environment variable, which is only parsed the first time it is used,
or from a file. The default file is /etc/localtime. The file is
checked each time to see if it might have changed and is reparsed if
it has changed.

The GLIBC library has a lock inside localtime_r(), which limits
parallelism, which is a problem when the rate of generating alerts is
high, since Suricata generates a new ascii time string for each alert
into fast.log.

This change caches the value returned by localtime_t() and then sets
the seconds within the minute based on the cached start-of-minute
time. All of the values return, expect for the seconds, is constant
within the same minute. Switching to a new seconds could change all
the other values, year, month, day, hour. The cache stores the current
and previous minute values.

The same trick is used in CreateTimeString() for generated time
string. The string, up to the minutes, is cached and then copied into
the result string, followed by printing the new seconds into the
result string.

The seconds within a minute are calculated as the difference in
seconds from the start of the current minute.
12 years ago
Ken Steele 68d26dcec7 Merge multiple copies of CreateTimeString() to one copy.
There were 8 identical copies of CreateTimeString() in 8 files.
Most used SCLocalTime, to replace localtime_r(), but some did not.
Created one copy in util-time.c.
12 years ago
Ken Steele 5532af4621 Create SCMUTEX_INITIALIZER to abstract out PTHREAD_MUTEX_INITIALIZER
This allows replacing pthread mutexes with other types of mutex.
12 years ago
Ken Steele 784843b146 Use Tilera SIMD for Signature matching ala SSE3
Makes use of 8-wide byte compare instructions in signature matching.

For allocating aligned memory, _mm_malloc() is SSE only, so added
check for __tile__ to use memalign() instead.

Shows a 13% speed up.
12 years ago
Ken Steele 22225a7e99 Tile SIMD implementation of SCMemcmp and SCMemcmpLowercase
Based on the SSE3 implementation, it checks 8 bytes at a time.
12 years ago
Anoop Saldanha e68d44b051 fix for #932.
ipv6 tunnel decoder wrongly treats the tunneled ipv6 packets as an ipv4
packet.
12 years ago
Anoop Saldanha e2f4144d99 fix for #920.
Cull the space before the address specified in address var variables.
12 years ago
Duarte Silva ab215c72f6 Now using the common functions 12 years ago
Duarte Silva 0a5c798729 Now using the common functions
- Removed some non printable ANSI characters
- Removed unecessary include
12 years ago
Duarte Silva 8ce95af09c Added the new files containing the repeated functions
- Renamed the functions to something more generic
- Added the source and include files to the Makefile
12 years ago
Anoop Saldanha a44d42b124 Fixes segv inside rule swap under low mem conditions.
We now gracefully exit rule swap on any allocation or other failures.
12 years ago
Anoop Saldanha 8516ba24c9 Rearrange ac state.
Notice a minor speed bump of around 2% on runs.  More updates to follow.
12 years ago
Ken Steele 4b8bb11454 Enable using Tile cycle counter.
The Tile processors all have a cycle counter with a simple interface. Use
that for UtilCpuGetTicks.
12 years ago
Victor Julien 38aaae1fd7 IsRuleReloadSet() shouldn't return an uninitialized value 12 years ago
Eric Leblond 189327981a unittests: fix stream-tcp.c
Lock and recycle fixes for stream-tcp.c
12 years ago
Eric Leblond cd3e32ce19 unittests: some functions needs a flow lock.
In debug validation mode, it is required to call application layer
parsing and other functions with a lock on flow. This patch updates
the code to do so.
12 years ago
Eric Leblond c5bd04f102 unittest: recycle packet before exit
To avoid an issue with flow validation, we need to recycle the packet
before cleaning the flow.
12 years ago
Anoop Saldanha d292f1a529 fix for #915. Fix segv when we send NULL to snprintf. 12 years ago
Eric Leblond c6e8c5bf1f pf_ring: avoid to ask for extended header.
This patch update pf_ring capture to avoid to ask for extended
header. They are only needed when rxonly checksum checks is used
and this is only possible when interface is not a DNA interface.
12 years ago
Victor Julien ff668c2030 Fix Tile compile 12 years ago