suricata

Commit Graph

Author	SHA1	Message	Date
Ken Steele	7a2095d851	In AC-Tile, convert from using pids for indexing to pattern index Use an MPM specific pattern index, which is simply an index starting at zero and incremented for each pattern added to the MPM, rather than the externally provided Pattern ID (pid), since that can be much larger than the number of patterns. The Pattern ID is shared across at MPMs. For example, an MPM with one pattern with pid=8000 would result in a max_pid of 8000, so the pid_pat_list would have 8000 entries. The pid_pat_list[] is replaced by a array of pattern indexes. The PID is moved to the SCACTilePatternList as a single value. The PatternList is also indexed by the Pattern Index. max_pat_id is no longer needed and mpm_ctx->pattern_cnt is used instead. The local bitarray is then also indexed by pattern index instead of PID, making it much smaller. The local bit array sets a bit for each pattern found for this MPM. It is only kept during one MPM search (stack allocated). One note, the local bit array is checked first and if the pattern has already been found, it will stop checking, but count a match. This could result in over counting matches of case-sensitve matches, since following case-insensitive matches will also be counted. For example, finding "Foo" in "foo Foo foo" would report finding "Foo" 2 times, mis-counting the third word as "Foo".	11 years ago
Ken Steele	d03f124445	Implement MPM opt for b2g, b3g, wumanber Found problems in b2gm and b2gc, so those are removed.	11 years ago
Ken Steele	edaefe5af2	Fix AC-tile for new pattern ID array.	11 years ago
Ken Steele	354a24e2ef	Fix unaligned load in AC-TILE MPM. The SLOAD define using __insn_ld2s_L2 is used to provide a compiler hint that the load will come from the L2 cache instead of the L1. It also specifies that it is a 2 byte signed load. For the Tiny MPM, that needs to be a 1-byte load, which is what is specified in util-ac-mpm-tile.c, but the #undef was removing that definition.	11 years ago
Ken Steele	92a821cdd9	Fix make distcheck for Tile src/Makefile.am was missing util-mpm-ac-tile-small.c which caused release tarballs for fail to build on Tile-Gx.	12 years ago
Ken Steele	326d5d3e15	Add 8-bit states to ac-tile When running with sgh-mpm-context: full, many more MPMs are created (16K) and many are small. If they have less than 128 states, they only need 1 byte for the next state instead of 2 bytes, cutting the size of the next-state table in half. This reduces total memory usage. Since that makes 3 different state sizes (1, 2 and 4 bytes), rather than going from 2 copies of the code to create the MPM to 3, I factored out the code that fills the next-state table into three functions so that all the other code could be the same. The search function is now parameterize for 8-bit and 16-bit state sizes and alphabet sizes 8, 16, 32, 64, 128 and 256.	12 years ago
Ken Steele	3870def601	Split AC-Tile MPM context into Search and Initialization structures. Some of the fields in the SCACTileCtx struct are only used to create the MPM, but are not needed to search the MPM. Create a new structure to contain just the data needed by AC Search. After creating the MPM, copy the data into the new structure and then free the memory only needed during initialization. This reduces the size of the AC-Tile MPM context from 1360 bytes down to 296 bytes.	12 years ago
Ken Steele	e05034f5dd	New Multi-pattern matcher, ac-tile, optimized for Tile architecture. Aho-Corasick mpm optimized for Tilera Tile-Gx architecture. Based on the util-mpm-ac.c code base. The primary optimizations are: 1) Matching function used Tilera specific instructions. 2) Alphabet compression to reduce delta table size to increase cache utilization and performance. The basic observation is that not all 256 ASCII characters are used by the set of multiple patterns in a group for which a DFA is created. The first reason is that Suricata's pattern matching is case-insensitive, so all uppercase characters are converted to lowercase, leaving a hole of 26 characters in the alphabet. Previously, this hole was simply left in the middle of the alphabet and thus in the generated Next State (delta) tables. A new, smaller, alphabet is created using a translation table of 256 bytes per mpm group. Previously, there was one global translation table for converting upper case to lowercase. Additional, unused characters are found by creating a histogram of all the characters in all the patterns. Then all the characters with zero counts are mapped to one character (0) in the new alphabet. Since These characters appear in no pattern, they can all be mapped to a single character and still result in the same matches being found. Zero was chosen for the value in the new alphabet since this "character" is more likely to appear in the input. The unused character always results in the next state being state zero, but that fact is not currently used by the code, since special casing takes additional instructions. The characters that do appear in some pattern are mapped to consecutive characters in the new alphabet, starting at 1. This results in a dense packing of next state values in the delta tables and additionally can allow for a smaller number of columns in that table, thus using less memory and better packing into the cache. The size of the new alphabet is the number of used characters plus 1 for the unused catch-all character. The alphabet size is rounded up to the next larger power-of-2 so that multiplication by the alphabet size can be done with a shift. It might be possible to use a multiply instruction, so that the exact alphabet size could be used, which would further reduce the size of the delta tables, increase cache density and not require the specialized search functions. The multiply would likely add 1 cycle to the inner search loop. Since the multiply by alphabet-size is cleverly merged with a mask instruction (in the SINDEX macro), specialized versions of the SCACSearch function are generated for alphabet sizes 256, 128, 64, 32 and 16. This is done by including the file util-mpm-ac-small.c multiple times with a redefined SINDEX macro. A function pointer is then stored in the mpm context for the search function. For alpha bit sizes of 8 or smaller, the number of states usually small, so the DFA is already very small, so there is little difference using the 16 state search function. The SCACSearch function is also specialized by the size of the value stored in the next state (delta) tables, either 16-bits or 32-bits. This removes a conditional inside the Search function. That conditional is only called once, but doesn't hurt to remove it. 16-bits are used for up to 32K states, with the sign bit set for states with matches. Future optimization: The state-has-match values is only needed per state, not per next state, so checking the next-state sign bit could be replaced with reading a different value, at the cost of an additional load, but increasing the 16-bit next state span to 64K. Since the order of the characters in the new alphabet doesn't matter, the new alphabet could be sorted by the frequency of the characters in the expected input stream for that multi-pattern matcher. This would group more frequent characters into the same cache lines, thus increasing the probability of reusing a cache-line. All the next state values for each state live in their own set of cache-lines. With power-of-two sizes alphabets, these don't overlap. So either 32 or 16 character's next states are loaded in each cache line load. If the alphabet size is not an exact power-of-2, then the last cache-line is not completely full and up to 31*2 bytes of that line could be wasted per state. The next state table could be transposed, so that all the next states for a specific character are stored sequentially, this could be better if some characters, for example the unused character, are much more frequent.	12 years ago

8 Commits (dad1f85edb59406a00164e6533c31ca12253b790)