doc: fast-pattern

10 years ago · 33e96c5087
parent a464573230
commit 33e96c5087
3 changed files with 246 additions and 1 deletions
--- a/doc/sphinx/fast-pattern-explained.rst
+++ b/doc/sphinx/fast-pattern-explained.rst
@ -0,0 +1,186 @@
+Suricata Fast Pattern Determination Explained
+=============================================
+
+If the 'fast_pattern' keyword is explicitly set in a rule, Suricata
+will use that as the fast pattern match.  The 'fast_pattern' keyword
+can only be set once per rule.  If 'fast_pattern' is not set, Suricata
+automatically determines the content to use as the fast pattern match.
+
+The following explains the logic Suricata uses to automatically
+determine the fast pattern match to use.
+
+Be aware that if there are positive (i.e. non-negated) content
+matches, then negated content matches are ignored for fast pattern
+determination.  Otherwise, negated content matches are considered.
+
+Suricata 1.1.x - 1.4.x
+----------------------
+
+#. The longest (in terms of character/byte length) content match is
+   used as the fast pattern match.
+
+#. If multiple content matches qualify for the longest length, the one
+   with the highest character/byte diversity score ("Pattern
+   Strength") is used as the fast pattern match.  See :ref:`Appendix C
+   <fast-pattern-explained-appendix-c>` for details on the algorithm
+   used to determine Pattern Strength.
+
+#. If multiple content matches qualify for the longest length and have
+   the same highest Pattern Strength, the buffer that has the *lower
+   "list_id"* is used as the fast pattern match.  See :ref:`Appendix A
+   <fast-pattern-explained-appendix-a>` for the list_id of each
+   buffers/list.
+
+#. If multiple content matches qualify for the longest length and have
+   the same highest Pattern Strength, and have the same list_id
+   (i.e. are looking in the same buffer), then the one that comes
+   first (from left-to-right) in the rule is used as the fast pattern
+   match.
+
+It is worth noting that for content matches that have the same length
+and Pattern Strength, regular 'content' matches take precedence over
+matches that use the 'http_*' buffers.
+
+Suricata 2.0.x
+--------------
+
+#. Suricata first identifies all content matches that have the highest
+   "priority" that are used in the signature.  The priority is based
+   off of the buffer being matched on and generally 'http_*' buffers
+   have a higher priority (lower number is higher priority).  See
+   :ref:`Appendix B <fast-pattern-explained-appendix-b>` for details
+   on which buffers have what priority.
+#. Within the content matches identified in step 1 (the highest
+   priority content matches), the longest (in terms of character/byte
+   length) content match is used as the fast pattern match.
+#. If multiple content matches have the same highest priority and
+   qualify for the longest length, the one with the highest
+   character/byte diversity score ("Pattern Strength") is used as the
+   fast pattern match.  See :ref:`Appendix C
+   <fast-pattern-explained-appendix-c>` for details on the algorithm
+   used to determine Pattern Strength.
+#. If multiple content matches have the same highest priority, qualify
+   for the longest length, and the same highest Pattern Strength, the
+   buffer ("list_id") that was *registered last* is used as the fast
+   pattern match.  See :ref:`Appendix B
+   <fast-pattern-explained-appendix-b>` for the registration order of
+   the different buffers/lists.
+#. If multiple content matches have the same highest priority, qualify
+   for the longest length, the same highest Pattern Strength, and have
+   the same list_id (i.e. are looking in the same buffer), then the
+   one that comes first (from left-to-right) in the rule is used as
+   the fast pattern match.
+
+It is worth noting that for content matches that have the same
+priority, length, and Pattern Strength, 'http_stat_msg',
+'http_stat_code', and 'http_method' take precedence over regular
+'content' matches.
+
+Appendices
+----------
+
+.. _fast-pattern-explained-appendix-a:
+
+Appendix A - Buffers, list_id values, and Registration Order for Suricata 1.3.4
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This should be pretty much the same for Suricata 1.1.x - 1.4.x.
+
+======= ============================== ======================== ==================
+list_id Content Modifier Keyword       Buffer Name              Registration Order
+======= ============================== ======================== ==================
+1       <none> (regular content match) DETECT_SM_LIST_PMATCH    1 (first)
+2       http_uri                       DETECT_SM_LIST_UMATCH    2
+6       http_client_body               DETECT_SM_LIST_HCBDMATCH 3
+7       http_server_body               DETECT_SM_LIST_HSBDMATCH 4
+8       http_header                    DETECT_SM_LIST_HHDMATCH  5
+9       http_raw_header                DETECT_SM_LIST_HRHDMATCH 6
+10      http_method                    DETECT_SM_LIST_HMDMATCH  7
+11      http_cookie                    DETECT_SM_LIST_HCDMATCH  8
+12      http_raw_uri                   DETECT_SM_LIST_HRUDMATCH 9
+13      http_stat_msg                  DETECT_SM_LIST_HSMDMATCH 10
+14      http_stat_code                 DETECT_SM_LIST_HSCDMATCH 11
+15      http_user_agent                DETECT_SM_LIST_HUADMATCH 12 (last)
+======= ============================== ======================== ==================
+
+Note: registration order doesn't matter when it comes to determining the fast pattern match for Suricata 1.3.4 but list_id value does.
+
+.. _fast-pattern-explained-appendix-b:
+
+Appendix B - Buffers, list_id values, Priorities, and Registration Order for Suricata 2.0.7
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This should be pretty much the same for Suricata 2.0.x.
+
+========================================== ================== ============================== ============================= =======
+Priority (lower number is higher priority) Registration Order Content Modifier Keyword       Buffer Name                   list_id
+========================================== ================== ============================== ============================= =======
+3                                          11                 <none> (regular content match) DETECT_SM_LIST_PMATCH         1
+3                                          12                 http_method                    DETECT_SM_LIST_HMDMATCH       12
+3                                          13                 http_stat_code                 DETECT_SM_LIST_HSCDMATCH      9
+3                                          14                 http_stat_msg                  DETECT_SM_LIST_HSMDMATCH      8
+2                                          1 (first)          http_client_body               DETECT_SM_LIST_HCBDMATCH      4
+2                                          2                  http_server_body               DETECT_SM_LIST_HSBDMATCH      5
+2                                          3                  http_header                    DETECT_SM_LIST_HHDMATCH       6
+2                                          4                  http_raw_header                DETECT_SM_LIST_HRHDMATCH      7
+2                                          5                  http_uri                       DETECT_SM_LIST_UMATCH         2
+2                                          6                  http_raw_uri                   DETECT_SM_LIST_HRUDMATCH      3
+2                                          7                  http_host                      DETECT_SM_LIST_HHHDMATCH      10
+2                                          8                  http_raw_host                  DETECT_SM_LIST_HRHHDMATCH     11
+2                                          9                  http_cookie                    DETECT_SM_LIST_HCDMATCH       13
+2                                          10                 http_user_agent                DETECT_SM_LIST_HUADMATCH      14
+2                                          15 (last)          dns_query                      DETECT_SM_LIST_DNSQUERY_MATCH 20
+========================================== ================== ============================== ============================= =======
+
+Note: list_id value doesn't matter when it comes to determining the
+fast pattern match for Suricata 2.0.7 but registration order does.
+
+.. _fast-pattern-explained-appendix-c:
+
+Appendix C - Pattern Strength Algorithm
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+From detect-engine-mpm.c.  Basically the Pattern Strength "score"
+starts at zero and looks at each character/byte in the passed in byte
+array from left to right.  If the character/byte has not been seen
+before in the array, it adds 3 to the score if it is an alpha
+character; else it adds 4 to the score if it is a printable character,
+0x00, 0x01, or 0xFF; else it adds 6 to the score.  If the
+character/byte has been seen before it adds 1 to the score.  The final
+score is returned.
+
+.. code-block:: c
+
+   /** \brief Predict a strength value for patterns
+    *
+    *  Patterns with high character diversity score higher.
+    *  Alpha chars score not so high
+    *  Other printable + a few common codes a little higher
+    *  Everything else highest.
+    *  Longer patterns score better than short patters.
+    *
+    *  \param pat pattern
+    *  \param patlen length of the patternn
+    *
+    *  \retval s pattern score
+    */
+    uint32_t PatternStrength(uint8_t *pat, uint16_t patlen) {
+	uint8_t a[256];
+	memset(&a, 0 ,sizeof(a));
+	uint32_t s = 0;
+	uint16_t u = 0;
+	for (u = 0; u < patlen; u++) {
+	    if (a[pat[u]] == 0) {
+		if (isalpha(pat[u]))
+		    s += 3;
+		else if (isprint(pat[u]) || pat[u] == 0x00 || pat[u] == 0x01 || pat[u] == 0xFF)
+		    s += 4;
+		else
+		    s += 6;
+		a[pat[u]] = 1;
+	    } else {
+		s++;
+	    }
+	}
+	return s;
+    }
--- a/doc/sphinx/fast-pattern.rst
+++ b/doc/sphinx/fast-pattern.rst
@ -1,4 +1,63 @@
 Fast Pattern
 ============

-Just a place holder now to demontrate linking.
+.. toctree::
+
+   fast-pattern-explained
+
+Only one content of a signature will be used in the Multi Pattern
+Matcher (MPM). If there are multiple contents, then Suricata uses the
+'strongest' content. This means a combination of length, how varied a
+content is, and what buffer it is looking in. Generally, the longer
+and more varied the better. For full details on how Suricata
+determines the fast pattern match, see :doc:`fast-pattern-explained`.
+
+Sometimes a signature writer concludes he wants Suricata to use
+another content than it does by default.
+
+For instance::
+
+  User-agent: Mozilla/5.0 Badness;
+
+  content:”User-Agent|3A|”;
+  content:”Badness”; distance:0;
+
+In this example you see the first content is longer and more varied
+than the second one, so you know Suricata will use this content for
+the MPM.  Because 'User-Agent:' will be a match very often, and
+'Badness' appears less often in network traffic, you can make Suricata
+use the second content by using 'fast_pattern'.
+
+::
+
+  content:”User-Agent|3A|”;
+  content:”Badness”; distance:0; fast_pattern;
+
+The keyword fast_pattern modifies the content previous to it.
+
+.. image:: fast-pattern/fast_pattern.png
+
+Fast-pattern can also be combined with all previous mentioned
+keywords, and all mentioned HTTP-modifiers.
+
+fast_pattern:only
+-----------------
+
+Sometimes a signature contains only one content. In that case it is
+not necessary Suricata will check it any further after a match has
+been found in MPM. If there is only one content, the whole signature
+matches. Suricata notices this automatically. In some signatures this
+is still indicated with 'fast_pattern:only;'. Although Suricata does
+not need fast_pattern:only, it does support it.
+
+Fast_pattern: 'chop'
+--------------------
+
+If you do not want the MPM to use the whole content, you can use
+fast_pattern 'chop'.
+
+For example::
+
+  content: “aaaaaaaaabc”; fast_pattern:8,4;
+
+This way, MPM uses only the last four characters.
--- a/doc/sphinx/fast-pattern/fast_pattern.png
+++ b/doc/sphinx/fast-pattern/fast_pattern.png