After a gap in a file transaction, the file tracker is truncated. However
this did not clear any stored out of order chunks from memory or stop more
chunks to be stored, leading to accumulation of a large number of chunks.
This patches fixes this be clearing the stored chunks on trunc. It also
makes sure no more chunks are stored in the tracker after the trunc.
Bug: #5781.
Issue: 2497
This changeset provides subsystem and module identifiers in the log when
the log format string contains "%S". By convention, the log format
surrounds "%S" with brackets.
The subsystem name is generally the same as the thread name. The module
name is derived from the source code module name and usually consists of
the first one or 2 segments of the name using the dash character as the
segment delimiter.
Remove the distinction between the C template protocol "template" and
the Rust template protocol "template-rust" and make the Rust parser
simply template now that we no longer have support to generate a C
protocol template.
Remove the app-layer-PROTO stub for Rust based parsers. It is no longer
needed as Rust parsers now contain the registration function in Rust.
Ticket: 4939
Use the lzma-rs crate for decompressing swf/lzma files instead of
the lzma decompressor in libhtp. This decouples suricata from libhtp
except for actual http parsing, and means libhtp no longer has to
export a lzma decompression interface.
Ticket: #5638
Fuzzing highlighted an issue where a command sequence on the same file
id triggered a logging issue:
file data for id N
close id N
file data for id N
If this happened in a single blob of data passed to the parser, the
existing file tx would be reused, the file "reopened", confusing the
file logging logic. This would trigger a debug assert.
This patch makes sure a new file tx is created for the file data
coming in after the first file tx is closed.
Bug: #5567.
It was not enough to set Cursor position to 0,
also its inner Vec should be cleared.
This way, a new input gets written at the beginning of the
Cursor and its inner Vec...
Ticket: #5691
Some fields that were previously strings are not always value UTF-8
data, instead the protocol specification refers to them as strings of
bytes, so in other words byte arrays.
Currently fields converted are:
- client_version
- info_hash
- response.id
- request.id
- nodes
- token
These variants, in particular the Brotli one can be large at over 2500
bytes which is allocated no matter which decompressor is being used.
Gzip comes in at over 500 bytes. Box deflate for consistency.
Where possible mark the relevant functions unsafe. Otherwise suppress
the warning for now as this pattern is supposed to be a safe API around
an unsafe one. Might need some further investigation, but in general the
"guarantee" here is provided from the C side.
Use .is_some() and .is_none() instead of comparing against None.
Comparing against None requires a value to impl PartialEq, is_none() and
is_some() do not and are more idiomatic.
This patch updates the NT status code definition to use the status
definition used on Microsoft documentation website. A first python
script is building JSON object with code definition.
```
import json
from bs4 import BeautifulSoup
import requests
ntstatus = requests.get('https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55')
ntstatus_parsed = BeautifulSoup(ntstatus.text, 'html.parser')
ntstatus_parsed = ntstatus_parsed.find('tbody')
ntstatus_dict = {}
for item in ntstatus_parsed.find_all('tr'):
cell = item.find_all('td')
if len(cell) == 0:
continue
code = cell[0].find_all('p')
description_ps = cell[1].find_all('p')
description_list = []
if len(description_ps):
for desc in description_ps:
if not desc.string is None:
description_list.append(desc.string.replace('\n ', ''))
else:
description_list = ['Description not available']
if not code[0].string.lower() in ntstatus_dict:
ntstatus_dict[code[0].string.lower()] = {"text": code[1].string, "desc": ' '.join(description_list)}
print(json.dumps(ntstatus_dict))
```
The second one is generating the code that is ready to be inserted into the
source file:
```
import json
ntstatus_file = open('ntstatus.json', 'r')
ntstatus = json.loads(ntstatus_file.read())
declaration_format = 'pub const SMB_NT%s:%su32 = %s;\n'
resolution_format = ' SMB_NT%s%s=> "%s",\n'
declaration = ""
resolution = ""
text_max = len(max([ntstatus[x]['text'] for x in ntstatus.keys()], key=len))
for code in ntstatus.keys():
text = ntstatus[code]['text']
text_spaces = ' ' * (4 + text_max - len(text))
declaration += declaration_format % (text, text_spaces, code)
resolution += resolution_format % (text, text_spaces, text)
print(declaration)
print('\n')
print('''
pub fn smb_ntstatus_string(c: u32) -> String {
match c {
''')
print(resolution)
print('''
_ => { return (c).to_string(); },
}.to_string()
}
''')
```
Bug #5412.
Introduce AppLayerTxData::file_tx as direction(s) indicator for transactions.
When set to 0, its not a file tx and it will not be considered for file
inspection, logging and housekeeping tasks.
Various tx loop optimizations in housekeeping and output.
Update the "file capable" app-layers to set the fields based on their
directional file support as well as on the traffic.
Update APIs to store files in transactions instead of the per flow state.
Goal is to avoid the overhead of matching up files and transactions in
cases where there are many of both.
Update all protocol implementations to support this.
Update file logging logic to account for having files in transactions. Instead
of it acting separately on file containers, it is now tied into the
transaction logging.
Update the filestore keyword to consider a match if filestore output not
enabled.