Issue: 2816
This commit increase the number of VLAN layers supported by Suricata
from 2 to 3. 3-layers are dubbed "Q-in-Q-in-Q".
Note that 3 layers are not compliant with any existing standard but are
often seen in larger deployments.
Use a fixed type of max_pending_packets instead of intmax_t which can
differ based on the platform/standard library.
Should also prevent lints about possible arithmetic overflow.
When reading a loopback interface, packets are received twice: Once as
outgoing packets and once as incoming packets.
Libpcap ignores outgoing packets. With current versions of Suricata, sniffing
a single http://localhost:80 request over lo using the af-packet source
minimally shows two syn packets, two synacks and twice as many packets in
the stats entries than you'd expect when running tcpdump or Wireshark.
Issue: 5718
This commit switches the majority of time handling to a new type --
SCTime_t -- which is a 64 bit container for time:
- 44 bits -- seconds
- 20 bits -- useconds
Tested on Fedora 37 with clang 15.
app-layer.c:1055:27: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
void AppLayerSetupCounters()
^
void
app-layer.c:1176:29: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
void AppLayerDeSetupCounters()
^
void
2 errors generated.
Each module (thread) updates its status to indicate running.
Main thread awaits for all threads to be in a running state
before continuing the initialisation process
Implements feature 5384
(https://redmine.openinfosecfoundation.org/issues/5384)
Update DROP action handling in tunnel packets. DROP/REJECT action is set
to outer (root) and inner packet.
Check action flags both against outer (root) and inner packet.
Remove PACKET_SET_ACTION macro. Replace with RESET for the one reset usecase.
The reason to remove is to make the logic easier to understand.
Reduce scope of RESET macros.
Rename PacketTestAction to PacketCheckAction except in unittests. Keep
PacketTestAction as a wrapper around PacketCheckAction. This makes it
easier to trace the action handling in the real code.
Fix rate_filter setting actions directly.
General code cleanups.
Bug: #5571.
AF-Packet bypass function in some situations allocates EBPF bypass data
for an already bypassed flow and assigns it to the flow without any checks
Issue: #5368
As Suricata is not supporting pcap-ng we have to stick with one single
datalink type for the capture if ever we want to do pcap logging.
Assuming this, this patch introduces a function to set the link
type globally. This will be used with pcap conditional logging
to get the logging of TCP segments with the correct link type.
cppcheck:
src/source-af-packet.c:1762:19: warning: Size of pointer 'v2' used instead of size of its data. This is likely to lead to a buffer overflow. You probably intend to write 'sizeof(*v2)'. [pointerSize]
ptv->ring.v2 = SCMalloc(ptv->req.v2.tp_frame_nr * sizeof (union thdr *));
^
src/source-af-packet.c:1767:26: warning: Size of pointer 'v2' used instead of size of its data. This is likely to lead to a buffer overflow. You probably intend to write 'sizeof(*v2)'. [pointerSize]
memset(ptv->ring.v2, 0, ptv->req.v2.tp_frame_nr * sizeof (union thdr *));
^
scan-build:
CC source-af-packet.o
source-af-packet.c:1762:24: warning: Result of 'malloc' is converted to a pointer of type 'char', which is incompatible with sizeof operand type 'union thdr *' [unix.MallocSizeof]
ptv->ring.v2 = SCMalloc(ptv->req.v2.tp_frame_nr * sizeof (union thdr *));
^~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~
./util-mem.h:35:18: note: expanded from macro 'SCMalloc'
^~~~~~
1 warning generated.
Bug: #5291.
Update the packet payload after decode, instead of during IPS send.
This means the updates happen in the capture thread, and the VLAN header
is available to logging as well.
Ticket: #4805.
Ticket: #4796.
V2 (for IDS and IPS) and V3 (for IDS) are widely supported. V2 was introduced
in 2008, so we can safely assume that all systems can run V2+.
PacketSetData() can't fail unless the input pointer is NULL, which is
impossible from the af-packet paths calling it. Remove error check to
avoid possible branching.
The AFPSwitchState function would close the socket and free the
other resources when the interface went down _and_ the ref cnt was
0. However in autofp mode it was common to get to this point while
packets were still processed in the autofp worker threads, meaning
the ref cnt would not be 0. On the interface coming back up the
initialization code would overwrite the socket and rings, leading
to resource leaks.
Socket ref cnt is decremented from the v2 release callback. If the
callback would get to ref cnt 0, the packet would not be released
in the kernel, but it would (possibly) close the socket if the
iface was down, but not free other resources.
This patch changes the logic to first release the packet to the
kernel and then decrement the ref cnt and it makes the main receive
loop the only one responsible for opening and closing sockets. Wait
with closing the socket and rings until the ref count is 0, which can
happen after AFPSwitchState is called due to packets still being
processed by autofp worker threads.
Bug: #4803.
Tpacket v3 only supports workers mode, which means the packet that would
reference a socket won't leave the thread. Therefore keeping a ref count
on the socket is not needed.
This patch removes the per packet reference count increment. The decrement
was missing, so this fixes the ref cnt handling so that after a iface up/
down capture can recover.
It should also lead to a minor performance increase as we avoid a round
of atomic operations per packet.
Bug: #4804.
Bug: #4801.
The Suricata AF_PACKET code opens a socket per thread, then after some minor
setup enters a loop where the socket is poll()'d with a timeout. When the
poll() call returns a non zero positive value, the AF_PACKET ring will be
processed.
The ringbuffer processing logic has a pointer into the ring where we last
checked the ring. From this position we will inspect each frame until we
find a frame with tp_status == TP_STATUS_KERNEL (so essentially 0). This
means the frame is currently owned by the kernel.
There is a special case handling for starting the ring processing but
finding a TP_STATUS_KERNEL immediately. This logic then skip to the next
frame, rerun the check, etc until it either finds an initialized frame or
the last frame of the ringbuffer.
The problem was, however, that the initial uninitialized frame was possibly
(likely?) still being initialized by the kernel. A data race between the
notification through the socket (the poll()) and the updating of the
`tp_status` field in the frame could lead to a valid frame getting skipped.
Of note is that for example libpcap does not do frame scanning. Instead it
simply exits it ring processing loop. Also interesting is that libpcap uses
atomic loads and stores on the tp_status field.
This skipping of frames had 2 bad side effects:
1. in most cases, the buffer would be full enough that the frame would
be processed in the next pass of the ring, but now the frame would
out of order. This might have lead to packets belong to the same
flow getting processed in the wrong order.
2. more severe is the soft lockup case. The skipped frame sits at ring
buffer index 0. The rest of the ring has been cleared, after the
initial frame was skipped. As our pass of the ring stops at the end
of the ring (ptv->frame_offset + 1 == ptv->req.v2.tp_frame_nr) the code
exits the ring processing loop at goes back to poll(). However, poll()
will not indicate that there is more data, as the stale frame in the
ring blocks the kernel from populating more frames beyond it. This
is now a dead lock, as the kernel waits for Suricata and Suricata
never touches the ring until it hears from the kernel.
The scan logic will scan the whole ring at most once, so it won't
reconsider the stale frame either.
This patch addresses the issues in several ways:
1. the startup "discard" logic was fixed to not skip over kernel
frames. Doing so would get us in a bad state at start up.
2. Instead of scanning the ring, we now enter a busy wait loop
when encountering a kernel frame where we didn't expect one. This
means that if we got a > 0 poll() result, we'll busy wait until
we get at least one frame.
3. Error handling is unified and cleaned up. Any frame error now
returns the frame to the kernel and progresses the frame pointer.
4. If we find a frame that is owned by us (TP_STATUS_USER_BUSY) we
yield to poll() immediately, as the next expected status of that
frame is TP_STATUS_KERNEL.
5. the ring is no longer processed until the "end" of the ring (so
highest index), but instead we process at most one full ring size
per run.
6. Work with a copy of `tp_status` instead of accessing original touched
also by the kernel.
Bug: #4785.
When testing for fanout support a cluster-id of 1 was always being
used instead of the configured cluster-id. This limited fanout
support to only one Suricata instance.
Instead of hardcoding an ID of 1, use the configured cluster-id.
Also make cluster_id a uint16_t instead of an int in AFPThreadVars.
Redmine issue:
https://redmine.openinfosecfoundation.org/issues/3419
Until this point the SC_ATOMIC_ADD macro pointed to a 'add_fetch'
intrinsic. This patch changes it to a 'fetch_add'.
There are 2 reasons for this:
1. C11 stdatomics.h has only 'atomic_fetch_add' and no 'add_fetch'
So this patch prepares for adding support for C11 atomics.
2. It was not consistent with SC_ATOMIC_SUB, which did use 'fetch_sub'
and not 'sub_fetch'.
Most callers are not using the return value, so these are unaffected.
The callers that do use the return value are updated.