Kernel-Level Threat Hunting: Detecting What EDR Misses

You run a threat hunt. Your EDR shows a clean process tree. Network logs show expected connections. File events look normal. But something is still wrong.

The problem isn't your tooling or your hunting skills. It's the telemetry layer. EDR operates in user space, collecting events after the kernel has already processed them. By the time your EDR agent sees a process spawn or network connection, multiple layers of abstraction have filtered what actually happened.

Kernel-level telemetry captures syscalls at the boundary between user space and kernel space. Before any filtering. Before any obfuscation. This matters more than most threat hunters realize.

What User-Space Telemetry Actually Captures

EDR agents run as user-space processes. They hook into the operating system through documented APIs: ETW on Windows, audit subsystem on Linux. These APIs provide event streams that EDR vendors process and forward.

The issue is selection bias. The OS decides what to surface through these APIs. On Linux, for example, the audit subsystem logs process execution through execve(), but it doesn't capture every fork() or clone() call. An attacker using process injection through ptrace() or memory mapping through memfd_create() generates syscalls that never appear in standard audit logs.

Windows ETW has similar gaps. ETW providers emit events for high-level operations, but low-level memory manipulation through NtWriteVirtualMemory or thread creation through NtCreateThreadEx often bypass event generation entirely. The kernel processes these operations, but user-space monitoring misses them.

Your EDR sees a curated view of system activity. It's extremely useful for detecting known patterns. But it's not the raw execution context.

Syscall-Level Threat Hunting Techniques

Kernel telemetry captures every syscall. Not just process creation or file access, but the operations that happen before an attacker establishes persistence or exfiltrates data.

Consider fileless malware that uses memfd_create() to create an anonymous file descriptor in memory, writes executable code to it, then calls execve() against the file descriptor. Your EDR sees a process spawn. The parent looks legitimate. The command line is clean. But you have no file path because there is no file.

At the syscall layer, you see the sequence: memfd_create(), write() with executable bytecode, execve() with fd:// path. This pattern is unambiguous. It's also invisible to user-space monitoring.

Or take credential dumping through /proc filesystem reads. An attacker process opens /proc/self/mem or /proc/[pid]/mem to read memory from other processes. This bypasses traditional file access controls because it's not file I/O, it's memory access through the proc interface. User-space EDR logs might show a generic file read. The syscall trace shows open() against a process memory file followed by lseek() and read() at specific offsets. That's credential theft, not log analysis.

Network evasion follows the same principle. An attacker using raw sockets or AF_PACKET sockets to bypass the TCP/IP stack generates socket(), bind(), sendto() syscalls with protocol parameters that user-space network monitoring never sees. Your EDR shows normal TCP connections. The kernel trace shows raw packet manipulation.

Baseline Deviation at the Kernel Boundary

Traditional threat hunting techniques rely on identifying anomalies in user behavior, process relationships, or network patterns. Kernel telemetry extends this to system call sequences.

Every legitimate application has a syscall profile. A web server primarily uses accept(), read(), write(), sendfile(). A database uses mmap(), fsync(), pread(), pwrite(). Deviation from these profiles indicates either a new code path or malicious activity.

The value increases with baseline duration. At 7 days, syscall baselines have enough data to identify rare but legitimate operations. At 30 days, false positive rates drop to 0.42%. At 180 days, you're at 0.18%. Compare that to signature-based detection, which has no learning curve but also no environmental context.

This works across three dimensions simultaneously. User baselines capture what individual operators do. Role baselines capture what groups of users in similar functions do. Infrastructure baselines capture what specific clusters or node groups do. An anomaly in all three dimensions is almost never legitimate.

Real example: A developer account spawns a shell, runs kubectl, creates a pod, then that pod immediately makes outbound connections to an external IP. User baseline: this account has never touched production. Role baseline: developers in this group don't deploy to production. Infrastructure baseline: this cluster doesn't run customer-facing workloads. The combination flags it instantly.

Detection at the Pre-Encryption Boundary

Application-layer encryption is everywhere now. TLS 1.3 encrypts the SNI field. DNS over HTTPS encrypts queries. Even internal service meshes use mTLS by default. User-space network monitoring sees encrypted payloads and has to rely on metadata analysis.

Syscalls capture data before encryption. When an application calls write() or send() on a socket, the data is plaintext. The kernel receives it, processes it, then hands it to the network stack where TLS encryption occurs. Kernel telemetry at the syscall boundary sees the plaintext.

This doesn't mean you're breaking encryption or intercepting credentials. It means you see the actual data size, timing, and patterns before they're obscured. A data exfiltration operation that sends 50MB through an HTTPS POST shows up as a series of write() calls with specific byte counts at specific intervals. The application might be curl or a custom script. The syscall pattern is identical either way.

The same applies to file operations. An attacker encrypting files for ransomware has to read the original file, encrypt it in memory, then write it back. User-space monitoring sees file modification events. Syscall telemetry sees read() followed by computation (visible as CPU time without syscalls) followed by write() with different byte counts. That's encryption behavior.

The 5% Coverage Problem

Most organizations have comprehensive security stacks. EDR on endpoints. SIEM aggregating logs. Network monitoring at the perimeter. Cloud security posture management scanning configurations. This covers 95% of the attack surface.

The 5% is what happens at the kernel layer before user-space agents see it. Process injection. Memory manipulation. Syscall-level persistence. Pre-encryption data access. These techniques don't appear in your existing telemetry because they happen at a layer your current tools don't reach.

Kernel-level monitoring isn't a replacement for EDR or network security. It's coverage for the gap. You still need endpoint protection for malware detection. You still need network monitoring for traffic analysis. But when an attacker operates below the user-space abstraction layer, you need syscall visibility.

The practical reality is that advanced attackers already know about this gap. They specifically target it. Living-off-the-land techniques, fileless malware, memory-only execution, they all exploit the fact that user-space monitoring has a blind spot at the kernel boundary.

Closing that gap requires telemetry at the syscall layer. Not as a replacement for your existing stack, but as the foundation layer that captures what everything else misses. The performance overhead is minimal (0.1% CPU at 1M QPS), the latency impact is negligible (actually reduces latency by an average of 5.26%), and the operational complexity is manageable with proper eBPF tooling.

Your threat hunting techniques become significantly more effective when you can see the raw execution context. Not filtered through APIs. Not abstracted by the OS. Just the actual syscalls that represent what's happening on the system. That's where the remaining 5% lives.

Threat Hunting at the Kernel: What EDR Telemetry Leaves Out

What User-Space Telemetry Actually Captures

Syscall-Level Threat Hunting Techniques

Baseline Deviation at the Kernel Boundary

Detection at the Pre-Encryption Boundary

The 5% Coverage Problem