Lateral Movement Detection: The Kernel-Level Visibility Gap

A SOC analyst sees 47 SSH connections from a developer workstation to production databases in one hour. Normal or malicious? The network logs show ports, IPs, and packet counts. They don't show that 46 of those connections came from the developer's IDE running automated schema migrations, and one came from a python reverse shell spawned by a compromised npm package.

Network-layer tools see the connection. They miss the context. That's the fundamental problem with detecting lateral movement using traditional network monitoring.

The Credential Problem in Lateral Movement Detection

Lateral movement isn't about exploiting vulnerabilities. It's about using legitimate credentials to move between systems after initial compromise. An attacker with valid SSH keys, Kerberos tickets, or service account tokens looks identical to a legitimate user at the network layer.

Consider a real attack pattern: an attacker compromises a Jenkins server, extracts AWS credentials from environment variables, uses those credentials to access S3 buckets, finds database credentials in configuration files, then pivots to RDS instances. Every single connection uses valid authentication. Network logs show authorized traffic to authorized destinations.

Traditional network detection relies on signatures, anomaly detection on traffic patterns, or geo-fencing. These work when attackers behave abnormally. They fail when attackers use stolen credentials to behave exactly like the legitimate owner of those credentials.

The detection gap exists because network tools operate at OSI Layer 3 and 4. They see packets and flows. They don't see which process initiated the connection, what executable spawned that process, what user context it runs under, or what sequence of events led to that network call.

What Kernel-Level Visibility Actually Captures

The kernel sits at the syscall boundary. Every network connection, file access, and process execution passes through syscalls before anything happens. This creates a complete causal chain that network tools can't see.

When a process opens a network socket, the kernel knows the process ID, parent process ID, executable path, command-line arguments, user ID, group ID, container namespace, and every file descriptor that process has open. It knows if this process was spawned by systemd on boot, by a user's interactive shell, or by a web server handling a request.

For lateral movement detection, this context transforms noise into signal. The kernel sees that ssh wasn't launched by a user typing in a terminal. It was launched by a PHP script running under www-data after that script wrote a binary to /tmp, executed it, and the binary made network calls to a pastebin service before launching ssh.

The sequence matters. Network logs show the ssh connection. Kernel telemetry shows the entire attack chain.

An eBPF-based agent captures this at the syscall level with sub-millisecond latency. In production environments running at 1 million requests per second, overhead averages 0.1% CPU. Detection latency averages 98 milliseconds. This isn't theoretical. These are measured numbers from deployments monitoring billions of events daily.

Behavioral Baselines Across Three Axes

Detecting lateral movement requires understanding normal behavior. But "normal" is different for users, for roles, and for infrastructure clusters.

A database administrator SSH'ing to production databases is normal. A developer doing the same thing might be normal or might be an incident, depending on whether that specific developer has a legitimate reason to access production. A marketing team member doing it is definitely abnormal.

Role-based baselines capture this. The system learns that users with the "sre" role regularly connect to production Kubernetes nodes at 3am during incident response. Users with the "data-analyst" role connect to analytics databases during business hours using specific query tools.

Infrastructure baselines capture cluster-specific behavior. Production environments have different normal patterns than staging. A Jenkins agent making outbound HTTPS connections to GitHub is normal. The same agent making SSH connections to production databases is not, even though both use valid credentials.

User-specific baselines catch the outliers. Alice normally connects to three specific databases from two specific workstations using the Postgres client. When her credentials connect from a fourth database to transfer tables to an external IP, that's an anomaly even if her role generally permits database access.

The power comes from correlating all three simultaneously. An event that's normal for the role but abnormal for the specific user and infrastructure context gets flagged. Network tools can't build these baselines because they don't see the user or process context.

The Detection Timeline

False positive rates matter for production security tools. Alert fatigue kills detection programs faster than sophisticated attackers.

At seven days of baseline learning, false positive rates measure 0.69%. At 30 days, they drop to 0.42%. At 180 days, 0.18%. The system gets more accurate as it observes more normal behavior patterns.

This timeline reflects a fundamental reality about behavioral detection: you need to see enough legitimate activity to distinguish it from malicious activity. Network anomaly detection faces the same challenge, but without process and user context, the baseline is noisier.

Lateral movement attacks often happen fast. The median dwell time before detection in ransomware incidents is 11 days, but the actual lateral movement phase takes hours. An attacker who compromises a workstation at 2pm might pivot to domain controllers by 4pm.

Detection latency of 98 milliseconds means the system flags suspicious activity in near real-time. The delay isn't in capturing events. It's in the analyst investigating the alert. Kernel-level telemetry provides the full attack chain for that investigation.

Why This Complements Network Security

Kernel-level visibility doesn't replace network security tools. It covers the gaps they leave.

Network tools excel at detecting C2 beacons, data exfiltration to known-bad IPs, and traffic pattern anomalies. They can block connections before they complete. They operate at line rate on encrypted traffic without needing to decrypt it.

But they can't see that the process initiating that connection was spawned by a cron job that was modified yesterday, or that the connection is using credentials stolen from a config file that the attacker accessed by exploiting a path traversal vulnerability in a web application.

The combination matters. Network tools provide perimeter defense and traffic analysis. Kernel agents provide host-level context and attack chain reconstruction. Together, they catch lateral movement that either tool would miss alone.

In practice, this means deploying kernel agents as a DaemonSet across Kubernetes clusters or via systemd on VMs, feeding telemetry into the same SIEM that ingests network logs. When a network alert fires, the kernel telemetry shows what spawned the connection. When a kernel alert fires on unusual process behavior, network logs show where that process tried to connect.

The visibility layer you're missing isn't at the network edge. It's at the syscall boundary on every host, watching every process make every call before application-layer encryption obscures what's actually happening. That's where lateral movement becomes visible again.

Lateral Movement Detection: Why Network-Layer Tools Miss Half the Story

The Credential Problem in Lateral Movement Detection

What Kernel-Level Visibility Actually Captures

Behavioral Baselines Across Three Axes

The Detection Timeline

Why This Complements Network Security