gVisor, the missing piece of container security

The container, beyond appearances
#

Working in a containerised world, especially with Kubernetes, implies understanding that containerisation relies on two pillars:

Namespaces: to partition what the container sees;
Cgroups: to limit what the container consumes, notably CPU and memory.

However, a structural limit exists within this model: the sharing of the operating system kernel during system calls.

In a standard environment, typically using runc as the execution layer, containers perform system calls (syscalls) on the same kernel: the host kernel. Even with isolation, this shared dependency remains.

This fact explains why virtual machines provide stronger isolation than containerisation. The hypervisor enables independent resource allocation, and each machine possesses its own operating system.

A malicious application exploiting a kernel vulnerability could compromise the entire node of a Kubernetes cluster.

gVisor enters the scene here. This open source project, licensed under Apache 2.0 and originating from Google, introduces a radically different approach: providing each container with its own virtual kernel.

Beyond its effectiveness in hardening containerised environment security, Google Cloud extensively uses this product within services such as App Engine, Cloud Run, and Cloud Functions. Furthermore, certain CNCF security solutions, like Falco, use it.

System calls: the Achilles heel of containerisation
#

To understand why default isolation proves insufficient, looking deeper becomes necessary: at the Linux kernel level.

A containerised application remains, ultimately, a simple process. When needing to perform concrete actions (writing a file, opening a socket, allocating memory), it cannot act alone. It must request permission from the host kernel via a defined interface: system calls (syscalls).

Hundreds of system calls exist! Especially within Linux for managing files (open, write, close), processes (fork, exit) or networking (socket, bind, connect). Consequently, the attack surface appears quite large.

To counter this, several filtering mechanisms exist. Seccomp and AppArmor provide such functionality, well-known to CKS enthusiasts as they feature in the certification curriculum.

Seccomp, the function firewall
#

Seccomp, short for Secure Computing Mode, acts as a Linux kernel security mechanism allowing restriction of system calls a process can perform. It functions as a whitelist.

Here is a minimalist example blocking everything ("defaultAction": "SCMP_ACT_ERRNO") except read, write, and close operations:

{
    "defaultAction": "SCMP_ACT_ERRNO",
    "architectures": [
        "SCMP_ARCH_X86_64",
    ],
    "syscalls": [
        {
            "names": [
                "read",
                "write",
                "close",
            ],
            "action": "SCMP_ACT_ALLOW"
        }
    ]
}

AppArmor: resource access control
#

AppArmor represents a security module protecting resources (files, paths). It defines what the process allows manipulating.

Below sits a simplified profile denying all file write operations:

#include <tunables/global>

profile deny-write flags=(attach_disconnected) {
  #include <abstractions/base>

  file,

  deny /** w,
}

These two tools constitute the first line of defence. They reduce risk but do not totally eliminate it. Furthermore, configuring these tools can prove extremely complex, especially when desiring that each application possesses its own profile, making the task time-consuming.

The solution: A kernel in user-space
#

gVisor changes the game by introducing an interception layer. Instead of allowing the application to communicate with the host kernel, gVisor acts as a sort of intermediary between the two.

The architecture relies on two key components:

Sentry: An emulated Linux-compatible kernel, written in Go, running in user-space. The application believes it communicates with the real kernel, but in reality, it interacts with the Sentry;
Gofer: The component managing file access, preventing the Sentry from directly accessing the host disk.

These two components communicate via the 9P (Plan 9 Filesystem Protocol) network protocol. Originally created for the Plan 9 from Bell Labs operating system, often viewed as the spiritual successor to Unix, this protocol allows the Gofer to operate as an agent and interact with the file system securely.

If an attacker compromises the application and attempts an escape, they become trapped in the Sentry. Since the Sentry runs in user-space (not kernel-space) and utilises Go (managing memory safely), the attack surface decreases drastically.

Getting hands dirty
#

Now it’s time to use gVisor within a Kubernetes cluster. I’ve taken a very basic example, a fairly (if not very) old nginx image, which could cause issues due to its numerous vulnerabilities.

One of the main advantages of gVisor is that you do not need to update your container images or even rewrite your applications.

gVisor installs as an alternative runtime (via the runsc binary), compatible with the OCI (Open Container Initiative) standard, which must be installed on each node of the cluster.

In Kubernetes, the RuntimeClass object configures this runtime:

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: gvisor
handler: runsc

Once the RuntimeClass is configured, the Pod definition is configured with the runtimeClassName field:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: risky-app
spec:
  template:
    spec:
      runtimeClassName: gvisor 
      containers:
      - name: nginx
        image: nginx:1.11-alpine

What are the visible differences?
#

Everything is working correctly, the nginx web server is not reporting any issues in its logs.

kubectl logs risky-app-656887b7b7-kcxlr

kubectl get po risky-app-656887b7b7-kcxlr
NAME                         READY   STATUS    RESTARTS   AGE
risky-app-656887b7b7-kcxlr   1/1     Running   0          63m

The first difference appears in the dmesg command, displaying kernel messages:

kubectl exec -it risky-app-656887b7b7-kcxlr -- /bin/sh -c dmesg
[   0.000000] Starting gVisor...
[   0.208482] Moving files to filing cabinet...
[   0.302747] Checking naughty and nice process list...
[   0.617373] Checking naughty and nice process list...
[   0.935288] Segmenting fault lines...
[   0.966793] Singleplexing /dev/ptmx...
[   1.411706] Digging up root...
[   1.816345] Adversarially training Redcode AI...
[   1.896328] Accelerating teletypewriter to 9600 baud...
[   2.015801] Letting the watchdogs out...
[   2.234311] Searching for socket adapter...
[   2.689240] Ready!

Clearly, gVisor operates the container, notably evidenced by the first line: Starting gVisor....

Earlier, I mentioned the user space kernel from Sentry, so now is the time to run the uname command:

With gVisor:

kubectl exec -it risky-app-656887b7b7-kcxlr -- /bin/sh -c 'uname -a'
Linux risky-app-656887b7b7-kcxlr 4.4.0 #1 SMP Sun Jan 10 15:06:54 PST 2016 x86_64 Linux

Without gVisor:

kubectl exec -it standard-app-67875cd8f-f22l2 -- /bin/sh -c 'uname -a'
Linux standard-app-67875cd8f-f22l2 6.18.2-talos #1 SMP Fri Jan  2 15:04:30 UTC 2026 x86_64 Linux

In the version without gVisor, host system kernel information remains clearly visible, potentially indicating vulnerabilities to an attacker.

No magic, compromises exist, as always…
#

Increased security cannot come without compromise! This is particularly true in the case of overhead.

Since the Sentry must intercept and process each system call (and sometimes call the host kernel itself), execution takes more time than a direct call.

Additionally, gVisor does not implement all system calls. You can find a list of limitations at this address.

Generally, anything strongly linked to hardware (network cards, GPUs, TPUs, etc.) will be subject to potential malfunctions or require adaptations.

Nevertheless, in most cases within a microservice environment, several programming languages undergo regular testing to ensure compatibility with gVisor.

The official documentation mentions Python, Java, Node.js, PHP, and Go. In reality, provided that languages do not perform unusual system calls, compatibility should exist without exception.

If you are in doubt, you can refer to the compatibility list based on your architecture: amd64 or arm64.

gVisor is a great choice for your containerised applications, especially those where code auditing is impossible, or those running with old image versions (like the example above), deprecated binaries, or containing several vulnerabilities.

Time for a recap
#

To summarise the information presented, the following table compares runc, found in standard containerisation engines such as Docker or containerd, with runsc, the gVisor binary.

Criteria	runc (Standard)	runsc (gVisor)
Philosophy	Logical isolation: “Everything is allowed unless forbidden”	Strong isolation: “Everything is intercepted and emulated”
Kernel architecture	Shared: Application uses host kernel directly	Dedicated (user-space): Application uses a virtual kernel (Sentry) written in Go
Defence mechanism	Filtering: Not by default but Seccomp/AppArmor can block calls known as dangerous	Interception: Processes calls internally. The host kernel remains invisible to the application
Attack surface	Large: Vulnerable to kernel flaws	Minimal: Escape compromises the Sentry, not the physical host
CPU performance	Native: Zero overhead	Overhead: Context switch cost during system calls
File system	Direct: Native access to mount points	Proxied (Gofer): Access via 9P protocol, adding slight latency
Memory footprint	Null: Strict application consumption	Fixed: ~15 to 20 MB additional per Pod (for the Sentry and Gofer)
Visibility (Host)	Transparent: The host sees container processes	Opaque: The host sees only `runsc` processes, not the sandbox internals
Compatibility	~100%: Supports all workloads	High: Supports the majority of apps, but some system calls are not possible

Conclusion
#

gVisor stands out as the ideal candidate for reducing the attack surface and establishing a sandbox for container execution.

Although it cannot address every container use case, the solution supports the majority of applications, drastically increasing security within Kubernetes clusters.

Its rapid implementation and ease of use make gVisor a robust, secure alternative to runc.

The container, beyond appearances#

System calls: the Achilles heel of containerisation#

Seccomp, the function firewall#

AppArmor: resource access control#

The solution: A kernel in user-space#

Getting hands dirty#

What are the visible differences?#

No magic, compromises exist, as always…#

Time for a recap#

Conclusion#

Related