Experimental — Hotcell is under active development and should not be used in production.

Dual-Layer Isolation.

Hotcell provides two layers of isolation with pluggable VMM backends. Layer 1 (VM isolation) runs on all platforms via libkrun or Firecracker. Layer 2 (VMM process jail) adds defense-in-depth on Linux — so even if an attacker escapes the VM, they land in a restricted process with no useful capabilities, a minimal filesystem view, and a tight syscall filter. The jailer has been verified end-to-end on Linux+KVM with seccomp in Kill mode. TSI networking has been verified on both platforms. 127+ tests including 28 security boundary tests.

swap_horiz

VMM Backends

Hotcell supports pluggable VMM backends behind a common VmmBackend trait. Everything above the backend — OCI pipeline, server, result protocol, streaming — is backend-agnostic. Select per-request via the backend parameter.

hotcell::backend::VmmBackend
/// Pluggable VMM backend trait.
/// Everything above the backend -- OCI pipeline, server,
/// result protocol -- is backend-agnostic.
#[async_trait]
pub trait VmmBackend: Send + Sync {
    /// Prepare the rootfs for this backend (directory or ext4 image).
    async fn prepare_rootfs(
        &self, rootfs_dir: &Path, config: &VmConfig,
    ) -> Result<RootfsHandle, HotcellError>;

    /// Run a VM and return the result.
    async fn run(
        &self, config: &VmConfig, rootfs: &RootfsHandle,
    ) -> Result<VmResult, HotcellError>;

    /// Run a VM with streaming console output.
    async fn run_streaming(
        &self, config: &VmConfig, rootfs: &RootfsHandle,
        tx: mpsc::Sender<StreamEvent>,
    ) -> Result<VmResult, HotcellError>;
}

// Two implementations in separate crates:
// - hotcell_libkrun::LibkrunBackend  (macOS + Linux)
// - hotcell_firecracker::FirecrackerBackend  (Linux only)

Backend Comparison

libkrun (default)
Firecracker
Platforms
macOS + Linux
Linux only
VMM process
Embedded (FFI, takes over worker process)
Separate binary (REST API over UDS)
Rootfs format
Directory (virtiofs)
ext4 block device image
Networking
TSI (inet, full, disabled)
TAP (future)
Shared directories
virtiofs mount tags
Not supported
Guest kernel
Built into libkrunfw
Separate vmlinux image
Host sandboxing
hotcell-jailer (namespaces, seccomp, Landlock, cgroups)
Firecracker jailer (optional)
Result collection
Read from virtiofs-shared file
Read from ext4 image after VM exit
Best for
Development, macOS, low-latency single-tenant
Production Linux, multi-tenant, stronger isolation
Default

libkrun

  • check_circle Embedded VMM via FFI — worker calls krun_start_enter()
  • check_circle virtiofs for rootfs and shared directory access
  • check_circle TSI (Transparent Socket Impersonation) networking — verified on both platforms
  • check_circle hotcell-jailer sandboxes the VMM process on Linux
Linux only

Firecracker

  • check_circle Separate VMM binary — configured via REST API over Unix socket
  • check_circle ext4 block device images created from OCI rootfs
  • check_circle Firecracker's own jailer — battle-tested in AWS Lambda
  • check_circle Serial console output streamed from file for real-time monitoring
memory

Layer 1: VM Isolation

Kernel Virtualization

Each execution runs inside its own virtual machine with a separate Linux kernel. With libkrun, the kernel is compiled into libkrunfw. With Firecracker, a separate vmlinux image is used. Either way, this eliminates the shared-kernel attack surface found in traditional containerization.

Guest Properties

  • check_circle Own kernel, process table, and memory space
  • check_circle No access to the host filesystem (libkrun: rootfs + shared directories via virtio-fs; Firecracker: ext4 block device)
  • check_circle No network access by default (libkrun: TSI must be explicitly enabled; Firecracker: TAP planned)
  • check_circle Resource limits via libkrun's rlimit support or Firecracker's machine config
VM boundary
GUEST_OS_KERNEL
deployed_code USER_WORKLOAD
virtio-fs only

Layer 2: VMM Process Jail

On Linux, the VMM process is sandboxed before it boots the VM. With the libkrun backend, hotcell-libkrun-worker is sandboxed by hotcell-jailer before it configures libkrun. With the Firecracker backend, Firecracker's own jailer (battle-tested in AWS Lambda) handles sandboxing. The jail steps below describe hotcell-jailer for the libkrun backend.

LINUX ONLY
DEFENSE-IN-DEPTH
01

Close Inherited FDs

close_range(3, MAX, 0) prevents leaking host file descriptors into the jail.

02

Clear Environment

All environment variables are removed. Only LD_LIBRARY_PATH=/lib remains (required for the dynamic linker to find libkrun inside the jail).

03

Join Cgroup

Dedicated cgroup with memory.max, pids.max (256), and cpu.max limits applied.

04

Namespace Isolation

unshare() creates mount, PID, IPC, UTS, and network namespaces (network only when TSI is disabled).

05

pivot_root

New root filesystem via pivot_root(), old root unmounted and removed. Host filesystem entirely invisible.

06

Landlock Restrictions

Mandatory access control via Landlock ABI v3. The process can only access explicitly listed paths. This step is fatal — if Landlock is not enforced, the jail fails.

07

Drop Capabilities

Two-phase capability drop: bounding set cleared via PR_CAPBSET_DROP, then all remaining sets (ambient, effective, permitted, inheritable) cleared after setuid to nobody.

08

Seccomp BPF

Allowlist-based BPF filter in Kill mode. Any syscall not in the list triggers immediate process termination.

Jail Filesystem

After pivot_root(), the worker's entire filesystem view is:

/
├── dev/
│   ├── kvm          # bind-mount, for VM creation
│   ├── urandom      # bind-mount, for randomness
│   └── null         # bind-mount
├── lib/             # bind-mount read-only: libkrun.so, libkrunfw.so, libc, ld-linux
├── proc/            # procfs, mounted after pivot_root
├── rootfs/          # bind-mount read-only: the OCI root filesystem
├── shares/          # bind-mount read-write: host shared directories
├── tmp/             # writable, world-writable with sticky bit
├── result/          # writable: result file directory
├── config.json      # read-only: VM configuration
├── console.log      # writable: console output
└── worker           # bind-mount read-only: hotcell-libkrun-worker binary
warning

Networking Trade-off

libkrun's TSI (Transparent Socket Impersonation) proxies guest socket calls through the VMM process on the host via vsock. When TSI is enabled, the worker does not unshare the network namespace (CLONE_NEWNET is skipped), socket syscalls are added to the seccomp allowlist, and Landlock network restrictions are skipped. The remaining layers (namespace isolation, seccomp allowlist, capability drop) still constrain the process.

Technical Specs

Cgroup memory.max guest + 256 MiB min 512 MiB
Cgroup pids.max 256 prevents fork bombs
Seccomp Mode Kill immediate termination
Landlock ABI v3 required Linux 6.2+
Landlock Network v4 optional Linux 6.7+
Cgroup cpu.max unlimited configurable per-execution
verified_user

OCI Pipeline Security

shield_locked

Path Traversal Protection

Tar extraction rejects entries containing ../ to prevent directory escape attacks.

link

Symlink Escape Guards

Symlinks are resolved within the rootfs boundary using guest semantics. Absolute symlinks are rebased into the rootfs, not followed on the host.

block

Shell Injection Prevention

guest_tag values are validated to contain only [a-zA-Z0-9_-].

fingerprint

Digest Verification

Downloaded layer blobs are verified against their SHA-256 digest before use.

lock_clock

TOCTOU-Safe Downloads

Blobs go to random temp files, are verified, then extracted from the same file — no window for substitution.

check_circle

Test Coverage

127+ tests across unit tests, integration tests (real VMs), and adversarial security boundary tests. The jailer is verified working on Linux+KVM with seccomp in Kill mode. TSI networking is verified on both macOS and Linux.

28

Security Boundary Tests

Adversarial escape attempts that must fail: filesystem escape, proc traversal, namespace breakout, seccomp bypass (unshare, setns, ptrace, personality, bpf, keyctl, clone with CLONE_NEW*, ioctl TIOCSTI, prctl), capability regain, fork bomb limits, and the full jail sequence.

8

Guest Isolation Tests

Primitive subsystem verification: cgroup creation and enforcement, capability dropping, namespace + pivot_root isolation, Landlock filesystem restrictions, seccomp filter installation (log and kill modes), and FD close_range.

E2E

End-to-End Verified

Jailed VM boot validated on Linux+KVM with seccomp Kill mode. TSI networking verified on both macOS (Hypervisor.framework) and Linux (jailed, seccomp Kill mode). Full sandbox sequence: namespaces, pivot_root, Landlock, seccomp, cgroup, capability drop.

Test Tiers

Test Tier
Platform
VM/KVM
Root
Unit tests
Any
No
No
Integration tests
macOS (HVF) or Linux (KVM)
Yes
No
Jailed VM boot
Linux (KVM)
Yes
Yes
Jailer primitive tests
Linux
No
Yes
Security boundary tests
Linux
No
Yes