Why Your User Space Rate Limiter Is Probably Crying

If you’ve ever tried to implement rate limiting in user space, you know the feeling. Packets arrive at the network interface, traverse through several kernel layers, bounce around in syscall overhead, and by the time your beautifully crafted rate limiting logic gets a chance to inspect them, you’ve already lost the performance battle. It’s like trying to stop a tsunami with a garden hose while wearing roller skates. eBPF (extended Berkeley Packet Filter) changes this entire equation. By moving rate limiting logic directly into the Linux kernel, we can intercept traffic at the earliest possible point—before it even gets a chance to knock on user space’s door. Combined with Go for orchestration and control, you get the best of both worlds: raw kernel performance and the developer-friendly ergonomics of modern application logic. Let me show you how to build a production-grade network rate limiter that actually makes sense.

Understanding the Architecture: Where Magic Meets Physics

graph TD A["Network Interface Card"] -->|Raw Packets| B["XDP Layer
eBPF Program"] B -->|Rate Limit Check| C{Packet Decision} C -->|Within Limit| D["Allow to Stack
XDP_PASS"] C -->|Exceeded Limit| E["Drop Packet
XDP_DROP"] D --> F["User Space Application"] B -->|Statistics| G["eBPF Maps
Shared State"] G -->|Go Program
Updates Policy| H["Configuration Map
Runtime Tuning"] H -->|New Limits| B

The architecture here isn’t complicated, but it’s crucial to understand. When a packet arrives at your network interface, it encounters your eBPF program before the Linux network stack even knows what hit it. This is where XDP (eXpress Data Path) comes in—it’s essentially a “hello” at the kernel’s front door where decisions happen at wire speed. Your eBPF program maintains maps (think of them as kernel-space hash tables) that track request rates per client, store configuration, and collect statistics. Your Go application acts as the maestro, updating these maps at runtime without needing to recompile or reload the kernel program. It’s hot-reloading for network policies.

The Core Building Blocks: Maps and State Management

eBPF maps are the glue holding this system together. For rate limiting, you’ll typically need three map types: LRU Hash Maps are your primary weapon. LRU stands for “Least Recently Used,” which means the kernel automatically evicts old entries when you hit the memory ceiling. This is perfect for tracking client IP addresses and their current request counts—you don’t want memory to leak like your debugging habits.

struct rate_counter {
    __u64 requests;
    __u64 window_start;
    __u64 last_seen;
};
struct {
    __uint(type, BPF_MAP_TYPE_LRU_HASH);
    __uint(max_entries, 100000);
    __type(key, __u32);  // Source IP
    __type(value, struct rate_counter);
} rate_limit_map SEC(".maps");

Array Maps for configuration are your runtime control panel. Need to adjust the rate limit without recompiling? Update an array map from your Go application, and the eBPF program picks it up on the next packet.

struct {
    __uint(type, BPF_MAP_TYPE_ARRAY);
    __uint(max_entries, 1);
    __type(key, __u32);
    __type(value, __u32);  // Requests per second limit
} config_map SEC(".maps");

Statistics maps let you observe what’s happening inside the kernel:

struct rate_stats {
    __u64 dropped_packets;
    __u64 allowed_packets;
};
struct {
    __uint(type, BPF_MAP_TYPE_ARRAY);
    __uint(max_entries, 1);
    __type(key, __u32);
    __type(value, struct rate_stats);
} stats_map SEC(".maps");

The eBPF Program: Where the Real Work Happens

Here’s the practical implementation. I’ll walk you through the logic step by step because understanding why each line exists makes debugging infinitely easier when things inevitably go sideways at 3 AM.

#include <uapi/linux/bpf.h>
#include <uapi/linux/if_ether.h>
#include <uapi/linux/ip.h>
#include <linux/in.h>
#define WINDOW_SIZE_NS (1000000000UL)  // 1 second in nanoseconds
#define MAX_REQUESTS_PER_SECOND 1000
struct rate_counter {
    __u64 requests;
    __u64 window_start;
    __u64 last_seen;
};
struct rate_stats {
    __u64 dropped_packets;
    __u64 allowed_packets;
};
struct {
    __uint(type, BPF_MAP_TYPE_LRU_HASH);
    __uint(max_entries, 100000);
    __type(key, __u32);
    __type(value, struct rate_counter);
} rate_limit_map SEC(".maps");
struct {
    __uint(type, BPF_MAP_TYPE_ARRAY);
    __uint(max_entries, 1);
    __type(key, __u32);
    __type(value, struct rate_stats);
} stats_map SEC(".maps");
struct {
    __uint(type, BPF_MAP_TYPE_ARRAY);
    __uint(max_entries, 1);
    __type(key, __u32);
    __type(value, __u32);
} config_map SEC(".maps");
// Inline helper to extract source IP
static __always_inline __u32 get_src_ip(struct xdp_md *ctx) {
    void *data_end = (void *)(unsigned long)ctx->data_end;
    void *data = (void *)(unsigned long)ctx->data;
    // Parse Ethernet header
    struct ethhdr *eth = data;
    if ((void *)(eth + 1) > data_end)
        return 0;
    // Parse IP header
    if (eth->h_proto != __constant_htons(ETH_P_IP))
        return 0;
    struct iphdr *iph = (void *)(eth + 1);
    if ((void *)(iph + 1) > data_end)
        return 0;
    return iph->saddr;
}
SEC("xdp")
int rate_limiter_main(struct xdp_md *ctx) {
    __u64 now = bpf_ktime_get_ns();
    __u32 src_ip = get_src_ip(ctx);
    if (!src_ip)
        return XDP_PASS;
    // Lookup or create rate counter for this IP
    struct rate_counter *counter = bpf_map_lookup_elem(&rate_limit_map, &src_ip);
    __u32 zero = 0;
    // Get current limit from config
    __u32 *max_rps = bpf_map_lookup_elem(&config_map, &zero);
    __u32 limit = max_rps ? *max_rps : MAX_REQUESTS_PER_SECOND;
    // Lookup statistics
    struct rate_stats *stats = bpf_map_lookup_elem(&stats_map, &zero);
    if (!counter) {
        // First packet from this IP
        struct rate_counter new_counter = {
            .requests = 1,
            .window_start = now,
            .last_seen = now,
        };
        bpf_map_update_elem(&rate_limit_map, &src_ip, &new_counter, BPF_ANY);
        if (stats) {
            __sync_fetch_and_add(&stats->allowed_packets, 1);
        }
        return XDP_PASS;
    }
    // Check if window has expired
    if (now - counter->window_start >= WINDOW_SIZE_NS) {
        counter->requests = 1;
        counter->window_start = now;
        counter->last_seen = now;
        if (stats) {
            __sync_fetch_and_add(&stats->allowed_packets, 1);
        }
        return XDP_PASS;
    }
    // Increment counter atomically
    __sync_fetch_and_add(&counter->requests, 1);
    counter->last_seen = now;
    // Check if limit exceeded
    if (counter->requests > limit) {
        if (stats) {
            __sync_fetch_and_add(&stats->dropped_packets, 1);
        }
        return XDP_DROP;
    }
    if (stats) {
        __sync_fetch_and_add(&stats->allowed_packets, 1);
    }
    return XDP_PASS;
}
char _license[] SEC("license") = "GPL";

Let me break down what’s happening here because the details matter: The Extraction Phase: get_src_ip() safely parses through the Ethernet and IP headers. Notice all those pointer checks against data_end? The eBPF verifier is paranoid (rightfully so), and it will reject your program if it suspects any out-of-bounds access. These checks keep the verifier happy and your kernel from crashing. The Lookup Logic: We check if we’ve seen this IP before. If not, we create a new counter entry and allow the packet. This is important—the first packet always passes. You’re not being mean, you’re being practical. The Window Management: Every second, we reset the counter. This is a simple sliding window implementation. If you wanted something fancier like token bucket (which I’ll show you next), this is where you’d implement it. The Atomic Operations: __sync_fetch_and_add() increments counters atomically. In the kernel, on a multi-core system, you can’t just do counter++. You need atomic operations to prevent the thundering herd problem where multiple CPUs are updating the same counter simultaneously and causing lies to emerge from the statistics.

Advanced Technique: Token Bucket Algorithm

Sliding windows work, but they’re not sophisticated. What if you want to allow occasional bursts while maintaining a long-term rate? Token bucket is the answer, and it’s elegant enough to deserve its own section. The concept is delightfully simple: imagine a bucket that holds tokens. Tokens refill at a constant rate (e.g., 100 per second). Each request costs some tokens. If the bucket is empty, you drop the packet. Let’s implement it:

struct token_bucket {
    __u64 tokens;           // Tokens available (stored with precision: tokens * 1000)
    __u64 burst_size;       // Maximum tokens the bucket can hold
    __u64 refill_rate;      // Tokens per second
    __u64 last_refill;      // Last time we added tokens
};
struct {
    __uint(type, BPF_MAP_TYPE_LRU_HASH);
    __uint(max_entries, 100000);
    __type(key, __u32);
    __type(value, struct token_bucket);
} token_map SEC(".maps");
#define BURST_SIZE 50
#define REFILL_RATE 100  // tokens per second
SEC("xdp")
int token_bucket_limiter(struct xdp_md *ctx) {
    __u64 now = bpf_ktime_get_ns();
    __u32 src_ip = get_src_ip(ctx);
    __u64 packet_cost = 1;  // Each packet costs 1 token
    if (!src_ip)
        return XDP_PASS;
    struct token_bucket *bucket = bpf_map_lookup_elem(&token_map, &src_ip);
    if (!bucket) {
        // New client: create bucket with full tokens
        struct token_bucket new_bucket = {
            .tokens = BURST_SIZE * 1000,
            .burst_size = BURST_SIZE,
            .refill_rate = REFILL_RATE * 1000,
            .last_refill = now,
        };
        bpf_map_update_elem(&token_map, &src_ip, &new_bucket, BPF_ANY);
        return XDP_PASS;
    }
    // Calculate elapsed time and tokens to add
    __u64 elapsed_ns = now - bucket->last_refill;
    __u64 elapsed_seconds = elapsed_ns / 1000000000ULL;
    __u64 tokens_to_add = elapsed_seconds * bucket->refill_rate;
    // Update token count (capped at burst size)
    bucket->tokens = bucket->tokens + tokens_to_add;
    if (bucket->tokens > bucket->burst_size * 1000) {
        bucket->tokens = bucket->burst_size * 1000;
    }
    bucket->last_refill = now;
    // Check if we have enough tokens
    if (bucket->tokens >= packet_cost * 1000) {
        bucket->tokens -= packet_cost * 1000;
        return XDP_PASS;
    }
    return XDP_DROP;
}

This is where eBPF gets genuinely clever. We’re storing fractional tokens (multiplied by 1000) to avoid integer division issues in the kernel. Your algorithm now smoothly handles bursty traffic while maintaining long-term rate limits. A client can send 50 packets all at once (burning through the burst), but then has to wait for tokens to refill.

Compiling and Deploying Your eBPF Program

Let’s get this code running. You’ll need LLVM/Clang (not regular GCC) because eBPF requires compiling to the BPF instruction set.

# Install dependencies (Ubuntu/Debian)
sudo apt-get install -y clang llvm linux-headers-$(uname -r)
# Compile the eBPF program
clang -O2 -target bpf -c rate_limiter.c -o rate_limiter.o
# Verify the compilation
llvm-objdump -S rate_limiter.o

Now comes the satisfying part—attaching it to your network interface via XDP:

# Load and attach to your interface (replace eth0 with your interface)
sudo ip link set dev eth0 xdp obj rate_limiter.o sec xdp
# Verify it's attached
sudo ip link show eth0
# You should see something like:
# xdp/id:123 prog/id:456 drv in

To unload it:

sudo ip link set dev eth0 xdp off

Go Application: Orchestrating the Rate Limiter

Now for the fun part—controlling this beast from user space. You’ll need a Go program that:

  1. Loads the eBPF program: Uses the ebpf package
  2. Updates configuration maps: Changes rate limits at runtime
  3. Reads statistics: Monitors what’s being dropped
package main
import (
    "encoding/binary"
    "flag"
    "fmt"
    "log"
    "net"
    "time"
    "github.com/cilium/ebpf"
)
type RateCounter struct {
    Requests    uint64
    WindowStart uint64
    LastSeen    uint64
}
type RateStats struct {
    DroppedPackets  uint64
    AllowedPackets  uint64
}
func main() {
    iface := flag.String("iface", "eth0", "Network interface to attach to")
    rateLimit := flag.Int("rate", 1000, "Requests per second")
    flag.Parse()
    // Load compiled eBPF program
    spec, err := ebpf.NewCollectionSpec("rate_limiter.o")
    if err != nil {
        log.Fatalf("Failed to load eBPF spec: %v", err)
    }
    coll, err := spec.Load(nil)
    if err != nil {
        log.Fatalf("Failed to load eBPF collection: %v", err)
    }
    defer coll.Close()
    // Attach to XDP
    link, err := attachXDP(coll.Programs["rate_limiter_main"], *iface)
    if err != nil {
        log.Fatalf("Failed to attach XDP program: %v", err)
    }
    defer link.Close()
    log.Printf("eBPF program attached to %s", *iface)
    // Update configuration
    configMap := coll.Maps["config_map"]
    rateConfig := uint32(*rateLimit)
    key := uint32(0)
    if err := configMap.Put(&key, &rateConfig); err != nil {
        log.Fatalf("Failed to update config: %v", err)
    }
    log.Printf("Rate limit set to %d requests/second", *rateLimit)
    // Monitor statistics
    statsMap := coll.Maps["stats_map"]
    ticker := time.NewTicker(5 * time.Second)
    defer ticker.Stop()
    for range ticker.C {
        var stats RateStats
        if err := statsMap.Lookup(&key, &stats); err != nil {
            log.Printf("Failed to read stats: %v", err)
            continue
        }
        total := stats.AllowedPackets + stats.DroppedPackets
        if total == 0 {
            continue
        }
        dropRate := float64(stats.DroppedPackets) / float64(total) * 100
        log.Printf("Stats: Allowed=%d, Dropped=%d, Drop Rate=%.2f%%",
            stats.AllowedPackets, stats.DroppedPackets, dropRate)
    }
}
func attachXDP(prog *ebpf.Program, iface string) (ebpf.Link, error) {
    // This requires netlink programming or using libbpf
    // Simplified example - in production, use cilium/ebpf Link APIs
    return nil, nil
}

This is a simplified example. For production, you’d want to use the proper Link API from the cilium/ebpf package. Here’s a more complete version:

package main
import (
    "context"
    "fmt"
    "log"
    "net"
    "time"
    "github.com/cilium/ebpf"
    "github.com/cilium/ebpf/link"
    "github.com/cilium/ebpf/ringbuf"
)
type RateStats struct {
    DroppedPackets  uint64
    AllowedPackets  uint64
}
func main() {
    // Load eBPF program
    spec, err := ebpf.NewCollectionSpec("rate_limiter.o")
    if err != nil {
        log.Fatalf("Loading eBPF spec: %v", err)
    }
    coll, err := spec.Load(nil)
    if err != nil {
        log.Fatalf("Loading eBPF collection: %v", err)
    }
    defer coll.Close()
    // Get the XDP program
    xdpProg := coll.Programs["rate_limiter_main"]
    if xdpProg == nil {
        log.Fatal("XDP program not found")
    }
    // Attach to XDP (requires appropriate permissions)
    iface, err := net.InterfaceByName("eth0")
    if err != nil {
        log.Fatalf("Interface lookup: %v", err)
    }
    l, err := link.AttachXDP(link.XDPOptions{
        Program:   xdpProg,
        Interface: iface.Index,
    })
    if err != nil {
        log.Fatalf("Attaching XDP: %v", err)
    }
    defer l.Close()
    log.Printf("✓ Attached XDP program to %s", iface.Name)
    // Update rate limit dynamically
    configMap := coll.Maps["config_map"]
    key := uint32(0)
    limit := uint32(5000)  // 5000 req/sec
    if err := configMap.Put(&key, &limit); err != nil {
        log.Fatalf("Updating config map: %v", err)
    }
    log.Printf("✓ Rate limit set to %d req/sec", limit)
    // Monitor statistics
    statsMap := coll.Maps["stats_map"]
    ticker := time.NewTicker(10 * time.Second)
    defer ticker.Stop()
    for {
        select {
        case <-ticker.C:
            var stats RateStats
            if err := statsMap.Lookup(&key, &stats); err != nil {
                log.Printf("Reading stats: %v", err)
                continue
            }
            total := stats.AllowedPackets + stats.DroppedPackets
            if total > 0 {
                dropPct := 100.0 * float64(stats.DroppedPackets) / float64(total)
                log.Printf("Allowed: %d | Dropped: %d | Drop Rate: %.2f%%",
                    stats.AllowedPackets, stats.DroppedPackets, dropPct)
            }
        }
    }
}

Testing Your Rate Limiter in the Real World

Theory is beautiful until reality crashes the party. Here’s how to actually test this thing: Setup: Create a virtual test environment using network namespaces:

# Create two network namespaces
sudo ip netns add client
sudo ip netns add server
# Create a virtual ethernet pair
sudo ip link add veth0 type veth peer name veth1
# Move interfaces to namespaces
sudo ip link set veth1 netns server
sudo ip link set veth0 netns client
# Configure IPs
sudo ip netns exec client ip addr add 10.0.0.1/24 dev veth0
sudo ip netns exec client ip link set veth0 up
sudo ip netns exec server ip addr add 10.0.0.2/24 dev veth1
sudo ip netns exec server ip link set veth1 up

Load your eBPF program:

# In the server namespace
sudo ip netns exec server sudo ip link set dev veth1 xdp obj rate_limiter.o sec xdp
# Start your Go controller
sudo ./rate_limiter -iface veth1 -rate 100

Generate traffic from the client:

# Terminal 1: Start a simple server
sudo ip netns exec server python3 -m http.server 8080
# Terminal 2: Generate load
sudo ip netns exec client ab -n 10000 -c 100 http://10.0.0.2:8080/
# Terminal 3: Watch your rate limiter statistics
watch -n 1 'sudo ./rate_limiter -iface veth1'

The ab (ApacheBench) tool will pound your server, and your eBPF program will calmly drop packets that exceed the rate limit. You’ll see the drop rate climb as the concurrency increases.

Production Considerations: When Theory Meets Reality

Implementing rate limiting in the kernel is powerful, but production deployment requires thinking beyond the happy path: Per-Client vs. Global Limits: The examples show per-IP tracking, but you might need different strategies—global limits on specific ports, per-subnet limits, or even machine-learning-based anomaly detection feeding into your eBPF maps. GeoIP-Aware Limiting: Want to be more lenient with domestic traffic? You can integrate GeoIP data:

struct geo_rate_config {
    __u32 domestic_limit;      // Higher
    __u32 foreign_limit;       // Lower
    __u32 suspicious_limit;    // Very low
};
static __always_inline __u8 get_country_risk(__u32 ip) {
    // Lookup against GeoIP data
    // Return risk level
    return RISK_MEDIUM;
}

Egress vs. Ingress: Should you rate limit incoming or outgoing traffic? The answer is context-dependent. Ingress protection prevents DDoS. Egress protection prevents compromised clients from misbehaving. Usually, you want both. Tail Calls for Policy Flexibility: For complex rate limiting policies, use bpf_tail_call() to dynamically switch between different rate limiting strategies without program recompilation:

struct {
    __uint(type, BPF_MAP_TYPE_PROG_ARRAY);
    __uint(max_entries, 10);
    __type(key, __u32);
    __type(value, __u32);
} policy_map SEC(".maps");
// Determine which policy to apply
__u32 policy_idx = classify_packet(ctx);
bpf_tail_call(ctx, &policy_map, policy_idx);

Integration with Service Meshes

Modern infrastructure increasingly uses service meshes like Istio or Cilium. eBPF-based rate limiting integrates beautifully. Instead of managing rate limits at the application level (where they incur overhead), push them to the mesh’s eBPF data plane. Your application stops worrying about being a good citizen and just handles requests—the mesh ensures nobody gets too greedy.

Performance Characteristics: The Numbers That Matter

Here’s what you can expect:

  • Latency: Sub-microsecond decision making. We’re talking single-digit nanoseconds on modern hardware.
  • Throughput: eBPF can handle millions of packets per second. Real-world performance depends on your hardware, but you’re looking at 10x+ improvement over user-space rate limiting.
  • Memory: The LRU maps automatically evict old entries, so memory usage stays bounded. 100,000 entries typically uses a few MB.
  • CPU overhead: Minimal. You’re doing simple arithmetic and map lookups—exactly what CPUs are optimized for. The secret sauce is that you’ve eliminated the syscall overhead. Every time your user-space program tries to decide whether to allow a packet, that’s a syscall. With eBPF, the decision happens in kernel space, no context switch needed.

Debugging: When Things Go Wrong

The eBPF verifier will reject programs that it suspects might crash the kernel. Error messages can be cryptic:

Verifier error: invalid access to map value, off=8 size=8

This typically means you’re accessing memory beyond what you declared in your struct. Double-check your struct sizes and alignment. Use bpftool for debugging:

sudo bpftool prog list
sudo bpftool prog show id 123
sudo bpftool prog dump xlated id 123
# Read map contents
sudo bpftool map dump name rate_limit_map

For live debugging in development, add bpf_printk() calls (they write to /sys/kernel/debug/tracing/trace_pipe):

#define bpf_printk(fmt, ...)                                      \
({                                                                 \
    char ____fmt[] = fmt;                                          \
    bpf_trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);    \
})
// In your program:
bpf_printk("Packet from IP: %pI4, requests: %llu", &src_ip, counter->requests);

Then monitor with:

sudo cat /sys/kernel/debug/tracing/trace_pipe

Conclusion: You’ve Got This

Network-level rate limiting with eBPF and Go isn’t theoretical—it’s practical, deployable, and genuinely transformative for infrastructure that needs to handle scale. You’ve eliminated a massive source of overhead, gained flexibility through runtime configuration, and moved traffic decisions to the only place they should ever happen: as close to the hardware as possible. The learning curve is real (eBPF’s verifier can be humbling), but the payoff is real too. Your network doesn’t lag anymore because user space took too long to decide. Your rate limiter never drops a connection due to GC pauses. You’ve won back the nanoseconds that matter. Start small—deploy to a test environment, watch it work, and scale up with confidence. The kernel’s got your back now.

References: eBPF Maps for High-Performance Rate Limiting - ngkore.org Network Traffic Rate Limiting with eBPF/XDP - iximiuz Labs Better Bandwidth Management with eBPF - YouTube eBPF in Service Mesh - Benisontech