Building Your Own Memory Leak Detective: A Practical Guide to Automating Go Memory Analysis

The Ghost in Your Machine

You know that feeling when your Go application starts consuming memory like it’s training for an all-you-can-eat buffet? One day it’s running smoothly, the next—boom—your ops team is paging you at 3 AM because the service is using 8GB of RAM when it should be using 800MB. Welcome to the wonderful world of memory leaks. Here’s the thing about Go: it’s got this fancy garbage collector that’s supposed to make memory management our problem no more. And for the most part, it does an excellent job. But when you start mixing Go with C bindings through CGO, or when you create goroutines like you’re collecting Pokémon, or when you leave time.Ticker objects running indefinitely—suddenly, that garbage collector can’t help you. It’s like bringing a flashlight to investigate why your basement is flooded when there’s a broken pipe in the wall. The really frustrating part? Most profiling tools are designed for developers working with higher-level languages and don’t properly account for memory allocated by C libraries through CGO bindings. This means you’re flying blind when it comes to memory leaks that matter most in production environments. This article is about taking control. We’re going to explore how to build an intelligent automation tool that detects and helps you eliminate memory leaks in Go applications. No guessing games, no midnight panic—just solid, reproducible diagnostics.

Why Memory Leaks Exist in Go (And Why You Should Care)

Before we build anything, let’s be honest about why memory leaks happen in Go despite having a garbage collector. The CGO Problem Go’s garbage collector is incredibly smart about tracking memory allocated by Go code. The moment you allocate something with make() or &SomeStruct{}, the GC knows about it. But when you call a C library through CGO—say, libpq for PostgreSQL or some cryptography library—the memory allocated by that C code is invisible to the garbage collector. It’s like having a bouncer who only checks IDs at the Go entrance but has no idea what’s happening in the C VIP section. The Goroutine Leak This one’s sneaky. Imagine a function that spawns a new goroutine every time a request comes in, and that goroutine waits indefinitely for a signal that might never come:

func runJobs(ctx context.Context) {
    for {
        go func() {
            data := make([]byte, 1000000) // 1MB allocation
            processData(data)
            <-ctx.Done() // Waiting forever if context never closes
        }()
        time.Sleep(time.Second)
    }
}

See the problem? You’re spawning one goroutine per second, each holding onto 1MB of memory. After an hour, you’ve got 3,600 goroutines and 3.6GB of memory that’s technically “in use” but not actually doing anything productive. The Slice Reference Sneakiness Here’s where things get particularly fun (and by fun, I mean frustrating). When you slice a large byte array to get a smaller portion, the original array stays in memory:

func extractData(data []byte) []byte {
    return data[5:10] // Keeps the entire original array in memory!
}

The garbage collector sees the slice as “in use” and won’t collect the massive underlying array. The solution is to explicitly clone the slice, but how many developers know this trick off the top of their head?

The Current State of Go Memory Profiling

pprof: The Gold Standard Go comes with an excellent built-in profiling tool called pprof. Accessible through net/http/pprof, it gives you heap profiles that report memory allocation samples, letting you see both current and historical memory usage patterns. You can visualize this data as graphs, flame graphs, or even text output to pinpoint exactly where memory is being allocated. Setting it up is stupidly easy:

package main
import (
    _ "net/http/pprof"
    "net/http"
)
func main() {
    go http.ListenAndServe(":6060", nil)
    // Your application code here
}

Then you can inspect memory with:

go tool pprof http://localhost:6060/debug/pprof/heap

The pprof Limitations But here’s where pprof falls short: it’s reactive, not proactive. You have to know there’s a problem before you start profiling. In production environments where you can’t just restart services willy-nilly, waiting for issues to manifest before investigating is like waiting for your house to catch fire before installing smoke detectors. Also, if you’re dealing with heavy CGO usage, pprof won’t capture those allocations properly. You’ll be staring at profiles wondering why memory keeps growing when the Go allocations look perfectly reasonable. Continuous Profiling: The Expensive Way Companies like Datadog offer continuous profiling solutions that automatically capture profiling data from production services without manual intervention. These are fantastic—if you have the budget and don’t mind vendor lock-in. For many teams, that’s overkill or simply not an option.

Introducing cgoleak: A Specialized Solution

The open-source community recognized this gap and created cgoleak, an eBPF-based memory leak detector specifically designed for Go applications with CGO bindings. It’s a trimmed-down version of bcc’s memleak.py, but optimized for Go developer needs. The key insight: instead of trying to track all memory allocations (like memleak.py does), cgoleak focuses exclusively on CGO allocations. This eliminates sampling issues and noise that plague generic memory profilers when applied to Go workloads. How cgoleak Works The tool uses eBPF (extended Berkeley Packet Filter) to hook into memory allocation functions at the kernel level. Every malloc(), calloc(), and realloc() call made by C libraries gets intercepted and tracked. When memory is freed, the tool updates its records. After a configurable interval, it reports allocations that were never freed. Current supported allocators include malloc (but interestingly, not jemalloc yet), which covers the vast majority of C libraries you’ll encounter.

Building Your Own Automated Memory Leak Detection System

Now we get to the fun part: creating a comprehensive solution that combines multiple approaches into an automated system. Here’s what we want our system to do:

Continuously monitor a Go application for memory growth
Use pprof to identify Go-side allocations
Use cgoleak or similar tools for CGO allocations
Compare profiles over time to identify trends
Generate alerts when memory growth exceeds thresholds
Provide actionable insights about which code is responsible
Be completely automated and production-friendly

Architecture Overview

Step 1: Setting Up the Profile Collector

First, we need a service that periodically pulls heap profiles from our Go application and stores them for later analysis. Here’s a production-ready implementation:

package main
import (
    "context"
    "fmt"
    "io"
    "net/http"
    "os"
    "path/filepath"
    "time"
)
type ProfileCollector struct {
    targetURL      string
    outputDir      string
    interval       time.Duration
    client         *http.Client
    done           chan struct{}
}
func NewProfileCollector(targetURL, outputDir string, interval time.Duration) *ProfileCollector {
    return &ProfileCollector{
        targetURL: targetURL,
        outputDir: outputDir,
        interval:  interval,
        client: &http.Client{
            Timeout: 30 * time.Second,
        },
        done: make(chan struct{}),
    }
}
func (pc *ProfileCollector) Start() error {
    if err := os.MkdirAll(pc.outputDir, 0755); err != nil {
        return fmt.Errorf("failed to create output directory: %w", err)
    }
    ticker := time.NewTicker(pc.interval)
    defer ticker.Stop()
    for {
        select {
        case <-ticker.C:
            if err := pc.collectProfile(); err != nil {
                fmt.Printf("Error collecting profile: %v\n", err)
            }
        case <-pc.done:
            return nil
        }
    }
}
func (pc *ProfileCollector) Stop() {
    close(pc.done)
}
func (pc *ProfileCollector) collectProfile() error {
    heapURL := fmt.Sprintf("%s/debug/pprof/heap", pc.targetURL)
    resp, err := pc.client.Get(heapURL)
    if err != nil {
        return fmt.Errorf("failed to fetch heap profile: %w", err)
    }
    defer resp.Body.Close()
    if resp.StatusCode != http.StatusOK {
        return fmt.Errorf("unexpected status code: %d", resp.StatusCode)
    }
    timestamp := time.Now().Format("2006-01-02_15-04-05")
    outputFile := filepath.Join(pc.outputDir, fmt.Sprintf("heap_%s.prof", timestamp))
    out, err := os.Create(outputFile)
    if err != nil {
        return fmt.Errorf("failed to create output file: %w", err)
    }
    defer out.Close()
    if _, err := io.Copy(out, resp.Body); err != nil {
        return fmt.Errorf("failed to write profile: %w", err)
    }
    fmt.Printf("Profile collected: %s\n", outputFile)
    return nil
}

Step 2: Analyzing Memory Trends

With profiles being collected, we need something that compares them over time and identifies anomalies:

package main
import (
    "bufio"
    "fmt"
    "os"
    "regexp"
    "strconv"
    "strings"
)
type MemoryProfile struct {
    Timestamp   string
    Allocations map[string]uint64 // function -> bytes allocated
    Goroutines  int
    TotalMem    uint64
}
type MemoryTrend struct {
    Function      string
    InitialBytes  uint64
    CurrentBytes  uint64
    GrowthPercent float64
    IsIncreasing  bool
}
func ParseProfileOutput(filename string) (*MemoryProfile, error) {
    file, err := os.Open(filename)
    if err != nil {
        return nil, err
    }
    defer file.Close()
    profile := &MemoryProfile{
        Timestamp:   filename,
        Allocations: make(map[string]uint64),
    }
    scanner := bufio.NewScanner(file)
    allocRegex := regexp.MustCompile(`(\d+)\s+\d+\.\d+%\s+\d+\.\d+%\s+(\d+)\s+(.+)`)
    var totalMem uint64
    for scanner.Scan() {
        line := scanner.Text()
        matches := allocRegex.FindStringSubmatch(line)
        if len(matches) >= 4 {
            bytes, err := strconv.ParseUint(matches, 10, 64)
            if err == nil {
                funcName := strings.TrimSpace(matches)
                profile.Allocations[funcName] = bytes
                totalMem += bytes
            }
        }
    }
    profile.TotalMem = totalMem
    return profile, scanner.Err()
}
func CompareTrends(before, after *MemoryProfile) []MemoryTrend {
    var trends []MemoryTrend
    for funcName, currentBytes := range after.Allocations {
        beforeBytes := before.Allocations[funcName]
        if currentBytes > beforeBytes {
            growthPercent := float64(currentBytes-beforeBytes) / float64(beforeBytes) * 100
            trends = append(trends, MemoryTrend{
                Function:      funcName,
                InitialBytes:  beforeBytes,
                CurrentBytes:  currentBytes,
                GrowthPercent: growthPercent,
                IsIncreasing:  true,
            })
        }
    }
    return trends
}
func GenerateAlert(trend MemoryTrend, threshold float64) *Alert {
    if trend.GrowthPercent > threshold {
        return &Alert{
            Severity: "WARNING",
            Message: fmt.Sprintf(
                "Function %s allocated %d more bytes (%.1f%% growth)",
                trend.Function,
                trend.CurrentBytes-trend.InitialBytes,
                trend.GrowthPercent,
            ),
            AffectedFunction: trend.Function,
        }
    }
    return nil
}
type Alert struct {
    Severity        string
    Message         string
    AffectedFunction string
}

Step 3: Integrating CGO Leak Detection

For applications using CGO, we want to use specialized tools. Here’s a wrapper that can execute cgoleak or similar tools and parse their output:

package main
import (
    "context"
    "fmt"
    "os/exec"
    "strconv"
    "strings"
    "time"
)
type CGOLeakDetector struct {
    binaryPath string
    pid        int
    interval   int // seconds
}
type CGOLeak struct {
    Address       string
    Size          uint64
    AllocationAge time.Duration
    StackTrace    []string
}
func NewCGOLeakDetector(binaryPath string, pid int) *CGOLeakDetector {
    return &CGOLeakDetector{
        binaryPath: binaryPath,
        pid:        pid,
        interval:   5,
    }
}
func (detector *CGOLeakDetector) DetectLeaks(ctx context.Context) ([]CGOLeak, error) {
    cmd := exec.CommandContext(
        ctx,
        detector.binaryPath,
        "--pid", fmt.Sprintf("%d", detector.pid),
        "--interval", fmt.Sprintf("%d", detector.interval),
    )
    output, err := cmd.CombinedOutput()
    if err != nil {
        return nil, fmt.Errorf("cgoleak execution failed: %w", err)
    }
    return parseLeakOutput(string(output))
}
func parseLeakOutput(output string) ([]CGOLeak, error) {
    var leaks []CGOLeak
    lines := strings.Split(output, "\n")
    for _, line := range lines {
        if strings.Contains(line, "bytes") {
            leak := parseLeak(line)
            if leak != nil {
                leaks = append(leaks, *leak)
            }
        }
    }
    return leaks, nil
}
func parseLeak(line string) *CGOLeak {
    // Simplified parsing - adjust based on actual cgoleak output format
    parts := strings.Fields(line)
    if len(parts) < 2 {
        return nil
    }
    sizeStr := parts
    size, err := strconv.ParseUint(strings.TrimSuffix(sizeStr, "B"), 10, 64)
    if err != nil {
        return nil
    }
    return &CGOLeak{
        Size:           size,
        AllocationAge:  time.Minute, // Simplified
        StackTrace:     []string{},
    }
}

Step 4: Automated Alerting System

Now we tie everything together with an alerting system that runs continuously:

package main
import (
    "fmt"
    "time"
)
type AlertManager struct {
    collector   *ProfileCollector
    cgoDetector *CGOLeakDetector
    config      AlertConfig
    previousMem map[string]uint64
}
type AlertConfig struct {
    MemoryGrowthThreshold float64       // percent
    CheckInterval         time.Duration
    AlertThreshold        uint64        // bytes
    Handlers              []AlertHandler
}
type AlertHandler interface {
    Handle(alert *Alert) error
}
type SlackAlertHandler struct {
    webhookURL string
}
func (h *SlackAlertHandler) Handle(alert *Alert) error {
    // Send to Slack
    fmt.Printf("[%s] %s\n", alert.Severity, alert.Message)
    return nil
}
func NewAlertManager(collector *ProfileCollector, detector *CGOLeakDetector, config AlertConfig) *AlertManager {
    return &AlertManager{
        collector:   collector,
        cgoDetector: detector,
        config:      config,
        previousMem: make(map[string]uint64),
    }
}
func (am *AlertManager) Start() {
    ticker := time.NewTicker(am.config.CheckInterval)
    defer ticker.Stop()
    for range ticker.C {
        am.analyzeMemory()
    }
}
func (am *AlertManager) analyzeMemory() {
    // Get latest profiles
    latestProfile := am.getLatestProfile()
    if latestProfile == nil {
        return
    }
    // Check CGO leaks
    leaks, err := am.cgoDetector.DetectLeaks(context.Background())
    if err == nil {
        for _, leak := range leaks {
            if leak.Size > am.config.AlertThreshold {
                alert := &Alert{
                    Severity: "CRITICAL",
                    Message: fmt.Sprintf(
                        "CGO leak detected: %d bytes allocated for %v",
                        leak.Size,
                        leak.AllocationAge,
                    ),
                }
                am.dispatchAlert(alert)
            }
        }
    }
    // Check Go memory trends
    for _, handler := range am.config.Handlers {
        handler.Handle(&Alert{
            Severity: "INFO",
            Message:  fmt.Sprintf("Memory analysis complete at %v", time.Now()),
        })
    }
}
func (am *AlertManager) dispatchAlert(alert *Alert) {
    for _, handler := range am.config.Handlers {
        if err := handler.Handle(alert); err != nil {
            fmt.Printf("Failed to send alert: %v\n", err)
        }
    }
}
func (am *AlertManager) getLatestProfile() *MemoryProfile {
    // Implementation would read from disk and return latest
    return nil
}

Common Memory Leak Patterns and How to Prevent Them

Let me share some patterns we see constantly in Go applications, particularly in production environments. Pattern 1: Unbounded Goroutine Creation

// ❌ BAD - Creates unlimited goroutines
func handleRequests(requests <-chan Request) {
    for req := range requests {
        go processRequest(req) // Goroutine per request, never cleaned up
    }
}
// ✅ GOOD - Worker pool pattern
func handleRequests(requests <-chan Request, workers int) {
    for i := 0; i < workers; i++ {
        go func() {
            for req := range requests {
                processRequest(req)
            }
        }()
    }
}

Pattern 2: Stopped Tickers That Aren’t Actually Stopped

// ❌ BAD - Ticker resources never released
func monitor(duration time.Duration) {
    ticker := time.NewTicker(time.Second)
    time.Sleep(duration)
    // Function exits but ticker keeps running in background
}
// ✅ GOOD - Always defer Stop()
func monitor(duration time.Duration) {
    ticker := time.NewTicker(time.Second)
    defer ticker.Stop() // Ensures cleanup even if function panics
    select {
    case <-ticker.C:
        doWork()
    case <-time.After(duration):
        return
    }
}

Pattern 3: Slice References Keeping Large Arrays

// ❌ BAD - Huge array kept in memory for small slice
func processLargeFile(data []byte) []byte {
    return data[100:200] // Original 100MB array stays in memory
}
// ✅ GOOD - Clone the data you actually need
func processLargeFile(data []byte) []byte {
    return bytes.Clone(data[100:200]) // Only 100 bytes retained
}

Putting It All Together: A Complete Example

Here’s a practical example of a monitoring service that ties everything together:

package main
import (
    "context"
    "fmt"
    "time"
)
func main() {
    // Configuration
    config := AlertConfig{
        MemoryGrowthThreshold: 25.0, // Alert if memory grows 25% between checks
        CheckInterval:         time.Minute * 5,
        AlertThreshold:        100 * 1024 * 1024, // 100MB of leaked CGO memory
        Handlers: []AlertHandler{
            &SlackAlertHandler{webhookURL: "https://hooks.slack.com/..."},
        },
    }
    // Create components
    collector := NewProfileCollector(
        "http://localhost:6060",
        "./profiles",
        time.Minute, // Collect profiles every minute
    )
    cgoDetector := NewCGOLeakDetector(
        "./cgoleak",
        12345, // PID of target process
    )
    manager := NewAlertManager(collector, cgoDetector, config)
    // Start collection in background
    go func() {
        if err := collector.Start(); err != nil {
            fmt.Printf("Collector error: %v\n", err)
        }
    }()
    // Start analysis
    manager.Start()
}

Making It Production-Ready

Handling Failures Gracefully When building this in production, remember that profile collection might fail, network calls might time out, and processes might restart. Wrap everything in proper error handling and logging:

func (pc *ProfileCollector) collectProfile() error {
    ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
    defer cancel()
    req, err := http.NewRequestWithContext(ctx, "GET", pc.targetURL, nil)
    if err != nil {
        return fmt.Errorf("failed to create request: %w", err)
    }
    resp, err := pc.client.Do(req)
    if err != nil {
        return fmt.Errorf("request failed: %w", err)
    }
    defer resp.Body.Close()
    // ... rest of implementation
}

Storage Considerations Profiles can get large quickly. Consider:

Compressing old profiles with gzip
Implementing a retention policy (keep last 7 days)
Uploading to cloud storage for long-term analysis
Aggregating profiles instead of storing raw data Performance Impact Profile collection itself has overhead. Collecting too frequently will affect application performance. A good starting point is every 5-10 minutes in production, with more frequent collection available as an opt-in diagnostic mode.

Conclusion: From Leak to Peak

Memory leaks in Go applications don’t have to be mysterious nighttime emergencies. By understanding the common patterns, using the right tools, and building automated systems to detect issues, you can transform memory management from a source of stress into a solved problem. The beauty of the approach we’ve outlined is that it’s layered—you can start with simple pprof analysis, add CGO detection when needed, and scale to comprehensive automated monitoring as your requirements grow. No single tool is perfect for every situation, but combining them gives you complete visibility. The next time you get that page at 3 AM about memory usage, instead of panicking, you’ll have profiles, trends, and actionable insights ready to go. And that’s a feeling worth celebrating. Now go forth and leak-proof your applications! 🚀

Subscribe to Our Telegram Channel

Подпишитесь на наш телеграм

Thank you for subscribing!

Спасибо за подписку!

The Ghost in Your Machine#

Why Memory Leaks Exist in Go (And Why You Should Care)#

The Current State of Go Memory Profiling#

Introducing cgoleak: A Specialized Solution#

Building Your Own Automated Memory Leak Detection System#

Architecture Overview#

Step 1: Setting Up the Profile Collector#

Step 2: Analyzing Memory Trends#

Step 3: Integrating CGO Leak Detection#

Step 4: Automated Alerting System#

Common Memory Leak Patterns and How to Prevent Them#

Putting It All Together: A Complete Example#

Making It Production-Ready#

Conclusion: From Leak to Peak#