The Ghost in Your Machine
You know that feeling when your Go application starts consuming memory like it’s training for an all-you-can-eat buffet? One day it’s running smoothly, the next—boom—your ops team is paging you at 3 AM because the service is using 8GB of RAM when it should be using 800MB. Welcome to the wonderful world of memory leaks.
Here’s the thing about Go: it’s got this fancy garbage collector that’s supposed to make memory management our problem no more. And for the most part, it does an excellent job. But when you start mixing Go with C bindings through CGO, or when you create goroutines like you’re collecting Pokémon, or when you leave time.Ticker objects running indefinitely—suddenly, that garbage collector can’t help you. It’s like bringing a flashlight to investigate why your basement is flooded when there’s a broken pipe in the wall.
The really frustrating part? Most profiling tools are designed for developers working with higher-level languages and don’t properly account for memory allocated by C libraries through CGO bindings. This means you’re flying blind when it comes to memory leaks that matter most in production environments.
This article is about taking control. We’re going to explore how to build an intelligent automation tool that detects and helps you eliminate memory leaks in Go applications. No guessing games, no midnight panic—just solid, reproducible diagnostics.
Why Memory Leaks Exist in Go (And Why You Should Care)
Before we build anything, let’s be honest about why memory leaks happen in Go despite having a garbage collector.
The CGO Problem
Go’s garbage collector is incredibly smart about tracking memory allocated by Go code. The moment you allocate something with make() or &SomeStruct{}, the GC knows about it. But when you call a C library through CGO—say, libpq for PostgreSQL or some cryptography library—the memory allocated by that C code is invisible to the garbage collector. It’s like having a bouncer who only checks IDs at the Go entrance but has no idea what’s happening in the C VIP section.
The Goroutine Leak
This one’s sneaky. Imagine a function that spawns a new goroutine every time a request comes in, and that goroutine waits indefinitely for a signal that might never come:
func runJobs(ctx context.Context) {
for {
go func() {
data := make([]byte, 1000000) // 1MB allocation
processData(data)
<-ctx.Done() // Waiting forever if context never closes
}()
time.Sleep(time.Second)
}
}
See the problem? You’re spawning one goroutine per second, each holding onto 1MB of memory. After an hour, you’ve got 3,600 goroutines and 3.6GB of memory that’s technically “in use” but not actually doing anything productive. The Slice Reference Sneakiness Here’s where things get particularly fun (and by fun, I mean frustrating). When you slice a large byte array to get a smaller portion, the original array stays in memory:
func extractData(data []byte) []byte {
return data[5:10] // Keeps the entire original array in memory!
}
The garbage collector sees the slice as “in use” and won’t collect the massive underlying array. The solution is to explicitly clone the slice, but how many developers know this trick off the top of their head?
The Current State of Go Memory Profiling
pprof: The Gold Standard
Go comes with an excellent built-in profiling tool called pprof. Accessible through net/http/pprof, it gives you heap profiles that report memory allocation samples, letting you see both current and historical memory usage patterns. You can visualize this data as graphs, flame graphs, or even text output to pinpoint exactly where memory is being allocated.
Setting it up is stupidly easy:
package main
import (
_ "net/http/pprof"
"net/http"
)
func main() {
go http.ListenAndServe(":6060", nil)
// Your application code here
}
Then you can inspect memory with:
go tool pprof http://localhost:6060/debug/pprof/heap
The pprof Limitations But here’s where pprof falls short: it’s reactive, not proactive. You have to know there’s a problem before you start profiling. In production environments where you can’t just restart services willy-nilly, waiting for issues to manifest before investigating is like waiting for your house to catch fire before installing smoke detectors. Also, if you’re dealing with heavy CGO usage, pprof won’t capture those allocations properly. You’ll be staring at profiles wondering why memory keeps growing when the Go allocations look perfectly reasonable. Continuous Profiling: The Expensive Way Companies like Datadog offer continuous profiling solutions that automatically capture profiling data from production services without manual intervention. These are fantastic—if you have the budget and don’t mind vendor lock-in. For many teams, that’s overkill or simply not an option.
Introducing cgoleak: A Specialized Solution
The open-source community recognized this gap and created cgoleak, an eBPF-based memory leak detector specifically designed for Go applications with CGO bindings. It’s a trimmed-down version of bcc’s memleak.py, but optimized for Go developer needs.
The key insight: instead of trying to track all memory allocations (like memleak.py does), cgoleak focuses exclusively on CGO allocations. This eliminates sampling issues and noise that plague generic memory profilers when applied to Go workloads.
How cgoleak Works
The tool uses eBPF (extended Berkeley Packet Filter) to hook into memory allocation functions at the kernel level. Every malloc(), calloc(), and realloc() call made by C libraries gets intercepted and tracked. When memory is freed, the tool updates its records. After a configurable interval, it reports allocations that were never freed.
Current supported allocators include malloc (but interestingly, not jemalloc yet), which covers the vast majority of C libraries you’ll encounter.
Building Your Own Automated Memory Leak Detection System
Now we get to the fun part: creating a comprehensive solution that combines multiple approaches into an automated system. Here’s what we want our system to do:
- Continuously monitor a Go application for memory growth
- Use pprof to identify Go-side allocations
- Use cgoleak or similar tools for CGO allocations
- Compare profiles over time to identify trends
- Generate alerts when memory growth exceeds thresholds
- Provide actionable insights about which code is responsible
- Be completely automated and production-friendly
Architecture Overview
Step 1: Setting Up the Profile Collector
First, we need a service that periodically pulls heap profiles from our Go application and stores them for later analysis. Here’s a production-ready implementation:
package main
import (
"context"
"fmt"
"io"
"net/http"
"os"
"path/filepath"
"time"
)
type ProfileCollector struct {
targetURL string
outputDir string
interval time.Duration
client *http.Client
done chan struct{}
}
func NewProfileCollector(targetURL, outputDir string, interval time.Duration) *ProfileCollector {
return &ProfileCollector{
targetURL: targetURL,
outputDir: outputDir,
interval: interval,
client: &http.Client{
Timeout: 30 * time.Second,
},
done: make(chan struct{}),
}
}
func (pc *ProfileCollector) Start() error {
if err := os.MkdirAll(pc.outputDir, 0755); err != nil {
return fmt.Errorf("failed to create output directory: %w", err)
}
ticker := time.NewTicker(pc.interval)
defer ticker.Stop()
for {
select {
case <-ticker.C:
if err := pc.collectProfile(); err != nil {
fmt.Printf("Error collecting profile: %v\n", err)
}
case <-pc.done:
return nil
}
}
}
func (pc *ProfileCollector) Stop() {
close(pc.done)
}
func (pc *ProfileCollector) collectProfile() error {
heapURL := fmt.Sprintf("%s/debug/pprof/heap", pc.targetURL)
resp, err := pc.client.Get(heapURL)
if err != nil {
return fmt.Errorf("failed to fetch heap profile: %w", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return fmt.Errorf("unexpected status code: %d", resp.StatusCode)
}
timestamp := time.Now().Format("2006-01-02_15-04-05")
outputFile := filepath.Join(pc.outputDir, fmt.Sprintf("heap_%s.prof", timestamp))
out, err := os.Create(outputFile)
if err != nil {
return fmt.Errorf("failed to create output file: %w", err)
}
defer out.Close()
if _, err := io.Copy(out, resp.Body); err != nil {
return fmt.Errorf("failed to write profile: %w", err)
}
fmt.Printf("Profile collected: %s\n", outputFile)
return nil
}
Step 2: Analyzing Memory Trends
With profiles being collected, we need something that compares them over time and identifies anomalies:
package main
import (
"bufio"
"fmt"
"os"
"regexp"
"strconv"
"strings"
)
type MemoryProfile struct {
Timestamp string
Allocations map[string]uint64 // function -> bytes allocated
Goroutines int
TotalMem uint64
}
type MemoryTrend struct {
Function string
InitialBytes uint64
CurrentBytes uint64
GrowthPercent float64
IsIncreasing bool
}
func ParseProfileOutput(filename string) (*MemoryProfile, error) {
file, err := os.Open(filename)
if err != nil {
return nil, err
}
defer file.Close()
profile := &MemoryProfile{
Timestamp: filename,
Allocations: make(map[string]uint64),
}
scanner := bufio.NewScanner(file)
allocRegex := regexp.MustCompile(`(\d+)\s+\d+\.\d+%\s+\d+\.\d+%\s+(\d+)\s+(.+)`)
var totalMem uint64
for scanner.Scan() {
line := scanner.Text()
matches := allocRegex.FindStringSubmatch(line)
if len(matches) >= 4 {
bytes, err := strconv.ParseUint(matches, 10, 64)
if err == nil {
funcName := strings.TrimSpace(matches)
profile.Allocations[funcName] = bytes
totalMem += bytes
}
}
}
profile.TotalMem = totalMem
return profile, scanner.Err()
}
func CompareTrends(before, after *MemoryProfile) []MemoryTrend {
var trends []MemoryTrend
for funcName, currentBytes := range after.Allocations {
beforeBytes := before.Allocations[funcName]
if currentBytes > beforeBytes {
growthPercent := float64(currentBytes-beforeBytes) / float64(beforeBytes) * 100
trends = append(trends, MemoryTrend{
Function: funcName,
InitialBytes: beforeBytes,
CurrentBytes: currentBytes,
GrowthPercent: growthPercent,
IsIncreasing: true,
})
}
}
return trends
}
func GenerateAlert(trend MemoryTrend, threshold float64) *Alert {
if trend.GrowthPercent > threshold {
return &Alert{
Severity: "WARNING",
Message: fmt.Sprintf(
"Function %s allocated %d more bytes (%.1f%% growth)",
trend.Function,
trend.CurrentBytes-trend.InitialBytes,
trend.GrowthPercent,
),
AffectedFunction: trend.Function,
}
}
return nil
}
type Alert struct {
Severity string
Message string
AffectedFunction string
}
Step 3: Integrating CGO Leak Detection
For applications using CGO, we want to use specialized tools. Here’s a wrapper that can execute cgoleak or similar tools and parse their output:
package main
import (
"context"
"fmt"
"os/exec"
"strconv"
"strings"
"time"
)
type CGOLeakDetector struct {
binaryPath string
pid int
interval int // seconds
}
type CGOLeak struct {
Address string
Size uint64
AllocationAge time.Duration
StackTrace []string
}
func NewCGOLeakDetector(binaryPath string, pid int) *CGOLeakDetector {
return &CGOLeakDetector{
binaryPath: binaryPath,
pid: pid,
interval: 5,
}
}
func (detector *CGOLeakDetector) DetectLeaks(ctx context.Context) ([]CGOLeak, error) {
cmd := exec.CommandContext(
ctx,
detector.binaryPath,
"--pid", fmt.Sprintf("%d", detector.pid),
"--interval", fmt.Sprintf("%d", detector.interval),
)
output, err := cmd.CombinedOutput()
if err != nil {
return nil, fmt.Errorf("cgoleak execution failed: %w", err)
}
return parseLeakOutput(string(output))
}
func parseLeakOutput(output string) ([]CGOLeak, error) {
var leaks []CGOLeak
lines := strings.Split(output, "\n")
for _, line := range lines {
if strings.Contains(line, "bytes") {
leak := parseLeak(line)
if leak != nil {
leaks = append(leaks, *leak)
}
}
}
return leaks, nil
}
func parseLeak(line string) *CGOLeak {
// Simplified parsing - adjust based on actual cgoleak output format
parts := strings.Fields(line)
if len(parts) < 2 {
return nil
}
sizeStr := parts
size, err := strconv.ParseUint(strings.TrimSuffix(sizeStr, "B"), 10, 64)
if err != nil {
return nil
}
return &CGOLeak{
Size: size,
AllocationAge: time.Minute, // Simplified
StackTrace: []string{},
}
}
Step 4: Automated Alerting System
Now we tie everything together with an alerting system that runs continuously:
package main
import (
"fmt"
"time"
)
type AlertManager struct {
collector *ProfileCollector
cgoDetector *CGOLeakDetector
config AlertConfig
previousMem map[string]uint64
}
type AlertConfig struct {
MemoryGrowthThreshold float64 // percent
CheckInterval time.Duration
AlertThreshold uint64 // bytes
Handlers []AlertHandler
}
type AlertHandler interface {
Handle(alert *Alert) error
}
type SlackAlertHandler struct {
webhookURL string
}
func (h *SlackAlertHandler) Handle(alert *Alert) error {
// Send to Slack
fmt.Printf("[%s] %s\n", alert.Severity, alert.Message)
return nil
}
func NewAlertManager(collector *ProfileCollector, detector *CGOLeakDetector, config AlertConfig) *AlertManager {
return &AlertManager{
collector: collector,
cgoDetector: detector,
config: config,
previousMem: make(map[string]uint64),
}
}
func (am *AlertManager) Start() {
ticker := time.NewTicker(am.config.CheckInterval)
defer ticker.Stop()
for range ticker.C {
am.analyzeMemory()
}
}
func (am *AlertManager) analyzeMemory() {
// Get latest profiles
latestProfile := am.getLatestProfile()
if latestProfile == nil {
return
}
// Check CGO leaks
leaks, err := am.cgoDetector.DetectLeaks(context.Background())
if err == nil {
for _, leak := range leaks {
if leak.Size > am.config.AlertThreshold {
alert := &Alert{
Severity: "CRITICAL",
Message: fmt.Sprintf(
"CGO leak detected: %d bytes allocated for %v",
leak.Size,
leak.AllocationAge,
),
}
am.dispatchAlert(alert)
}
}
}
// Check Go memory trends
for _, handler := range am.config.Handlers {
handler.Handle(&Alert{
Severity: "INFO",
Message: fmt.Sprintf("Memory analysis complete at %v", time.Now()),
})
}
}
func (am *AlertManager) dispatchAlert(alert *Alert) {
for _, handler := range am.config.Handlers {
if err := handler.Handle(alert); err != nil {
fmt.Printf("Failed to send alert: %v\n", err)
}
}
}
func (am *AlertManager) getLatestProfile() *MemoryProfile {
// Implementation would read from disk and return latest
return nil
}
Common Memory Leak Patterns and How to Prevent Them
Let me share some patterns we see constantly in Go applications, particularly in production environments. Pattern 1: Unbounded Goroutine Creation
// ❌ BAD - Creates unlimited goroutines
func handleRequests(requests <-chan Request) {
for req := range requests {
go processRequest(req) // Goroutine per request, never cleaned up
}
}
// ✅ GOOD - Worker pool pattern
func handleRequests(requests <-chan Request, workers int) {
for i := 0; i < workers; i++ {
go func() {
for req := range requests {
processRequest(req)
}
}()
}
}
Pattern 2: Stopped Tickers That Aren’t Actually Stopped
// ❌ BAD - Ticker resources never released
func monitor(duration time.Duration) {
ticker := time.NewTicker(time.Second)
time.Sleep(duration)
// Function exits but ticker keeps running in background
}
// ✅ GOOD - Always defer Stop()
func monitor(duration time.Duration) {
ticker := time.NewTicker(time.Second)
defer ticker.Stop() // Ensures cleanup even if function panics
select {
case <-ticker.C:
doWork()
case <-time.After(duration):
return
}
}
Pattern 3: Slice References Keeping Large Arrays
// ❌ BAD - Huge array kept in memory for small slice
func processLargeFile(data []byte) []byte {
return data[100:200] // Original 100MB array stays in memory
}
// ✅ GOOD - Clone the data you actually need
func processLargeFile(data []byte) []byte {
return bytes.Clone(data[100:200]) // Only 100 bytes retained
}
Putting It All Together: A Complete Example
Here’s a practical example of a monitoring service that ties everything together:
package main
import (
"context"
"fmt"
"time"
)
func main() {
// Configuration
config := AlertConfig{
MemoryGrowthThreshold: 25.0, // Alert if memory grows 25% between checks
CheckInterval: time.Minute * 5,
AlertThreshold: 100 * 1024 * 1024, // 100MB of leaked CGO memory
Handlers: []AlertHandler{
&SlackAlertHandler{webhookURL: "https://hooks.slack.com/..."},
},
}
// Create components
collector := NewProfileCollector(
"http://localhost:6060",
"./profiles",
time.Minute, // Collect profiles every minute
)
cgoDetector := NewCGOLeakDetector(
"./cgoleak",
12345, // PID of target process
)
manager := NewAlertManager(collector, cgoDetector, config)
// Start collection in background
go func() {
if err := collector.Start(); err != nil {
fmt.Printf("Collector error: %v\n", err)
}
}()
// Start analysis
manager.Start()
}
Making It Production-Ready
Handling Failures Gracefully When building this in production, remember that profile collection might fail, network calls might time out, and processes might restart. Wrap everything in proper error handling and logging:
func (pc *ProfileCollector) collectProfile() error {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
req, err := http.NewRequestWithContext(ctx, "GET", pc.targetURL, nil)
if err != nil {
return fmt.Errorf("failed to create request: %w", err)
}
resp, err := pc.client.Do(req)
if err != nil {
return fmt.Errorf("request failed: %w", err)
}
defer resp.Body.Close()
// ... rest of implementation
}
Storage Considerations Profiles can get large quickly. Consider:
- Compressing old profiles with gzip
- Implementing a retention policy (keep last 7 days)
- Uploading to cloud storage for long-term analysis
- Aggregating profiles instead of storing raw data Performance Impact Profile collection itself has overhead. Collecting too frequently will affect application performance. A good starting point is every 5-10 minutes in production, with more frequent collection available as an opt-in diagnostic mode.
Conclusion: From Leak to Peak
Memory leaks in Go applications don’t have to be mysterious nighttime emergencies. By understanding the common patterns, using the right tools, and building automated systems to detect issues, you can transform memory management from a source of stress into a solved problem. The beauty of the approach we’ve outlined is that it’s layered—you can start with simple pprof analysis, add CGO detection when needed, and scale to comprehensive automated monitoring as your requirements grow. No single tool is perfect for every situation, but combining them gives you complete visibility. The next time you get that page at 3 AM about memory usage, instead of panicking, you’ll have profiles, trends, and actionable insights ready to go. And that’s a feeling worth celebrating. Now go forth and leak-proof your applications! 🚀
