Disaster Recovery for Solo Developers: What to Do When Everything Breaks

When you’re a solo developer, your infrastructure isn’t just your business—it is your business. There’s no ops team to call at 3 AM, no heroic incident response meeting where someone else takes charge. It’s just you, your cold coffee, and the sinking realization that your production database just became nothing more than a digital ghost story. I’ve been there. And I’m guessing you have too, or you’re smart enough to know it’s coming. The good news? Disaster recovery for solo developers doesn’t require enterprise-grade complexity or a six-figure budget. It requires pragmatism, a little automation, and most importantly, a plan that actually works when everything has already gone sideways.

Why Solo Developers Actually Have It Better (Sort Of)

Before we dive into the doom and gloom, let’s acknowledge something: as a solo developer, you have a massive advantage that enterprise teams don’t. You know your entire stack. Every service, every dependency, every quirky configuration hack you implemented at 2 AM because it was the only thing that worked—it’s all in your head (and hopefully in your code). You can move fast. You can test immediately. You can make decisions without waiting for approval committees. But you’re also completely alone when things break. There’s no escalation path—you are the escalation path. The goal here isn’t to eliminate all risk (spoiler: that’s impossible). It’s to minimize downtime and ensure data safety using strategies that actually work for one person managing multiple services.

The Foundation: Know Your RTO and RPO

Let’s start with two acronyms that actually matter. Recovery Time Objective (RTO) is how long you can afford to be down before your business suffers. For a side project? Maybe it’s 24 hours. For your primary income? Probably measured in hours or minutes. Recovery Point Objective (RPO) is how much data loss you can tolerate. Losing 1 hour of transactions? Acceptable. Losing a day’s worth? Probably not. Here’s the practical exercise every solo developer should do:

For each service you run, answer these questions:
1. If this service goes down today, how many hours until I lose revenue/trust?
   → This is your RTO
2. If I lose all data from the last X hours, can the business survive?
   → This is your RPO
3. What's the blast radius?
   - Just this service?
   - Multiple services?
   - User data compromised?
Example:
- Blog: RTO = 7 days, RPO = 1 week (people can wait for content)
- SaaS API: RTO = 4 hours, RPO = 1 hour (users are paying)
- User database: RTO = 2 hours, RPO = 15 minutes (data loss is unacceptable)

These numbers directly dictate your DR strategy. High RTO/RPO? You can use cheaper, slower solutions. Low numbers? You need redundancy and automation.

The Three-Layer DR Stack

Forget everything you know about enterprise disaster recovery. Here’s what actually matters for solo developers:

Layer 1: Automated Backups (Your Safety Net)

You cannot rely on manual backups. You will forget. We all forget. I once thought I was backing up my database daily—turns out the script had been silently failing for 47 days. Automation is your friend. Here’s a practical setup for a typical solo dev stack:

#!/bin/bash
# backup-everything.sh - runs daily via cron
set -e  # Exit on any error
BACKUP_DIR="/backups/$(date +%Y-%m-%d)"
RETENTION_DAYS=30
REMOTE_BACKUP_BUCKET="s3://my-backups"
mkdir -p "$BACKUP_DIR"
# Database backup with compression
echo "Backing up PostgreSQL..."
pg_dump --verbose --no-owner \
  -h localhost \
  -U postgres \
  my_production_db | gzip > "$BACKUP_DIR/db.sql.gz"
# Application files
echo "Backing up application code..."
tar --exclude='.git' \
  --exclude='node_modules' \
  --exclude='.env' \
  -czf "$BACKUP_DIR/app.tar.gz" /app
# Configuration and secrets (encrypted!)
echo "Backing up encrypted config..."
tar -czf - /etc/myapp | \
  openssl enc -aes-256-cbc -salt -out "$BACKUP_DIR/config.tar.gz.enc" \
  -k "$BACKUP_ENCRYPTION_KEY"
# Upload to remote storage
echo "Uploading to S3..."
aws s3 sync "$BACKUP_DIR" "$REMOTE_BACKUP_BUCKET/$(date +%Y-%m-%d)/"
# Cleanup old backups (keep 30 days)
echo "Cleaning up old backups..."
find "$BACKUP_DIR" -mtime +$RETENTION_DAYS -delete
echo "Backup complete at $(date)"

Set this in your cron:

# Daily backup at 2 AM
0 2 * * * /usr/local/bin/backup-everything.sh >> /var/log/backups.log 2>&1

The key principles:

Automated (not manual)
Verified (actually test that restore works)
Redundant storage (local + cloud)
Retention policy (don’t keep infinite backups)
Encrypted (especially for secrets and configs)

Layer 2: Infrastructure as Code (The Memory Aid)

Here’s the dirty truth: when you’re under pressure during an outage, you will forget how you configured things. You’ll forget the exact arguments for that server restart. You’ll forget why you set that weird firewall rule. Infrastructure as Code (IaC) solves this. For solo developers, Terraform is the sweet spot. It’s not too heavy, widely supported, and saves you when you need to rebuild fast:

# main.tf - Your entire infrastructure documented
terraform {
  backend "s3" {
    bucket = "my-tf-state"
    key    = "prod/terraform.tfstate"
    region = "us-east-1"
  }
}
provider "aws" {
  region = "us-east-1"
}
# Database
resource "aws_db_instance" "main" {
  identifier     = "production-db"
  engine         = "postgres"
  engine_version = "14.10"
  instance_class = "db.t3.micro"
  allocated_storage    = 100
  storage_encrypted    = true
  multi_az             = true  # High availability!
  backup_retention_period = 30
  db_name  = "myapp"
  username = "dbadmin"
  password = random_password.db_password.result
  skip_final_snapshot       = false
  final_snapshot_identifier = "prod-db-final-snapshot-${timestamp()}"
  tags = {
    Name        = "production-database"
    Environment = "prod"
    ManagedBy   = "terraform"
  }
}
# Application server
resource "aws_instance" "app" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.small"
  key_name      = aws_key_pair.deployer.key_name
  vpc_security_group_ids = [aws_security_group.app.id]
  # This runs when the instance starts
  user_data = base64encode(file("${path.module}/init-app.sh"))
  ebs_block_device {
    device_name           = "/dev/sda1"
    volume_size           = 50
    volume_type           = "gp3"
    delete_on_termination = true
  }
  tags = {
    Name = "production-app"
  }
}
# Load balancer
resource "aws_lb" "main" {
  name               = "prod-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.lb.id]
  subnets            = aws_subnet.public[*].id
}
output "app_url" {
  value = aws_lb.main.dns_name
}

The magic here? When disaster strikes, you don’t rebuild from memory. You run:

terraform plan  # See exactly what will be created
terraform apply # Rebuild your entire infrastructure

Store your Terraform code in git with version control. Never manually change infrastructure—always update the code first, then apply it.

Layer 3: Runbooks (Your Survival Guide)

A runbook is a documented procedure for recovering from specific failures. Not a novel. Not something theoretical. Step-by-step instructions that work when you’re panicking. Create a RUNBOOKS.md file in your repository:

# Disaster Recovery Runbooks
## Database Corruption - Complete Recovery
**Time to recover:** ~45 minutes
**Data loss:** Up to 1 hour
### Prerequisites
- AWS CLI configured with credentials
- PostgreSQL client installed locally
- SSH access to production server
### Steps
1. **Verify the problem**
   ```bash
   ssh [email protected]
   sudo -u postgres psql -d myapp -c "SELECT COUNT(*) FROM users;"
   # If this fails or returns wrong data, proceed

Stop the application

sudo systemctl stop myapp
# Wait 10 seconds for connections to close

Find the most recent good backup

aws s3 ls s3://my-backups/ --recursive | sort
# Choose the most recent one before the corruption occurred

Download the backup

aws s3 cp s3://my-backups/2026-01-23/db.sql.gz ./
gunzip db.sql.gz

Verify backup integrity before restoring

# Test restore to a temporary database first
createdb test_restore
psql test_restore < db.sql
psql test_restore -c "SELECT COUNT(*) FROM users;"
# If counts look right, proceed. If not, try earlier backup.
dropdb test_restore

Restore to production

sudo -u postgres psql -c "DROP DATABASE myapp;"
sudo -u postgres psql -c "CREATE DATABASE myapp;"
sudo -u postgres psql myapp < db.sql

Restart application and verify

sudo systemctl start myapp
sleep 5
curl http://localhost:3000/health
# Should return 200 OK

Monitor logs for errors

sudo journalctl -u myapp -f --since "5 minutes ago"
# Watch for 10 minutes for any issues

Rollback

If anything goes wrong during recovery:

Stop the application: sudo systemctl stop myapp
Restore from even earlier backup
Call your backup provider support (document their number)

Complete Regional Failure - Migrate to Backup Region

Time to recover: ~2 hours Data loss: Up to 1 hour

Prerequisites

Terraform and AWS CLI configured
Backup region infrastructure code ready
DNS access to your domain registrar

Steps

Assess the situation
- Check AWS status page: https://status.aws.amazon.com/
- Verify your primary region is actually down (ping your server)
- Decision: Is this worth failing over? (High costs!)

Deploy to backup region

# In your Terraform code, change the region
sed -i 's/us-east-1/us-west-2/g' main.tf
terraform init -backend-config="key=prod-backup/terraform.tfstate"
terraform plan
terraform apply
# This will create new infrastructure in us-west-2

Restore latest backup to new region

# Download latest backup
aws s3 cp s3://my-backups/latest/db.sql.gz ./
# Restore to new database in new region
PGHOST=$(terraform output backup_db_host)
gunzip db.sql.gz
psql -h $PGHOST -U dbadmin -d myapp < db.sql

Update DNS to point to backup region

# Log into your domain registrar and update A records:
# mysite.com -> new ALB in us-west-2
# Verify DNS propagation (may take up to 5 minutes)
dig mysite.com
# Should show new IP address

Verify application is working

curl https://mysite.com/health -v
# Should return 200 OK

Post-incident
- Once primary region is back online, fail back at your convenience
- Document what happened and why
- Update this runbook if anything was unclear

Cost Warning

Backup region infrastructure costs about $X/month. Failover decision should factor in cost vs. revenue loss.

Keep this in your repository. Print a copy. When disaster hits, you follow these steps, not your panicked brain.
## The Testing Problem (It's Real)
Here's the uncomfortable truth: a disaster recovery plan you've never tested is just fiction.
You need to actually test your recovery, but as a solo developer, you can't afford a disaster for real. So you create controlled chaos:
```bash
#!/bin/bash
# test-dr-plan.sh - Monthly DR test
echo "=== DR Test: Database Recovery ==="
echo "Testing at $(date)"
# Create a test database
createdb test_dr_$(date +%s)
TEST_DB="test_dr_$(date +%s)"
# Restore latest backup into test database
LATEST_BACKUP=$(aws s3 ls s3://my-backups/ --recursive | sort | tail -1 | awk '{print $NF}')
aws s3 cp "$LATEST_BACKUP" ./test-restore.sql.gz
gunzip test-restore.sql.gz
psql "$TEST_DB" < test-restore.sql
# Run basic health checks
RESULT=$(psql "$TEST_DB" -c "SELECT COUNT(*) FROM users;" 2>&1)
if [[ $RESULT =~ ^[0-9]+$ ]]; then
  echo "✓ Database restore successful: $RESULT users found"
else
  echo "✗ Database restore FAILED"
  echo "Result: $RESULT"
  exit 1
fi
# Cleanup
dropdb "$TEST_DB"
rm -f test-restore.sql.gz
echo "=== DR Test Complete ==="
echo "Test passed on $(date)" >> /var/log/dr-tests.log

Run this monthly. Log the results. When a real disaster happens, you’ll know the process works. Here’s what your testing schedule should look like:

Test Type	Frequency	Effort	What It Tests
Backup verification	Daily	5 min (automated)	Backups aren’t silently failing
Partial recovery test	Monthly	30 min	Can you restore a database?
Full infrastructure rebuild	Quarterly	2 hours	Can you rebuild everything?
Disaster simulation (chaos engineering)	Quarterly	3+ hours	What actually breaks?
Communication drill	Quarterly	1 hour	Can you notify customers?

A Simple Recovery Flow

Here’s how the pieces fit together:

graph TD A["🚨 Disaster Detected"] --> B["Check Runbook"] B --> C{"Is it a
known issue?"} C -->|Yes| D["Follow Documented Steps"] C -->|No| E["Create New Runbook
While Recovering"] D --> F["Restore from Latest Backup"] E --> F F --> G["Verify Data Integrity"] G --> H{"Restore
successful?"} H -->|Yes| I["Restart Services"] H -->|No| J["Try Earlier Backup"] J --> G I --> K["Run Health Checks"] K --> L["Notify Customers"] L --> M["Document Incident"] M --> N["Update Runbooks
& Tests"] N --> O["Sleep 😴"]

The Tools You Actually Need

Don’t overthink this. You don’t need enterprise-grade software. Here’s the minimal stack:

Backup storage: AWS S3 (~$1-3/month) or DigitalOcean Spaces (~$5/month)
Infrastructure code: Terraform (free)
Version control: GitHub (free)
Monitoring: Uptime robot (free tier + $9/month for checks)
Incident tracking: GitHub Issues or Notion (free) Total cost: Under $20/month. Probably less than your coffee budget.

# Essential tooling checklist
- pg_dump or mysqldump ✓ (database backups)
- tar and gzip ✓ (file compression)
- AWS CLI or equivalent ✓ (remote storage)
- Terraform ✓ (infrastructure)
- curl or wget ✓ (health checks)
- Your brain ✓ (decision-making)

The Before/After Checklist

Before your next disaster:

Document RTO and RPO for each service
Set up automated daily backups to remote storage
Test backup restoration (actually do it)
Write runbooks for 3-5 most likely failure scenarios
Version control your infrastructure code
Set up basic uptime monitoring with alerts to your phone
Identify your backup tools’ support phone number
Create an encrypted file with all your credentials (and back it up)
Run a full disaster test (pretend your server is gone)
Document how to notify customers of outage After a disaster:
Wait until everything is stable (don’t rush)
Document exactly what happened (timeline)
Identify root cause (not “server died” but why)
Update runbooks based on what you actually did
Fix the underlying issue so it doesn’t happen again
Calculate recovery cost and revenue impact
Thank yourself for having a DR plan

The Real Benefit

The real value of disaster recovery isn’t that it prevents disasters. Disasters happen. Servers fail. Humans make mistakes. Bugs corrupt databases. That’s technology. The real value is that when disaster strikes—and it will—you’re not starting from zero at 3 AM, panicking about what’s lost forever. You have a plan. You have backups. You have documented steps. You can breathe. That’s the difference between “my side project is gone” and “I lost 1 hour of data and recovered in 45 minutes.” As a solo developer, your disaster recovery strategy is one of your most valuable assets. Not because you’ll never need it, but because you will need it, and you’ll be grateful it exists. Start small. Backup daily. Test monthly. Document everything. Sleep well knowing that when everything breaks, you’ve already prepared for it. The only disaster worse than losing your service is losing it and not being prepared. Don’t be that developer.

Subscribe to Our Telegram Channel

Подпишитесь на наш телеграм

Thank you for subscribing!

Спасибо за подписку!

Why Solo Developers Actually Have It Better (Sort Of)#

The Foundation: Know Your RTO and RPO#

The Three-Layer DR Stack#

Layer 1: Automated Backups (Your Safety Net)#

Layer 2: Infrastructure as Code (The Memory Aid)#

Layer 3: Runbooks (Your Survival Guide)#

Rollback#

Complete Regional Failure - Migrate to Backup Region#

Prerequisites#

Steps#

Cost Warning#

A Simple Recovery Flow#

The Tools You Actually Need#

The Before/After Checklist#

The Real Benefit#