The Goal - What "Cheap but Works" Really Means
Most people say "cheap VMs," but the real target is:
- <strong>Low fixed cost</strong> - don't pay for capacity you don't use
- <strong>Predictable performance</strong> - latency doesn't randomly explode at 10x users
- <strong>Simple operations</strong> - you can debug it without a dedicated team
- <strong>Safe scaling</strong> - add resources when needed, not every week
- <strong>Memory discipline</strong> - most "cheap VM" failures are memory problems, not CPU problems
<strong>Assumption:</strong> you're building an app that needs more than one process, but not a full Kubernetes platform yet.
1. Start With the Architecture
A cost-efficient "more users" architecture usually has six pieces:
- <strong>Stateless app servers</strong> (horizontal scale)
- <strong>A database</strong> designed for the query pattern
- <strong>A cache</strong> to reduce database load
- <strong>A queue</strong> for background work
- <strong>An edge / load balancer</strong> to distribute requests
- <strong>Observability</strong> so you can scale intentionally
Everything below is built around this shape.
2. The Recommended Cheap-VM Stack
Edge / Load Balancing
- <strong>Nginx</strong> (or HAProxy) on the edge VM(s)
- Terminate TLS at the edge if needed
<strong>Why:</strong> simple, battle-tested, low overhead, tiny memory footprint (~10-30MB idle).
App Layer
Pick one:
- <strong>Node.js (Fastify/Express)</strong> - JS-heavy teams
- <strong>Java (Spring Boot)</strong> - strong ecosystem, tooling, predictability
- <strong>Python (FastAPI)</strong> - speed + productivity, good for async/ML-adjacent work
Recommendation for most teams: <strong>Spring Boot or FastAPI</strong>.
Async / Background Jobs
- <strong>Redis</strong> (lightweight queues) or <strong>RabbitMQ</strong> (more features, more overhead)
- If you already use Redis for cache, queueing often comes "for free"
Cache
- <strong>Redis</strong>
Cache what you're sure about:
- Sessions (if applicable)
- Hot reads
- Computed/expensive results
Avoid caching everything - cache invalidation bugs are expensive to debug and can silently serve stale data.
Database
- <strong>PostgreSQL</strong> for most products
- <strong>MySQL</strong> if your team is already deep in it
Upgrade path when needed: connection pooling → read replicas → partitioning (later, not first).
Object Storage for Files
- Use external storage (S3-compatible)
- Avoid storing large files on VM disks - it complicates backups, scaling, and disk cost planning
Observability
- <strong>OpenTelemetry</strong> for instrumentation
- <strong>Prometheus + Grafana</strong> for metrics (or managed equivalents)
- <strong>Centralized logs</strong> (Loki / ELK / managed)
<strong>Why:</strong> cheap systems fail in the dark. Observability makes scaling decisions cheap instead of guesswork. ---
3. Real Cost Tiers (with Actual Specs)
Prices are rough monthly estimates and vary by provider. Budget providers (Hetzner, OVH, DigitalOcean droplets) run 40-60% cheaper than AWS/Azure/GCP for equivalent raw compute - the tradeoff is you lose some managed-service convenience.
Tier 1 - Up to ~1,000 concurrent users
| Component | Spec | Notes | Monthly Cost |
|---|---|---|---|
| Edge (Nginx) | 1 vCPU / 1GB | Can share box with app | $4-6 |
| App server | 2 vCPU / 4GB | Combine edge+app+worker here | $10-13 |
| Redis | 1 vCPU / 2GB | Shared box is fine | $4-6 |
| PostgreSQL | 2 vCPU / 4GB | Managed if budget allows | $15-25 |
| Worker | Shares app VM | No extra VM needed yet | $0 |
| <strong>Total</strong> | <strong>~$35-50/mo</strong> |
At this tier, don't split into 5 VMs. Combine edge + app + worker on one box, DB on another. Premature splitting adds cost without performance benefit.
Tier 2 - ~5,000-20,000 concurrent users
| Component | Spec | Notes | Monthly Cost |
|---|---|---|---|
| Edge / LB | 2 vCPU / 2GB (x2 for HA) | $12-20 | |
| App servers | 2-3x, 4 vCPU / 8GB each | Horizontal scale | $60-90 |
| Redis | 2 vCPU / 8GB dedicated | Split from app | $30-40 |
| PostgreSQL | 4 vCPU / 16GB + 1 replica | Add read replica | $100-160 |
| Worker | 2 vCPU / 4GB, 1-2 instances | Own VM now | $20-30 |
| Object storage | Pay per GB | S3-compatible | $5-20 |
| <strong>Total</strong> | <strong>~$230-360/mo</strong> |
Tier 3 - ~100,000 concurrent users
At this scale, managed DB/Redis usually costs less than the engineering time to self-host reliably.
| Component | Spec | Notes | Monthly Cost |
|---|---|---|---|
| Edge / LB | Managed LB + 3-4 nodes | $80-150 | |
| App servers | 6-10x, 4-8 vCPU / 16GB | $600-1,200 | |
| Redis | Managed cluster, 16-32GB | $150-400 | |
| PostgreSQL | Managed, 8-16 vCPU / 64GB + 2 replicas | $500-1,200 | |
| Workers | 4-6x, 4 vCPU / 8GB | $150-250 | |
| Storage / CDN | Usage-based | $50-200 | |
| <strong>Total</strong> | <strong>~$1,500-3,500/mo</strong> |
4. Memory Management Per Component
This is where most "cheap VM" setups actually break - memory, not CPU.
App Layer Memory
<strong>Node.js</strong>
- Default heap sits around 1.5-2GB before GC pressure builds up
- Rule of thumb: <code>available_RAM / (heap_size + ~300MB overhead)</code> = max instances per VM
- On a 4GB box, run 2 processes max (PM2 cluster mode), not 4
<strong>Java / Spring Boot</strong>
- Always set <code>-Xmx</code> explicitly - never let the JVM auto-detect memory on a shared VM
- Target 60-70% of total VM memory for heap
- Example: 4GB VM → <code>-Xmx2560m -Xms1024m</code>
- Leave headroom for JIT compilation, thread stacks, and native (off-heap) memory
<strong>Python / FastAPI</strong>
- Memory is per-worker - Gunicorn/Uvicorn workers are separate OS processes
- 4 workers × ~150-300MB baseline adds up fast; budget accordingly
- Watch for slow memory growth from unclosed DB connections or accumulating async tasks
Redis Memory
- Always set <code>maxmemory</code> explicitly (e.g. <code>maxmemory 1.5gb</code> on a 2GB box) - never let Redis assume it owns the whole VM
- Set an eviction policy: <code>allkeys-lru</code> for pure cache, <code>volatile-lru</code> if mixing cache + persistent queue data
- Rule of thumb: actual memory usage ≈ 1.2-1.5x raw data size (data structure overhead)
- Watch the <code>evicted_keys</code> metric - a fast climb means you're either underprovisioned or caching things you shouldn't
PostgreSQL Memory
- <code>shared_buffers</code>: ~25% of VM RAM (the OS page cache handles the rest - don't go higher)
- <code>work_mem</code>: this is *per sort/hash operation, per connection* - too high with many concurrent connections causes OOM. Keep <code>work_mem × max_connections</code> well under available RAM
- <code>effective_cache_size</code>: 50-75% of RAM (planner hint only, not a real allocation)
- <strong>Connection pooling matters more than VM size.</strong> 200 raw Postgres connections can burn 200 × ~10MB = 2GB in overhead before a single query runs. PgBouncer in transaction mode collapses this to a handful of real backend connections regardless of app-side concurrency
Worker / Queue Memory
- Background workers (report generation, thumbnailing, ML inference) are the most common OOM-killer victims - bursty, large payloads
- Set explicit concurrency limits tied to memory, not CPU count (e.g. Celery <code>--concurrency=4</code>, BullMQ <code>concurrency</code> option) - for these workloads, memory usually runs out before CPU
- Give workers their own VM/container with a hard memory limit and a restart policy, so a bad job kills itself instead of the whole box
5. Rough Capacity-Planning Formula
Use this for rough sizing *before* load testing - not as a substitute for it:
App-tier RAM ≈ concurrent_users × avg_memory_per_request_context
DB RAM ≈ (working_set_size × 1.3) for shared_buffers + OS cache
Redis RAM ≈ (hot_data_size × 1.3) + maxmemory headroom
Worker RAM ≈ concurrency_limit × avg_job_memory_footprintAdd 20-30% headroom on top of the total for OS overhead, GC spikes, and traffic bursts.
6. Scaling Strategy: Cheap VMs Scale If You Keep Them Stateless
Scale horizontally at the app layer
Run multiple app instances behind the edge/load balancer.
- Keep the app stateless
- Store session/state in Redis or the database, not local memory
Use the queue for slow tasks
Move these off the request path into a background worker:
- Email sending
- Report generation
- Thumbnail/image processing
- Heavy ML inference
Result: app servers stay responsive under load.
Add cache to protect the database
A very common cheap-VM bottleneck is "database CPU hits 100%." Redis cache often fixes this before you need to buy bigger machines.
Use connection pooling
Databases are harmed by too many raw connections.
- Use PgBouncer for PostgreSQL
- Keep DB connection counts stable regardless of app-tier scale
Set memory limits everywhere
- Container/VM-level memory limits with restart policies
- Explicit heap/maxmemory settings per service (see Section 4)
- Alerts before OOM, not after
7. "Best Stack" Examples (Pick One)
Option A (Java/Spring) - predictable enterprise stack
- Nginx
- Spring Boot
- Redis
- PostgreSQL
- RabbitMQ (optional, or Redis queues)
- Prometheus/Grafana + centralized logs
Option B (Node/Fastify) - high throughput, minimal boilerplate
- Nginx
- Fastify/Express
- Redis
- PostgreSQL
- BullMQ (Redis-based jobs)
- OpenTelemetry + Grafana
Option C (Python/FastAPI) - productivity + async workloads
- Nginx
- FastAPI (async)
- Redis
- PostgreSQL
- Celery/RQ (queue)
- Metrics + logs
8. What to Avoid on Cheap VMs (Hidden Costs)
- <strong>No caching</strong> → the database becomes your scaling limit
- <strong>No queue</strong> → timeouts and retries destroy performance under load
- <strong>No rate limiting</strong> → one user or bot melts everything
- <strong>Running everything on one VM</strong> → upgrades and failures become terrifying
- <strong>Ignoring observability</strong> → you'll end up scaling randomly, and expensively
- <strong>Letting runtimes auto-detect memory</strong> → JVM/Node defaults assume they own the whole machine; on shared VMs this causes silent OOM kills
- <strong>Uncapped connection pools</strong> → connection overhead alone can eat gigabytes before any real work happens
- <strong>No memory limits on workers</strong> → one bad job (huge payload, memory leak) takes down the whole box instead of just itself
- <strong>Skipping load testing before scaling</strong> → you end up paying for capacity guessed from intuition, not evidence
9. Minimal Deployment Layout for Cheap VMs
A practical starting layout:
- <strong>VM1:</strong> Edge (Nginx + TLS)
- <strong>VM2:</strong> App server (2+ app instances)
- <strong>VM3:</strong> Worker (queue consumer)
- <strong>VM4:</strong> Redis
- <strong>VM5:</strong> PostgreSQL
When traffic grows:
- Scale VM2 app instances horizontally
- Split Redis/DB onto larger, dedicated sizing
- Add read replicas later, not preemptively
10. Load Testing - Proof Before Scaling
Before changing infrastructure, test with:
- Realistic request shapes
- Realistic payload sizes
- Concurrency close to expected real usage
Watch these metrics closely:
- p95 / p99 latency
- Error rate
- DB CPU + query time
- Redis hit rate
- Memory usage under sustained load (not just at idle)
11. How This Connects to Kubernetes / AKS
If you later move to AKS, the same architecture concepts map cleanly:
| Cheap-VM concept | AKS/Kubernetes equivalent |
|---|---|
| App instances | Deployments |
| Edge routing | Ingress |
| Queues | Worker Deployments |
| Redis/DB | Managed services or StatefulSets |
| VM memory limits | Pod resource requests/limits |