Database Optimization Techniques for Better Performance

August 15, 2024 (1y ago)

Database Optimization Techniques for Better Performance

Database performance can make or break your application. Here are essential optimization techniques every developer should know.

Indexing

PostgreSQL Indexes

-- Single column index
CREATE INDEX idx_users_email ON users(email);
 
-- Composite index (order matters!)
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
 
-- Partial index (index subset of data)
CREATE INDEX idx_active_users ON users(email) WHERE active = true;
 
-- Unique index
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);
 
-- Expression index
CREATE INDEX idx_users_lower_email ON users(LOWER(email));

MongoDB Indexes

// Single field index
db.users.createIndex({ email: 1 });
 
// Compound index
db.orders.createIndex({ userId: 1, createdAt: -1 });
 
// Text index for search
db.posts.createIndex({ title: "text", content: "text" });
 
// TTL index (auto-delete documents)
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 3600 });

When to Index

✅ Index columns used in WHERE clauses
✅ Index columns used in JOIN conditions
✅ Index columns used in ORDER BY
✅ Index foreign keys

❌ Don't index columns with low selectivity
❌ Don't over-index (slows down writes)
❌ Don't index small tables

Query Optimization

Use EXPLAIN

-- PostgreSQL
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';
 
-- Look for:
-- - Seq Scan (bad for large tables)
-- - Index Scan (good)
-- - cost and actual time

Avoid SELECT *

-- Bad
SELECT * FROM users WHERE id = 1;
 
-- Good
SELECT id, name, email FROM users WHERE id = 1;

Pagination

-- Bad (offset is slow for large values)
SELECT * FROM posts ORDER BY created_at DESC OFFSET 10000 LIMIT 20;
 
-- Good (cursor-based)
SELECT * FROM posts 
WHERE created_at < '2024-08-15' 
ORDER BY created_at DESC 
LIMIT 20;

Batch Operations

-- Bad
INSERT INTO logs (message) VALUES ('log 1');
INSERT INTO logs (message) VALUES ('log 2');
INSERT INTO logs (message) VALUES ('log 3');
 
-- Good
INSERT INTO logs (message) VALUES 
  ('log 1'),
  ('log 2'),
  ('log 3');

N+1 Query Problem

The Problem

// Bad - N+1 queries
const posts = await db.query('SELECT * FROM posts');
 
for (const post of posts) {
  // This runs N times!
  const author = await db.query('SELECT * FROM users WHERE id = ?', [post.authorId]);
}

The Solution

// Good - Single query with JOIN
const posts = await db.query(`
  SELECT posts.*, users.name as author_name
  FROM posts
  JOIN users ON posts.author_id = users.id
`);
 
// Or use IN clause
const posts = await db.query('SELECT * FROM posts');
const authorIds = posts.map(p => p.authorId);
const authors = await db.query('SELECT * FROM users WHERE id IN (?)', [authorIds]);

With ORM (Prisma)

// Bad
const posts = await prisma.post.findMany();
// author accessed in loop triggers N queries
 
// Good - Include relation
const posts = await prisma.post.findMany({
  include: { author: true }
});

Connection Pooling

Node.js with pg

import { Pool } from 'pg';
 
const pool = new Pool({
  host: 'localhost',
  database: 'mydb',
  max: 20,              // Max connections
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});
 
// Use pool for queries
const result = await pool.query('SELECT * FROM users');

Prisma Connection Pool

DATABASE_URL="postgresql://user:pass@localhost:5432/mydb?connection_limit=20&pool_timeout=10"

Caching Strategies

Redis Cache Layer

import Redis from 'ioredis';
 
const redis = new Redis();
 
async function getUser(id: string) {
  // Check cache first
  const cached = await redis.get(`user:${id}`);
  if (cached) {
    return JSON.parse(cached);
  }
 
  // Query database
  const user = await db.query('SELECT * FROM users WHERE id = ?', [id]);
 
  // Cache for 1 hour
  await redis.setex(`user:${id}`, 3600, JSON.stringify(user));
 
  return user;
}
 
// Invalidate cache on update
async function updateUser(id: string, data: any) {
  await db.query('UPDATE users SET ... WHERE id = ?', [id]);
  await redis.del(`user:${id}`);
}

Database Design

Normalization

-- Bad (denormalized)
CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  user_name VARCHAR(255),
  user_email VARCHAR(255),
  product_name VARCHAR(255),
  product_price DECIMAL
);
 
-- Good (normalized)
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255),
  email VARCHAR(255)
);
 
CREATE TABLE products (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255),
  price DECIMAL
);
 
CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  user_id INTEGER REFERENCES users(id),
  product_id INTEGER REFERENCES products(id)
);

Denormalization for Read Performance

-- Sometimes denormalization is good for reads
CREATE TABLE order_summaries (
  id SERIAL PRIMARY KEY,
  order_id INTEGER,
  user_name VARCHAR(255),  -- Denormalized
  total_amount DECIMAL,
  item_count INTEGER
);

Monitoring Queries

PostgreSQL Slow Query Log

-- In postgresql.conf
log_min_duration_statement = 1000  -- Log queries > 1 second
 
-- Find slow queries
SELECT query, calls, mean_time, total_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 10;

Best Practices Summary

  1. Always use indexes on frequently queried columns
  2. **Avoid SELECT *** - fetch only needed columns
  3. Use EXPLAIN to analyze query performance
  4. Implement connection pooling for efficiency
  5. Cache frequently accessed data with Redis
  6. Watch for N+1 queries - use JOINs or batching
  7. Use cursor-based pagination for large datasets
  8. Monitor slow queries and optimize regularly
  9. Normalize for writes, denormalize for reads when needed
  10. Keep your database statistics updated

Optimized databases lead to faster applications and happier users!