Infrastructure as Code: Terraform Best Practices for 2024

Terraform is powerful, but it’s easy to create a mess. We’ve seen 10,000-line main.tf files, state files checked into git, and teams afraid to run terraform apply.

It doesn’t have to be this way.

Structure: Start Right

Module Organization

Break your infrastructure into logical modules:

terraform/
├── modules/
│   ├── networking/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── database/
│   ├── compute/
│   └── monitoring/
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   └── terraform.tfvars
│   ├── staging/
│   └── prod/
└── global/
    └── iam/

Each module should have a single responsibility. Networking configures VPCs and subnets. Compute manages ECS or EC2. Don’t mix them.

State Management

Never commit state files. Use remote backends:

terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "prod/networking/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

The DynamoDB table enables state locking, preventing two people from running applies simultaneously and corrupting state.

Variables and Locals

Use Variables for Inputs

Anything that changes between environments should be a variable:

variable "environment" {
  description = "Environment name (dev, staging, prod)"
  type        = string
  
  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be dev, staging, or prod."
  }
}

variable "instance_count" {
  description = "Number of EC2 instances"
  type        = number
  default     = 2
  
  validation {
    condition     = var.instance_count >= 1 && var.instance_count <= 10
    error_message = "Instance count must be between 1 and 10."
  }
}

Notice the validation blocks—they catch configuration errors before you apply.

Use Locals for Computed Values

Locals are for DRY (Don’t Repeat Yourself) values:

locals {
  common_tags = {
    Environment = var.environment
    ManagedBy   = "Terraform"
    Project     = var.project_name
  }
  
  name_prefix = "${var.project_name}-${var.environment}"
  
  # Conditional logic
  db_instance_class = var.environment == "prod" ? "db.r5.xlarge" : "db.t3.medium"
}

resource "aws_instance" "app" {
  ami           = var.ami_id
  instance_type = var.instance_type
  
  tags = merge(
    local.common_tags,
    {
      Name = "${local.name_prefix}-app-server"
    }
  )
}

Data Sources: Use Them Wisely

Data sources query existing infrastructure. Use them to reference resources you don’t manage with Terraform:

# Look up existing VPC
data "aws_vpc" "main" {
  tags = {
    Name = "main-vpc"
  }
}

# Find the latest AMI
data "aws_ami" "amazon_linux_2" {
  most_recent = true
  owners      = ["amazon"]
  
  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }
}

resource "aws_instance" "app" {
  ami           = data.aws_ami.amazon_linux_2.id
  subnet_id     = data.aws_vpc.main.subnet_ids[0]
  instance_type = "t3.medium"
}

Warning: Data sources run on every plan/apply. If the data source queries something that changes frequently, your plans become unpredictable.

Modules: Think Like a Library

Good modules are reusable, well-documented, and have clear interfaces.

Example: A VPC Module

# modules/networking/main.tf
variable "vpc_cidr" {
  description = "CIDR block for VPC"
  type        = string
}

variable "availability_zones" {
  description = "List of AZs to use"
  type        = list(string)
}

variable "environment" {
  description = "Environment name"
  type        = string
}

resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true
  
  tags = {
    Name        = "${var.environment}-vpc"
    Environment = var.environment
  }
}

resource "aws_subnet" "private" {
  count             = length(var.availability_zones)
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 4, count.index)
  availability_zone = var.availability_zones[count.index]
  
  tags = {
    Name = "${var.environment}-private-${var.availability_zones[count.index]}"
    Type = "private"
  }
}

output "vpc_id" {
  description = "ID of the VPC"
  value       = aws_vpc.main.id
}

output "private_subnet_ids" {
  description = "IDs of private subnets"
  value       = aws_subnet.private[*].id
}

Using the Module

# environments/prod/main.tf
module "networking" {
  source = "../../modules/networking"
  
  vpc_cidr           = "10.0.0.0/16"
  availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
  environment        = "prod"
}

module "database" {
  source = "../../modules/database"
  
  vpc_id     = module.networking.vpc_id
  subnet_ids = module.networking.private_subnet_ids
}

Secrets: Never in Code

Bad:

resource "aws_db_instance" "main" {
  username = "admin"
  password = "super-secret-password"  # ❌ NO!
}

Good:

data "aws_secretsmanager_secret_version" "db_password" {
  secret_id = "prod/db/master-password"
}

resource "aws_db_instance" "main" {
  username = "admin"
  password = jsondecode(data.aws_secretsmanager_secret_version.db_password.secret_string)["password"]
}

Or use variables and pass secrets via environment variables:

export TF_VAR_db_password=$(aws secretsmanager get-secret-value --secret-id prod/db/password --query SecretString --output text)
terraform apply

Count vs For_Each

Use count for simple resource duplication:

resource "aws_instance" "web" {
  count = var.instance_count
  
  ami           = var.ami_id
  instance_type = "t3.medium"
  
  tags = {
    Name = "web-${count.index + 1}"
  }
}

Use for_each when resources have distinct identities:

variable "users" {
  type = map(object({
    role = string
  }))
  
  default = {
    "alice" = { role = "admin" }
    "bob"   = { role = "developer" }
  }
}

resource "aws_iam_user" "users" {
  for_each = var.users
  
  name = each.key
  tags = {
    Role = each.value.role
  }
}

The key difference: for_each uses stable identifiers (keys), so removing “alice” doesn’t affect “bob”. With count, removing index 0 shifts everything.

Drift Detection

Infrastructure changes outside Terraform happen. Catch it:

# Regular drift detection in CI
terraform plan -detailed-exitcode
# Exit code 0 = no changes
# Exit code 1 = error
# Exit code 2 = successful plan with changes

Set up a daily job that runs terraform plan and alerts if drift is detected.

Testing

Use terraform validate and terraform fmt in CI:

# .github/workflows/terraform.yml
name: Terraform
on: [push, pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      
      - name: Terraform Format Check
        run: terraform fmt -check -recursive
      
      - name: Terraform Init
        run: terraform init -backend=false
      
      - name: Terraform Validate
        run: terraform validate

For integration testing, consider Terratest.

Common Anti-Patterns

❌ Massive monolithic main.tf files
✅ Break into modules

❌ Copy-paste between environments
✅ Use shared modules with environment-specific variables

❌ Hardcoded values everywhere
✅ Use variables and data sources

❌ No state locking
✅ Use DynamoDB for state locks

❌ Manual terraform apply in production
✅ Use CI/CD with approvals

The Payoff

Teams following these practices typically see:

70% reduction in “state conflict” incidents
Faster onboarding for new team members
Environment parity (dev actually matches prod)
Confident, routine infrastructure changes

Terraform done right is a force multiplier. Done wrong, it’s a liability.

Start with good structure, and the rest follows.