Skip to content

Terraform & Terragrunt

Before You Read

This page describes how infrastructure changes are made. For step-by-step change procedures see Change Management. For module documentation see Terraform Modules Reference.

Why Terragrunt?

Terraform alone handles resource provisioning but requires duplication when configuring multiple environments. Terragrunt adds:

  1. DRY configuration — write module calls once, override only what differs per environment
  2. Remote state management — each project/environment has its own GCS bucket and prefix
  3. Dependency orderingdependency blocks ensure modules apply in the right order
  4. Before/after hooks — run scripts before or after Terraform commands

Repository Structure

infrastructure-management/
├── modules/                    ← Reusable Terraform modules (never apply directly)
│   ├── k8s/                   ← GKE cluster
│   ├── network/               ← VPC, subnets, firewall, NAT, static IPs
│   ├── datastore/             ← Cloud SQL MySQL
│   ├── redis/                 ← Cloud Memorystore Redis
│   ├── artifacts/             ← Artifact Registry repositories
│   ├── secretmanager/         ← GCP Secret Manager secrets
│   ├── service-accounts/      ← GCP service accounts + IAM + Workload Identity
│   ├── helm/                  ← Helm releases (Istio, cert-manager, ArgoCD, app ingress)
│   ├── dns/                   ← Cloud DNS records
│   ├── buckets/               ← GCS buckets
│   ├── users-access/          ← User IAM bindings
│   ├── cert-monitor/          ← Certificate monitoring (Python script + Docker)
│   ├── kms/                   ← Cloud KMS key rings and keys
│   ├── cloudsql-root-password/         ← Auto-generate + store root password
│   ├── cloudsql-microservice-credentials/ ← Per-service DB users + secrets
│   └── secretmanager-version/ ← Secret version management
└── projects/                   ← Environment-specific assemblies
    ├── orofi-dev/              ← Dev environment
    │   ├── local.tf            ← Variables + backend config
    │   ├── network.tf          ← Calls modules/network
    │   ├── k8s.tf              ← Calls modules/k8s
    │   ├── sql.tf              ← Calls modules/datastore
    │   ├── redis.tf            ← Calls modules/redis
    │   ├── service-accounts.tf ← Calls modules/service-accounts (×8 services)
    │   ├── secrets.tf          ← Calls modules/secretmanager (×40+ secrets)
    │   ├── dns.tf              ← Calls modules/dns
    │   └── ...
    ├── orofi-staging/          ← Staging environment (same structure)
    └── orofi-prod/             ← Production environment (sparse — managed manually)

State Management

Each environment stores its Terraform state in a dedicated GCS bucket:

Environment Bucket Prefix
Dev oro-dev-infra terraform/oro/dev
Staging oro-infra-stag terraform/automation/staging
Production oro-infra-production terraform/automation/production

State is never shared between environments. This means changes in dev don't affect staging state.

Backend Configuration (Dev Example)

# infrastructure-management/projects/orofi-dev/local.tf
terraform {
  backend "gcs" {
    bucket = "oro-dev-infra"
    prefix = "terraform/oro/dev"
  }
}

provider "google" {
  project = "orofi-dev-cloud"
  region  = "us-central1"
}

Module Architecture

Modules are pure Terraform — no Terragrunt. Each module accepts variables and creates GCP resources.

The modules/helm Module

This is the most complex module. It deploys Kubernetes components via Helm:

  • Istio (base + istiod + ingressgateway + egressgateway)
  • cert-manager
  • ArgoCD
  • App Ingress (Gateway resource + ArgoCD VirtualService)

The Istio versions and values are defined in modules/helm/main.tf.

The modules/cloudsql-microservice-credentials Module

This module creates a complete per-service database identity: 1. A MySQL user in the Cloud SQL instance 2. A secret in GCP Secret Manager with the connection string 3. IAM binding for the microservice's service account to read the secret

This pattern ensures each microservice has its own database user and no service can access another service's database.

The modules/service-accounts Module

For each microservice this module: 1. Creates a GCP service account 2. Grants IAM roles (workload identity + any storage/KMS roles) 3. Creates Workload Identity binding to the Kubernetes namespace/service account 4. Grants access to the service's secrets in Secret Manager

Environment Variables and Inputs

Each project directory defines its variables in local.tf:

# infrastructure-management/projects/orofi-staging/local.tf
locals {
  project_id  = "orofi-stage-cloud"
  env         = "stage"
  region      = "us-central1"
  zone        = "us-central1-a"
  domain      = "*.stage.orofi.xyz"
  network_cidr = "11.0.0.0/16"
}

These locals flow into every module call in the same project directory:

# infrastructure-management/projects/orofi-staging/k8s.tf
module "k8s" {
  source     = "../../modules/k8s"
  project_id = local.project_id
  env        = local.env
  region     = local.region
  zone       = local.zone
  ...
}

Atlantis (Planned)

The atlantis-integration-plan.md file describes a planned automation where Bitbucket PRs trigger terragrunt plan automatically and require an atlantis apply comment to execute.

Until Atlantis is running, all Terraform changes are applied manually by engineers with the appropriate service account credentials.

Planned Atlantis configuration: - Server: GCE VM e2-medium in us-east1 - Autodiscover: All Terragrunt directories - Workflow: terragrunt plan on PR open, terragrunt apply on comment - Ignored paths: modules/, common/ - Parallel plans, sequential applies - Production: Manual only

For current manual procedures see Change Management.

CI/CD for Infrastructure

The infrastructure-management/bitbucket-pipelines.yml handles: 1. Python linting for modules/cert-monitor/scripts/monitor.py 2. Building and pushing the cert-monitor Docker image on version tags

Infrastructure changes (Terraform apply) are not automated via CI — they require manual execution.

See Also