Terraform & Terragrunt¶
Before You Read¶
This page describes how infrastructure changes are made. For step-by-step change procedures see Change Management. For module documentation see Terraform Modules Reference.
Why Terragrunt?¶
Terraform alone handles resource provisioning but requires duplication when configuring multiple environments. Terragrunt adds:
- DRY configuration — write module calls once, override only what differs per environment
- Remote state management — each project/environment has its own GCS bucket and prefix
- Dependency ordering —
dependencyblocks ensure modules apply in the right order - Before/after hooks — run scripts before or after Terraform commands
Repository Structure¶
infrastructure-management/
├── modules/ ← Reusable Terraform modules (never apply directly)
│ ├── k8s/ ← GKE cluster
│ ├── network/ ← VPC, subnets, firewall, NAT, static IPs
│ ├── datastore/ ← Cloud SQL MySQL
│ ├── redis/ ← Cloud Memorystore Redis
│ ├── artifacts/ ← Artifact Registry repositories
│ ├── secretmanager/ ← GCP Secret Manager secrets
│ ├── service-accounts/ ← GCP service accounts + IAM + Workload Identity
│ ├── helm/ ← Helm releases (Istio, cert-manager, ArgoCD, app ingress)
│ ├── dns/ ← Cloud DNS records
│ ├── buckets/ ← GCS buckets
│ ├── users-access/ ← User IAM bindings
│ ├── cert-monitor/ ← Certificate monitoring (Python script + Docker)
│ ├── kms/ ← Cloud KMS key rings and keys
│ ├── cloudsql-root-password/ ← Auto-generate + store root password
│ ├── cloudsql-microservice-credentials/ ← Per-service DB users + secrets
│ └── secretmanager-version/ ← Secret version management
│
└── projects/ ← Environment-specific assemblies
├── orofi-dev/ ← Dev environment
│ ├── local.tf ← Variables + backend config
│ ├── network.tf ← Calls modules/network
│ ├── k8s.tf ← Calls modules/k8s
│ ├── sql.tf ← Calls modules/datastore
│ ├── redis.tf ← Calls modules/redis
│ ├── service-accounts.tf ← Calls modules/service-accounts (×8 services)
│ ├── secrets.tf ← Calls modules/secretmanager (×40+ secrets)
│ ├── dns.tf ← Calls modules/dns
│ └── ...
├── orofi-staging/ ← Staging environment (same structure)
└── orofi-prod/ ← Production environment (sparse — managed manually)
State Management¶
Each environment stores its Terraform state in a dedicated GCS bucket:
| Environment | Bucket | Prefix |
|---|---|---|
| Dev | oro-dev-infra |
terraform/oro/dev |
| Staging | oro-infra-stag |
terraform/automation/staging |
| Production | oro-infra-production |
terraform/automation/production |
State is never shared between environments. This means changes in dev don't affect staging state.
Backend Configuration (Dev Example)¶
# infrastructure-management/projects/orofi-dev/local.tf
terraform {
backend "gcs" {
bucket = "oro-dev-infra"
prefix = "terraform/oro/dev"
}
}
provider "google" {
project = "orofi-dev-cloud"
region = "us-central1"
}
Module Architecture¶
Modules are pure Terraform — no Terragrunt. Each module accepts variables and creates GCP resources.
The modules/helm Module¶
This is the most complex module. It deploys Kubernetes components via Helm:
- Istio (base + istiod + ingressgateway + egressgateway)
- cert-manager
- ArgoCD
- App Ingress (Gateway resource + ArgoCD VirtualService)
The Istio versions and values are defined in modules/helm/main.tf.
The modules/cloudsql-microservice-credentials Module¶
This module creates a complete per-service database identity: 1. A MySQL user in the Cloud SQL instance 2. A secret in GCP Secret Manager with the connection string 3. IAM binding for the microservice's service account to read the secret
This pattern ensures each microservice has its own database user and no service can access another service's database.
The modules/service-accounts Module¶
For each microservice this module: 1. Creates a GCP service account 2. Grants IAM roles (workload identity + any storage/KMS roles) 3. Creates Workload Identity binding to the Kubernetes namespace/service account 4. Grants access to the service's secrets in Secret Manager
Environment Variables and Inputs¶
Each project directory defines its variables in local.tf:
# infrastructure-management/projects/orofi-staging/local.tf
locals {
project_id = "orofi-stage-cloud"
env = "stage"
region = "us-central1"
zone = "us-central1-a"
domain = "*.stage.orofi.xyz"
network_cidr = "11.0.0.0/16"
}
These locals flow into every module call in the same project directory:
# infrastructure-management/projects/orofi-staging/k8s.tf
module "k8s" {
source = "../../modules/k8s"
project_id = local.project_id
env = local.env
region = local.region
zone = local.zone
...
}
Atlantis (Planned)¶
The atlantis-integration-plan.md file describes a planned automation where Bitbucket PRs trigger terragrunt plan automatically and require an atlantis apply comment to execute.
Until Atlantis is running, all Terraform changes are applied manually by engineers with the appropriate service account credentials.
Planned Atlantis configuration:
- Server: GCE VM e2-medium in us-east1
- Autodiscover: All Terragrunt directories
- Workflow: terragrunt plan on PR open, terragrunt apply on comment
- Ignored paths: modules/, common/
- Parallel plans, sequential applies
- Production: Manual only
For current manual procedures see Change Management.
CI/CD for Infrastructure¶
The infrastructure-management/bitbucket-pipelines.yml handles:
1. Python linting for modules/cert-monitor/scripts/monitor.py
2. Building and pushing the cert-monitor Docker image on version tags
Infrastructure changes (Terraform apply) are not automated via CI — they require manual execution.
See Also¶
- Terraform Modules Reference — every module with inputs, outputs, and examples
- Terragrunt Structure Reference — detailed project layout
- Change Management — safe process for applying changes
- Environments — what differs between environments