Environments¶
Environment Overview¶
Orofi runs three environments. All use the same Terraform modules and Kubernetes manifests — differences are controlled entirely through variable values, not separate code paths.
| Property | Development | Staging | Production |
|---|---|---|---|
| GCP Project | orofi-dev-cloud |
orofi-stage-cloud |
[NEEDS TEAM INPUT] |
| Domain | *.dev.orofi.xyz |
*.stage.orofi.xyz |
*.orofi.xyz |
| GKE Cluster | orofi-dev-cloud-dev-k8s-cluster |
orofi-stage-cloud-stage-k8s-cluster |
[NEEDS TEAM INPUT] |
| Region/Zone | us-central1-a |
us-central1-a |
[NEEDS TEAM INPUT] |
| VPC CIDR | 10.0.0.0/16 |
11.0.0.0/16 |
[NEEDS TEAM INPUT] |
| Terraform State Bucket | oro-dev-infra |
oro-infra-stag |
oro-infra-production |
| Terraform State Prefix | terraform/oro/dev |
terraform/automation/staging |
terraform/automation/production |
| Terraform SA | terraform-mnl@orofi-dev-cloud |
orofi-mnl-sa-terraform@orofi-stage-cloud |
[NEEDS TEAM INPUT] |
| IaC Automation | Manual + Bitbucket | Manual + Bitbucket | Manual only (planned: Atlantis) |
What Changes Per Environment¶
Database Tier¶
Dev uses micro/HDD for cost savings. Staging uses a larger tier with SSD and regional HA to mirror production behavior.
| Property | Dev | Staging | Production |
|---|---|---|---|
| Instance tier | db-f1-micro |
db-n1-standard-1 |
[NEEDS TEAM INPUT] |
| Disk type | PD_HDD |
PD_SSD |
[NEEDS TEAM INPUT] |
| Disk size | 20 GB | 100 GB | [NEEDS TEAM INPUT] |
| Availability | ZONAL |
REGIONAL |
[NEEDS TEAM INPUT] |
| Backup retention | [NEEDS TEAM INPUT] | 30 backups | [NEEDS TEAM INPUT] |
GKE Autoscaling¶
Both clusters autoscale from 1 to 15 nodes. The minimum of 1 means the cluster can scale to zero nodes (all workloads evicted) when using the manual scale-down pipelines.
Kafka Configuration¶
| Property | Dev | Staging |
|---|---|---|
| Controller replicas | 1 | 3 |
| Broker replicas | 1 | 3 |
| Topic replication factor | 1 | 3 |
| Min ISR | N/A | 2 |
| Network threads | default | 8 |
| IO threads | default | 16 |
In dev, Kafka runs as a single broker — no replication. Staging mirrors production HA configuration.
MongoDB¶
| Property | Dev | Staging |
|---|---|---|
| Replica set size | 1 | 3 |
| WiredTiger cache | 0.2 GB | [NEEDS TEAM INPUT] |
Redis Backup Cadence¶
| Property | Dev | Staging |
|---|---|---|
| RDB snapshot interval | Every 12 hours | Every 6 hours |
Zero-Trust Firewall¶
Zero-trust network policy is enabled in dev and disabled in staging:
Dev: zero_trust = true → GCP firewall denies all traffic except known IPs
Stage: zero_trust = false → Istio controls access, firewall is open
Deployment Controls¶
| Property | Dev | Staging | Production |
|---|---|---|---|
| ArgoCD auto-sync | [NEEDS TEAM INPUT] | [NEEDS TEAM INPUT] | [NEEDS TEAM INPUT] |
| Manual scale pipelines | Available | Available | N/A |
| Terraform automation | Manual CLI | Manual CLI | Manual only |
Why Three Environments?¶
Development is the integration environment where all services are deployed after a successful build. It should reflect the current state of the main (or develop) branch. It uses cheaper resources because cost matters more than reliability at this stage.
Staging is a production-mirror environment. It uses production-equivalent resource sizes (REGIONAL Cloud SQL, HA Kafka with 3 replicas) so that performance and reliability issues surface before production. Staging deployments precede production deployments.
Production is the live environment serving real users. [NEEDS TEAM INPUT: describe production deployment gate criteria — who approves, what tests must pass].
Manual Scale-Down / Scale-Up¶
To reduce costs, the dev and staging clusters can be scaled to zero nodes using Bitbucket pipeline custom triggers:
| Pipeline | Effect |
|---|---|
scale-down-dev |
Scales dev GKE cluster to 0 nodes |
scale-up-dev |
Scales dev GKE cluster back to normal |
scale-down-staging |
Scales staging GKE cluster to 0 nodes |
scale-up-staging |
Scales staging GKE cluster back to normal |
These pipelines use the k8s-scaler-cross service account which has roles/container.admin on the target clusters.
Scale-down impacts
Scaling to zero nodes will evict all pods. ArgoCD will re-deploy everything when the cluster scales back up. Expect 5–10 minutes for services to be fully healthy after scale-up.
Cross-Environment Access¶
The staging Private Service Connector (PSC) accepts auto-connections from orofi-devops-cloud. This allows the devops project to reach the staging Cloud SQL instance directly over the private network — used by migration tooling and Flyway.
The k8s-scaler-cross service account in staging has cross-project IAM bindings that allow scaling the dev cluster from staging automation.
Environment Variable Differences¶
Each environment has its own set of secrets in GCP Secret Manager, prefixed with dev- or stage-. The External Secrets Operator in each cluster only reads from that environment's secrets.
See Environment Variables Guide for per-service configuration details.
See Also¶
- Infrastructure Topology — resource-level comparison
- Terraform Modules — module inputs that differ per environment
- Scaling Events Runbook — how to manually scale clusters