Cluster Configuration Reference¶

Cluster Overview¶

Property	Dev	Staging
Cluster name	`orofi-dev-cloud-dev-k8s-cluster`	`orofi-stage-cloud-stage-k8s-cluster`
Project	`orofi-dev-cloud`	`orofi-stage-cloud`
Zone	`us-central1-a`	`us-central1-a`
Node pool min	0	0
Node pool max	15	15
Node pool initial	1	1
Node machine type	[NEEDS TEAM INPUT]	[NEEDS TEAM INPUT]
Workload Identity	Enabled	Enabled
Zero-trust control plane	Enabled	Disabled

Node Pool Autoscaling¶

The cluster autoscaler adjusts node count between 0 and 15 based on pod resource requests.

Scale-up trigger: A pod is Pending because no existing node has sufficient CPU/memory for its resource requests.

Scale-down trigger: A node's scheduled pods can fit on other nodes, and the node has been underutilized for 10 minutes.

Important: Autoscaling responds to pod requests, not actual usage. Over-provisioned resource requests (e.g., requesting 2 CPU but using 200m) prevent effective downscaling.

# See current node count and status
kubectl get nodes

# See resource requests vs capacity per node
kubectl describe nodes | grep -A5 "Allocated resources"

Workload Identity¶

All application pods authenticate to GCP using Workload Identity — no static service account keys are used.

The binding chain:

Pod → K8s ServiceAccount (namespace-scoped)
   → GCP ServiceAccount (project-scoped)
   → GCP IAM permissions

Each K8s ServiceAccount has an annotation set by the Helm chart:

# serviceaccount.yaml (in orofi-application Helm chart)
annotations:
  iam.gke.io/gcp-service-account: microservice-identity-sa@orofi-{env}-cloud.iam.gserviceaccount.com

The corresponding GCP SA has a Workload Identity IAM binding:

serviceAccount:orofi-{env}-cloud.svc.id.goog[microservice-identity/microservice-identity-sa]
  → roles/iam.workloadIdentityUser
  on microservice-identity-sa@orofi-{env}-cloud.iam.gserviceaccount.com

RBAC¶

Kubernetes RBAC is supplemented by GCP IAM — GCP IAM roles for GKE translate to Kubernetes RBAC permissions.

GCP Role	K8s Equivalent
`roles/container.viewer`	view (read pods, logs)
`roles/container.developer`	edit (deploy, scale)
`roles/container.admin`	cluster-admin

[NEEDS TEAM INPUT: are there any custom ClusterRoles or RoleBindings beyond the GCP defaults? Document them here.]

Namespace Isolation¶

Each application namespace is isolated via:

Secret isolation: Each namespace only has ExternalSecret objects for its own secrets
ServiceAccount isolation: Each namespace has one ServiceAccount with its own GCP SA binding
Istio PeerAuthentication: STRICT mTLS mode per namespace (blocks unauthenticated connections)
Resource quotas: [NEEDS TEAM INPUT: are ResourceQuota objects configured per namespace?]

Installed Components¶

Bootstrap Components (via Terraform `modules/helm`)¶

Applied once during cluster setup, managed by Terraform state:

Component	Helm Chart	Version	Namespace
Istio base (CRDs)	`base` (Istio)	1.24.2	`istio-system`
Istio control plane	`istiod` (Istio)	1.24.2	`istio-system`
Istio ingress gateway	`gateway` (Istio)	1.24.2	`istio-system`
Istio egress gateway	`gateway` (Istio)	1.24.2	`istio-system`
cert-manager	`cert-manager`	latest	`cert-manager`
ArgoCD	`argo-cd`	[NEEDS TEAM INPUT]	`argocd`
Gateway resource	custom	N/A	`istio-system`

Application Components (via ArgoCD)¶

All other components are deployed by ArgoCD from the infrastructure-configuration repository:

Component	Helm Chart	Namespace	Source
Prometheus	Bitnami `1.3.23`	`prometheus`	`cluster-addons/prometheus/`
Grafana	[NEEDS TEAM INPUT]	`grafana`	`cluster-addons/grafana/`
Loki	[NEEDS TEAM INPUT]	`loki`	`cluster-addons/loki/`
kube-state-metrics	[NEEDS TEAM INPUT]	`kube-state-metrics`	`cluster-addons/`
node-exporter	[NEEDS TEAM INPUT]	`node-exporter`	`cluster-addons/`
KubeCost	[NEEDS TEAM INPUT]	`kubecost`	`cluster-addons/kubecost/`
KEDA	[NEEDS TEAM INPUT]	`keda`	`cluster-addons/`
External Secrets Operator	[NEEDS TEAM INPUT]	`external-secrets`	`cluster-addons/`
K6 Operator	[NEEDS TEAM INPUT]	`k6-operator`	`tools/k6-operator/`
MongoDB Operator (PSMDB)	Percona	`mongo-db`	`tools/mongodb-operator/`
Kafka (Bitnami KRaft)	`32.4.3`	`kafka`	`tools/kafka-new/`
Kafka UI	custom	`kafka` (or `tools`)	`tools/kafka-ui/`
Mongo Express	`1.17.0`	`tools`	`tools/mongo-express/`

Resource Requests and Limits Reference¶

Platform Components¶

Kafka (Dev)

Component	CPU Request	CPU Limit	Memory Request	Memory Limit
Controller	250m	500m	512Mi	1Gi
Broker	1000m	2000m	1Gi	4Gi

K6 Operator Manager (Dev)

CPU Request	CPU Limit	Memory Request	Memory Limit
100m	500m	128Mi	512Mi

[NEEDS TEAM INPUT: document resource requests/limits for Prometheus, Grafana, ArgoCD, ESO, KEDA, and all microservices.]

Horizontal Pod Autoscaling¶

Each microservice has an HPA resource (created by the shared Helm chart). KEDA extends the HPA with custom metrics.

[NEEDS TEAM INPUT: document the HPA min/max replicas and target CPU for each service.]