Cluster Configuration Reference¶
Cluster Overview¶
| Property | Dev | Staging |
|---|---|---|
| Cluster name | orofi-dev-cloud-dev-k8s-cluster |
orofi-stage-cloud-stage-k8s-cluster |
| Project | orofi-dev-cloud |
orofi-stage-cloud |
| Zone | us-central1-a |
us-central1-a |
| Node pool min | 0 | 0 |
| Node pool max | 15 | 15 |
| Node pool initial | 1 | 1 |
| Node machine type | [NEEDS TEAM INPUT] | [NEEDS TEAM INPUT] |
| Workload Identity | Enabled | Enabled |
| Zero-trust control plane | Enabled | Disabled |
Node Pool Autoscaling¶
The cluster autoscaler adjusts node count between 0 and 15 based on pod resource requests.
Scale-up trigger: A pod is Pending because no existing node has sufficient CPU/memory for its resource requests.
Scale-down trigger: A node's scheduled pods can fit on other nodes, and the node has been underutilized for 10 minutes.
Important: Autoscaling responds to pod requests, not actual usage. Over-provisioned resource requests (e.g., requesting 2 CPU but using 200m) prevent effective downscaling.
# See current node count and status
kubectl get nodes
# See resource requests vs capacity per node
kubectl describe nodes | grep -A5 "Allocated resources"
Workload Identity¶
All application pods authenticate to GCP using Workload Identity — no static service account keys are used.
The binding chain:
Pod → K8s ServiceAccount (namespace-scoped)
→ GCP ServiceAccount (project-scoped)
→ GCP IAM permissions
Each K8s ServiceAccount has an annotation set by the Helm chart:
# serviceaccount.yaml (in orofi-application Helm chart)
annotations:
iam.gke.io/gcp-service-account: microservice-identity-sa@orofi-{env}-cloud.iam.gserviceaccount.com
The corresponding GCP SA has a Workload Identity IAM binding:
serviceAccount:orofi-{env}-cloud.svc.id.goog[microservice-identity/microservice-identity-sa]
→ roles/iam.workloadIdentityUser
on microservice-identity-sa@orofi-{env}-cloud.iam.gserviceaccount.com
RBAC¶
Kubernetes RBAC is supplemented by GCP IAM — GCP IAM roles for GKE translate to Kubernetes RBAC permissions.
| GCP Role | K8s Equivalent |
|---|---|
roles/container.viewer |
view (read pods, logs) |
roles/container.developer |
edit (deploy, scale) |
roles/container.admin |
cluster-admin |
[NEEDS TEAM INPUT: are there any custom ClusterRoles or RoleBindings beyond the GCP defaults? Document them here.]
Namespace Isolation¶
Each application namespace is isolated via:
- Secret isolation: Each namespace only has
ExternalSecretobjects for its own secrets - ServiceAccount isolation: Each namespace has one ServiceAccount with its own GCP SA binding
- Istio PeerAuthentication: STRICT mTLS mode per namespace (blocks unauthenticated connections)
- Resource quotas: [NEEDS TEAM INPUT: are ResourceQuota objects configured per namespace?]
Installed Components¶
Bootstrap Components (via Terraform modules/helm)¶
Applied once during cluster setup, managed by Terraform state:
| Component | Helm Chart | Version | Namespace |
|---|---|---|---|
| Istio base (CRDs) | base (Istio) |
1.24.2 | istio-system |
| Istio control plane | istiod (Istio) |
1.24.2 | istio-system |
| Istio ingress gateway | gateway (Istio) |
1.24.2 | istio-system |
| Istio egress gateway | gateway (Istio) |
1.24.2 | istio-system |
| cert-manager | cert-manager |
latest | cert-manager |
| ArgoCD | argo-cd |
[NEEDS TEAM INPUT] | argocd |
| Gateway resource | custom | N/A | istio-system |
Application Components (via ArgoCD)¶
All other components are deployed by ArgoCD from the infrastructure-configuration repository:
| Component | Helm Chart | Namespace | Source |
|---|---|---|---|
| Prometheus | Bitnami 1.3.23 |
prometheus |
cluster-addons/prometheus/ |
| Grafana | [NEEDS TEAM INPUT] | grafana |
cluster-addons/grafana/ |
| Loki | [NEEDS TEAM INPUT] | loki |
cluster-addons/loki/ |
| kube-state-metrics | [NEEDS TEAM INPUT] | kube-state-metrics |
cluster-addons/ |
| node-exporter | [NEEDS TEAM INPUT] | node-exporter |
cluster-addons/ |
| KubeCost | [NEEDS TEAM INPUT] | kubecost |
cluster-addons/kubecost/ |
| KEDA | [NEEDS TEAM INPUT] | keda |
cluster-addons/ |
| External Secrets Operator | [NEEDS TEAM INPUT] | external-secrets |
cluster-addons/ |
| K6 Operator | [NEEDS TEAM INPUT] | k6-operator |
tools/k6-operator/ |
| MongoDB Operator (PSMDB) | Percona | mongo-db |
tools/mongodb-operator/ |
| Kafka (Bitnami KRaft) | 32.4.3 |
kafka |
tools/kafka-new/ |
| Kafka UI | custom | kafka (or tools) |
tools/kafka-ui/ |
| Mongo Express | 1.17.0 |
tools |
tools/mongo-express/ |
Resource Requests and Limits Reference¶
Platform Components¶
Kafka (Dev)
| Component | CPU Request | CPU Limit | Memory Request | Memory Limit |
|---|---|---|---|---|
| Controller | 250m | 500m | 512Mi | 1Gi |
| Broker | 1000m | 2000m | 1Gi | 4Gi |
K6 Operator Manager (Dev)
| CPU Request | CPU Limit | Memory Request | Memory Limit |
|---|---|---|---|
| 100m | 500m | 128Mi | 512Mi |
[NEEDS TEAM INPUT: document resource requests/limits for Prometheus, Grafana, ArgoCD, ESO, KEDA, and all microservices.]
Horizontal Pod Autoscaling¶
Each microservice has an HPA resource (created by the shared Helm chart). KEDA extends the HPA with custom metrics.
[NEEDS TEAM INPUT: document the HPA min/max replicas and target CPU for each service.]
See Also¶
- GCP & GKE
- Scaling Events Runbook
- Terraform Modules Reference —
k8smodule