Tracing Requests¶

Before You Read¶

This guide helps you trace why a request failed or was slow. For log reading basics see Reading Logs.

Request Path¶

Every external request traverses this path:

Client → Cloud DNS → GCP Load Balancer → Istio IngressGateway → API Gateway → Microservice → Database

When debugging, work from the outside in: 1. Is DNS resolving correctly? 2. Is the load balancer reachable? 3. Is the IngressGateway routing correctly? 4. Is the API Gateway healthy? 5. Is the microservice healthy? 6. Is the database accessible?

Step 1: Check IngressGateway¶

# Is the IngressGateway healthy?
kubectl get pods -n istio-system -l app=istio-ingressgateway

# Check for routing errors in ingress logs
kubectl logs -n istio-system -l app=istio-ingressgateway --tail=50 \
  | grep -v "200\|204\|health"

# Verify the gateway resource is configured
kubectl get gateway -n istio-system oro-gateway -o yaml

Step 2: Check VirtualServices¶

# List all VirtualServices (shows all routes)
kubectl get virtualservices -A

# Check a specific service's VirtualService
kubectl get virtualservice -n microservice-identity -o yaml

# Verify the VirtualService host matches the incoming request hostname

Step 3: Check the API Gateway¶

# Is the gateway pod running?
kubectl get pods -n api-gateway-public

# Check gateway logs for the request
kubectl logs -n api-gateway-public \
  -l app=api-gateway-public \
  --tail=100 | grep "your-request-path"

Step 4: Check the Microservice¶

# Is the microservice healthy?
kubectl get pods -n microservice-identity

# Check for recent errors
kubectl logs -n microservice-identity \
  -l app=microservice-identity \
  --tail=100 | grep -i "error\|exception\|fatal"

# Check events for crash loops or scheduling issues
kubectl get events -n microservice-identity \
  --sort-by='.lastTimestamp' | tail -20

Step 5: Check Istio Proxy Status¶

If the application seems healthy but requests aren't routing:

# Check Envoy proxy config for a pod
istioctl proxy-config routes \
  $(kubectl get pod -n microservice-identity -l app=microservice-identity -o name | head -1).microservice-identity

# Check listener config
istioctl proxy-config listeners \
  $(kubectl get pod -n microservice-identity -l app=microservice-identity -o name | head -1).microservice-identity

# Check for proxy sync errors
istioctl proxy-status

Distributed Tracing¶

[NEEDS TEAM INPUT: is distributed tracing (Jaeger, Zipkin, or Tempo) deployed? If yes, document how to find a trace ID in logs and open it in the tracing UI. If not, note this as a gap.]

The Istio Telemetry resource is configured per namespace (via telemetry.yaml in the Helm chart). If tracing is enabled, trace IDs appear in log output.

Grafana: Correlation Across Services¶

In Grafana Explore, you can correlate logs across services using a common request ID or correlation ID:

# Find all log lines with a specific request ID across all namespaces
{namespace=~"microservice-.*|api-gateway-.*"} |= "req-id-12345"

[NEEDS TEAM INPUT: what correlation/trace ID header does the platform use? (e.g., X-Request-ID, X-Trace-ID, X-B3-TraceId). Is it propagated through all services?]

Checking mTLS Issues¶

If services can't connect to each other and you see TLS errors:

# Check if mTLS is working between services
kubectl exec -it \
  -n api-gateway-public \
  $(kubectl get pod -n api-gateway-public -l app=api-gateway-public -o name | head -1) \
  -- curl -v http://microservice-identity.microservice-identity.svc.cluster.local/health

# Check PeerAuthentication
kubectl get peerauthentication -A

# Verify cert is issued and valid
istioctl proxy-config secret \
  $(kubectl get pod -n microservice-identity -l app=microservice-identity -o name | head -1).microservice-identity

Health Check Endpoints¶

# Check service health directly (from within the cluster via port-forward)
kubectl port-forward -n microservice-identity svc/microservice-identity 8080:80

# In another terminal
curl -v http://localhost:8080/health

[NEEDS TEAM INPUT: confirm the health check path for each service. The Helm chart template uses /health but confirm this is correct for all services.]

Common Failure Signatures¶

Symptom	Likely Cause	Where to Look
503 from IngressGateway	No healthy pods for the route	Check pods in target namespace
504 from IngressGateway	Request timeout	Check microservice logs for slow queries
401/403	Auth failure	Check API key secrets, JWT validation
Pod `CrashLoopBackOff`	App crash on startup	`kubectl logs --previous`
Pod `Pending`	No nodes with capacity	`kubectl describe pod` for resource constraints
`RBAC: access denied` in ArgoCD	Service account permissions	Check K8s RBAC, ArgoCD project config