Tracing Requests¶
Before You Read¶
This guide helps you trace why a request failed or was slow. For log reading basics see Reading Logs.
Request Path¶
Every external request traverses this path:
Client → Cloud DNS → GCP Load Balancer → Istio IngressGateway → API Gateway → Microservice → Database
When debugging, work from the outside in: 1. Is DNS resolving correctly? 2. Is the load balancer reachable? 3. Is the IngressGateway routing correctly? 4. Is the API Gateway healthy? 5. Is the microservice healthy? 6. Is the database accessible?
Step 1: Check IngressGateway¶
# Is the IngressGateway healthy?
kubectl get pods -n istio-system -l app=istio-ingressgateway
# Check for routing errors in ingress logs
kubectl logs -n istio-system -l app=istio-ingressgateway --tail=50 \
| grep -v "200\|204\|health"
# Verify the gateway resource is configured
kubectl get gateway -n istio-system oro-gateway -o yaml
Step 2: Check VirtualServices¶
# List all VirtualServices (shows all routes)
kubectl get virtualservices -A
# Check a specific service's VirtualService
kubectl get virtualservice -n microservice-identity -o yaml
# Verify the VirtualService host matches the incoming request hostname
Step 3: Check the API Gateway¶
# Is the gateway pod running?
kubectl get pods -n api-gateway-public
# Check gateway logs for the request
kubectl logs -n api-gateway-public \
-l app=api-gateway-public \
--tail=100 | grep "your-request-path"
Step 4: Check the Microservice¶
# Is the microservice healthy?
kubectl get pods -n microservice-identity
# Check for recent errors
kubectl logs -n microservice-identity \
-l app=microservice-identity \
--tail=100 | grep -i "error\|exception\|fatal"
# Check events for crash loops or scheduling issues
kubectl get events -n microservice-identity \
--sort-by='.lastTimestamp' | tail -20
Step 5: Check Istio Proxy Status¶
If the application seems healthy but requests aren't routing:
# Check Envoy proxy config for a pod
istioctl proxy-config routes \
$(kubectl get pod -n microservice-identity -l app=microservice-identity -o name | head -1).microservice-identity
# Check listener config
istioctl proxy-config listeners \
$(kubectl get pod -n microservice-identity -l app=microservice-identity -o name | head -1).microservice-identity
# Check for proxy sync errors
istioctl proxy-status
Distributed Tracing¶
[NEEDS TEAM INPUT: is distributed tracing (Jaeger, Zipkin, or Tempo) deployed? If yes, document how to find a trace ID in logs and open it in the tracing UI. If not, note this as a gap.]
The Istio Telemetry resource is configured per namespace (via telemetry.yaml in the Helm chart). If tracing is enabled, trace IDs appear in log output.
Grafana: Correlation Across Services¶
In Grafana Explore, you can correlate logs across services using a common request ID or correlation ID:
# Find all log lines with a specific request ID across all namespaces
{namespace=~"microservice-.*|api-gateway-.*"} |= "req-id-12345"
[NEEDS TEAM INPUT: what correlation/trace ID header does the platform use? (e.g., X-Request-ID, X-Trace-ID, X-B3-TraceId). Is it propagated through all services?]
Checking mTLS Issues¶
If services can't connect to each other and you see TLS errors:
# Check if mTLS is working between services
kubectl exec -it \
-n api-gateway-public \
$(kubectl get pod -n api-gateway-public -l app=api-gateway-public -o name | head -1) \
-- curl -v http://microservice-identity.microservice-identity.svc.cluster.local/health
# Check PeerAuthentication
kubectl get peerauthentication -A
# Verify cert is issued and valid
istioctl proxy-config secret \
$(kubectl get pod -n microservice-identity -l app=microservice-identity -o name | head -1).microservice-identity
Health Check Endpoints¶
# Check service health directly (from within the cluster via port-forward)
kubectl port-forward -n microservice-identity svc/microservice-identity 8080:80
# In another terminal
curl -v http://localhost:8080/health
[NEEDS TEAM INPUT: confirm the health check path for each service. The Helm chart template uses /health but confirm this is correct for all services.]
Common Failure Signatures¶
| Symptom | Likely Cause | Where to Look |
|---|---|---|
| 503 from IngressGateway | No healthy pods for the route | Check pods in target namespace |
| 504 from IngressGateway | Request timeout | Check microservice logs for slow queries |
| 401/403 | Auth failure | Check API key secrets, JWT validation |
Pod CrashLoopBackOff |
App crash on startup | kubectl logs --previous |
Pod Pending |
No nodes with capacity | kubectl describe pod for resource constraints |
RBAC: access denied in ArgoCD |
Service account permissions | Check K8s RBAC, ArgoCD project config |