RCA Agent
The RCA (Root Cause Analysis) Agent is an AI-powered component that analyzes logs, metrics, and traces from your OpenChoreo deployments to generate reports with likely root causes of issues. It integrates with Large Language Models (LLMs) to provide intelligent analysis and actionable insights.
Prerequisitesβ
Before enabling the RCA Agent, ensure the following:
- OpenChoreo Observability Plane installed (optionally with the Prometheus Metrics Module for richer analysis).
- An LLM API key from OpenAI (support for other providers coming soon)
- Alerting configured for your components with
enableAiRootCauseAnalysisenabled.
Enable automatic RCA only for critical alerts to manage LLM costs.
For best compatibility, we recommend using OpenAI models. Support for other providers will be available soon.
Enabling the RCA Agentβ
Step 1: Store secrets in OpenBaoβ
Store your LLM API key:
kubectl exec -n openbao openbao-0 -- \
env BAO_ADDR=http://127.0.0.1:8200 BAO_TOKEN=root \
bao kv put secret/rca-llm-api-key value="<YOUR_LLM_API_KEY>"
Step 2: Create the ExternalSecretβ
Create an ExternalSecret to pull all required values into a single Kubernetes Secret. This secret is referenced by rca.secretName and all its keys are injected as environment variables via envFrom.
kubectl apply -f - <<EOF
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: rca-agent-secret
namespace: openchoreo-observability-plane
spec:
refreshInterval: 1h
secretStoreRef:
kind: ClusterSecretStore
name: default
target:
name: rca-agent-secret
data:
- secretKey: RCA_LLM_API_KEY
remoteRef:
key: rca-llm-api-key
property: value
- secretKey: OAUTH_CLIENT_SECRET
remoteRef:
key: rca-oauth-client-secret
property: value
EOF
Step 3: Upgrade the Observability Planeβ
Enable the RCA Agent and configure the LLM model. The --reuse-values flag preserves your existing configuration.
helm upgrade --install openchoreo-observability-plane oci://ghcr.io/openchoreo/helm-charts/openchoreo-observability-plane \
--version 1.0.0-rc.1 \
--namespace openchoreo-observability-plane \
--reuse-values \
--set rca.enabled=true \
--set rca.llm.modelName=<model-name>
If the observability plane and control plane are in separate clusters, set rca.controlPlaneUrl to the control plane API URL (defaults to http://api.openchoreo.localhost:8080):
--set rca.controlPlaneUrl=<control-plane-api-url>
Step 4: Register with the control planeβ
Configure rcaAgentURL in the ClusterObservabilityPlane resource so the control plane knows where to reach the agent:
kubectl patch clusterobservabilityplane default --type=merge -p '{"spec":{"rcaAgentURL":"http://rca-agent.openchoreo.localhost:11080"}}'
Step 5: Verify the installationβ
Check that the RCA Agent pod is running:
kubectl get pods -n openchoreo-observability-plane -l app.kubernetes.io/component=ai-rca-agent
If you are using the default identity provider (Thunder) and the default SQLite report storage, your setup is complete. The sections below are only needed if you are configuring an external identity provider or PostgreSQL for report storage.
Authentication and Authorizationβ
By default, OpenChoreo configures Thunder as the identity provider for the RCA Agent with a pre-configured OAuth client for testing purposes. If you are using an external identity provider, follow the steps below to configure authentication and authorization.
Authenticationβ
Create an OAuth 2.0 client that supports the client_credentials grant type for service-to-service authentication.
Store your OAuth client secret in OpenBao:
kubectl exec -n openbao openbao-0 -- \
env BAO_ADDR=http://127.0.0.1:8200 BAO_TOKEN=root \
bao kv put secret/rca-oauth-client-secret value="<YOUR_OAUTH_CLIENT_SECRET>"
Then configure the Observability Plane Helm values with your client credentials:
security:
oidc:
tokenUrl: "<your-idp-token-url>"
rca:
secretName: "rca-agent-secret"
oauth:
clientId: "<your-client-id>"
See Identity Provider Configuration for detailed setup instructions.
Authorizationβ
The RCA Agent uses the client_credentials grant to authenticate with the OpenChoreo API as a service account. The API matches the sub claim in the issued JWT to identify the caller, so the new client must be granted the rca-agent role via a bootstrap authorization mapping.
Add the following to your Control Plane values override, replacing <your-client-id> with the same client ID used above:
openchoreoApi:
config:
security:
authorization:
bootstrap:
mappings:
- name: rca-agent-binding
roleRef:
name: rca-agent
entitlement:
claim: sub
value: "<your-client-id>"
effect: allow
Report Storageβ
By default, RCA reports are stored in SQLite with a persistent volume β no external database required.
For production deployments that need horizontal scaling or shared storage, you can use PostgreSQL instead.
Using PostgreSQLβ
Store the PostgreSQL connection URI in OpenBao:
kubectl exec -n openbao openbao-0 -- \
env BAO_ADDR=http://127.0.0.1:8200 BAO_TOKEN=root \
bao kv put secret/rca-sql-backend-uri value="postgresql+asyncpg://<USER>:<PASSWORD>@<HOST>:<PORT>/<DBNAME>"
Add the SQL_BACKEND_URI key to the ExternalSecret from Step 2:
kubectl patch externalsecret rca-agent-secret -n openchoreo-observability-plane --type=json \
-p '[{"op":"add","path":"/spec/data/-","value":{"secretKey":"SQL_BACKEND_URI","remoteRef":{"key":"rca-sql-backend-uri","property":"value"}}}]'
Then set the report backend in your Helm values:
rca:
reportBackend: postgresql