Multi-Cluster Connectivity
When deploying OpenChoreo across multiple Kubernetes clusters (e.g., separate Control Plane and Data Plane clusters), you must explicitly establish trust between them. This guide covers the step-by-step process of exchanging certificates to secure the WebSocket connection between planes.
Overviewβ
The OpenChoreo Control Plane runs a Cluster Gateway that listens for incoming WebSocket connections from remote planes (Data Plane, Workflow Plane, Observability Plane). This connection is secured using Mutual TLS (mTLS).
- Server Trust: The remote plane must trust the Control Plane's CA certificate to verify the Cluster Gateway's identity.
- Client Authentication: The remote plane's Cluster Agent generates its own client certificate using a local self-signed issuer. This certificate must be registered with the Control Plane to allow the connection.
In multi-cluster deployments, the agent's client certificate is self-signed because the Control Plane's CA private key cannot be shared across clusters. The Control Plane trusts the agent by explicitly registering the agent's certificate in the DataPlane/WorkflowPlane/ObservabilityPlane CRD.
Prerequisitesβ
- Control Plane installed in a primary cluster
- Remote Cluster (Data, Build, or Observability) where the remote plane will be installed
kubectlcontext configured for both clusters
Each remote cluster must have certain prerequisites installed before deploying a plane. The exact set depends on the plane type:
| Component | Data Plane | Workflow Plane | Observability Plane |
|---|---|---|---|
| Gateway API CRDs | Yes | No | Yes |
| cert-manager | Yes | Yes | Yes |
| External Secrets Operator | Yes | Yes | Yes |
| kgateway | Yes | No | Yes |
See the On Your Environment guide for install commands for each prerequisite.
The plane CRD secretStoreRef is resolved on the Control Plane cluster, so remote clusters don't need their own secret backend for that purpose. Individual workloads or modules on remote clusters may still need local secrets depending on your setup. See Secret Management for details.
Step 0: Configure Cluster Gateway Certificate SANsβ
Before extracting the CA, ensure the Control Plane's Cluster Gateway certificate includes the hostname or IP that remote agents will connect to. By default, the certificate only includes internal cluster DNS names.
When installing or upgrading the Control Plane, add your public DNS name:
helm upgrade --install openchoreo-control-plane oci://ghcr.io/openchoreo/helm-charts/openchoreo-control-plane \
--namespace openchoreo-control-plane \
# ... other values ...
--set "clusterGateway.tls.dnsNames[0]=cluster-gateway.openchoreo-control-plane.svc" \
--set "clusterGateway.tls.dnsNames[1]=cluster-gateway.openchoreo-control-plane.svc.cluster.local" \
--set "clusterGateway.tls.dnsNames[2]=cluster-gateway.openchoreo.${DOMAIN}"
Or in a values file:
clusterGateway:
tls:
dnsNames:
- cluster-gateway.openchoreo-control-plane.svc
- cluster-gateway.openchoreo-control-plane.svc.cluster.local
- cluster-gateway.openchoreo.example.com # Your public DNS name
The serverUrl hostname in Step 2 must match one of the DNS SANs (or IP SANs) in the Cluster Gateway certificate. If not, TLS verification will fail with a certificate error.
If remote agents connect to the Cluster Gateway by IP address rather than a DNS name (common in local or lab setups), the IP must appear in the certificate's ipAddresses field, not dnsNames. X.509 validation rejects IP addresses in DNS SANs.
The Helm chart currently only exposes clusterGateway.tls.dnsNames. To add an IP SAN, patch the Certificate resource after installation:
kubectl patch certificate cluster-gateway-tls -n openchoreo-control-plane --type merge -p "{
\"spec\": {
\"ipAddresses\": [\"${GATEWAY_IP}\"]
}
}"
# Force reissuance
kubectl delete secret cluster-gateway-tls -n openchoreo-control-plane
kubectl wait --for=condition=Ready certificate/cluster-gateway-tls \
-n openchoreo-control-plane --timeout=60s
# Restart the gateway to pick up the new certificate
kubectl rollout restart deployment/cluster-gateway -n openchoreo-control-plane
Expose the Cluster Gatewayβ
By default, the Cluster Gateway service uses ClusterIP and is only reachable from within the Control Plane cluster. For remote planes to connect, you need to expose it externally.
Option A: LoadBalancer service (simplest for dedicated IPs)
Set the service type to LoadBalancer in the control plane helm values:
clusterGateway:
service:
type: LoadBalancer
After upgrading the helm release, get the external IP:
kubectl get svc cluster-gateway -n openchoreo-control-plane -w
export GATEWAY_IP=$(kubectl get svc cluster-gateway -n openchoreo-control-plane \
-o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo "Cluster Gateway: wss://${GATEWAY_IP}:8443/ws"
Option B: TLS passthrough via kgateway (reuses the existing LoadBalancer)
Route agent connections through the existing kgateway LoadBalancer using TLS passthrough:
clusterGateway:
tlsRoute:
enabled: true
hosts:
- host: cluster-gateway.openchoreo.example.com
This creates a TLSRoute that passes WebSocket connections through to the Cluster Gateway based on the SNI hostname. The hostname must match one of the clusterGateway.tls.dnsNames configured above.
See the Control Plane Helm Reference for all clusterGateway.* parameters.
Step 1: Extract Control Plane CAβ
The Control Plane generates a Certificate Authority (CA) used to sign the Cluster Gateway's serving certificate. Remote planes need this CA to verify they are connecting to the authentic Control Plane.
Run this command against your Control Plane cluster:
# Set your Control Plane context and namespace
export CP_CONTEXT="my-control-plane-cluster"
export CP_NAMESPACE="openchoreo-control-plane"
# Extract the CA certificate from the Secret
export CP_CA_CERT=$(kubectl --context $CP_CONTEXT get secret cluster-gateway-ca \
-n $CP_NAMESPACE -o jsonpath='{.data.ca\.crt}' | base64 -d)
# Verify the output (should start with -----BEGIN CERTIFICATE-----)
echo "$CP_CA_CERT" | head -n 5
Extract the CA from the cluster-gateway-ca Secret (not the ConfigMap of the same name). The ConfigMap is populated by a one-shot Job (cluster-gateway-ca-extractor) that runs at install time. If the Job runs before the CA is ready, the ConfigMap may contain a placeholder instead of the real certificate. The Secret always contains the actual CA.
Step 2: Install Remote Plane with CAβ
When installing a remote plane (e.g., Data Plane), create a ConfigMap with the extracted CA certificate and point the Cluster Agent to that ConfigMap. This configures the agent to trust your Control Plane and use a locally-generated client certificate.
Example: Data Plane Installation
First, save the extracted CA certificate to a file:
# Save the CA certificate to a file
echo "$CP_CA_CERT" > ./server-ca.crt
Then create the CA ConfigMap and install the Data Plane:
# Set your Data Plane context
export DP_CONTEXT="my-data-plane-cluster"
export DOMAIN="example.com"
kubectl --context $DP_CONTEXT create namespace openchoreo-data-plane --dry-run=client -o yaml | kubectl --context $DP_CONTEXT apply -f -
kubectl --context $DP_CONTEXT create configmap cluster-gateway-ca \
--from-file=ca.crt=./server-ca.crt \
-n openchoreo-data-plane \
--dry-run=client -o yaml | kubectl --context $DP_CONTEXT apply -f -
helm upgrade --install openchoreo-data-plane oci://ghcr.io/openchoreo/helm-charts/openchoreo-data-plane \
--version <version> \
--kube-context $DP_CONTEXT \
--namespace openchoreo-data-plane \
--create-namespace \
--set clusterAgent.enabled=true \
--set clusterAgent.serverUrl="wss://cluster-gateway.openchoreo.${DOMAIN}/ws" \
--set clusterAgent.tls.enabled=true \
--set clusterAgent.tls.generateCerts=true \
--set clusterAgent.tls.serverCAConfigMap=cluster-gateway-ca \
--set clusterAgent.tls.caSecretName=""
Key Parametersβ
| Parameter | Description |
|---|---|
clusterAgent.serverUrl | The public WebSocket URL of your Control Plane's Cluster Gateway (e.g., wss://cluster-gateway.openchoreo.example.com/ws) |
clusterAgent.tls.generateCerts | Set to true to generate client certificates locally instead of copying from control plane |
clusterAgent.tls.serverCAConfigMap | ConfigMap name containing the Control Plane CA certificate (ca.crt) used to verify the Cluster Gateway identity |
clusterAgent.tls.caSecretName | Set to empty ("") to use a self-signed issuer for generating the agent's client certificate |
In multi-cluster deployments, the remote plane cannot access the Control Plane's CA private key (only the public certificate is available). Therefore, the agent generates its own client certificate using a local self-signed issuer. This certificate is then registered with the Control Plane in Step 4, establishing mutual trust.
Step 3: Extract Agent Client CAβ
After the remote plane is installed, cert-manager generates a self-signed CA and uses it to issue a client certificate for the Cluster Agent. You need to extract the CA certificate (not the leaf certificate) to register the plane with the Control Plane.
Run this command against your Remote Cluster (Data/Build/Observability):
# Set your Remote Plane context and namespace
export REMOTE_CONTEXT="my-data-plane-cluster"
export REMOTE_NAMESPACE="openchoreo-data-plane"
# Wait for the certificate to be ready
kubectl --context $REMOTE_CONTEXT wait --for=condition=Ready \
certificate/cluster-agent-dataplane-tls -n $REMOTE_NAMESPACE --timeout=120s
# Extract the agent's CA certificate (used by the gateway to verify this agent)
export AGENT_CA=$(kubectl --context $REMOTE_CONTEXT get secret cluster-agent-tls \
-n $REMOTE_NAMESPACE -o jsonpath='{.data.ca\.crt}' | base64 -d)
# Verify the output (should start with -----BEGIN CERTIFICATE-----)
echo "$AGENT_CA" | head -n 5
Extract ca.crt from the cluster-agent-tls Secret, not tls.crt. The ca.crt field contains the self-signed CA that issued the agent's client certificate. The Cluster Gateway uses this CA to verify the agent's identity. Using tls.crt (the leaf certificate) will cause websocket: bad handshake errors because the gateway cannot build a trust chain.
If the certificate is not ready, check the cert-manager logs and the Certificate resource status:
kubectl --context $REMOTE_CONTEXT describe certificate cluster-agent-dataplane-tls -n $REMOTE_NAMESPACE
Step 4: Register Plane in Control Planeβ
Finally, register the remote plane by creating the appropriate CRD in the Control Plane cluster. You must embed the AGENT_CA extracted in Step 3.
OpenChoreo supports both namespace-scoped (DataPlane, WorkflowPlane, ObservabilityPlane) and cluster-scoped (ClusterDataPlane, ClusterWorkflowPlane, ClusterObservabilityPlane) variants. Cluster-scoped resources are visible to all namespaces and are simpler for single-tenant setups. The On Your Environment guide uses the cluster-scoped variants. Use namespace-scoped variants when you need per-namespace isolation (e.g., different teams using different data planes).
Example: Registering a Data Plane
# Create the DataPlane resource in the Control Plane
cat <<EOF | kubectl --context $CP_CONTEXT apply -f -
apiVersion: openchoreo.dev/v1alpha1
kind: DataPlane
metadata:
name: production-us-east
namespace: default # Or your organization's namespace
spec:
planeID: production-us-east
clusterAgent:
clientCA:
value: |
$(echo "$AGENT_CA" | sed 's/^/ /')
gateway:
ingress:
external:
name: default-gateway
namespace: openchoreo-system
https:
host: "apps.openchoreo.${DOMAIN}"
port: 443
secretStoreRef:
name: default
EOF
The gateway.ingress.external.https.port must match the port your Data Plane gateway is externally reachable on. If your LoadBalancer maps to standard ports (e.g., 443), set it explicitly as shown above. The Control Plane uses the host and port to generate correct URLs for deployed workloads.
Verificationβ
Once registered, the Control Plane will accept connections from the agent. You can verify the connection by checking the agent logs:
kubectl --context $REMOTE_CONTEXT logs -n $REMOTE_NAMESPACE -l app=cluster-agent --tail=20
You should see a message indicating successful connection: "connected to control plane"
Other Plane Typesβ
The same process applies to Workflow Plane and Observability Plane. The key differences are the namespace and plane-specific configuration.
Workflow Planeβ
kubectl --context $REMOTE_CONTEXT create namespace openchoreo-workflow-plane --dry-run=client -o yaml | kubectl --context $REMOTE_CONTEXT apply -f -
kubectl --context $REMOTE_CONTEXT create configmap cluster-gateway-ca \
--from-file=ca.crt=./server-ca.crt \
-n openchoreo-workflow-plane \
--dry-run=client -o yaml | kubectl --context $REMOTE_CONTEXT apply -f -
helm upgrade --install openchoreo-workflow-plane oci://ghcr.io/openchoreo/helm-charts/openchoreo-workflow-plane \
--version <version> \
--kube-context $REMOTE_CONTEXT \
--namespace openchoreo-workflow-plane \
--create-namespace \
--set clusterAgent.enabled=true \
--set clusterAgent.serverUrl="wss://cluster-gateway.openchoreo.${DOMAIN}/ws" \
--set clusterAgent.tls.enabled=true \
--set clusterAgent.tls.generateCerts=true \
--set clusterAgent.tls.serverCAConfigMap=cluster-gateway-ca \
--set clusterAgent.tls.caSecretName=""
Observability Planeβ
kubectl --context $REMOTE_CONTEXT create namespace openchoreo-observability-plane --dry-run=client -o yaml | kubectl --context $REMOTE_CONTEXT apply -f -
kubectl --context $REMOTE_CONTEXT create configmap cluster-gateway-ca \
--from-file=ca.crt=./server-ca.crt \
-n openchoreo-observability-plane \
--dry-run=client -o yaml | kubectl --context $REMOTE_CONTEXT apply -f -
helm upgrade --install openchoreo-observability-plane oci://ghcr.io/openchoreo/helm-charts/openchoreo-observability-plane \
--version <version> \
--kube-context $REMOTE_CONTEXT \
--namespace openchoreo-observability-plane \
--create-namespace \
--set clusterAgent.enabled=true \
--set clusterAgent.serverUrl="wss://cluster-gateway.openchoreo.${DOMAIN}/ws" \
--set clusterAgent.tls.enabled=true \
--set clusterAgent.tls.generateCerts=true \
--set clusterAgent.tls.serverCAConfigMap=cluster-gateway-ca \
--set clusterAgent.tls.caSecretName=""
Step 5: Register Planesβ
After installing each remote plane (Steps 2-3), register it with the Control Plane by creating the corresponding CRD. See Step 4 for the DataPlane example.
Register a Workflow Planeβ
AGENT_CA=$(kubectl --context $REMOTE_CONTEXT get secret cluster-agent-tls \
-n openchoreo-workflow-plane -o jsonpath='{.data.ca\.crt}' | base64 -d)
cat <<EOF | kubectl --context $CP_CONTEXT apply -f -
apiVersion: openchoreo.dev/v1alpha1
kind: WorkflowPlane
metadata:
name: default
namespace: default
spec:
planeID: default
clusterAgent:
clientCA:
value: |
$(echo "$AGENT_CA" | sed 's/^/ /')
secretStoreRef:
name: default
EOF
Register an Observability Planeβ
The observerURL must be reachable from the Control Plane cluster. In multi-cluster setups this cannot be an in-cluster svc.cluster.local address. Use the Observability Plane's external gateway URL or a routable DNS name instead.
AGENT_CA=$(kubectl --context $REMOTE_CONTEXT get secret cluster-agent-tls \
-n openchoreo-observability-plane -o jsonpath='{.data.ca\.crt}' | base64 -d)
cat <<EOF | kubectl --context $CP_CONTEXT apply -f -
apiVersion: openchoreo.dev/v1alpha1
kind: ClusterObservabilityPlane
metadata:
name: default
spec:
planeID: default
clusterAgent:
clientCA:
value: |
$(echo "$AGENT_CA" | sed 's/^/ /')
observerURL: https://observer.openchoreo.${DOMAIN}
EOF
Link Planes to the Observability Planeβ
After registering the Observability Plane, tell the Data Plane and Workflow Plane where to send telemetry. Adjust the resource kind to match how you registered the planes (namespace-scoped dataplane or cluster-scoped clusterdataplane):
# If using namespace-scoped DataPlane:
kubectl --context $CP_CONTEXT patch dataplane default -n default --type merge \
-p '{"spec":{"observabilityPlaneRef":{"kind":"ClusterObservabilityPlane","name":"default"}}}'
# If using cluster-scoped ClusterDataPlane:
kubectl --context $CP_CONTEXT patch clusterdataplane default --type merge \
-p '{"spec":{"observabilityPlaneRef":{"kind":"ClusterObservabilityPlane","name":"default"}}}'
# Same for the workflow plane if installed:
kubectl --context $CP_CONTEXT patch workflowplane default -n default --type merge \
-p '{"spec":{"observabilityPlaneRef":{"kind":"ClusterObservabilityPlane","name":"default"}}}'
Cross-Cluster Telemetryβ
In multi-cluster deployments, telemetry collectors on the Observability Plane can't scrape pods on remote clusters directly. You need to install observability modules on each remote Data Plane cluster and configure them to push telemetry (logs, metrics, traces) to the Observability Plane's ingestion endpoints.
The specific setup depends on which observability modules you choose. Each module's documentation covers cross-cluster configuration. See Community Modules for available modules and their installation guides.
Troubleshootingβ
Certificate Not Readyβ
If the certificate fails to become ready:
# Check certificate status
kubectl describe certificate cluster-agent-dataplane-tls -n openchoreo-data-plane
# Check cert-manager logs
kubectl logs -n cert-manager deployment/cert-manager
Common issues:
- Issuer not found: Ensure the Helm chart completed successfully and the self-signed issuer was created
- Permission denied: Check that cert-manager has permissions to create secrets in the namespace
Agent Cannot Connectβ
If the agent fails to connect to the Control Plane:
# Check agent logs for connection errors
kubectl logs -n openchoreo-data-plane -l app=cluster-agent
# Verify the server CA ConfigMap exists
kubectl get configmap cluster-gateway-ca -n openchoreo-data-plane
Common issues:
- Certificate verification failed: Ensure ConfigMap
cluster-gateway-cacontains the correctca.crtfrom the Control Plane andclusterAgent.tls.serverCAConfigMappoints to it - Connection refused: Verify the
serverUrlis accessible from the remote cluster and the Cluster Gateway ingress is properly configured
TLS Certificate Error (x509: certificate is valid for X, not Y)β
If you see an error like x509: certificate is valid for cluster-gateway.openchoreo-control-plane.svc, not cluster-gateway.openchoreo.example.com:
The Cluster Gateway's server certificate doesn't include your public DNS name. Update the Control Plane with the correct DNS names (see Step 0):
# Check current certificate DNS names and IP SANs
kubectl get certificate cluster-gateway-tls -n openchoreo-control-plane \
-o jsonpath='{.spec.dnsNames}{"\n"}{.spec.ipAddresses}'
# After updating the Helm values, the certificate will be re-issued
# You may need to delete the old secret to force regeneration
kubectl delete secret cluster-gateway-tls -n openchoreo-control-plane
kubectl wait --for=condition=Ready certificate/cluster-gateway-tls \
-n openchoreo-control-plane --timeout=60s
After updating, restart the cluster gateway and re-extract the CA certificate (Step 1) if the CA was regenerated.
TLS Certificate Error (x509: cannot validate certificate for IP because it doesn't contain any IP SANs)β
This error means the agent is connecting to the Cluster Gateway by IP address, but the certificate only has DNS SANs. X.509 requires IP addresses to appear in the ipAddresses field of the certificate, not in dnsNames.
See the "Connecting by IP address" tip in Step 0 for the fix.
Agent Certificate Not Trusted (websocket: bad handshake)β
If the agent logs show websocket: bad handshake, the Cluster Gateway cannot verify the agent's client certificate. This usually means the wrong certificate was registered in the plane CRD.
# Check cluster gateway logs for TLS errors
kubectl logs -n openchoreo-control-plane deployment/cluster-gateway
# Check Control Plane controller-manager logs
kubectl logs -n openchoreo-control-plane deployment/controller-manager
Common causes:
- The
clientCA.valuein the DataPlane/WorkflowPlane/ObservabilityPlane CRD containstls.crt(the leaf cert) instead ofca.crt(the CA that signed the leaf). Re-extract usingca.crtas shown in Step 3. - The
clientCA.valuecontains the Cluster Gateway's server CA instead of the agent's self-signed client CA. These are different CAs whenclusterAgent.tls.generateCerts=trueandclusterAgent.tls.caSecretName="". - After updating the plane CRD, restart the Cluster Gateway:
kubectl rollout restart deployment/cluster-gateway -n openchoreo-control-plane
Related Referencesβ
- Control Plane Helm Reference (clusterGateway, gateway, TLS configuration)
- Data Plane Helm Reference (clusterAgent, gateway ports, telemetry)
- Workflow Plane Helm Reference (clusterAgent, workflow configuration)
- Observability Plane Helm Reference (observer, gateway, telemetry ingestion)
- Deployment Topology (single-cluster vs multi-cluster architecture)
- Secret Management (configuring secret stores across clusters)