Skip to main content
Version: Next

Multi-Cluster Connectivity

When deploying OpenChoreo across multiple Kubernetes clusters (e.g., separate Control Plane and Data Plane clusters), you must explicitly establish trust between them. This guide covers the step-by-step process of exchanging certificates to secure the WebSocket connection between planes.

Overview​

The OpenChoreo Control Plane runs a Cluster Gateway that listens for incoming WebSocket connections from remote planes (Data Plane, Build Plane, Observability Plane). This connection is secured using Mutual TLS (mTLS).

  1. Server Trust: The remote plane must trust the Control Plane's CA certificate to verify the Cluster Gateway's identity.
  2. Client Authentication: The remote plane's Cluster Agent generates its own client certificate using a local self-signed issuer. This certificate must be registered with the Control Plane to allow the connection.
note

In multi-cluster deployments, the agent's client certificate is self-signed because the Control Plane's CA private key cannot be shared across clusters. The Control Plane trusts the agent by explicitly registering the agent's certificate in the DataPlane/BuildPlane/ObservabilityPlane CRD.

Prerequisites​

  • Control Plane installed in a primary cluster
  • Remote Cluster (Data, Build, or Observability) where the remote plane will be installed
  • kubectl context configured for both clusters
  • cert-manager installed on both clusters (Control Plane for gateway certificate, remote cluster for agent certificate)

Step 0: Configure Cluster Gateway DNS Names​

Before extracting the CA, ensure the Control Plane's Cluster Gateway certificate includes your public DNS name. By default, the certificate only includes internal cluster DNS names.

When installing or upgrading the Control Plane, add your public DNS name:

helm upgrade --install openchoreo-control-plane oci://ghcr.io/openchoreo/helm-charts/openchoreo-control-plane \
--namespace openchoreo-control-plane \
# ... other values ...
--set "clusterGateway.tls.dnsNames[0]=cluster-gateway.openchoreo-control-plane.svc" \
--set "clusterGateway.tls.dnsNames[1]=cluster-gateway.openchoreo-control-plane.svc.cluster.local" \
--set "clusterGateway.tls.dnsNames[2]=cluster-gateway.openchoreo.${DOMAIN}"

Or in a values file:

clusterGateway:
tls:
dnsNames:
- cluster-gateway.openchoreo-control-plane.svc
- cluster-gateway.openchoreo-control-plane.svc.cluster.local
- cluster-gateway.openchoreo.example.com # Your public DNS name
warning

The serverUrl hostname in Step 2 must match one of the DNS names in the Cluster Gateway certificate. If not, TLS verification will fail with a certificate error.

Step 1: Extract Control Plane CA​

The Control Plane generates a Certificate Authority (CA) used to sign the Cluster Gateway's serving certificate. Remote planes need this CA to verify they are connecting to the authentic Control Plane.

Run this command against your Control Plane cluster:

# Set your Control Plane context and namespace
export CP_CONTEXT="my-control-plane-cluster"
export CP_NAMESPACE="openchoreo-control-plane"

# Extract the CA certificate from the ConfigMap
export CP_CA_CERT=$(kubectl --context $CP_CONTEXT get configmap cluster-gateway-ca \
-n $CP_NAMESPACE -o jsonpath='{.data.ca\.crt}')

# Verify the output (should start with -----BEGIN CERTIFICATE-----)
echo "$CP_CA_CERT" | head -n 5
warning

Ensure you extract the CA from the cluster-gateway-ca ConfigMap, not the TLS Secret. The ConfigMap contains the public CA certificate in plain text, which is what the Helm chart expects.

Step 2: Install Remote Plane with CA​

When installing a remote plane (e.g., Data Plane), pass the extracted CA certificate to the Helm chart. This configures the Cluster Agent to trust your Control Plane and use a locally-generated client certificate.

Example: Data Plane Installation

First, save the extracted CA certificate to a file:

# Save the CA certificate to a file
echo "$CP_CA_CERT" > ./server-ca.crt

Then install the Data Plane:

# Set your Data Plane context
export DP_CONTEXT="my-data-plane-cluster"
export DOMAIN="example.com"

helm upgrade --install openchoreo-data-plane oci://ghcr.io/openchoreo/helm-charts/openchoreo-data-plane \
--version <version> \
--kube-context $DP_CONTEXT \
--namespace openchoreo-data-plane \
--create-namespace \
--set clusterAgent.enabled=true \
--set clusterAgent.serverUrl="wss://cluster-gateway.openchoreo.${DOMAIN}/ws" \
--set clusterAgent.tls.enabled=true \
--set clusterAgent.tls.generateCerts=true \
--set-file clusterAgent.tls.serverCAValue=./server-ca.crt \
--set clusterAgent.tls.caSecretName="" \
--set clusterAgent.tls.caSecretNamespace=""

Key Parameters​

ParameterDescription
clusterAgent.serverUrlThe public WebSocket URL of your Control Plane's Cluster Gateway (e.g., wss://cluster-gateway.openchoreo.example.com/ws)
clusterAgent.tls.generateCertsSet to true to generate client certificates locally instead of copying from control plane
clusterAgent.tls.serverCAValueThe CA certificate file from Step 1. Used to verify the Cluster Gateway's identity. Use --set-file for proper PEM formatting.
clusterAgent.tls.caSecretNameSet to empty ("") to use a self-signed issuer for generating the agent's client certificate
clusterAgent.tls.caSecretNamespaceSet to empty ("") along with caSecretName
Why self-signed client certificates?

In multi-cluster deployments, the remote plane cannot access the Control Plane's CA private key (only the public certificate is available). Therefore, the agent generates its own client certificate using a local self-signed issuer. This certificate is then registered with the Control Plane in Step 4, establishing mutual trust.

Step 3: Extract Agent Client Certificate​

After the remote plane is installed, its cert-manager will generate a client certificate for the Cluster Agent. You need to extract this certificate to register the plane.

Run this command against your Remote Cluster (Data/Build/Observability):

# Set your Remote Plane context and namespace
export REMOTE_CONTEXT="my-data-plane-cluster"
export REMOTE_NAMESPACE="openchoreo-data-plane"

# Wait for the certificate to be ready
kubectl --context $REMOTE_CONTEXT wait --for=condition=Ready \
certificate/cluster-agent-dataplane-tls -n $REMOTE_NAMESPACE --timeout=120s

# Extract the agent's client certificate
export AGENT_CERT=$(kubectl --context $REMOTE_CONTEXT get secret cluster-agent-tls \
-n $REMOTE_NAMESPACE -o jsonpath='{.data.tls\.crt}' | base64 -d)

# Verify the output (should start with -----BEGIN CERTIFICATE-----)
echo "$AGENT_CERT" | head -n 5
tip

If the certificate is not ready, check the cert-manager logs and the Certificate resource status:

kubectl --context $REMOTE_CONTEXT describe certificate cluster-agent-dataplane-tls -n $REMOTE_NAMESPACE

Step 4: Register Plane in Control Plane​

Finally, register the remote plane by creating the appropriate CRD (DataPlane, BuildPlane, or ObservabilityPlane) in the Control Plane cluster. You must embed the AGENT_CERT extracted in Step 3.

Example: Registering a Data Plane

# Create the DataPlane resource in the Control Plane
cat <<EOF | kubectl --context $CP_CONTEXT apply -f -
apiVersion: openchoreo.dev/v1alpha1
kind: DataPlane
metadata:
name: production-us-east
namespace: default # Or your organization's namespace
spec:
agent:
enabled: true
clientCA:
value: |
$(echo "$AGENT_CERT" | sed 's/^/ /')
gateway:
publicVirtualHost: "apps.openchoreo.${DOMAIN}"
organizationVirtualHost: "openchoreoapis.internal"
secretStoreRef:
name: default
EOF

Verification​

Once registered, the Control Plane will accept connections from the agent. You can verify the connection by checking the agent logs:

kubectl --context $REMOTE_CONTEXT logs -n $REMOTE_NAMESPACE -l app.kubernetes.io/name=cluster-agent --tail=20

You should see a message indicating successful connection: Connected to control plane at wss://...

Other Plane Types​

The same process applies to Build Plane and Observability Plane. The key differences are the namespace and plane-specific configuration.

Build Plane​

helm upgrade --install openchoreo-build-plane oci://ghcr.io/openchoreo/helm-charts/openchoreo-build-plane \
--version <version> \
--kube-context $REMOTE_CONTEXT \
--namespace openchoreo-build-plane \
--create-namespace \
--set clusterAgent.enabled=true \
--set clusterAgent.serverUrl="wss://cluster-gateway.openchoreo.${DOMAIN}/ws" \
--set clusterAgent.tls.enabled=true \
--set clusterAgent.tls.generateCerts=true \
--set-file clusterAgent.tls.serverCAValue=./server-ca.crt \
--set clusterAgent.tls.caSecretName="" \
--set clusterAgent.tls.caSecretNamespace=""

Observability Plane​

helm upgrade --install openchoreo-observability-plane oci://ghcr.io/openchoreo/helm-charts/openchoreo-observability-plane \
--version <version> \
--kube-context $REMOTE_CONTEXT \
--namespace openchoreo-observability-plane \
--create-namespace \
--set clusterAgent.enabled=true \
--set clusterAgent.serverUrl="wss://cluster-gateway.openchoreo.${DOMAIN}/ws" \
--set clusterAgent.tls.enabled=true \
--set clusterAgent.tls.generateCerts=true \
--set-file clusterAgent.tls.serverCAValue=./server-ca.crt \
--set clusterAgent.tls.caSecretName="" \
--set clusterAgent.tls.caSecretNamespace=""

Troubleshooting​

Certificate Not Ready​

If the certificate fails to become ready:

# Check certificate status
kubectl describe certificate cluster-agent-dataplane-tls -n openchoreo-data-plane

# Check cert-manager logs
kubectl logs -n cert-manager deployment/cert-manager

Common issues:

  • Issuer not found: Ensure the Helm chart completed successfully and the self-signed issuer was created
  • Permission denied: Check that cert-manager has permissions to create secrets in the namespace

Agent Cannot Connect​

If the agent fails to connect to the Control Plane:

# Check agent logs for connection errors
kubectl logs -n openchoreo-data-plane -l app=cluster-agent

# Verify the server CA ConfigMap exists
kubectl get configmap cluster-gateway-ca -n openchoreo-data-plane

Common issues:

  • Certificate verification failed: Ensure serverCAValue contains the correct CA certificate from the Control Plane
  • Connection refused: Verify the serverUrl is accessible from the remote cluster and the Cluster Gateway ingress is properly configured

TLS Certificate Error (x509: certificate is valid for X, not Y)​

If you see an error like x509: certificate is valid for cluster-gateway.openchoreo-control-plane.svc, not cluster-gateway.openchoreo.example.com:

The Cluster Gateway's server certificate doesn't include your public DNS name. Update the Control Plane with the correct DNS names (see Step 0):

# Check current certificate DNS names
kubectl get certificate cluster-gateway-tls -n openchoreo-control-plane -o jsonpath='{.spec.dnsNames}'

# After updating the Helm values, the certificate will be re-issued
# You may need to delete the old certificate to force regeneration
kubectl delete certificate cluster-gateway-tls -n openchoreo-control-plane

After updating, re-extract the CA certificate (Step 1) as the certificate may have been regenerated.

Agent Certificate Not Trusted​

If the Control Plane rejects the agent's connection:

# Check Control Plane controller-manager logs
kubectl logs -n openchoreo-control-plane deployment/controller-manager

Ensure the DataPlane/BuildPlane/ObservabilityPlane CRD has the correct agent certificate in spec.agent.clientCA.value