Skip to main content
Version: v1.0.x

OpenChoreo Architecture

Overview​

OpenChoreo is architected as a modular, multi-plane Kubernetes-native system that integrates deeply with other open-source projects to provide a complete, extensible Internal Developer Platform (IDP). It uses a domain-and-abstraction-driven, API-first approach as its core design philosophy, which sets it apart from platforms that are primarily built from disparate tools stitched together with scripts.

OpenChoreo is designed to provide platform builders a strong foundation to stand up their IDP with minimal effort, while also offering the flexibility to customize and extend every aspect of the platform as needed. It achieves this through a clear separation of concerns across multiple planes, each responsible for specific aspects of the platform's functionality. It also uses a modular framework that allows external tools to be integrated as first-class experiences in the platform, rather than just being bolted on.

The Control Plane acts as the central orchestrator, transforming and reconciling the desired state of the platform and developer resources with the other planes, as declared in its Developer API and Platform API.

The Experience Plane provides a uniform, access-controlled interface for both platform and development teams via the CLI (called occ) and a Backstage-based Internal Developer Portal (UI). Both the control plane and the observability plane expose their own OpenAPI-v3-based API and MCP server, which together power these experience surfaces.

The declarative nature of OpenChoreo’s APIs allows it to be operated imperatively via the UI, CLI, and MCPs, or declaratively with Git as the source of truth for both platform and application state, if desired (native GitOps).

In summary,

  • Data Plane(s) provide isolated, observable runtime environments and a gateway topology for running developer API resources such as projects and components. Optional modules can be installed to provide additional capabilities such as API management, scale-to-zero and zero-trust networking.
  • Workflow Plane(s) execute workflows such as CI workflows for building and testing components, GitOps workflows for declaratively managing platform and application state. It also runs other generic workflows as defined by platform teams, such as resource provisioning tasks.
  • Observability Plane(s) collect and aggregate distributed container logs, metrics, and OpenTelemetry-based tracing data across workflow and data planes, providing rich, domain-centric querying and alerting capabilities.

Figure 1 (below) illustrates the components and interactions of the platform at a high level.

Each plane in OpenChoreo operates as a distinct functional unit, with its own deployment and upgrade lifecycle, scaling behavior, and security boundaries. The control plane and the data planes together form the core of the platform, while the workflow and observability planes remain optional but highly recommended for a complete IDP experience.


Control Plane​

The control plane is a Kubernetes cluster that acts as the brain of OpenChoreo. It runs a central control loop that continuously monitors the state of the platform and developer resources. It takes actions to ensure that the desired state (as declared via the Developer and Platform APIs) is reflected in the actual state across all planes.

The control plane consists of the following key components:

  • API Server: Exposes the OpenChoreo API, which is used by both developers and platform teams to interact with the system. It serves as the main entry point for all API requests, handling authentication, authorization, and request validation. The API server is exposed via a Kubernetes Gateway API, and powers both the OpenAPI-v3-based API and the MCP servers. The API server also hosts OpenChoreo's authorization engine, that provides fine-grained RBAC (Role-Based Access Control), ABAC (Attribute-Based Access Control) and hierarchical instance-level access control to all resources created in OpenChoreo.
  • Controller Manager: A set of Kubernetes controllers that implement the core reconciliation logic of the platform. These controllers watch for changes to the CRD instances defined in the Developer and Platform APIs, and take appropriate actions to ensure that the desired state is achieved across all planes. For example, when a new Component is created, the controllers will validate the request, resolve any references (e.g., dependencies of components), and trigger the necessary workflows to build, deploy, and expose the component in the data plane(s) with the required network policies and observability configurations.
  • Cluster Gateway: All other planes (data, observability, workflow) establish outbound connections to the control plane. This system component acts as the hub that allows the API server and Controller Manager to communicate with other planes (a hub-and-spoke model). It exposes a Secure WebSocket (wss) API that allows bidirectional communication between other planes via long-lived connections, authenticated with mTLS (using Cert-Manager). This prevents the Kubernetes API servers of the data, workflow and observability planes from being exposed to the internet.

Experience Plane​

The experience plane is the user-facing layer of OpenChoreo, built for platform teams, development teams and agents to interact with the IDP based on their respective roles and permissions.

It includes the following components:

  • OpenAPI-v3-based APIs exposed by the control plane and observability plane
  • The CLI (called occ)
  • The Backstage-based Internal Developer Portal

    OpenChoreo uses an extended fork of Backstage for its UI that supports native Backstage plugins and custom plugins built specifically for OpenChoreo's APIs and concepts.

  • MCP servers for AI-assisted/driven development and operations (exposed by the control plane and observability plane)

Authentication and Authorization​

OpenChoreo integrates with any OAuth2/OIDC-compatible Identity Provider (IdP) for authentication (who you are). By default, OpenChoreo ships with WSO2 Thunder, an open-source identity server to help you get started.

Learn more about configuring your Identity Provider β†’

OpenChoreo comes built-in with a flexible, declarative authorization engine (that defines what you can do) for all interactions based on fine-grained RBAC (Role-Based Access Control), ABAC (Attribute-Based Access Control) and specific instance-level (namespace, project, component-level) access controls that can allow or deny actions on resources. It works by mapping groups provided from your Identity Provider when a user logs in (via the UI, CLI, API and MCP servers) to extensible roles and authorization policies defined in OpenChoreo. This authorization engine is powered by Apache Casbin.

Learn more about configuring authorization policies β†’

AI as a First-Class Platform Construct​

OpenChoreo was designed from the ground up to support AI-assisted/driven development and operations, and this is reflected in its architecture. Whether your team currently uses AI or not, OpenChoreo provides a future-proof foundation for integrating AI tools and agents that can interact with the platform following the same golden paths, guardrails and authorization policies as human users. It also ships with a set of optional platform agents that can assist with site reliability engineering, cost control and architecture governance use cases. These agents also serve as references for how you can build your own.

Learn more about working with AI in OpenChoreo β†’

Platform API​

The Platform API is a set of Kubernetes CRDs that allow platform builders to define the structure and behavior of the platform itself. It provides abstractions for defining organizational boundaries (Namespaces), Environments, Data Planes, Workflow Planes, Observability Planes, and Deployment Pipelines that can be used as building blocks to define the overall topology of the platform. By using the Platform API, platform teams can declaratively configure how the platform should be structured and how it should operate, without having to write custom code or scripts.

The Platform API also includes programmable abstractions such as ComponentTypes, Traits and Workflows that allow platform teams to define reusable templates and golden paths for their development teams. These abstractions enable platform teams to encapsulate best practices, enforce organizational standards, and provide a consistent developer experience across the platform. These concepts together enable OpenChoreo's approach to policy, security and governance by design.

Figure 2 (below) illustrates some of the core concepts of the Platform API and how they relate to each other. It serves only as a high-level representation of one possible platform topology, but does not enforce strict limitations on what your platform topology should look like.

Learn more about the Platform API abstractions and concepts β†’

Developer API​

The Developer API is a set of Kubernetes CRDs designed to simplify, streamline and reduce the cognitive burden of application development on Kubernetes for development teams. Instead of exposing the entire configuration surface of the Kubernetes API, these abstractions provide a more intuitive and domain-driven way to define projects, their components, and their interactions via endpoints and dependencies.

OpenChoreo avoids "black-box" abstractions that completely obscure Kubernetes. Instead, these provide a way for platform teams to create opinionated, reusable templates that define organizational best practices and standards as intent-driven interfaces for their development teams. This shift-down approach in OpenChoreo is enabled by programmable Component Types, Traits and Workflows (that are part of the platform API).

shift-down reduces developer cognitive load by offloading complexity to the platform, whereas shift-left often increases it.

Creating abstractions for developer intent means enabling self-service for day-to-day tasks as golden paths in your IDP, shifting away from forcing developers to figure out the "how" of Kubernetes and ticket-driven operations.

e.g.,

  • "creating a component",
  • "defining a grouping of components as a project",
  • "building, testing and deploying a component",
  • "exposing a port of a component with a network visibility level (external/internal/namespace/project)",
  • "defining a dependency on another component or an external system",
  • "debugging a component in an environment with logs, metrics and traces" are all examples of developer intent that can be abstracted away from the underlying infrastructure operations.

Figure 3 (below) illustrates the relationships between these abstractions. Note that Workflows and WorkflowRuns have been omitted from the diagram for simplicity, but they are also important abstractions in the Developer API that allow platform teams to define reusable CI/CD, resource provisioning and other workflows as golden paths for developers.

Figure 4 (below) illustrates how a developer-created Component is transformed into Kubernetes resources in the data plane at runtime, based on the definitions provided in the platform API.

With OpenChoreo, platform teams retain complete control of what gets deployed on their infrastructure, while developers get a simplified, intent-driven experience that abstracts away the underlying complexity.

These abstractions also align with the domain-driven design principles, where projects represent bounded contexts and components represent the individual services or workloads within a domain.

Learn more about the Developer API abstractions and concepts β†’


Data Plane​

A data plane is a Kubernetes cluster that is responsible for running component workloads, enforcing network policies, and exposing component endpoints via a structured gateway topology and wiring up dependencies as instructed by the control plane. By definition, a data plane may span multiple federated clusters, but in common practice, a data plane will be 1:1 with a Kubernetes cluster.

Each data plane is registered with the control plane and is authenticated via mutual TLS. OpenChoreo uses a cluster agent (not an AI agent) that establishes a long-lived, outbound secure websocket (wss) connection to the control plane's Cluster Gateway, and listens for instructions or queries.

An OpenChoreo deployment can have one or more data planes, and these can span clusters in different geographies and infrastructure providers, depending on the needs of the organization. Environments (and ClusterEnvironments) define logically isolated runtime environments on a data plane, and a DeploymentPipeline (or a ClusterDeploymentPipeline) can be used to promote components across these logically (or physically) separated environments (i.e., a component can be promoted from a dev environment on data plane 1 β†’ staging on data plane 1 β†’ production on data plane 2 while applying environment-specific configurations and secrets).

The control plane creates logically isolated namespaces, projects and environments via Kubernetes namespaces and network policies in the data planes. It transpiles (converts at the same abstraction level) resources created by the Developer API using the definitions provided in ComponentTypes and Traits into Kubernetes API resources (including any custom resources), and offloads that reconciliation to the data plane cluster's Kubernetes API server. However, this is not a fire-and-forget model. The control plane continuously monitors the desired state of resources deployed to the data plane and takes corrective actions as necessary. A data plane can function even if disconnected from the control plane, but any changes to the desired state will not be reflected until the connection is restored.

Modules​

Optional modules can be installed in the data plane to provide additional platform capabilities. For example:

  • API management module: provides API management capabilities such as rate limiting, authentication, and observability for component endpoints, using Traits that plug into the Components.
  • Elastic (scale-to-zero) module: automatically scales down component workloads to zero when they are not in use, and scales them back up when needed based on traffic.
  • Guard module: uses the Cilium CNI and eBPF to enforce zero-trust network policies and tap into kernel-level observability.

Runtime Model​

The runtime model describes how OpenChoreo's abstractions transform into running systems. When development teams declare projects and components, the platform orchestrates a sophisticated runtime environment that provides isolation, security, and observability on the data planes.

At runtime, resources of a project are isolated through Cells β€” secure, isolated, and observable boundaries for all components belonging to a given namespace-project-environment combination. A Cell becomes the runtime boundary for a group of components with policy enforcement, and observability, aligning with ideas of Cell-Based Architecture: a model where individual teams or domains operate independently within well-defined boundaries, while still benefiting from shared infrastructure capabilities.

Learn more about the runtime model β†’

Gateway Topology​

Another advantage of using the Cell as the runtime boundary is that it allows OpenChoreo to implement a structured gateway topology for all northbound, southbound, eastbound and westbound traffic entering or leaving the cell. It simplifies how developers expose their components via a set of abstractions for direction (ingress, egress) and visibility levels (external, internal, namespace, project), while giving platform teams control over how those visibilities are mapped to Kubernetes Gateway instances, standardizing the gateway topology across the platform.

Learn more about configuring the gateway topology β†’

Figure 5 (above): A screenshot of a Cell diagram as shown in the OpenChoreo Portal, depicting the gateways and component interactions.


Workflow Plane​

A workflow plane is a Kubernetes cluster that is responsible for executing platform-defined workflows. At a high level, OpenChoreo has two categories of workflows: CI workflows that provide developer self-service for building, testing and deploying their components, and generic workflows, that cover all other automation use cases in the platform, including GitOps workflows, resource provisioning workflows, and any custom workflows defined by platform teams.

Similar to data planes, workflow planes are registered with the control plane and establish an outbound secure websocket (wss) connection to the Cluster Gateway. The control plane offloads workflow execution to the workflow plane by sending workflow definitions and execution instructions via this channel, and continuously monitors the state of workflow executions to take corrective actions as necessary.

Modules​

The default workflow module for the OpenChoreo Workflow Plane is powered by Argo Workflows, a powerful Kubernetes-native workflow engine. However, OpenChoreo's workflow concepts are designed to work with any CRD-based workflow/pipeline engine, so you can customize the Workflow Plane to use an alternative Kubernetes-native (CRD-based) module like Tekton, for example.

Using External CI Systems​

The Workflow Plane is an optional plane. If you already have an existing CI system, such as GitHub Actions, GitLab CI, or Jenkins, you can continue to use it. When using an external CI system, OpenChoreo does not control the execution of CI workflows. Instead, builds must be executed outside the platform, and integrated so that when a new build artifact is created, the control plane is notified with the relevant metadata (e.g., image and workload CR).

OpenChoreo provides a set of curated Backstage plugins for its Internal Developer Portal for a few select external CI systems to make this easier, but you can also choose to build your own UI integrations using APIs.

In practice, you could use both external CI and OpenChoreo's workflow plane. A common pattern is using the Git provider's native system for pre-pull-request-merge checks, and the workflow plane to create and deploy the final build when the PR is merged. Generic workflows can be used for GitOps, and also for running post-deployment/pre-promotion workflows such as integration tests.


Observability Plane​

An observability plane is a Kubernetes cluster responsible for providing centralized logs, metrics, traces, and alerts. It acts as a central data sink, collecting and aggregating observability data from all other workflow and data planes, as defined by the platform API. By design, workflow and data planes can use different observability planes if required, and the control plane and the experience plane can query the correct observability plane for each resource based on the relationships declared in the platform topology.

Similar to data and workflow planes, the observability plane establishes an outbound secure websocket (wss) connection to the control plane when registered, but unlike the other planes, it also exposes an OpenAPI-v3-based API for querying logs, metrics and traces and its own MCP server. The control plane uses the websocket channel for reconciling resources in the observability plane (e.g., ObservabilityAlertRules), but the experience plane and external tools (including AI assistants/agents) can query observability data directly via the Observer API and MCP Servers, which are secured with the same authentication and authorization policies as the rest of the platform.

This design prevents observability data from being proxied through the control plane to end-users, which can be a concern in larger multi-regional, multi-tenant OpenChoreo deployments where regional data privacy regulations may apply.

Observability data collection from other planes is carried out by collection agents configured via the observability modules. These agents run on each target plane, enrich logs with domain metadata (such as plane, namespace, project, and component), and forward them to the observability modules in the observability plane. The observability modules are responsible for receiving this data, and providing the backends/adapters for OpenChoreo's Observer API that provide a rich, domain-centric querying experience.

Modules​

By default, the observability plane ships with three modules for logs, metrics, and tracing, but you can choose to swap these out for your own preferred tools, or even use a single module that provides all these capabilities. The default modules are:

  • Logs powered by OpenSearch (observability-logs-opensearch)
  • Metrics powered by Prometheus (observability-metrics-prometheus)
  • Tracing powered by an OpenTelemetry (OTEL) collector with an OpenSearch backend (observability-tracing-opensearch)
  • Alerting capabilities built into the logs and metrics modules.

These modules together support full-text search, structured/unstructured log storage, metrics storage, tracing storage, configurable retention, and complex queries.

Using External Observability Systems​

The observability plane is optional. If you have an existing observability system, you can choose to integrate it with OpenChoreo instead of using the default observability plane and modules. This is a common pattern for organizations that already have a significant investment in an observability stack such as Datadog, Splunk, New Relic, Grafana Cloud, cloud provider specific solutions, etc. OpenChoreo's observability plane uses an adapter pattern that can allow a minimal observability plane to plug into an external system's API, providing the same domain-centric Observer API and MCPs for querying across the unified experience plane.


Deployment Topologies​

OpenChoreo supports multiple deployment patterns to suit different organizational needs, from local development to large-scale, multi-cluster production setups.

  • In development or testing setups, all planes can be deployed into a single Kubernetes cluster using namespace isolation.
  • In production environments, each plane is typically deployed in a separate cluster for scalability, fault tolerance, and security.
  • Hybrid topologies are also supported, allowing teams to co-locate certain planes (e.g., Control + Workflow) for cost or operational efficiency.

For detailed topology configurations, see the Deployment Topology guide.

What's Next​

  1. Quick Start Guide - Experience OpenChoreo locally in a few minutes with just Docker
  2. Try It Out - Run Locally or On Your Environment
  3. Concepts - Learn more about OpenChoreo's core concepts and abstractions
  4. Platform Engineer Guide - Learn how to set up in production, configure developer workflows and govern your platform
  5. Developer Guide - For end-users of the platform: learn how to build, deploy and observe your applications with self-service
  6. Working with AI - Learn how to use your AI assistants (Claude Code, Codex, etc.) with OpenChoreo and set up the built-in platform agents