Skip to main content
Version: v1.0.0-rc.1 (pre-release)

ObservabilityAlertRule

An ObservabilityAlertRule defines a rule for monitoring runtime observability data (metrics or logs) and triggering alerts when specific conditions are met.

Generated Resources

ObservabilityAlertRule resources are generated automatically by the OpenChoreo control plane during component releases. They are derived from the alert definitions specified in a component's traits and environment-specific parameters are applied via ReleaseBinding CR.

Usage Recommendation​

You should not create ObservabilityAlertRule resources manually. Instead, you should define alert rules using a Trait (either from the default observability-alert-rule trait or a custom trait) within your component definition. This ensures that the alert rules are properly scoped to your component and managed as part of its lifecycle across different environments.

Example: Defining Alerts as Traits​

In your Component CR, add the alert rule as a trait (using the default observability-alert-rule trait). The trait is responsible for generating an ObservabilityAlertRule CR with the appropriate spec.source, spec.condition, and spec.actions fields:

apiVersion: openchoreo.dev/v1alpha1
kind: Component
metadata:
name: my-service
spec:
# ... other component fields ...
traits:
- name: observability-alert-rule
kind: Trait
instanceName: high-error-rate-log-alert
parameters:
description: "Triggered when error logs count exceeds 50 in 5 minutes."
severity: "critical"
source:
type: "log"
query: "status:error"
condition:
window: 5m
interval: 1m
operator: gt
threshold: 50

Override the environment-specific parameters for the alert rule (enablement, notification channels, and incident/AI RCA behavior) in the ReleaseBinding CR via traitEnvironmentConfigs:

apiVersion: openchoreo.dev/v1alpha1
kind: ReleaseBinding
metadata:
name: my-service-production
namespace: default
spec:
owner:
projectName: default
componentName: my-service
environment: production

traitEnvironmentConfigs:
high-error-rate-log-alert:
enabled: true
actions:
notifications:
channels:
- devops-email-notifications
incident:
enabled: true
triggerAiRca: false

The control plane will then generate the corresponding ObservabilityAlertRule resource for each environment where this component is released.

API Version​

openchoreo.dev/v1alpha1

Resource Definition​

Metadata​

ObservabilityAlertRule resources are namespace-scoped and typically created within the project-environment namespace similar to how resources are created in dataplanes.

apiVersion: openchoreo.dev/v1alpha1
kind: ObservabilityAlertRule
metadata:
name: <rule-name>
namespace: <project-environment-namespace>

Spec Fields​

FieldTypeRequiredDescription
namestringYesUnique identifier for the alert rule
descriptionstringNoA human-friendly summary of the alert rule
severityAlertSeverityNoDescribes how urgent the alert is (info, warning, critical)
enabledbooleanNoToggles whether this alert rule should be evaluated. Defaults to true
sourceAlertSourceYesSpecifies the observability source type (log or metric) and query/metric that drives the rule
conditionAlertConditionYesControls when an alert should be triggered based on the source data
actionsAlertActionsYesDefines where alerts are sent and whether incidents/AI RCA are triggered

AlertSeverity​

ValueDescription
infoInformational alerts
warningWarning-level alerts
criticalCritical alerts

AlertSource​

Specifies where and how events are pulled for evaluation.

FieldTypeRequiredDescription
typeAlertSourceTypeYesThe telemetry source type (log, metric)
querystringNoThe query for log-based alerting (for example, status:error). Required when type=log.
metricstringNoThe metric type for metrics-based alerting. Required when type=metric. Must be one of the supported metrics (cpu_usage, memory_usage).

AlertSourceType​

ValueDescription
logLog-based alerting (powered by observability logs module)
metricUsage metrics-based alerting (powered by observability metrics module)

AlertCondition​

Represents the conditions under which an alert should be triggered.

FieldTypeRequiredDescription
windowdurationYesThe time window aggregated before comparison (e.g., 5m)
intervaldurationYesHow often the alert rule is evaluated (e.g., 1m)
operatorAlertConditionOperatorYesComparison operator used for evaluation
thresholdintegerYesTrigger value for the configured operator (percentage or count, depending on source)

AlertConditionOperator​

ValueDescription
gtGreater than threshold
ltLess than threshold
gteGreater than or equal to threshold
lteLess than or equal to threshold
eqEquals the threshold

AlertActions​

Defines what happens when an alert rule is triggered.

FieldTypeRequiredDescription
notificationsAlertNotificationsYesNotification channels to send alerts to
incidentAlertIncidentNoOptional incident and AI RCA behavior

AlertNotifications​

FieldTypeRequiredDescription
channelsstring[]YesList of ObservabilityAlertsNotificationChannel names to notify

At least one notification channel must be configured. If the originating trait or ReleaseBinding override omits actions.notifications.channels, the control plane resolves the environment’s default notification channel and populates actions.notifications.channels in the generated ObservabilityAlertRule.

AlertIncident​

Represents incident behavior when an alert fires.

FieldTypeRequiredDescription
enabledbooleanNoEnables incident creation when this alert fires. Defaults to false.
triggerAiRcabooleanNoEnables AI-powered root cause analysis when an incident is created. Requires enabled to be true.

Examples​

Log-based Alert Rule (Generated from trait)​

apiVersion: openchoreo.dev/v1alpha1
kind: ObservabilityAlertRule
metadata:
name: error-logs-alert
namespace: my-project-production
spec:
name: Error Logs Detected
description: Triggered when more than 10 error logs are detected in 1 minute.
severity: critical
enabled: true
source:
type: log
query: 'status: "error"'
condition:
window: 1m
interval: 1m
operator: gt
threshold: 10
actions:
notifications:
channels:
- devops-email-notifications
incident:
enabled: true
triggerAiRca: false

Metric-based Alert Rule for CPU usage (Generated from trait)​

apiVersion: openchoreo.dev/v1alpha1
kind: ObservabilityAlertRule
metadata:
name: high-cpu-usage
namespace: my-project-production
spec:
name: High CPU Usage
description: Triggered when average container CPU usage exceeds 80% of limits for 5 minutes.
severity: warning
enabled: true
source:
type: metric
metric: cpu_usage
condition:
window: 5m
interval: 1m
operator: gte
threshold: 80
actions:
notifications:
channels:
- devops-slack-notifications