Error Handling & Retry Patterns
Build robust state machines with automatic retry logic, error recovery, and graceful degradation. This example demonstrates production-ready patterns for handling failures in state machines.
What you'll learn:
- Error event handling with
error.*wildcards - Automatic retry with configurable limits
- Exponential backoff patterns
- Graceful failure states and recovery options
- Conditional transitions for error routing
Try It
Click Start, then process to begin. Use success, failure, or timeout to simulate different outcomes:
How to explore:
- Click process to start the operation
- Wait for
operation_start(500ms delay), then send:- success - Operation completes successfully
- failure - Triggers retry logic (up to 3 attempts)
- timeout - Same as failure, triggers retry
- During retry wait, you can send cancel to abort
- After max retries, send force_retry to try again
- reset returns to idle from any terminal state
The Retry Pattern
A robust retry mechanism handles transient failures while preventing infinite loops:
┌─────────────┐
│ idle │
└──────┬──────┘
│ process
▼
┌─────────────┐
┌────▶│ processing │◀────────────────────┐
│ └──────┬──────┘ │
│ │ │
│ success │ failure/timeout/error.* │
│ │ │
│ ▼ │
│ ┌─────────────┐ ┌──────┴──────┐
│ │ success │ │error_handler│
│ └─────────────┘ └──────┬──────┘
│ │
│ retryCount < max?
│ ┌─────┴─────┐
│ YES NO
│ │ │
│ ▼ ▼
│ ┌────────────┐ ┌─────────┐
│ │ retry_wait │ │ failed │
│ └──────┬─────┘ └─────────┘
│ │
│ retry (1s delay)
└──────────────────────────┘
The SCXML Implementation
<?xml version="1.0" encoding="UTF-8"?>
<scxml xmlns="http://www.w3.org/2005/07/scxml" version="1.0"
datamodel="ecmascript" initial="idle" name="ErrorHandling">
<datamodel>
<data id="retryCount" expr="0"/>
<data id="maxRetries" expr="3"/>
<data id="lastError" expr="''"/>
<data id="result" expr="null"/>
</datamodel>
<state id="idle">
<onentry>
<log label="Status" expr="'Ready. Send "process" to begin.'"/>
<assign location="retryCount" expr="0"/>
<assign location="lastError" expr="''"/>
</onentry>
<transition event="process" target="processing"/>
</state>
<state id="processing">
<onentry>
<log label="Status" expr="'Processing... (attempt ' + (retryCount + 1) + '/' + maxRetries + ')'"/>
<send event="operation_start" delay="500ms"/>
</onentry>
<!-- Success path -->
<transition event="success" target="success">
<assign location="result" expr="_event.data ? _event.data.result : 'completed'"/>
<log label="Success" expr="'Operation succeeded: ' + result"/>
</transition>
<!-- Multiple error paths - all route to handler -->
<transition event="failure" target="error_handler">
<assign location="lastError" expr="_event.data ? _event.data.message : 'Unknown error'"/>
</transition>
<!-- Wildcard catches all error.* events -->
<transition event="error.*" target="error_handler">
<assign location="lastError" expr="'System error: ' + _event.name"/>
</transition>
<transition event="timeout" target="error_handler">
<assign location="lastError" expr="'Operation timed out'"/>
</transition>
</state>
<state id="error_handler">
<onentry>
<log label="Error" expr="lastError"/>
<assign location="retryCount" expr="retryCount + 1"/>
</onentry>
<!-- Conditional: retry if under limit -->
<transition cond="retryCount < maxRetries" target="retry_wait">
<log label="Retry" expr="'Will retry (' + retryCount + '/' + maxRetries + ')'"/>
</transition>
<!-- Otherwise: fail permanently -->
<transition cond="retryCount >= maxRetries" target="failed">
<log label="Failed" expr="'Max retries exceeded'"/>
</transition>
</state>
<state id="retry_wait">
<onentry>
<log label="Status" expr="'Waiting before retry...'"/>
<send event="retry" delay="1s"/>
</onentry>
<transition event="retry" target="processing"/>
<transition event="cancel" target="cancelled"/>
</state>
<state id="success">
<onentry>
<log label="Complete" expr="'Operation completed successfully'"/>
</onentry>
<transition event="reset" target="idle"/>
</state>
<state id="failed">
<onentry>
<log label="Failed" expr="'Operation failed after ' + retryCount + ' attempts. Last error: ' + lastError"/>
</onentry>
<transition event="reset" target="idle"/>
<transition event="force_retry" target="processing">
<assign location="retryCount" expr="0"/>
</transition>
</state>
<state id="cancelled">
<onentry>
<log label="Cancelled" expr="'Operation cancelled by user'"/>
</onentry>
<transition event="reset" target="idle"/>
</state>
</scxml>
Key Concepts
Error Event Wildcards
The error.* pattern catches any event starting with "error.":
<transition event="error.*" target="error_handler">
<assign location="lastError" expr="'System error: ' + _event.name"/>
</transition>
This catches:
error.communicationerror.platformerror.execution- Any other
error.*event
Conditional Transitions
SCXML evaluates transitions in document order. The first matching transition wins:
<!-- Check retry limit -->
<transition cond="retryCount < maxRetries" target="retry_wait"/>
<transition cond="retryCount >= maxRetries" target="failed"/>
Note: Use < and > for < and > in XML attributes.
Event Data Access
Extract information from event payloads:
<transition event="failure" target="error_handler">
<assign location="lastError" expr="_event.data ? _event.data.message : 'Unknown error'"/>
</transition>
The _event object contains:
_event.name- Event name (e.g., "failure")_event.data- Event payload data_event.type- Event type (internal/external)_event.origin- Source of the event
Advanced Patterns
Exponential Backoff
Increase delay between retries to reduce load:
<datamodel>
<data id="retryCount" expr="0"/>
<data id="baseDelay" expr="1000"/> <!-- 1 second -->
</datamodel>
<state id="retry_wait">
<onentry>
<!-- Calculate exponential delay: 1s, 2s, 4s, 8s... -->
<script>
var delay = baseDelay * Math.pow(2, retryCount - 1);
var maxDelay = 30000; // Cap at 30 seconds
delay = Math.min(delay, maxDelay);
</script>
<log label="Retry" expr="'Waiting ' + delay + 'ms before retry...'"/>
<send event="retry" delayexpr="delay + 'ms'"/>
</onentry>
<transition event="retry" target="processing"/>
</state>
Circuit Breaker Pattern
Prevent repeated failures by "opening" the circuit:
<datamodel>
<data id="failureCount" expr="0"/>
<data id="circuitOpen" expr="false"/>
<data id="circuitOpenTime" expr="0"/>
</datamodel>
<state id="processing">
<!-- Check if circuit is open -->
<transition cond="circuitOpen" target="circuit_open">
<log label="Circuit" expr="'Circuit breaker is OPEN - fast fail'"/>
</transition>
<!-- Normal processing... -->
</state>
<state id="error_handler">
<onentry>
<assign location="failureCount" expr="failureCount + 1"/>
<!-- Open circuit after 5 consecutive failures -->
<if cond="failureCount >= 5">
<assign location="circuitOpen" expr="true"/>
<assign location="circuitOpenTime" expr="Date.now()"/>
<log label="Circuit" expr="'Circuit breaker OPENED'"/>
</if>
</onentry>
</state>
<state id="circuit_open">
<onentry>
<!-- Try to close circuit after 30 seconds -->
<send event="circuit_check" delay="30s"/>
</onentry>
<transition event="circuit_check" target="half_open">
<assign location="circuitOpen" expr="false"/>
</transition>
</state>
Retry with Different Strategies
Route to different handlers based on error type:
<state id="processing">
<!-- Network errors: retry immediately -->
<transition event="error.network" target="retry_wait"/>
<!-- Auth errors: re-authenticate first -->
<transition event="error.auth" target="authenticate"/>
<!-- Validation errors: don't retry, user must fix -->
<transition event="error.validation" target="validation_failed"/>
<!-- Unknown errors: use standard retry logic -->
<transition event="error.*" target="error_handler"/>
</state>
Code Generation
Java with ExecutorService
scxml-gen error-handling.scxml -t java -o ErrorHandling.java --package com.example
import com.scxmlgen.runtime.executor.ContinuousStateMachineExecutor;
ErrorHandling sm = new ErrorHandling();
try (var executor = new ContinuousStateMachineExecutor(sm)) {
executor.start();
sm.send("process");
Thread.sleep(600); // Wait for operation_start
// Simulate failure
sm.send("failure", Map.of("message", "Connection refused"));
// Let retry logic run
Thread.sleep(5000);
// Check final state
if (sm.isInState("failed")) {
System.out.println("Operation failed after retries");
}
}
JavaScript with Async/Await
import { ErrorHandling } from './ErrorHandling.js';
const sm = new ErrorHandling();
sm.onStateChange(() => console.log('State:', [...sm.getActiveStateIds()]));
sm.start();
// Process with simulated external operation
sm.send('process');
// Simulate async operation result
setTimeout(() => {
const success = Math.random() > 0.7; // 30% success rate
if (success) {
sm.send('success', { result: 'Data processed' });
} else {
sm.send('failure', { message: 'Server unavailable' });
}
}, 600);
C with Error Codes
#include "error_handling.h"
ErrorHandling sm;
ErrorHandling_init(&sm);
ErrorHandling_start(&sm);
// Start processing
ErrorHandling_send(&sm, EVT_PROCESS, NULL);
// In your main loop, handle external operation results
void handle_operation_result(int error_code) {
if (error_code == 0) {
ErrorHandling_send(&sm, EVT_SUCCESS, NULL);
} else if (error_code == ETIMEDOUT) {
ErrorHandling_send(&sm, EVT_TIMEOUT, NULL);
} else {
ErrorHandling_send(&sm, EVT_FAILURE, NULL);
}
}
Testing Error Handling
Unit Test Strategy
@Test
void shouldRetryOnFailure() {
ErrorHandling sm = new ErrorHandling();
RunToCompletionStateMachineExecutor exec =
new RunToCompletionStateMachineExecutor(sm);
exec.start();
sm.send("process");
// First failure
sm.send("failure");
assertTrue(sm.isInState("error_handler") || sm.isInState("retry_wait"));
// Process delayed events (retry timer)
while (sm.processDelayedEvents()) {
Thread.sleep(100);
}
// Should be back in processing for retry
assertTrue(sm.isInState("processing"));
}
@Test
void shouldFailAfterMaxRetries() {
ErrorHandling sm = new ErrorHandling();
// ... setup ...
// Fail 3 times
for (int i = 0; i < 3; i++) {
sm.send("process");
sm.send("failure");
// Process retry timers...
}
assertTrue(sm.isInState("failed"));
}
Best Practices
1. Always Have an Exit Path
Every state should eventually lead to a terminal or recovery state:
<!-- ✅ Good: timeout prevents infinite wait -->
<state id="waiting_for_response">
<onentry>
<send event="timeout" delay="30s"/>
</onentry>
<transition event="response" target="process_response"/>
<transition event="timeout" target="error_handler"/>
</state>
2. Log State Transitions for Debugging
<state id="error_handler">
<onentry>
<log label="ERROR" expr="'Handler entered. Count: ' + retryCount + ', Error: ' + lastError"/>
</onentry>
</state>
3. Preserve Error Context
<datamodel>
<data id="errorHistory" expr="[]"/>
</datamodel>
<state id="error_handler">
<onentry>
<script>
errorHistory.push({
time: Date.now(),
error: lastError,
attempt: retryCount
});
</script>
</onentry>
</state>
4. Allow Manual Override
<state id="failed">
<!-- Normal reset -->
<transition event="reset" target="idle"/>
<!-- Manual override for operators -->
<transition event="force_retry" target="processing">
<assign location="retryCount" expr="0"/>
<log label="Override" expr="'Manual retry triggered'"/>
</transition>
</state>
Summary
| Pattern | Use Case | Key Elements |
|---|---|---|
| Basic Retry | Transient failures | Counter + conditional transition |
| Exponential Backoff | Rate limiting, API calls | Dynamic delay calculation |
| Circuit Breaker | Prevent cascade failures | Failure counter + timeout |
| Error Routing | Different error types | Event wildcards + specific handlers |
Files
| File | Description |
|---|---|
| error-handling.scxml | SCXML source file |
| error-handling-player.html | Interactive demo |
Next Steps
- Done Events - Compound state completion patterns
- Deep History - State restoration after errors
- Invoke Feature - Child machine error handling
- W3C Compliance - Error event specifications