Error Handling & Retry Patterns

Error Handling & Retry Patterns

Build robust state machines with automatic retry logic, error recovery, and graceful degradation. This example demonstrates production-ready patterns for handling failures in state machines.

What you'll learn:

  • Error event handling with error.* wildcards
  • Automatic retry with configurable limits
  • Exponential backoff patterns
  • Graceful failure states and recovery options
  • Conditional transitions for error routing

Try It

Click Start, then process to begin. Use success, failure, or timeout to simulate different outcomes:

How to explore:

  1. Click process to start the operation
  2. Wait for operation_start (500ms delay), then send:
    • success - Operation completes successfully
    • failure - Triggers retry logic (up to 3 attempts)
    • timeout - Same as failure, triggers retry
  3. During retry wait, you can send cancel to abort
  4. After max retries, send force_retry to try again
  5. reset returns to idle from any terminal state

The Retry Pattern

A robust retry mechanism handles transient failures while preventing infinite loops:

                    ┌─────────────┐
                    │    idle     │
                    └──────┬──────┘
                           │ process
                           ▼
                    ┌─────────────┐
              ┌────▶│  processing │◀────────────────────┐
              │     └──────┬──────┘                     │
              │            │                            │
              │    success │    failure/timeout/error.* │
              │            │                            │
              │            ▼                            │
              │     ┌─────────────┐              ┌──────┴──────┐
              │     │   success   │              │error_handler│
              │     └─────────────┘              └──────┬──────┘
              │                                        │
              │                          retryCount < max?
              │                          ┌─────┴─────┐
              │                         YES          NO
              │                          │            │
              │                          ▼            ▼
              │                   ┌────────────┐ ┌─────────┐
              │                   │ retry_wait │ │ failed  │
              │                   └──────┬─────┘ └─────────┘
              │                          │
              │                    retry (1s delay)
              └──────────────────────────┘

The SCXML Implementation

xml
<?xml version="1.0" encoding="UTF-8"?>
<scxml xmlns="http://www.w3.org/2005/07/scxml" version="1.0"
       datamodel="ecmascript" initial="idle" name="ErrorHandling">

  <datamodel>
    <data id="retryCount" expr="0"/>
    <data id="maxRetries" expr="3"/>
    <data id="lastError" expr="''"/>
    <data id="result" expr="null"/>
  </datamodel>

  <state id="idle">
    <onentry>
      <log label="Status" expr="'Ready. Send &quot;process&quot; to begin.'"/>
      <assign location="retryCount" expr="0"/>
      <assign location="lastError" expr="''"/>
    </onentry>
    <transition event="process" target="processing"/>
  </state>

  <state id="processing">
    <onentry>
      <log label="Status" expr="'Processing... (attempt ' + (retryCount + 1) + '/' + maxRetries + ')'"/>
      <send event="operation_start" delay="500ms"/>
    </onentry>

    <!-- Success path -->
    <transition event="success" target="success">
      <assign location="result" expr="_event.data ? _event.data.result : 'completed'"/>
      <log label="Success" expr="'Operation succeeded: ' + result"/>
    </transition>

    <!-- Multiple error paths - all route to handler -->
    <transition event="failure" target="error_handler">
      <assign location="lastError" expr="_event.data ? _event.data.message : 'Unknown error'"/>
    </transition>

    <!-- Wildcard catches all error.* events -->
    <transition event="error.*" target="error_handler">
      <assign location="lastError" expr="'System error: ' + _event.name"/>
    </transition>

    <transition event="timeout" target="error_handler">
      <assign location="lastError" expr="'Operation timed out'"/>
    </transition>
  </state>

  <state id="error_handler">
    <onentry>
      <log label="Error" expr="lastError"/>
      <assign location="retryCount" expr="retryCount + 1"/>
    </onentry>

    <!-- Conditional: retry if under limit -->
    <transition cond="retryCount &lt; maxRetries" target="retry_wait">
      <log label="Retry" expr="'Will retry (' + retryCount + '/' + maxRetries + ')'"/>
    </transition>

    <!-- Otherwise: fail permanently -->
    <transition cond="retryCount &gt;= maxRetries" target="failed">
      <log label="Failed" expr="'Max retries exceeded'"/>
    </transition>
  </state>

  <state id="retry_wait">
    <onentry>
      <log label="Status" expr="'Waiting before retry...'"/>
      <send event="retry" delay="1s"/>
    </onentry>
    <transition event="retry" target="processing"/>
    <transition event="cancel" target="cancelled"/>
  </state>

  <state id="success">
    <onentry>
      <log label="Complete" expr="'Operation completed successfully'"/>
    </onentry>
    <transition event="reset" target="idle"/>
  </state>

  <state id="failed">
    <onentry>
      <log label="Failed" expr="'Operation failed after ' + retryCount + ' attempts. Last error: ' + lastError"/>
    </onentry>
    <transition event="reset" target="idle"/>
    <transition event="force_retry" target="processing">
      <assign location="retryCount" expr="0"/>
    </transition>
  </state>

  <state id="cancelled">
    <onentry>
      <log label="Cancelled" expr="'Operation cancelled by user'"/>
    </onentry>
    <transition event="reset" target="idle"/>
  </state>
</scxml>

Key Concepts

Error Event Wildcards

The error.* pattern catches any event starting with "error.":

xml
<transition event="error.*" target="error_handler">
  <assign location="lastError" expr="'System error: ' + _event.name"/>
</transition>

This catches:

  • error.communication
  • error.platform
  • error.execution
  • Any other error.* event

Conditional Transitions

SCXML evaluates transitions in document order. The first matching transition wins:

xml
<!-- Check retry limit -->
<transition cond="retryCount &lt; maxRetries" target="retry_wait"/>
<transition cond="retryCount &gt;= maxRetries" target="failed"/>

Note: Use &lt; and &gt; for < and > in XML attributes.

Event Data Access

Extract information from event payloads:

xml
<transition event="failure" target="error_handler">
  <assign location="lastError" expr="_event.data ? _event.data.message : 'Unknown error'"/>
</transition>

The _event object contains:

  • _event.name - Event name (e.g., "failure")
  • _event.data - Event payload data
  • _event.type - Event type (internal/external)
  • _event.origin - Source of the event

Advanced Patterns

Exponential Backoff

Increase delay between retries to reduce load:

xml
<datamodel>
  <data id="retryCount" expr="0"/>
  <data id="baseDelay" expr="1000"/>  <!-- 1 second -->
</datamodel>

<state id="retry_wait">
  <onentry>
    <!-- Calculate exponential delay: 1s, 2s, 4s, 8s... -->
    <script>
      var delay = baseDelay * Math.pow(2, retryCount - 1);
      var maxDelay = 30000;  // Cap at 30 seconds
      delay = Math.min(delay, maxDelay);
    </script>
    <log label="Retry" expr="'Waiting ' + delay + 'ms before retry...'"/>
    <send event="retry" delayexpr="delay + 'ms'"/>
  </onentry>
  <transition event="retry" target="processing"/>
</state>

Circuit Breaker Pattern

Prevent repeated failures by "opening" the circuit:

xml
<datamodel>
  <data id="failureCount" expr="0"/>
  <data id="circuitOpen" expr="false"/>
  <data id="circuitOpenTime" expr="0"/>
</datamodel>

<state id="processing">
  <!-- Check if circuit is open -->
  <transition cond="circuitOpen" target="circuit_open">
    <log label="Circuit" expr="'Circuit breaker is OPEN - fast fail'"/>
  </transition>
  <!-- Normal processing... -->
</state>

<state id="error_handler">
  <onentry>
    <assign location="failureCount" expr="failureCount + 1"/>
    <!-- Open circuit after 5 consecutive failures -->
    <if cond="failureCount >= 5">
      <assign location="circuitOpen" expr="true"/>
      <assign location="circuitOpenTime" expr="Date.now()"/>
      <log label="Circuit" expr="'Circuit breaker OPENED'"/>
    </if>
  </onentry>
</state>

<state id="circuit_open">
  <onentry>
    <!-- Try to close circuit after 30 seconds -->
    <send event="circuit_check" delay="30s"/>
  </onentry>
  <transition event="circuit_check" target="half_open">
    <assign location="circuitOpen" expr="false"/>
  </transition>
</state>

Retry with Different Strategies

Route to different handlers based on error type:

xml
<state id="processing">
  <!-- Network errors: retry immediately -->
  <transition event="error.network" target="retry_wait"/>

  <!-- Auth errors: re-authenticate first -->
  <transition event="error.auth" target="authenticate"/>

  <!-- Validation errors: don't retry, user must fix -->
  <transition event="error.validation" target="validation_failed"/>

  <!-- Unknown errors: use standard retry logic -->
  <transition event="error.*" target="error_handler"/>
</state>

Code Generation

Java with ExecutorService

bash
scxml-gen error-handling.scxml -t java -o ErrorHandling.java --package com.example
java
import com.scxmlgen.runtime.executor.ContinuousStateMachineExecutor;

ErrorHandling sm = new ErrorHandling();

try (var executor = new ContinuousStateMachineExecutor(sm)) {
    executor.start();

    sm.send("process");
    Thread.sleep(600);  // Wait for operation_start

    // Simulate failure
    sm.send("failure", Map.of("message", "Connection refused"));

    // Let retry logic run
    Thread.sleep(5000);

    // Check final state
    if (sm.isInState("failed")) {
        System.out.println("Operation failed after retries");
    }
}

JavaScript with Async/Await

javascript
import { ErrorHandling } from './ErrorHandling.js';

const sm = new ErrorHandling();
sm.onStateChange(() => console.log('State:', [...sm.getActiveStateIds()]));
sm.start();

// Process with simulated external operation
sm.send('process');

// Simulate async operation result
setTimeout(() => {
    const success = Math.random() > 0.7;  // 30% success rate
    if (success) {
        sm.send('success', { result: 'Data processed' });
    } else {
        sm.send('failure', { message: 'Server unavailable' });
    }
}, 600);

C with Error Codes

c
#include "error_handling.h"

ErrorHandling sm;
ErrorHandling_init(&sm);
ErrorHandling_start(&sm);

// Start processing
ErrorHandling_send(&sm, EVT_PROCESS, NULL);

// In your main loop, handle external operation results
void handle_operation_result(int error_code) {
    if (error_code == 0) {
        ErrorHandling_send(&sm, EVT_SUCCESS, NULL);
    } else if (error_code == ETIMEDOUT) {
        ErrorHandling_send(&sm, EVT_TIMEOUT, NULL);
    } else {
        ErrorHandling_send(&sm, EVT_FAILURE, NULL);
    }
}

Testing Error Handling

Unit Test Strategy

java
@Test
void shouldRetryOnFailure() {
    ErrorHandling sm = new ErrorHandling();
    RunToCompletionStateMachineExecutor exec =
        new RunToCompletionStateMachineExecutor(sm);
    exec.start();

    sm.send("process");

    // First failure
    sm.send("failure");
    assertTrue(sm.isInState("error_handler") || sm.isInState("retry_wait"));

    // Process delayed events (retry timer)
    while (sm.processDelayedEvents()) {
        Thread.sleep(100);
    }

    // Should be back in processing for retry
    assertTrue(sm.isInState("processing"));
}

@Test
void shouldFailAfterMaxRetries() {
    ErrorHandling sm = new ErrorHandling();
    // ... setup ...

    // Fail 3 times
    for (int i = 0; i < 3; i++) {
        sm.send("process");
        sm.send("failure");
        // Process retry timers...
    }

    assertTrue(sm.isInState("failed"));
}

Best Practices

1. Always Have an Exit Path

Every state should eventually lead to a terminal or recovery state:

xml
<!-- ✅ Good: timeout prevents infinite wait -->
<state id="waiting_for_response">
  <onentry>
    <send event="timeout" delay="30s"/>
  </onentry>
  <transition event="response" target="process_response"/>
  <transition event="timeout" target="error_handler"/>
</state>

2. Log State Transitions for Debugging

xml
<state id="error_handler">
  <onentry>
    <log label="ERROR" expr="'Handler entered. Count: ' + retryCount + ', Error: ' + lastError"/>
  </onentry>
</state>

3. Preserve Error Context

xml
<datamodel>
  <data id="errorHistory" expr="[]"/>
</datamodel>

<state id="error_handler">
  <onentry>
    <script>
      errorHistory.push({
        time: Date.now(),
        error: lastError,
        attempt: retryCount
      });
    </script>
  </onentry>
</state>

4. Allow Manual Override

xml
<state id="failed">
  <!-- Normal reset -->
  <transition event="reset" target="idle"/>

  <!-- Manual override for operators -->
  <transition event="force_retry" target="processing">
    <assign location="retryCount" expr="0"/>
    <log label="Override" expr="'Manual retry triggered'"/>
  </transition>
</state>

Summary

Pattern Use Case Key Elements
Basic Retry Transient failures Counter + conditional transition
Exponential Backoff Rate limiting, API calls Dynamic delay calculation
Circuit Breaker Prevent cascade failures Failure counter + timeout
Error Routing Different error types Event wildcards + specific handlers

Files

File Description
error-handling.scxml SCXML source file
error-handling-player.html Interactive demo

Next Steps