π Wyoming Protocol Specification
Wyoming is a binary-safe event protocol optimized for streaming. Each message consists of:
- A single-line JSON header
- Optional binary data (e.g. structured config)
- Optional binary payload (e.g. audio stream)
π§± Message Format
{ "type": "...", "data_length": ..., "payload_length": ... }\n
<data bytes> (optional)
<payload bytes> (optional)
type: The event type (e.g.audio-start,transcribe)data_length: Length in bytes of optional metadata (JSON)payload_length: Length in bytes of binary payload
ποΈ Example: Audio Stream
{ "type": "audio-chunk", "data_length": 0, "payload_length": 3200 }
Followed by 3200 bytes of PCM audio.
ποΈ Streaming Audio Flow (Mermaid)
sequenceDiagram
participant Mic
participant Wake
participant ASR
Mic->>Wake: AudioStart
Mic-->>Wake: AudioChunk (stream)
Wake-->>ASR: Detection (wakeword)
Mic->>ASR: AudioStart
Mic-->>ASR: AudioChunk (stream)
Mic->>ASR: AudioStop
ASR-->>Mic: Transcript
π¦ Message Lifecycle
Each Wyoming component speaks the same protocol. Typical flows include:
ASR:
TranscribeβAudioStart+AudioChunk+AudioStopβTranscript
TTS:
SynthesizeβAudioStart+AudioChunk+AudioStop
Wakeword:
DetectβAudioChunk*βDetectionorNotDetected
π Streaming vs One-Shot
Unlike HTTP-based APIs, Wyoming supports true streaming:
- Send audio while itβs being recorded
- Receive results as they become available
- Enables real-time voice interaction
π Error Handling
Errors are returned as Error events:
{ "type": "error", "data": { "message": "Invalid audio format" } }
Clients should handle unknown types gracefully and support reconnecting to services.
π Compatibility
- All messages are newline-delimited JSON
- Binary segments follow directly after header
- Works across languages and platforms