Docs/Voice API

Voice API

Build interactive voice applications with text-to-speech, DTMF input, call recording, and real-time transcription.

Text-to-Speech

Natural-sounding voices in 40+ languages and accents.

Call Recording

Record calls with automatic transcription and storage.

IVR Builder

Create interactive voice menus with DTMF and speech input.

Quick Start

Make your first outbound call with text-to-speech. The API returns immediately while the call is placed asynchronously.

POST/v1/voice/calls
curl -X POST https://api.canarymsg.dev/v1/voice/calls \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '
  {
    "to": "+15551234567",
    "from": "+15559876543",
    "tts": {
      "text": "Hello! This is a call from Canary. Press 1 to confirm.",
      "voice": "en-US-Neural2-F"
    }
  }
'

Response

{
  "id": "call_abc123xyz",
  "to": "+15551234567",
  "from": "+15559876543",
  "status": "queued",
  "direction": "outbound",
  "created_at": "2024-01-01T12:00:00Z"
}

Request Parameters

ParameterTypeRequiredDescription
tostringRecipient phone number in E.164 format
fromstringCaller ID (must be a verified Canary number)
ttsobject*Text-to-speech configuration
audio_urlstring*URL of audio file to play (MP3, WAV)
recordbooleanOptionalEnable call recording
transcribebooleanOptionalEnable real-time transcription
gatherobjectOptionalGather DTMF or speech input
webhook_urlstringOptionalURL for call status webhooks
timeoutintegerOptionalRing timeout in seconds (default: 30)

* Either tts or audio_url is required

Text-to-Speech

Convert text to natural-sounding speech with support for multiple voices, languages, and SSML markup.

TTS Configuration

{
  "tts": {
    "text": "Hello! Your verification code is <say-as interpret-as='digits'>123456</say-as>",
    "voice": "en-US-Neural2-F",
    "speed": 1.0,
    "pitch": 0
  }
}

Available Voices

Voice IDLanguageGenderType
en-US-Neural2-FEnglish (US)FemaleNeural
en-US-Neural2-DEnglish (US)MaleNeural
en-GB-Neural2-AEnglish (UK)FemaleNeural
es-ES-Neural2-ASpanish (Spain)FemaleNeural
fr-FR-Neural2-AFrenchFemaleNeural

Gather DTMF Input

Collect keypad input from callers to build interactive voice menus. You can gather a specific number of digits or wait for a terminating key.

{
  "to": "+15551234567",
  "from": "+15559876543",
  "tts": {
    "text": "Please enter your 4-digit PIN followed by the pound key."
  },
  "gather": {
    "type": "dtmf",
    "num_digits": 4,
    "finish_on_key": "#",
    "timeout": 10,
    "webhook_url": "https://yourapp.com/webhooks/dtmf"
  }
}

Webhook Payload

{
  "call_id": "call_abc123xyz",
  "type": "gather.completed",
  "digits": "1234",
  "finished_on_key": "#"
}

Call Recording

Record calls for quality assurance, compliance, or training. Recordings are stored securely and can be transcribed automatically.

{
  "to": "+15551234567",
  "from": "+15559876543",
  "tts": {
    "text": "This call may be recorded for quality purposes."
  },
  "record": true,
  "transcribe": true,
  "recording_channels": "dual"
}

Get Recording

GET/v1/voice/calls/:id/recording
{
  "id": "rec_xyz789",
  "call_id": "call_abc123xyz",
  "duration": 45,
  "url": "https://recordings.canarymsg.dev/rec_xyz789.mp3",
  "transcription": {
    "text": "Hello, this is a recorded message...",
    "confidence": 0.95
  }
}

Call Status

StatusDescription
queuedCall is queued to be placed
ringingCall is ringing at destination
in-progressCall is connected and in progress
completedCall ended normally
busyDestination was busy
no-answerNo answer after timeout
failedCall failed to connect

SDK Examples

Node.js

import Canary from '@canary/node';

const canary = new Canary('YOUR_API_KEY');

const call = await canary.voice.call({
  to: '+15551234567',
  from: '+15559876543',
  tts: {
    text: 'Hello from Canary!',
    voice: 'en-US-Neural2-F'
  },
  record: true
});

console.log('Call initiated:', call.id);

Python

from canary import Canary

canary = Canary("YOUR_API_KEY")

call = canary.voice.call(
    to="+15551234567",
    from_="+15559876543",
    tts={
        "text": "Hello from Canary!",
        "voice": "en-US-Neural2-F"
    },
    record=True
)

print(f"Call initiated: {call.id}")

Next Steps