Skip to main content
Voice agents

Voice agents overview

Learn how to build voice-enabled applications with the Speechmatics Voice SDK

The Voice SDK builds on our Realtime API to provide features optimized for conversational AI:

  • Intelligent segmentation: groups words into meaningful speech segments per speaker.
  • Turn detection: automatically detects when speakers finish talking.
  • Speaker management: focus on or ignore specific speakers in multi-speaker scenarios.
  • Preset configurations: offers ready-to-use settings for conversations, note-taking, and captions.
  • Simplified event handling: delivers clean, structured segments instead of raw word-level events.

Voice SDK vs Realtime SDK

Use the Voice SDK when:

  • Building conversational AI or voice agents
  • You need automatic turn detection
  • You want speaker-focused transcription
  • You need ready-to-use presets for common scenarios

Use the Realtime SDK when:

  • You need the raw stream of word-by-word transcription data
  • Building custom segmentation logic
  • You want fine-grained control over every event
  • Processing audio files or custom workflows

Getting started

1. Create an API key

Create an API key in the portal to access the Voice SDK. Store your key securely as a managed secret.

2. Install dependencies

# Standard installation
pip install speechmatics-voice

# With SMART_TURN (ML-based turn detection)
pip install speechmatics-voice[smart]

3. Configure

Replace YOUR_API_KEY with your actual API key from the portal:

import asyncio
import os
from speechmatics.rt import Microphone
from speechmatics.voice import VoiceAgentClient, AgentServerMessageType

async def main():
# Create client with preset
client = VoiceAgentClient(
api_key=os.getenv("YOUR_API_KEY"),
preset="scribe"
)

# Handle final segments
@client.on(AgentServerMessageType.ADD_SEGMENT)
def on_segment(message):
for segment in message["segments"]:
speaker = segment["speaker_id"]
text = segment["text"]
print(f"{speaker}: {text}")

# Setup microphone
mic = Microphone(sample_rate=16000, chunk_size=320)
if not mic.start():
print("Error: Microphone not available")
return

# Connect and stream
await client.connect()

try:
while True:
audio_chunk = await mic.read(320)
await client.send_audio(audio_chunk)
except KeyboardInterrupt:
pass
finally:
await client.disconnect()

if __name__ == "__main__":
asyncio.run(main())

FAQ

Implementation and deployment

Can I deploy this in my own environment?

Yes! The Voice SDK can be consumed via our managed service or deployed in your own environment. To learn more about on-premises deployment options, speak to sales.

Support

Where can I provide feedback or get help?

You can submit feedback, bug reports, or feature requests through the Speechmatics GitHub discussions.

Next steps

For more information, see the Voice SDK on github.

To learn more, check out the Speechmatics Academy.

Building something amazing

We'd love to hear about your project and help you succeed.

Get in touch with us:

  • Share your feedback and feature requests
  • Ask questions about implementation
  • Discuss enterprise pricing and custom voices
  • Report any issues or bugs you encounter

Contact our team or join our developer community to connect with other builders in voice AI.