Skip to main content

Product overview

Real-Time Speech to Text

Agora's Real-Time Speech to Text (STT) transcribes live voice streams to deliver closed captions and transcription for enhanced accessibility. With advanced features like silent audio removal, it optimizes performance and reduces costs.

Transcribed text can be translated into multiple languages in real-time or used as input for AI models like GPT, seamlessly connecting real-time engagement with AI-powered applications.

Start building with

SDK quickstart

Customize your experience from the start with our flexible Video SDK.

API reference

Samples

Product Features

Product Icon

Screen sharing and collaboration

Enable screen sharing or interactive whiteboards that allow users to draw, annotate, and share content from multiple devices simultaneously.

Product Icon

Call Recording

Record video calls in the cloud or on premises with control over the format, path of storage, and quality.

Product Icon

Multiple audio and video tracks

Publish multiple audio and video tracks to one or more channels from a single instance, with support for multi-channel capture cameras and microphones.

Product Icon

High-quality video at scale

Consistent high-quality video from 1:1 calls to thousands of concurrent users, even under challenging network conditions.

Product Icon

AI-powered audio enhancement

Support for high quality audio with 3D spatial audio, AI noise suppression, and gain control to provide an immersive audio experience.

Product Icon

Global coverage

Agora’s software-defined, real-time network (SD-RTN) supports video users in over 200 countries and regions.

vundefined