Real-time Transcription

Stream configuration template ids

production: 670ba9e0efa59fe6aecd56f1

Input data

multimedia content: wav, mp3, mp4, webm, flv, etc.

Egress responses

Egress Response
- Transcription Response
- LLM Response

Export response

Audio Intelligence Response

Parameters

Transcription

language

string

default:"en"

The language as country ISO code to transcribe the audio to. When not specified, it is automatically detected.

multichannel

bool

default:"false"

Enable or disable multichannel audio processing. Possible values: true or false. When enabled, each channel is processed separately.

utteranceEnd

int

default:"1000"

The minimum waiting time in milliseconds after the last speech is detected before the emitting an utterance transcription message.

Raw data format parameters

Use these parameters only when sending raw data.

encoding

string

default:"none"

The encoding of the audio data. Accepted values:

linear16: 16-bit signed little-endian samples (Linear PCM)
flac: Free Lossless Audio Codec
mulaw: 8-bit samples (G.711 mu-law)
opus: OGG Opus

sampleRate

int

default:"none"

The sample rate of the audio data in Hertz (Hz).

channels

int

default:"none"

The number of audio channels in the audio data.

Audio Intelligence

summary

bool

default:"false"

Create a summary based on the upstream content.

summaryLength

string

default:"brief"

Summary length. Options are: brief or detailed.

summaryTone

string

default:"conversational"

Summary tone. Options are: conversational or informative.

summaryStructure

string

default:"paragraphs"

Summary structure. Options are: paragraphs or bullet_points.

sentimentAnalysis

bool

default:"false"

Perform sentiment analysis on the upstream content.

ask[0-N]

string

default:"none"

Specifies the custom prompt content for one of the ask0, ask1, …, askN ask anything slots. When the content is specified, the prompt is considered activated. Otherwise, it is deactivated.

ask[0-N]System

string

default:"none"

Specifies the custom system prompt for one of the ask0System, ask1System, …, askNSystem ask anything slots.

ask[0-N]Id

string

default:"none"

Specifies the prompt id for one of the ask0Id, ask1Id, …, askNId ask anything slots. The role of the prompt id is to identify the prompt in the responses. When unspecified, it will fallback on the index of the slot (e.g. "0", "1").

ask[0-N]Format

string

default:"none"

Specifies the prompt response JSON Schema encoded as a string for one of the ask0Format, ask1Format, …, askNFormat ask anything slots.

When unspecified, the response not be structured in any particular way. When specified, the response will be a JSON object encoded as a string.

Vatis Docs

Speech to Text

Audio Intelligence

Infrastructure

Integration

Stream configuration template ids

Input data

Egress responses

Export response

Parameters

Vatis Docs

Speech to Text

Audio Intelligence

Infrastructure

Integration

​ Stream configuration template ids

​ Input data

​ Egress responses

​ Export response

​ Parameters

Stream configuration template ids

Input data

Egress responses

Export response

Parameters