Speaker Diarization

Stream configuration template ids

production all languages: 669fee048452492b62cfad39
production Romanian-only

Input data

multimedia content: wav, mp3, mp4, webm, flv, etc.
public http or https link to multimedia file encoded in UTF-8

Egress responses

Egress Response

Export response

Audio Intelligence Response

Parameters

Transcription

language

string | list

default:"none"

The language as country ISO code to transcribe the audio to. When not specified, it is automatically detected. When multiple languages are specified, the model will auto-detect one of the specified languages.If multiple languages are specified, the language will perform multilingual code switching between the specified languages. If none is specified, the model will switch between all supported languages.To provide multiple languages in the URL, simply add multiple language parameters. For example: language=en&language=es.

noSpeechThreshold

float

default:"0.5"

The threshold value to consider a segment as a speech segment. The value ranges from 0 to 1.

vad

bool

default:"true"

Enable or disable voice activity detection. Possible values: true or false.

splitStereo

bool

default:"false"

Enable or disable splitting stereo audio into two mono audio streams. Possible values: true or false.

The splitStereo parameter should not be used in this configuration

Speaker diarization

speakersNumber

int

default:"none"

The number of speakers to diarize the audio to. When not specified, it is automatically detected.

minSpeakersNumber

int

default:"none"

The minimum number of speakers to diarize the audio to.

maxSpeakersNumber

int

default:"none"

The maximum number of speakers to diarize the audio to.

Audio Intelligence

The audio intelligence layer operates only over the transcription responses, excluding the diarization information.

summary

bool

default:"false"

Create a summary based on the upstream content.

summaryLength

string

default:"brief"

Summary length. Options are: brief or detailed.

summaryTone

string

default:"conversational"

Summary tone. Options are: conversational or informative.

summaryStructure

string

default:"paragraphs"

Summary structure. Options are: paragraphs or bullet_points.

sentimentAnalysis

bool

default:"false"

Perform sentiment analysis on the upstream content.

ask[0-N]

string

default:"none"

Specifies the custom prompt content for one of the ask0, ask1, …, askN ask anything slots. When the content is specified, the prompt is considered activated. Otherwise, it is deactivated.

ask[0-N]System

string

default:"none"

Specifies the custom system prompt for one of the ask0System, ask1System, …, askNSystem ask anything slots.

ask[0-N]Id

string

default:"none"

Specifies the prompt id for one of the ask0Id, ask1Id, …, askNId ask anything slots. The role of the prompt id is to identify the prompt in the responses. When unspecified, it will fallback on the index of the slot (e.g. "0", "1").

ask[0-N]Format

string

default:"none"

Specifies the prompt response JSON Schema encoded as a string for one of the ask0Format, ask1Format, …, askNFormat ask anything slots.When unspecified, the response not be structured in any particular way. When specified, the response will be a JSON object encoded as a string.

enhancedTranscription

bool

default:"false"

Enable transcription enhancement.

etModel

[standard|enhanced]

default:"standard"

Balance the speed-accuracy ration of the transcription enhancing model.

etVocabulary

string[]

default:"null"

Provide a list of words to enhance the transcription. Those can be domain-specific terms that the model should pay special attention to.

etSystemPrompt

string

default:"null"

Provide a system prompt to guide the transcription enhancement model. This can help the model understand the context or specific requirements for the transcription.

Vatis Docs

Speech to Text

Audio Intelligence

Infrastructure

Integration

Stream configuration template ids

Input data

Egress responses

Export response

Parameters

Vatis Docs

Speech to Text

Audio Intelligence

Infrastructure

Integration

​ Stream configuration template ids

​ Input data

​ Egress responses

​ Export response

​ Parameters

Stream configuration template ids

Input data

Egress responses

Export response

Parameters