Text-to-speech

curl --request POST \ --url https://api.infery.ai/v1/audio/speech \ --header 'Authorization: <api-key>' \ --header 'Content-Type: application/json' \ --data ' { "model": "<string>", "input": "<string>", "voice": "<string>", "response_format": "mp3", "speed": 1 } '

curl https://api.infery.ai/v1/audio/speech \ -H "Authorization: Bearer $INFERY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "tts-1", "input": "Hello from Infery", "voice": "alloy" }' --output out.mp3

Parameters

voice — model-dependent (alloy, echo, onyx, nova, shimmer, etc.)

response_format — mp3, wav, opus, flac, pcm

speed — 0.25–4.0

Authorizations

Authorization

string

header

required

API key in format: Bearer inf_***

Body

application/json

model

string

required

Model ID to use for TTS

input

string

required

Text to synthesize into speech

voice

string

required

Voice to use for synthesis

response_format

enum<string>

default:mp3

Available options:

mp3,

opus,

aac,

flac

speed

number

default:1

Speed of the generated audio (0.25 to 4.0)

Response

Binary audio stream. Content-Type reflects the requested response_format: audio/mpeg (mp3, default), audio/wav, audio/ogg (opus), audio/flac, audio/aac, or audio/pcm. Credits deducted are returned in the x-credits-used response header.

The response is of type file.

Overview

Chat Completions

Embeddings

Images

Audio

Video

Music

Files

Models

Sample output

Parameters

Authorizations

Body

Response

Overview

Chat Completions

Embeddings

Images

Audio

Video

Music

Files

Models

​Sample output

​Parameters

Authorizations

Body

Response

Sample output

Parameters