Skip to main content
POST
/
v1
/
audio
/
speech
Text-to-speech
curl --request POST \
  --url https://api.infery.ai/v1/audio/speech \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "input": "<string>",
  "voice": "<string>",
  "response_format": "mp3",
  "speed": 1
}
'
"/samples/tts.wav"
curl https://api.infery.ai/v1/audio/speech \
  -H "Authorization: Bearer $INFERY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello from Infery",
    "voice": "alloy"
  }' --output out.mp3
Response is the audio binary (audio/mpeg / audio/wav depending on response_format).

Sample output

Generated with gemini-2.5-flash-preview-tts, voice Kore. Download tts.wav.

Parameters

  • voice — model-dependent (alloy, echo, onyx, nova, shimmer, etc.)
  • response_formatmp3, wav, opus, flac, pcm
  • speed — 0.25–4.0

Authorizations

Authorization
string
header
required

API key in format: Bearer inf_***

Body

application/json
model
string
required

Model ID to use for TTS

input
string
required

Text to synthesize into speech

voice
string
required

Voice to use for synthesis

response_format
enum<string>
default:mp3
Available options:
mp3,
opus,
aac,
flac
speed
number
default:1

Speed of the generated audio (0.25 to 4.0)

Response

Binary audio stream. Content-Type reflects the requested response_format: audio/mpeg (mp3, default), audio/wav, audio/ogg (opus), audio/flac, audio/aac, or audio/pcm. Credits deducted are returned in the x-credits-used response header.

The response is of type file.