Skip to main content
For the endpoint contract see Music API. This guide covers the practical side: picking a model, prompting, operation modes, costs.

Sample output

30-second clip generated with lyria-3-clip-preview (Google Lyria). Prompt: “Uplifting modern electronic track with warm synths, gentle beat, optimistic mood.”

Pick a model

ModelStrong forDurationNotes
lyria-3-clip (Google)Instrumentals, atmosphere, demosFixed 30 sEnglish-only prompts; fast; no vocals
lyria-3-pro (Google)Longer, higher-fidelity instrumentalsUp to ~2 minEnglish-only; richer mix
suno-v4Full songs with vocals, any genre30 s – 5 minLyrics + singing supported
suno-v5Latest Suno — better vocals, sound effects30 s – 5 minSupports sounds operation
Filter the catalog: GET /v1/models?modality=music. Each model exposes allowedParams.response_formats, allowedParams.operations, and allowedParams.default_params. Rule of thumb:
  • Background / UI / ad beds → Lyria (instrumental, cheap, fast).
  • Songs with lyrics, creator content → Suno.
  • Sound effects, short stingers → Suno V5 sounds operation.

Prompting that renders

Music models are literal about genre, instruments and mood — but imprecise about tempo and key unless forced.
[mood/energy], [genre/style], [key instruments], [production notes], [tempo/key if needed]

Example that works:
Bright, optimistic indie-pop,
ukulele and warm acoustic guitar,
hand-claps and soft kick drum, whistled melody,
modern clean mix,
around 110 BPM, major key, no vocals
Tips:
  • Say “no vocals” explicitly when you want an instrumental — Suno defaults to vocals.
  • Reference era + region for style anchoring (”90s British trip-hop”, “early-2000s West Coast hip-hop”).
  • Describe production, not just genre — “lo-fi tape warmth, sidechained pad, shuffled hi-hat” lands better than “chill beat”.
  • Specify BPM when it matters (ads, workouts). Models don’t always honour it, but without you get unpredictable tempo.
  • Avoid copyrighted artist names. Use stylistic descriptors instead (“operatic rock anthem in the vein of stadium classics”, not “in the style of Queen”).

Async like video

Music generation is asynchronous, same submit → poll pattern as video:
python
import time

def generate_music(prompt: str, model: str = "suno-v4") -> dict:
    job = client.post("/v1/music/generations", json={
        "model": model,
        "prompt": prompt,
        "duration_seconds": 60,
    }).json()

    while True:
        status = client.get(f"/v1/music/generations/{job['job_id']}").json()
        if status["status"] == "completed":
            return status["result"]
        if status["status"] == "failed":
            raise RuntimeError(status.get("error"))
        time.sleep(5)
Typical wall time:
  • Lyria clip (30 s): ~20–40 s
  • Lyria pro (2 min): ~60–120 s
  • Suno full song (3 min): ~60–90 s
Hard timeout is 1 hour. As with video, never block an HTTP request handler waiting for music — push to a queue.

Suno custom mode

By default Suno writes the lyrics for you from your prompt. For full control, enable custom_mode:
python
job = client.post("/v1/music/generations", json={
    "model": "suno-v5",
    "prompt": "Upbeat product launch anthem",
    "custom_mode": True,
    "title": "We Ship",
    "style": "Stadium pop-rock, anthemic",
    "lyrics": "[Verse 1]\nToday we push it live ...\n[Chorus]\nWe ship, we ship ...",
    "instrumental": False,
    "vocal_gender": "f",
    "negative_tags": "Heavy Metal, Screaming",
    "duration_seconds": 120,
}).json()
Relevant Suno-only params:
  • custom_mode: true — unlocks title, style, lyrics
  • style — genre/style string (max 1000 chars)
  • lyrics — bring your own lyrics (max ~5000 chars), use [Verse] / [Chorus] / [Bridge] section tags for structure
  • instrumental: true — generate without vocals even with lyrics supplied
  • vocal_genderm or f
  • negative_tags — styles to steer away from
  • style_weight, weirdness_constraint, audio_weight — 0.0–1.0 dials
  • persona_id + persona_model — reuse a stylistic or vocal persona
See Music API → Parameters for the full list.

Operation modes (Suno)

Suno supports several operations beyond plain generation, dispatched via the operation field:
OperationWhat it doesRequired fields
generate (default)Text-to-musicprompt
extendContinue an existing trackaudio_id, optional continue_at
upload_coverCover song from uploaded audioupload_url
upload_extendExtend uploaded audioupload_url
add_instrumentalMake instrumental versionupload_url, tags
add_vocalsAdd vocals to instrumentalupload_url
vocal_removalSeparate vocals / stemsaudio_id, task_id, separation_type
sounds (V5)Sound effect with BPM/key/loopprompt, sound_loop, sound_tempo, sound_key
lyricsGenerate lyrics only (no audio)prompt
Stems (from vocal_removal) are returned as additional items in result.data.

Persistence

result.data[].url is a signed URL into our private storage with a 7-day expiry. For long-term keeping:
python
import httpx
mp3 = httpx.get(result["data"][0]["url"]).content
f = client.files.create(file=("track.mp3", mp3), purpose="user_data")
# Now f.id is a forever-stable handle
If GCS upload fails on our side (very rare), the response falls back to inline b64_audio for that track — decode and save yourself:
python
import base64
if "b64_audio" in track:
    open("track.mp3", "wb").write(base64.b64decode(track["b64_audio"]))

Costs at a glance

Per 30-second clip / per 1-minute song:
  • Lyria 3 clip: ~10 credits per 30 s
  • Lyria 3 pro: ~25 credits per minute
  • Suno v4: ~30 credits per minute
  • Suno v5: ~40 credits per minute
  • Vocal removal (separate_vocal): ~10 credits; split_stem (up to 12 stems): ~30 credits
Your invoice line shows model + operation + duration. Prototype on Lyria clip; promote to Suno for deliverables that need vocals.

Formats

  • Default: mp3 (~128–192 kbps, universally supported, small enough to stream).
  • wav is supported on Lyria 3 Pro and all Suno models — use for post-production editing only, files are ~10× larger.

Pitfalls

  • Non-English prompts to Lyria → 400 with a clear message. Translate or use Suno instead.
  • Copyrighted artists/songs — refusal from upstream; not retryable by fallback. Describe the sound, not the artist.
  • Suno instrumental: true + lyrics provided — lyrics are ignored, the track is instrumental. Don’t pay for generation that ignores a key input; set one or the other.
  • Polling too fast — stick to 5-second intervals. Faster polling won’t speed up generation, it’ll just eat your RPM.
  • Storing 5-minute tracks as base64 in JSON — the gateway already offloads to GCS and returns url; always prefer url over b64_audio when both are present.