Skip to main content

Voices & Music

The Voices page is the standalone audio studio in AI Suite. It bundles three modes — Voice Cloning, Text-to-Speech, and Music — into a tabbed interface so you can produce dialogue, narration, and soundtracks without leaving the page.

Tabs

Voice Cloning

Clone a voice from a short audio sample. Once cloned, the voice can be used for text-to-speech generation or assigned to dialogue lines in the Video Generator Node for automatic lip-sync.

Use CaseCost
TTS voice clone1,500 credits (one-time)
Video lip-sync clone7 credits (one-time)

Cloned voices are stored at the workspace level and can be reused across all your projects.

Text-to-Speech

Generate spoken audio from a text script using a system voice or one of your cloned voices.

SettingDescription
VoiceChoose from 27 system voices or any voice you've cloned
ScriptThe text to be spoken (free-form)
ModelSpeech-02-HD

Cost: 10 credits per 100 characters.

Music

Generate full music tracks from a text prompt. Two tiers are available.

TierModelCost
StandardACE-Step~0.2 credits/second (minimum 1)
PremiumElevenLabs800 credits/minute

The Music tab is the same engine used by the Audio Generator Node in Flow Studio — choose the AI Suite version for one-off generations and the node version when you need music inside a workflow.

info

Music generation occasionally needs a cold start on the underlying provider. The Voices page automatically retries up to 3 times with a short backoff, so an initial wait of a few seconds is normal.

When to Use Voices vs the Audio Generator Node

Use CaseRecommended Surface
One-off voice clone for a single videoVoices page
Quick TTS take to test a scriptVoices page
Background track for a single exportVoices page
Repeated audio generation as part of a pipelineAudio Generator Node
Pairing audio with generated video via Video CombinerAudio Generator Node