Skill
XTTS (Text-to-Speech)
XTTS-v2 Text-to-Speech Integration Skill
Overview
Voice synthesis with XTTS-v2 on Ubuntu Server. Supports 16+ languages including Arabic with voice cloning.
Model: XTTS-v2 (Coqui TTS) Arabic Support: ✅ Excellent Voice Cloning: ✅ Yes Speed: 1-3 seconds per sentence
TypeScript Implementation
import { JobClient } from './job-management-skill';
interface TTSParams {
text: string;
language?: string;
speed?: number;
speaker_wav?: string | null;
}
class TTSClient {
private jobClient: JobClient;
constructor(token: string, baseURL = 'https://api.proyaro.com') {
this.jobClient = new JobClient(baseURL, token);
}
async synthesize(params: TTSParams): Promise<{ audio_url: string; duration: number }> {
const job = await this.jobClient.createJob({
job_type: 'text_to_speech',
parameters: {
text: params.text,
language: params.language || 'ar',
speed: params.speed || 1.0,
speaker_wav: params.speaker_wav || null,
},
});
const result = await this.jobClient.waitForJob(job.id);
return {
audio_url: `${this.jobClient['baseURL']}${result.result_data.audio_url}`,
duration: result.result_data.duration,
};
}
}
// Usage
const tts = new TTSClient('your-token');
const { audio_url } = await tts.synthesize({
text: 'مرحبا، كيف حالك؟',
language: 'ar',
speed: 1.0,
});
const audio = new Audio(audio_url);
audio.play();
React Hook
export function useTTS(token: string) {
const [synthesizing, setSynthesizing] = useState(false);
const synthesize = useCallback(async (text: string, language = 'ar') => {
setSynthesizing(true);
try {
const client = new TTSClient(token);
return await client.synthesize({ text, language });
} finally {
setSynthesizing(false);
}
}, [token]);
return { synthesize, synthesizing };
}
Best Practices
Text Limits:
- Max: 1000 characters
- Recommended: 100-300 characters per request
- Split long text into sentences
Languages: ar, en, es, fr, de, it, pt, pl, tr, ru, nl, cs, zh-cn, ja, ko, hu
Speed Range: 0.5 (slow) to 2.0 (fast), 1.0 = normal
Version: 1.0
ProYaro AI Infrastructure Documentation • Version 1.2