TTS Server (Percy)

Percy TTS server: Qwen3-TTS, voice management, adding custom voices, and Moneypenny integration

Overview

Percy is the text-to-speech server used by SanMarcSoft projects. It runs Qwen3-TTS for high-quality voice synthesis. It is deployed as a Docker container on the NAS (ai.matthewstevens.org).

Architecture

  • Model: Qwen3-TTS (large TTS model)
  • Runtime: Python with model inference
  • Deployment: Docker container on ai.matthewstevens.org
  • Access: Internal network only (not exposed to internet)

Voice Management

Available Voices

Percy supports multiple voice profiles. Each voice has a reference audio sample and configuration.

Adding a New Voice

  1. Prepare reference audio: Record or obtain a clean audio sample (10-30 seconds, WAV format, 16kHz+)

  2. Add to voice directory:

1
2
ssh ai
cp reference-audio.wav /path/to/percy/voices/<voice-name>/reference.wav
  1. Update voice configuration: Add the voice profile to the configuration file with:

    • Voice name
    • Reference audio path
    • Speaker characteristics (optional)
    • Default parameters (speed, pitch)
  2. Restart the service:

1
docker restart percy-tts
  1. Test the new voice:
1
2
3
curl -X POST http://ai:8086/synthesize \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, this is a test.", "voice": "<voice-name>"}'

Moneypenny Integration

The Moneypenny agent uses Percy for voice synthesis. The agent image is on Docker Hub (applepublicdotcom):

1
2
# Moneypenny references Percy at the internal network address
PERCY_URL="http://ai.matthewstevens.org:8086"

Deployment

Percy runs on ai.matthewstevens.org (the build/inference server with GPU):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
ssh ai
export PATH=/usr/local/bin:/opt/homebrew/bin:$PATH

docker run -d \
  --name percy-tts \
  --restart unless-stopped \
  -p 8086:8080 \
  --gpus all \
  -v /path/to/voices:/app/voices \
  applepublicdotcom/percy-tts:latest

GPU Requirements

Qwen3-TTS requires GPU acceleration for reasonable inference speed. The ai server has the necessary GPU.

API Endpoints

MethodPathDescription
POST/synthesizeText-to-speech synthesis
GET/voicesList available voices
GET/healthHealth check

Synthesis Request

1
2
3
4
5
6
7
8
9
curl -X POST http://ai:8086/synthesize \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello world",
    "voice": "default",
    "speed": 1.0,
    "format": "wav"
  }' \
  -o output.wav

Troubleshooting

  • Slow synthesis: Check GPU utilization. If GPU is at 100%, another process may be using it.
  • Voice not found: Verify the voice directory exists and contains reference.wav. Restart after adding.
  • Container won’t start: Check GPU driver compatibility. nvidia-smi should work inside the container.
  • Out of memory: Qwen3-TTS is a large model. Ensure sufficient GPU VRAM is available.