TTS Server (Percy) | SanMarcSoft SOP Runbook

Overview

Percy is the text-to-speech server used by SanMarcSoft projects. It runs Qwen3-TTS for high-quality voice synthesis. It is deployed as a Docker container on the NAS (ai.matthewstevens.org).

Architecture

Model: Qwen3-TTS (large TTS model)
Runtime: Python with model inference
Deployment: Docker container on ai.matthewstevens.org
Access: Internal network only (not exposed to internet)

Voice Management

Available Voices

Percy supports multiple voice profiles. Each voice has a reference audio sample and configuration.

Adding a New Voice

Prepare reference audio: Record or obtain a clean audio sample (10-30 seconds, WAV format, 16kHz+)
Add to voice directory:

1
2
ssh ai
cp reference-audio.wav /path/to/percy/voices/<voice-name>/reference.wav

Update voice configuration: Add the voice profile to the configuration file with:
- Voice name
- Reference audio path
- Speaker characteristics (optional)
- Default parameters (speed, pitch)
Restart the service:

1
docker restart percy-tts

Test the new voice:

1
2
3
curl -X POST http://ai:8086/synthesize \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, this is a test.", "voice": "<voice-name>"}'

Moneypenny Integration

The Moneypenny agent uses Percy for voice synthesis. The agent image is on Docker Hub (applepublicdotcom):

1
2
# Moneypenny references Percy at the internal network address
PERCY_URL="http://ai.matthewstevens.org:8086"

Deployment

Percy runs on ai.matthewstevens.org (the build/inference server with GPU):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
ssh ai
export PATH=/usr/local/bin:/opt/homebrew/bin:$PATH

docker run -d \
  --name percy-tts \
  --restart unless-stopped \
  -p 8086:8080 \
  --gpus all \
  -v /path/to/voices:/app/voices \
  applepublicdotcom/percy-tts:latest

GPU Requirements

Qwen3-TTS requires GPU acceleration for reasonable inference speed. The ai server has the necessary GPU.

API Endpoints

Method	Path	Description
POST	`/synthesize`	Text-to-speech synthesis
GET	`/voices`	List available voices
GET	`/health`	Health check

Synthesis Request

1
2
3
4
5
6
7
8
9
curl -X POST http://ai:8086/synthesize \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello world",
    "voice": "default",
    "speed": 1.0,
    "format": "wav"
  }' \
  -o output.wav

Troubleshooting

Slow synthesis: Check GPU utilization. If GPU is at 100%, another process may be using it.
Voice not found: Verify the voice directory exists and contains reference.wav. Restart after adding.
Container won’t start: Check GPU driver compatibility. nvidia-smi should work inside the container.
Out of memory: Qwen3-TTS is a large model. Ensure sufficient GPU VRAM is available.