Overview
Percy is the text-to-speech server used by SanMarcSoft projects. It runs Qwen3-TTS for high-quality voice synthesis. It is deployed as a Docker container on the NAS (ai.matthewstevens.org).
Architecture
- Model: Qwen3-TTS (large TTS model)
- Runtime: Python with model inference
- Deployment: Docker container on ai.matthewstevens.org
- Access: Internal network only (not exposed to internet)
Voice Management
Available Voices
Percy supports multiple voice profiles. Each voice has a reference audio sample and configuration.
Adding a New Voice
Prepare reference audio: Record or obtain a clean audio sample (10-30 seconds, WAV format, 16kHz+)
Add to voice directory:
| |
Update voice configuration: Add the voice profile to the configuration file with:
- Voice name
- Reference audio path
- Speaker characteristics (optional)
- Default parameters (speed, pitch)
Restart the service:
| |
- Test the new voice:
| |
Moneypenny Integration
The Moneypenny agent uses Percy for voice synthesis. The agent image is on Docker Hub (applepublicdotcom):
| |
Deployment
Percy runs on ai.matthewstevens.org (the build/inference server with GPU):
| |
GPU Requirements
Qwen3-TTS requires GPU acceleration for reasonable inference speed. The ai server has the necessary GPU.
API Endpoints
| Method | Path | Description |
|---|---|---|
| POST | /synthesize | Text-to-speech synthesis |
| GET | /voices | List available voices |
| GET | /health | Health check |
Synthesis Request
| |
Troubleshooting
- Slow synthesis: Check GPU utilization. If GPU is at 100%, another process may be using it.
- Voice not found: Verify the voice directory exists and contains
reference.wav. Restart after adding. - Container won’t start: Check GPU driver compatibility.
nvidia-smishould work inside the container. - Out of memory: Qwen3-TTS is a large model. Ensure sufficient GPU VRAM is available.