# Directory Structure
```
├── .gitignore
├── .python-version
├── main.py
├── pyproject.toml
├── README.md
├── tts-mcp.py
└── uv.lock
```
# Files
--------------------------------------------------------------------------------
/.python-version:
--------------------------------------------------------------------------------
```
3.10
```
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
```
# Python-generated files
__pycache__/
*.py[oc]
build/
dist/
wheels/
*.egg-info
# Virtual environments
.venv
```
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
```markdown
# Kokoro TTS MCP Server
A Model Context Protocol (MCP) server that provides text-to-speech capabilities using the Kokoro TTS engine. This server exposes TTS functionality through MCP tools, making it easy to integrate speech synthesis into your applications.
## Prerequisites
- Python 3.10 or higher
- `uv` package manager
## Installation
1. First, install the `uv` package manager:
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
2. Clone this repository and install dependencies:
```bash
uv venv
source .venv/bin/activate # On Windows, use: .venv\Scripts\activate
uv pip install .
```
## Features
- Text-to-speech synthesis with customizable voices
- Adjustable speech speed
- Support for saving audio to files or direct playback
- Cross-platform audio playback support (Windows, macOS, Linux)
## Usage
The server provides a single MCP tool `generate_speech` with the following parameters:
- `text` (required): The text to convert to speech
- `voice` (optional): Voice to use for synthesis (default: "af_heart")
- `speed` (optional): Speech speed multiplier (default: 1.0)
- `save_path` (optional): Directory to save audio files
- `play_audio` (optional): Whether to play the audio immediately (default: False)
### Example Usage
```python
from mcp.client import Client
async with Client() as client:
await client.connect("kokoro-tts")
# Generate and play speech
result = await client.call_tool(
"generate_speech",
{
"text": "Hello, world!",
"voice": "af_heart",
"speed": 1.0,
"play_audio": True
}
)
```
## Dependencies
- kokoro >= 0.8.4
- mcp[cli] >= 1.3.0
- soundfile >= 0.13.1
## Platform Support
Audio playback is supported on:
- Windows (using `start`)
- macOS (using `afplay`)
- Linux (using `aplay`)
## MCP Configuration
Add the following configuration to your MCP settings file:
```json
{
"mcpServers": {
"kokoro-tts": {
"command": "/Users/giannisan/pinokio/bin/miniconda/bin/uv",
"args": [
"--directory",
"/Users/giannisan/Documents/Cline/MCP/kokoro-tts-mcp",
"run",
"tts-mcp.py"
]
}
}
}
```
## License
[Add your license information here]
```
--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
```python
def main():
print("Hello from kokoro-tts-mcp!")
if __name__ == "__main__":
main()
```
--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
```toml
[project]
name = "kokoro-tts-mcp"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
"kokoro>=0.8.4",
"mcp[cli]>=1.3.0",
"soundfile>=0.13.1",
]
```
--------------------------------------------------------------------------------
/tts-mcp.py:
--------------------------------------------------------------------------------
```python
import sys
import os
import logging
import subprocess
import tempfile
from typing import List
import torch
import soundfile as sf
from kokoro import KPipeline
from mcp.server.fastmcp import FastMCP
from pathlib import Path
# Disable ALL logging
logging.disable(logging.CRITICAL)
logging.getLogger().setLevel(logging.CRITICAL)
logging.captureWarnings(True)
# Initialize components
mcp = FastMCP("kokoro-tts")
pipeline = KPipeline(lang_code='a')
def _play_audio(path: Path):
"""Silent audio playback"""
try:
if sys.platform == "win32":
subprocess.call(["start", str(path)], shell=True)
elif sys.platform == "darwin":
subprocess.call(["afplay", str(path)])
else:
subprocess.call(["aplay", str(path)])
except:
pass
@mcp.tool()
async def generate_speech(
text: str,
voice: str = "af_heart",
speed: float = 1.0,
save_path: str = None,
play_audio: bool = False
) -> List[dict]:
results = []
voice_tensor = None
if isinstance(voice, str) and Path(voice).exists():
try:
voice_tensor = torch.load(voice, weights_only=True)
except:
raise ValueError("Invalid voice tensor")
if save_path:
(save_path := Path(save_path)).mkdir(parents=True, exist_ok=True)
try:
generator = pipeline(text, voice=voice_tensor or voice, speed=speed, split_pattern=r'\n+')
except:
raise RuntimeError("TTS failed")
with tempfile.TemporaryDirectory() as tmp_dir:
for i, (graphemes, _, audio) in enumerate(generator):
audio_numpy = audio.cpu().numpy()
if save_path:
sf.write(save_path/f'segment_{i}.wav', audio_numpy, 24000)
if play_audio:
# Fixed temp_path definition
temp_path = Path(tmp_dir) / f'segment_{i}.wav'
sf.write(temp_path, audio_numpy, 24000)
_play_audio(temp_path)
results.append({'text': graphemes})
return results
if __name__ == "__main__":
try:
mcp.run(transport=os.getenv("MCP_TRANSPORT", "stdio"))
except:
sys.exit(1)
```