# Directory Structure
```
├── .gitignore
├── .idea
│ ├── .gitignore
│ ├── inspectionProfiles
│ │ └── profiles_settings.xml
│ ├── misc.xml
│ ├── modules.xml
│ ├── vcs.xml
│ └── voice-recorder-mcp.iml
├── environment.yml
├── LICENSE
├── pyproject.toml
├── README.md
├── src
│ ├── voice_recorder
│ │ ├── __init__.py
│ │ ├── __pycache__
│ │ │ ├── __init__.cpython-311.pyc
│ │ │ ├── __init__.cpython-312.pyc
│ │ │ ├── audio_service.cpython-311.pyc
│ │ │ ├── audio_service.cpython-312.pyc
│ │ │ ├── config.cpython-311.pyc
│ │ │ ├── config.cpython-312.pyc
│ │ │ ├── server.cpython-311.pyc
│ │ │ └── server.cpython-312.pyc
│ │ ├── audio_service.py
│ │ ├── config.py
│ │ └── server.py
│ └── voice_recorder_mcp.egg-info
│ ├── dependency_links.txt
│ ├── entry_points.txt
│ ├── PKG-INFO
│ ├── SOURCES.txt
│ └── top_level.txt
└── uv.lock
```
# Files
--------------------------------------------------------------------------------
/.idea/.gitignore:
--------------------------------------------------------------------------------
```
1 | # Default ignored files
2 | /shelf/
3 | /workspace.xml
4 |
```
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
```
1 | .venv
2 | build
3 | src/voice_recorder_mcp.egg-info
4 | .idea
```
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
```markdown
1 | # Voice Recorder MCP Server
2 |
3 | An MCP server for recording audio and transcribing it using OpenAI's Whisper model. Designed to work as a Goose custom extension or standalone MCP server.
4 |
5 | ## Features
6 |
7 | - Record audio from the default microphone
8 | - Transcribe recordings using Whisper
9 | - Integrates with Goose AI agent as a custom extension
10 | - Includes prompts for common recording scenarios
11 |
12 | ## Installation
13 |
14 | ```bash
15 | # Install from source
16 | git clone https://github.com/DefiBax/voice-recorder-mcp.git
17 | cd voice-recorder-mcp
18 | pip install -e .
19 | ```
20 |
21 | ## Usage
22 |
23 | ### As a Standalone MCP Server
24 |
25 | ```bash
26 | # Run with default settings (base.en model)
27 | voice-recorder-mcp
28 |
29 | # Use a specific Whisper model
30 | voice-recorder-mcp --model medium.en
31 |
32 | # Adjust sample rate
33 | voice-recorder-mcp --sample-rate 44100
34 | ```
35 |
36 | ### Testing with MCP Inspector
37 |
38 | The MCP Inspector provides an interactive interface to test your server:
39 |
40 | ```bash
41 | # Install the MCP Inspector
42 | npm install -g @modelcontextprotocol/inspector
43 |
44 | # Run your server with the inspector
45 | npx @modelcontextprotocol/inspector voice-recorder-mcp
46 | ```
47 |
48 | ### With Goose AI Agent
49 |
50 | 1. Open Goose and go to Settings > Extensions > Add > Command Line Extension
51 | 2. Set the name to `voice-recorder`
52 | 3. In the Command field, enter the full path to the voice-recorder-mcp executable:
53 | ```
54 | /full/path/to/voice-recorder-mcp
55 | ```
56 |
57 | Or for a specific model:
58 | ```
59 | /full/path/to/voice-recorder-mcp --model medium.en
60 | ```
61 |
62 | To find the path, run:
63 | ```bash
64 | which voice-recorder-mcp
65 | ```
66 |
67 | 4. No environment variables are needed for basic functionality
68 | 5. Start a conversation with Goose and introduce the recorder with:
69 | "I want you to take action from transcriptions returned by voice-recorder. For example, if I dictate a calculation like 1+1, please return the result."
70 |
71 | ## Available Tools
72 |
73 | - `start_recording`: Start recording audio from the default microphone
74 | - `stop_and_transcribe`: Stop recording and transcribe the audio to text
75 | - `record_and_transcribe`: Record audio for a specified duration and transcribe it
76 |
77 | ## Whisper Models
78 |
79 | This extension supports various Whisper model sizes:
80 |
81 | | Model | Speed | Accuracy | Memory Usage | Use Case |
82 | |-------|-------|----------|--------------|----------|
83 | | `tiny.en` | Fastest | Lowest | Minimal | Testing, quick transcriptions |
84 | | `base.en` | Fast | Good | Low | Everyday use (default) |
85 | | `small.en` | Medium | Better | Moderate | Good balance |
86 | | `medium.en` | Slow | High | High | Important recordings |
87 | | `large` | Slowest | Highest | Very High | Critical transcriptions |
88 |
89 | The `.en` suffix indicates models specialized for English, which are faster and more accurate for English content.
90 |
91 | ## Requirements
92 |
93 | - Python 3.12+
94 | - An audio input device (microphone)
95 |
96 | ## Configuration
97 |
98 | You can configure the server using environment variables:
99 |
100 | ```bash
101 | # Set Whisper model
102 | export WHISPER_MODEL=small.en
103 |
104 | # Set audio sample rate
105 | export SAMPLE_RATE=44100
106 |
107 | # Set maximum recording duration (seconds)
108 | export MAX_DURATION=120
109 |
110 | # Then run the server
111 | voice-recorder-mcp
112 | ```
113 |
114 | ## Troubleshooting
115 |
116 | ### Common Issues
117 |
118 | - **No audio being recorded**: Check your microphone permissions and settings
119 | - **Model download errors**: Ensure you have a stable internet connection for the initial model download
120 | - **Integration with Goose**: Make sure the command path is correct
121 | - **Audio quality issues**: Try adjusting the sample rate (default: 16000)
122 |
123 | ## Contributing
124 |
125 | Contributions are welcome! Please feel free to submit a Pull Request.
126 |
127 | 1. Fork the repository
128 | 2. Create your feature branch (`git checkout -b feature/amazing-feature`)
129 | 3. Commit your changes (`git commit -m 'Add some amazing feature'`)
130 | 4. Push to the branch (`git push origin feature/amazing-feature`)
131 | 5. Open a Pull Request
132 |
133 | ## License
134 |
135 | This project is licensed under the MIT License - see the LICENSE file for details.
136 |
```
--------------------------------------------------------------------------------
/src/voice_recorder_mcp.egg-info/dependency_links.txt:
--------------------------------------------------------------------------------
```
1 |
2 |
```
--------------------------------------------------------------------------------
/src/voice_recorder_mcp.egg-info/top_level.txt:
--------------------------------------------------------------------------------
```
1 | voice_recorder
2 |
```
--------------------------------------------------------------------------------
/src/voice_recorder_mcp.egg-info/entry_points.txt:
--------------------------------------------------------------------------------
```
1 | [console_scripts]
2 | voice-recorder-mcp = voice_recorder:main
3 |
```
--------------------------------------------------------------------------------
/.idea/vcs.xml:
--------------------------------------------------------------------------------
```
1 | <?xml version="1.0" encoding="UTF-8"?>
2 | <project version="4">
3 | <component name="VcsDirectoryMappings">
4 | <mapping directory="" vcs="Git" />
5 | </component>
6 | </project>
```
--------------------------------------------------------------------------------
/.idea/inspectionProfiles/profiles_settings.xml:
--------------------------------------------------------------------------------
```
1 | <component name="InspectionProjectProfileManager">
2 | <settings>
3 | <option name="USE_PROJECT_PROFILE" value="false" />
4 | <version value="1.0" />
5 | </settings>
6 | </component>
```
--------------------------------------------------------------------------------
/environment.yml:
--------------------------------------------------------------------------------
```yaml
1 | #name: voice-recorder-mcp
2 | #channels:
3 | # - conda-forge
4 | # - defaults
5 | #dependencies:
6 | # - python>=3.10,<3.12
7 | # - pip
8 | # - numpy>=1.26.0
9 | # - pip:
10 | # - "mcp[cli]>=1.2.0"
11 | # - "git+https://github.com/openai/whisper.git"
12 | # - "sounddevice>=0.4.6"
13 | # - "nltk>=3.8.1"
14 |
```
--------------------------------------------------------------------------------
/src/voice_recorder/__init__.py:
--------------------------------------------------------------------------------
```python
1 | from .config import get_config
2 | from .server import mcp, audio_service
3 |
4 | def main():
5 | """Voice Recorder MCP: Record audio and transcribe using Whisper."""
6 | # Config is automatically loaded when server is imported
7 | mcp.run()
8 |
9 | __all__ = ["mcp", "audio_service", "main"]
```
--------------------------------------------------------------------------------
/.idea/modules.xml:
--------------------------------------------------------------------------------
```
1 | <?xml version="1.0" encoding="UTF-8"?>
2 | <project version="4">
3 | <component name="ProjectModuleManager">
4 | <modules>
5 | <module fileurl="file://$PROJECT_DIR$/.idea/voice-recorder-mcp.iml" filepath="$PROJECT_DIR$/.idea/voice-recorder-mcp.iml" />
6 | </modules>
7 | </component>
8 | </project>
```
--------------------------------------------------------------------------------
/.idea/misc.xml:
--------------------------------------------------------------------------------
```
1 | <?xml version="1.0" encoding="UTF-8"?>
2 | <project version="4">
3 | <component name="Black">
4 | <option name="sdkName" value="voice-recorder-mcp" />
5 | </component>
6 | <component name="ProjectRootManager" version="2" project-jdk-name="Python 3.12 virtualenv at ~/PycharmProjects/voice-recorder-mcp-pip/.venv" project-jdk-type="Python SDK" />
7 | </project>
```
--------------------------------------------------------------------------------
/src/voice_recorder_mcp.egg-info/SOURCES.txt:
--------------------------------------------------------------------------------
```
1 | LICENSE
2 | README.md
3 | pyproject.toml
4 | src/voice_recorder/__init__.py
5 | src/voice_recorder/audio_service.py
6 | src/voice_recorder/config.py
7 | src/voice_recorder/server.py
8 | src/voice_recorder_mcp.egg-info/PKG-INFO
9 | src/voice_recorder_mcp.egg-info/SOURCES.txt
10 | src/voice_recorder_mcp.egg-info/dependency_links.txt
11 | src/voice_recorder_mcp.egg-info/entry_points.txt
12 | src/voice_recorder_mcp.egg-info/requires.txt
13 | src/voice_recorder_mcp.egg-info/top_level.txt
```
--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
```toml
1 | [project]
2 | name = "voice-recorder-mcp"
3 | version = "0.1.0"
4 | description = "MCP server for voice recording and transcription"
5 | readme = "README.md"
6 | license = {text = "MIT"}
7 | requires-python = ">=3.10,<3.13" # Allow Python 3.11 and 3.12
8 | dependencies = [
9 | "mcp[cli]>=1.2.0",
10 | "sounddevice>=0.4.6",
11 | "numpy>=1.20.0,<2.0.0",
12 | "openai-whisper @ git+https://github.com/openai/whisper.git",
13 | ]
14 |
15 | [project.optional-dependencies]
16 | dev = [
17 | "pytest>=7.0.0",
18 | "black>=23.0.0",
19 | "isort>=5.12.0",
20 | ]
21 |
22 | [project.scripts]
23 | voice-recorder-mcp = "voice_recorder:main"
24 |
25 | [tool.setuptools]
26 | package-dir = {"" = "src"}
27 |
28 | [tool.setuptools.packages.find]
29 | where = ["src"]
30 |
31 | [tool.uv]
32 | package = true
```
--------------------------------------------------------------------------------
/src/voice_recorder/config.py:
--------------------------------------------------------------------------------
```python
1 | import os
2 | import argparse
3 | from dataclasses import dataclass
4 |
5 |
6 | @dataclass
7 | class Config:
8 | whisper_model: str = "base.en"
9 | sample_rate: int = 16000
10 | max_duration: int = 60
11 |
12 |
13 | def parse_args():
14 | parser = argparse.ArgumentParser(
15 | description="MCP server for voice recording and transcription using Whisper."
16 | )
17 | parser.add_argument('--model', default='base.en', help='Whisper model to use')
18 | parser.add_argument('--sample-rate', type=int, default=16000, help='Audio sample rate')
19 | return parser.parse_args()
20 |
21 |
22 | def get_config():
23 | """Load configuration from environment variables or command line arguments"""
24 | args = parse_args()
25 |
26 | # Environment variables take precedence over command line arguments
27 | config = Config(
28 | whisper_model=os.environ.get("WHISPER_MODEL", args.model),
29 | sample_rate=int(os.environ.get("SAMPLE_RATE", args.sample_rate)),
30 | max_duration=int(os.environ.get("MAX_DURATION", 60))
31 | )
32 |
33 | return config
```
--------------------------------------------------------------------------------
/src/voice_recorder/server.py:
--------------------------------------------------------------------------------
```python
1 | from mcp.server.fastmcp import FastMCP, Context
2 | from mcp.shared.exceptions import McpError
3 | from mcp.types import ErrorData, INTERNAL_ERROR, INVALID_PARAMS
4 | import logging
5 | import time
6 | from voice_recorder.audio_service import AudioService
7 | from .config import get_config
8 |
9 | # Configure logging
10 | logging.basicConfig(
11 | level=logging.INFO,
12 | format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
13 | )
14 | logger = logging.getLogger(__name__)
15 |
16 | # Create an MCP server
17 | mcp = FastMCP("VoiceRecorder")
18 |
19 | # Initialize the audio service
20 | config = get_config()
21 | audio_service = AudioService(model_name=config.whisper_model)
22 |
23 |
24 | @mcp.tool()
25 | def start_recording() -> str:
26 | """Start recording audio from the default microphone"""
27 | try:
28 | return audio_service.start_recording()
29 | except Exception as e:
30 | logger.error(f"Error starting recording: {str(e)}")
31 | raise McpError(ErrorData(INTERNAL_ERROR, f"Recording error: {str(e)}"))
32 |
33 |
34 | @mcp.tool()
35 | def stop_and_transcribe() -> str:
36 | """Stop recording and transcribe the audio to text"""
37 | try:
38 | audio_data, msg = audio_service.stop_recording()
39 | if audio_data is None:
40 | return msg
41 |
42 | return audio_service.transcribe(audio_data)
43 | except Exception as e:
44 | logger.error(f"Error during transcription: {str(e)}")
45 | raise McpError(ErrorData(INTERNAL_ERROR, f"Transcription error: {str(e)}"))
46 |
47 |
48 | @mcp.tool()
49 | def record_and_transcribe(duration_seconds: int) -> str:
50 | """
51 | Record audio for the specified duration and transcribe it
52 |
53 | Args:
54 | duration_seconds: Number of seconds to record (1-60)
55 | """
56 | try:
57 | # Validate input
58 | if not isinstance(duration_seconds, int) or duration_seconds < 1 or duration_seconds > 60:
59 | raise McpError(
60 | ErrorData(INVALID_PARAMS, "Duration must be between 1 and 60 seconds")
61 | )
62 |
63 | # Start recording
64 | start_recording()
65 | logger.info(f"Recording for {duration_seconds} seconds")
66 |
67 | # Wait for specified duration
68 | time.sleep(duration_seconds)
69 |
70 | # Stop and transcribe
71 | return stop_and_transcribe()
72 | except Exception as e:
73 | logger.error(f"Error in record_and_transcribe: {str(e)}")
74 | # Make sure recording is stopped in case of error
75 | if audio_service.is_recording:
76 | try:
77 | audio_service.stop_recording()
78 | except:
79 | pass
80 | raise McpError(ErrorData(INTERNAL_ERROR, f"Error: {str(e)}"))
```
--------------------------------------------------------------------------------
/src/voice_recorder/audio_service.py:
--------------------------------------------------------------------------------
```python
1 | import time
2 | import threading
3 | import numpy as np
4 | import sounddevice as sd
5 | from queue import Queue
6 | import whisper
7 | import logging
8 |
9 | logger = logging.getLogger(__name__)
10 |
11 |
12 | class AudioService:
13 | def __init__(self, model_name="base.en", sample_rate=16000):
14 | """Initialize the audio service with recording and transcription capabilities"""
15 | self.is_recording = False
16 | self.sample_rate = sample_rate
17 | self.stop_event = None
18 | self.data_queue = None
19 | self.recording_thread = None
20 |
21 | # Initialize transcriber
22 | logger.info(f"Loading Whisper model: {model_name}")
23 | try:
24 | self.transcriber = whisper.load_model(model_name)
25 | logger.info(f"Whisper model '{model_name}' loaded successfully")
26 | except Exception as e:
27 | logger.error(f"Failed to load Whisper model: {str(e)}")
28 | raise
29 |
30 | def start_recording(self):
31 | """Start recording audio from the default microphone"""
32 | if self.is_recording:
33 | return "Already recording"
34 |
35 | self.data_queue = Queue()
36 | self.stop_event = threading.Event()
37 |
38 | def callback(indata, frames, time, status):
39 | if status:
40 | logger.warning(f"Recording status: {status}")
41 | self.data_queue.put(bytes(indata))
42 |
43 | def record_thread():
44 | try:
45 | with sd.RawInputStream(
46 | samplerate=self.sample_rate,
47 | dtype="int16",
48 | channels=1,
49 | callback=callback
50 | ):
51 | logger.info("Recording started")
52 | while not self.stop_event.is_set():
53 | time.sleep(0.1)
54 | except Exception as e:
55 | logger.error(f"Error in recording thread: {str(e)}")
56 |
57 | self.recording_thread = threading.Thread(target=record_thread)
58 | self.recording_thread.daemon = True # Make thread exit when main program exits
59 | self.recording_thread.start()
60 | self.is_recording = True
61 |
62 | return "Recording started"
63 |
64 | def stop_recording(self):
65 | """Stop recording and return the audio data"""
66 | if not self.is_recording:
67 | return None, "Not recording"
68 |
69 | self.stop_event.set()
70 | self.recording_thread.join()
71 | self.is_recording = False
72 |
73 | logger.info("Processing audio data...")
74 | audio_data = b"".join(list(self.data_queue.queue))
75 | audio_np = np.frombuffer(audio_data, dtype=np.int16).astype(np.float32) / 32768.0
76 |
77 | return audio_np, "Recording stopped"
78 |
79 | def transcribe(self, audio_np):
80 | """Transcribe audio data to text"""
81 | if audio_np is None or audio_np.size == 0:
82 | return "No audio recorded"
83 |
84 | logger.info("Transcribing audio...")
85 | try:
86 | result = self.transcriber.transcribe(audio_np, fp16=False)
87 | transcription = result["text"].strip()
88 | logger.info(f"Transcription completed: {transcription[:30]}...")
89 | return transcription
90 | except Exception as e:
91 | logger.error(f"Error during transcription: {str(e)}")
92 | raise
```