#
tokens: 3793/50000 17/17 files
lines: off (toggle) GitHub
raw markdown copy
# Directory Structure

```
├── .gitignore
├── .idea
│   ├── .gitignore
│   ├── inspectionProfiles
│   │   └── profiles_settings.xml
│   ├── misc.xml
│   ├── modules.xml
│   ├── vcs.xml
│   └── voice-recorder-mcp.iml
├── environment.yml
├── LICENSE
├── pyproject.toml
├── README.md
├── src
│   ├── voice_recorder
│   │   ├── __init__.py
│   │   ├── __pycache__
│   │   │   ├── __init__.cpython-311.pyc
│   │   │   ├── __init__.cpython-312.pyc
│   │   │   ├── audio_service.cpython-311.pyc
│   │   │   ├── audio_service.cpython-312.pyc
│   │   │   ├── config.cpython-311.pyc
│   │   │   ├── config.cpython-312.pyc
│   │   │   ├── server.cpython-311.pyc
│   │   │   └── server.cpython-312.pyc
│   │   ├── audio_service.py
│   │   ├── config.py
│   │   └── server.py
│   └── voice_recorder_mcp.egg-info
│       ├── dependency_links.txt
│       ├── entry_points.txt
│       ├── PKG-INFO
│       ├── SOURCES.txt
│       └── top_level.txt
└── uv.lock
```

# Files

--------------------------------------------------------------------------------
/.idea/.gitignore:
--------------------------------------------------------------------------------

```
# Default ignored files
/shelf/
/workspace.xml

```

--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------

```
.venv
build
src/voice_recorder_mcp.egg-info
.idea
```

--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------

```markdown
# Voice Recorder MCP Server

An MCP server for recording audio and transcribing it using OpenAI's Whisper model. Designed to work as a Goose custom extension or standalone MCP server.

## Features

- Record audio from the default microphone
- Transcribe recordings using Whisper
- Integrates with Goose AI agent as a custom extension
- Includes prompts for common recording scenarios

## Installation

```bash
# Install from source
git clone https://github.com/DefiBax/voice-recorder-mcp.git
cd voice-recorder-mcp
pip install -e .
```

## Usage

### As a Standalone MCP Server

```bash
# Run with default settings (base.en model)
voice-recorder-mcp

# Use a specific Whisper model
voice-recorder-mcp --model medium.en

# Adjust sample rate
voice-recorder-mcp --sample-rate 44100
```

### Testing with MCP Inspector

The MCP Inspector provides an interactive interface to test your server:

```bash
# Install the MCP Inspector
npm install -g @modelcontextprotocol/inspector

# Run your server with the inspector
npx @modelcontextprotocol/inspector voice-recorder-mcp
```

### With Goose AI Agent

1. Open Goose and go to Settings > Extensions > Add > Command Line Extension
2. Set the name to `voice-recorder`
3. In the Command field, enter the full path to the voice-recorder-mcp executable:
   ```
   /full/path/to/voice-recorder-mcp
   ```
   
   Or for a specific model:
   ```
   /full/path/to/voice-recorder-mcp --model medium.en
   ```
   
   To find the path, run:
   ```bash
   which voice-recorder-mcp
   ```

4. No environment variables are needed for basic functionality
5. Start a conversation with Goose and introduce the recorder with:
   "I want you to take action from transcriptions returned by voice-recorder. For example, if I dictate a calculation like 1+1, please return the result."

## Available Tools

- `start_recording`: Start recording audio from the default microphone
- `stop_and_transcribe`: Stop recording and transcribe the audio to text
- `record_and_transcribe`: Record audio for a specified duration and transcribe it

## Whisper Models

This extension supports various Whisper model sizes:

| Model | Speed | Accuracy | Memory Usage | Use Case |
|-------|-------|----------|--------------|----------|
| `tiny.en` | Fastest | Lowest | Minimal | Testing, quick transcriptions |
| `base.en` | Fast | Good | Low | Everyday use (default) |
| `small.en` | Medium | Better | Moderate | Good balance |
| `medium.en` | Slow | High | High | Important recordings |
| `large` | Slowest | Highest | Very High | Critical transcriptions |

The `.en` suffix indicates models specialized for English, which are faster and more accurate for English content.

## Requirements

- Python 3.12+
- An audio input device (microphone)

## Configuration

You can configure the server using environment variables:

```bash
# Set Whisper model
export WHISPER_MODEL=small.en

# Set audio sample rate
export SAMPLE_RATE=44100

# Set maximum recording duration (seconds)
export MAX_DURATION=120

# Then run the server
voice-recorder-mcp
```

## Troubleshooting

### Common Issues

- **No audio being recorded**: Check your microphone permissions and settings
- **Model download errors**: Ensure you have a stable internet connection for the initial model download
- **Integration with Goose**: Make sure the command path is correct
- **Audio quality issues**: Try adjusting the sample rate (default: 16000)

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## License

This project is licensed under the MIT License - see the LICENSE file for details.

```

--------------------------------------------------------------------------------
/src/voice_recorder_mcp.egg-info/dependency_links.txt:
--------------------------------------------------------------------------------

```


```

--------------------------------------------------------------------------------
/src/voice_recorder_mcp.egg-info/top_level.txt:
--------------------------------------------------------------------------------

```
voice_recorder

```

--------------------------------------------------------------------------------
/src/voice_recorder_mcp.egg-info/entry_points.txt:
--------------------------------------------------------------------------------

```
[console_scripts]
voice-recorder-mcp = voice_recorder:main

```

--------------------------------------------------------------------------------
/.idea/vcs.xml:
--------------------------------------------------------------------------------

```
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
  <component name="VcsDirectoryMappings">
    <mapping directory="" vcs="Git" />
  </component>
</project>
```

--------------------------------------------------------------------------------
/.idea/inspectionProfiles/profiles_settings.xml:
--------------------------------------------------------------------------------

```
<component name="InspectionProjectProfileManager">
  <settings>
    <option name="USE_PROJECT_PROFILE" value="false" />
    <version value="1.0" />
  </settings>
</component>
```

--------------------------------------------------------------------------------
/environment.yml:
--------------------------------------------------------------------------------

```yaml
#name: voice-recorder-mcp
#channels:
#  - conda-forge
#  - defaults
#dependencies:
#  - python>=3.10,<3.12
#  - pip
#  - numpy>=1.26.0
#  - pip:
#    - "mcp[cli]>=1.2.0"
#    - "git+https://github.com/openai/whisper.git"
#    - "sounddevice>=0.4.6"
#    - "nltk>=3.8.1"

```

--------------------------------------------------------------------------------
/src/voice_recorder/__init__.py:
--------------------------------------------------------------------------------

```python
from .config import get_config
from .server import mcp, audio_service

def main():
    """Voice Recorder MCP: Record audio and transcribe using Whisper."""
    # Config is automatically loaded when server is imported
    mcp.run()

__all__ = ["mcp", "audio_service", "main"]
```

--------------------------------------------------------------------------------
/.idea/modules.xml:
--------------------------------------------------------------------------------

```
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
  <component name="ProjectModuleManager">
    <modules>
      <module fileurl="file://$PROJECT_DIR$/.idea/voice-recorder-mcp.iml" filepath="$PROJECT_DIR$/.idea/voice-recorder-mcp.iml" />
    </modules>
  </component>
</project>
```

--------------------------------------------------------------------------------
/.idea/misc.xml:
--------------------------------------------------------------------------------

```
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
  <component name="Black">
    <option name="sdkName" value="voice-recorder-mcp" />
  </component>
  <component name="ProjectRootManager" version="2" project-jdk-name="Python 3.12 virtualenv at ~/PycharmProjects/voice-recorder-mcp-pip/.venv" project-jdk-type="Python SDK" />
</project>
```

--------------------------------------------------------------------------------
/src/voice_recorder_mcp.egg-info/SOURCES.txt:
--------------------------------------------------------------------------------

```
LICENSE
README.md
pyproject.toml
src/voice_recorder/__init__.py
src/voice_recorder/audio_service.py
src/voice_recorder/config.py
src/voice_recorder/server.py
src/voice_recorder_mcp.egg-info/PKG-INFO
src/voice_recorder_mcp.egg-info/SOURCES.txt
src/voice_recorder_mcp.egg-info/dependency_links.txt
src/voice_recorder_mcp.egg-info/entry_points.txt
src/voice_recorder_mcp.egg-info/requires.txt
src/voice_recorder_mcp.egg-info/top_level.txt
```

--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------

```toml
[project]
name = "voice-recorder-mcp"
version = "0.1.0"
description = "MCP server for voice recording and transcription"
readme = "README.md"
license = {text = "MIT"}
requires-python = ">=3.10,<3.13"  # Allow Python 3.11 and 3.12
dependencies = [
    "mcp[cli]>=1.2.0",
    "sounddevice>=0.4.6",
    "numpy>=1.20.0,<2.0.0",
    "openai-whisper @ git+https://github.com/openai/whisper.git",
]

[project.optional-dependencies]
dev = [
    "pytest>=7.0.0",
    "black>=23.0.0",
    "isort>=5.12.0",
]

[project.scripts]
voice-recorder-mcp = "voice_recorder:main"

[tool.setuptools]
package-dir = {"" = "src"}

[tool.setuptools.packages.find]
where = ["src"]

[tool.uv]
package = true
```

--------------------------------------------------------------------------------
/src/voice_recorder/config.py:
--------------------------------------------------------------------------------

```python
import os
import argparse
from dataclasses import dataclass


@dataclass
class Config:
    whisper_model: str = "base.en"
    sample_rate: int = 16000
    max_duration: int = 60


def parse_args():
    parser = argparse.ArgumentParser(
        description="MCP server for voice recording and transcription using Whisper."
    )
    parser.add_argument('--model', default='base.en', help='Whisper model to use')
    parser.add_argument('--sample-rate', type=int, default=16000, help='Audio sample rate')
    return parser.parse_args()


def get_config():
    """Load configuration from environment variables or command line arguments"""
    args = parse_args()

    # Environment variables take precedence over command line arguments
    config = Config(
        whisper_model=os.environ.get("WHISPER_MODEL", args.model),
        sample_rate=int(os.environ.get("SAMPLE_RATE", args.sample_rate)),
        max_duration=int(os.environ.get("MAX_DURATION", 60))
    )

    return config
```

--------------------------------------------------------------------------------
/src/voice_recorder/server.py:
--------------------------------------------------------------------------------

```python
from mcp.server.fastmcp import FastMCP, Context
from mcp.shared.exceptions import McpError
from mcp.types import ErrorData, INTERNAL_ERROR, INVALID_PARAMS
import logging
import time
from voice_recorder.audio_service import AudioService
from .config import get_config

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# Create an MCP server
mcp = FastMCP("VoiceRecorder")

# Initialize the audio service
config = get_config()
audio_service = AudioService(model_name=config.whisper_model)


@mcp.tool()
def start_recording() -> str:
    """Start recording audio from the default microphone"""
    try:
        return audio_service.start_recording()
    except Exception as e:
        logger.error(f"Error starting recording: {str(e)}")
        raise McpError(ErrorData(INTERNAL_ERROR, f"Recording error: {str(e)}"))


@mcp.tool()
def stop_and_transcribe() -> str:
    """Stop recording and transcribe the audio to text"""
    try:
        audio_data, msg = audio_service.stop_recording()
        if audio_data is None:
            return msg

        return audio_service.transcribe(audio_data)
    except Exception as e:
        logger.error(f"Error during transcription: {str(e)}")
        raise McpError(ErrorData(INTERNAL_ERROR, f"Transcription error: {str(e)}"))


@mcp.tool()
def record_and_transcribe(duration_seconds: int) -> str:
    """
    Record audio for the specified duration and transcribe it

    Args:
        duration_seconds: Number of seconds to record (1-60)
    """
    try:
        # Validate input
        if not isinstance(duration_seconds, int) or duration_seconds < 1 or duration_seconds > 60:
            raise McpError(
                ErrorData(INVALID_PARAMS, "Duration must be between 1 and 60 seconds")
            )

        # Start recording
        start_recording()
        logger.info(f"Recording for {duration_seconds} seconds")

        # Wait for specified duration
        time.sleep(duration_seconds)

        # Stop and transcribe
        return stop_and_transcribe()
    except Exception as e:
        logger.error(f"Error in record_and_transcribe: {str(e)}")
        # Make sure recording is stopped in case of error
        if audio_service.is_recording:
            try:
                audio_service.stop_recording()
            except:
                pass
        raise McpError(ErrorData(INTERNAL_ERROR, f"Error: {str(e)}"))
```

--------------------------------------------------------------------------------
/src/voice_recorder/audio_service.py:
--------------------------------------------------------------------------------

```python
import time
import threading
import numpy as np
import sounddevice as sd
from queue import Queue
import whisper
import logging

logger = logging.getLogger(__name__)


class AudioService:
    def __init__(self, model_name="base.en", sample_rate=16000):
        """Initialize the audio service with recording and transcription capabilities"""
        self.is_recording = False
        self.sample_rate = sample_rate
        self.stop_event = None
        self.data_queue = None
        self.recording_thread = None

        # Initialize transcriber
        logger.info(f"Loading Whisper model: {model_name}")
        try:
            self.transcriber = whisper.load_model(model_name)
            logger.info(f"Whisper model '{model_name}' loaded successfully")
        except Exception as e:
            logger.error(f"Failed to load Whisper model: {str(e)}")
            raise

    def start_recording(self):
        """Start recording audio from the default microphone"""
        if self.is_recording:
            return "Already recording"

        self.data_queue = Queue()
        self.stop_event = threading.Event()

        def callback(indata, frames, time, status):
            if status:
                logger.warning(f"Recording status: {status}")
            self.data_queue.put(bytes(indata))

        def record_thread():
            try:
                with sd.RawInputStream(
                        samplerate=self.sample_rate,
                        dtype="int16",
                        channels=1,
                        callback=callback
                ):
                    logger.info("Recording started")
                    while not self.stop_event.is_set():
                        time.sleep(0.1)
            except Exception as e:
                logger.error(f"Error in recording thread: {str(e)}")

        self.recording_thread = threading.Thread(target=record_thread)
        self.recording_thread.daemon = True  # Make thread exit when main program exits
        self.recording_thread.start()
        self.is_recording = True

        return "Recording started"

    def stop_recording(self):
        """Stop recording and return the audio data"""
        if not self.is_recording:
            return None, "Not recording"

        self.stop_event.set()
        self.recording_thread.join()
        self.is_recording = False

        logger.info("Processing audio data...")
        audio_data = b"".join(list(self.data_queue.queue))
        audio_np = np.frombuffer(audio_data, dtype=np.int16).astype(np.float32) / 32768.0

        return audio_np, "Recording stopped"

    def transcribe(self, audio_np):
        """Transcribe audio data to text"""
        if audio_np is None or audio_np.size == 0:
            return "No audio recorded"

        logger.info("Transcribing audio...")
        try:
            result = self.transcriber.transcribe(audio_np, fp16=False)
            transcription = result["text"].strip()
            logger.info(f"Transcription completed: {transcription[:30]}...")
            return transcription
        except Exception as e:
            logger.error(f"Error during transcription: {str(e)}")
            raise
```