sparfenyuk/mcp-youtube # codebase.md

# Directory Structure

```
├── .gitignore
├── CHANGELOG.md
├── cog.toml
├── LICENSE
├── pyproject.toml
├── README.md
├── ruff.toml
├── src
│   └── mcp_youtube
│       ├── __init__.py
│       ├── py.typed
│       ├── server.py
│       └── tools.py
└── uv.lock
```

# Files

--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------

```
.env
.venv
*.egg-info
__pycache__
*.pyc

*.session

```

--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------

```markdown
# Youtube MCP server

- [Youtube MCP server](#youtube-mcp-server)
  - [About](#about)
  - [What is MCP?](#what-is-mcp)
  - [What does this server do?](#what-does-this-server-do)
  - [Practical use cases](#practical-use-cases)
  - [Prerequisites](#prerequisites)
  - [Installation](#installation)
  - [Configuration](#configuration)
    - [Claude Desktop Configuration](#claude-desktop-configuration)
  - [Development](#development)
    - [Getting started](#getting-started)
    - [Debugging the server in the Inspector](#debugging-the-server-in-the-inspector)
  - [Troubleshooting](#troubleshooting)
    - [Message 'Could not connect to MCP server mcp-youtube'](#message-could-not-connect-to-mcp-server-mcp-youtube)

## About

The server is a bridge between the Youtube API and the AI assistants and is based on the [Model Context Protocol](https://modelcontextprotocol.io).

<a href="https://glama.ai/mcp/servers/gzrh7914k6">
  <img width="380" height="200" src="https://glama.ai/mcp/servers/gzrh7914k6/badge" alt="Youtube Server MCP server" />
</a>

## What is MCP?

The Model Context Protocol (MCP) is a system that lets AI apps, like Claude Desktop, connect to external tools and data sources. It gives a clear and safe way for AI assistants to work with local services and APIs while keeping the user in control.

## What does this server do?

- [x] Download closed captions for the given video

## Practical use cases

- [x] Create a summary of the video

## Prerequisites

- [`uv` tool](https://docs.astral.sh/uv/getting-started/installation/)

## Installation

```bash
uv tool install git+https://github.com/sparfenyuk/mcp-youtube
```

> [!NOTE]
> If you have already installed the server, you can update it using `uv tool upgrade --reinstall` command.

> [!NOTE]
> If you want to delete the server, use the `uv tool uninstall mcp-youtube` command.

## Configuration

### Claude Desktop Configuration

Configure Claude Desktop to recognize the Youtube MCP server.

1. Open the Claude Desktop configuration file:
   - in MacOS, the configuration file is located at `~/Library/Application Support/Claude/claude_desktop_config.json`
   - in Windows, the configuration file is located at `%APPDATA%\Claude\claude_desktop_config.json`

   > __Note:__
   > You can also find claude_desktop_config.json inside the settings of Claude Desktop app

2. Add the server configuration

    ```json
    {
      "mcpServers": {
        "mcp-youtube": {
            "command": "mcp-youtube",
          }
        }
      }
    }
    ```

## Development

### Getting started

1. Clone the repository
2. Install the dependencies

   ```bash
   uv sync
   ```

3. Run the server

   ```bash
   uv run mcp-youtube --help
   ```

Tools can be added to the `src/mcp_youtube/tools.py` file.

How to add a new tool:

1. Create a new class that inherits from ToolArgs

   ```python
   class NewTool(ToolArgs):
       """Description of the new tool."""
       pass
   ```

   Attributes of the class will be used as arguments for the tool.
   The class docstring will be used as the tool description.

2. Implement the tool_runner function for the new class

   ```python
   @tool_runner.register
   async def new_tool(args: NewTool) -> t.Sequence[TextContent | ImageContent | EmbeddedResource]:
       pass
   ```

   The function should return a sequence of TextContent, ImageContent or EmbeddedResource.
   The function should be async and accept a single argument of the new class.

3. Done! Restart the client and the new tool should be available.

Validation can accomplished either through Claude Desktop or by running the tool directly.

### Debugging the server in the Inspector

The MCP inspector is a tool that helps to debug the server using fancy UI. To run it, use the following command:

```bash
npx @modelcontextprotocol/inspector uv run mcp-youtube
```

## Troubleshooting

### Message 'Could not connect to MCP server mcp-youtube'

If you see the message 'Could not connect to MCP server mcp-youtube' in Claude Desktop, it means that the server configuration is incorrect.

Try the following:

- Use the full path to the `mcp-youtube` binary in the configuration file
```

--------------------------------------------------------------------------------
/src/mcp_youtube/__init__.py:
--------------------------------------------------------------------------------

```python
import asyncio

from typer import Context, Typer

app = Typer()


@app.callback(invoke_without_command=True)
def _run(ctx: Context) -> None:
    if ctx.invoked_subcommand is None:
        # This will run if no subcommand is specified
        run()


@app.command()
def run() -> None:
    """Run the mcp-youtube server."""
    from .server import run_mcp_server

    asyncio.run(run_mcp_server())

```

--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------

```toml
[project]
name = "mcp-youtube"
version = "0.1.0"
description = "MCP server to work with YouTube"
requires-python = ">=3.11"
dependencies = [
  "mcp>=1.1.0",
  "pydantic>=2.0.0",
  "pydantic-settings>=2.6.0",
  "typer>=0.15.0",
  "xdg-base-dirs>=6.0.0",
  "youtube-transcript-api",
]

[build-system]
requires = ["setuptools>=70"]
build-backend = "setuptools.build_meta"

[dependency-groups]
dev = ["mypy>=1.13.0"]

[project.scripts]
mcp-youtube = "mcp_youtube:app"

[tool.mypy]
plugins = ["pydantic.mypy"]

[tool.setuptools.package-data]
"*" = ["py.typed"]

```

--------------------------------------------------------------------------------
/cog.toml:
--------------------------------------------------------------------------------

```toml
from_latest_tag = false
ignore_merge_commits = false
disable_changelog = false
disable_bump_commit = false
generate_mono_repository_global_tag = true
generate_mono_repository_package_tags = true
branch_whitelist = []
skip_ci = "[skip ci]"
skip_untracked = false
pre_bump_hooks = []
post_bump_hooks = ["uv build"]
pre_package_bump_hooks = []
post_package_bump_hooks = []
tag_prefix = "v"

[git_hooks.commit-msg]
script = """#!/bin/sh
set -e
cog verify --file $1
cog check
"""


[commit_types]

[changelog]
path = "CHANGELOG.md"
authors = [{ signature = "Sergey Parfenyuk", username = "sparfenyuk" }]
template = "remote"
remote = "github.com"
owner = "sparfenyuk"
repository = "mcp-youtube"

[bump_profiles]

```

--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------

```markdown
# Changelog
All notable changes to this project will be documented in this file. See [conventional commits](https://www.conventionalcommits.org/) for commit guidelines.

- - -
## [v0.1.0](https://github.com/sparfenyuk/mcp-youtube/compare/0afe9c0beeaef4a80b2fe10fe90ab9374878b006..v0.1.0) - 2024-12-14
#### Bug Fixes
- rename argument in prompt - ([0ac0972](https://github.com/sparfenyuk/mcp-youtube/commit/0ac09729fb4c0ceaefd8865257744c3216b4c689)) - [@sparfenyuk](https://github.com/sparfenyuk)
#### Features
- download cc for a given youtube link - ([0afe9c0](https://github.com/sparfenyuk/mcp-youtube/commit/0afe9c0beeaef4a80b2fe10fe90ab9374878b006)) - [@sparfenyuk](https://github.com/sparfenyuk)

- - -

Changelog generated by [cocogitto](https://github.com/cocogitto/cocogitto).
```

--------------------------------------------------------------------------------
/ruff.toml:
--------------------------------------------------------------------------------

```toml
# Exclude a variety of commonly ignored directories.
exclude = [
  ".bzr",
  ".direnv",
  ".eggs",
  ".git",
  ".git-rewrite",
  ".hg",
  ".ipynb_checkpoints",
  ".mypy_cache",
  ".nox",
  ".pants.d",
  ".pyenv",
  ".pytest_cache",
  ".pytype",
  ".ruff_cache",
  ".svn",
  ".tox",
  ".venv",
  ".vscode",
  "__pypackages__",
  "_build",
  "buck-out",
  "build",
  "dist",
  "node_modules",
  "site-packages",
  "venv",
]

# Same as Black.
line-length = 120
indent-width = 4

[lint]
# Enable Pyflakes (`F`) and a subset of the pycodestyle (`E`)  codes by default.
# Unlike Flake8, Ruff doesn't enable pycodestyle warnings (`W`) or
# McCabe complexity (`C901`) by default.
select = ["ALL"]
ignore = ["D", "TRY003", "EM101", "EM102", "TCH"]

# Allow fix for all enabled rules (when `--fix`) is provided.
fixable = ["ALL"]
unfixable = []

# Allow unused variables when underscore-prefixed.
dummy-variable-rgx = "^(_+|(_+[a-zA-Z0-9_]*[a-zA-Z0-9]+?))$"

[format]
# Like Black, use double quotes for strings.
quote-style = "double"

# Like Black, indent with spaces, rather than tabs.
indent-style = "space"

# Like Black, respect magic trailing commas.
skip-magic-trailing-comma = false

# Like Black, automatically detect the appropriate line ending.
line-ending = "auto"

# Enable auto-formatting of code examples in docstrings. Markdown,
# reStructuredText code/literal blocks and doctests are all supported.
#
# This is currently disabled by default, but it is planned for this
# to be opt-out in the future.
docstring-code-format = false

# Set the line length limit used when formatting code snippets in
# docstrings.
#
# This only has an effect when the `docstring-code-format` setting is
# enabled.
docstring-code-line-length = "dynamic"

```

--------------------------------------------------------------------------------
/src/mcp_youtube/server.py:
--------------------------------------------------------------------------------

```python
from __future__ import annotations

import inspect
import logging
import typing as t
from collections.abc import Sequence
from functools import cache

from mcp.server import Server
from mcp.types import (
    EmbeddedResource,
    GetPromptResult,
    ImageContent,
    Prompt,
    PromptArgument,
    PromptMessage,
    Resource,
    ResourceTemplate,
    TextContent,
    Tool,
)
from pydantic.networks import AnyUrl

from . import tools

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
app = Server("mcp-youtube")


@cache
def enumerate_available_tools() -> t.Generator[tuple[str, Tool], t.Any, None]:
    for _, tool_args in inspect.getmembers(tools, inspect.isclass):
        if issubclass(tool_args, tools.ToolArgs) and tool_args != tools.ToolArgs:
            logger.debug("Found tool: %s", tool_args)
            description = tools.tool_description(tool_args)
            yield description.name, description


mapping: dict[str, Tool] = dict(enumerate_available_tools())


@app.list_prompts()
async def list_prompts() -> list[Prompt]:
    """List available prompts."""
    return [
        Prompt(
            name="YoutubeVideoSummary",
            description="Create a summary of the given Youtube video.",
            arguments=[PromptArgument(name="video_url", description="URL of the Youtube video", required=True)],
        ),
    ]


@app.get_prompt()
async def get_prompt(name: str, args: dict[str, str] | None = None) -> GetPromptResult:
    """Get a prompt by name."""
    if name == "YoutubeVideoSummary":
        url = args.get("video_url") if args else None
        if not url:
            raise ValueError("video_url is required")
        return GetPromptResult(
            messages=[
                PromptMessage(
                    role="user",
                    content=TextContent(
                        type="text",
                        text=f"Create a summary of the video {url} using closed captions. Define key takeaways, "
                        "interesting facts, and the main topic of the video.",
                    ),
                ),
            ],
        )

    raise ValueError(f"Unknown prompt: {name}")


@app.list_resources()
async def list_resources() -> list[Resource]:
    """List available resources."""
    return []


@app.read_resource()
async def get_resource(uri: AnyUrl) -> str | bytes:
    """Get a resource by URI."""
    return "{id: 1, name: 'test'}"


@app.list_tools()
async def list_tools() -> list[Tool]:
    """List available tools."""
    return list(mapping.values())


@app.list_resource_templates()
async def list_resource_templates() -> list[ResourceTemplate]:
    """List available resources."""
    return []


@app.progress_notification()
async def progress_notification(pogress: str | int, p: float, s: float | None) -> None:
    """Progress notification."""


@app.call_tool()
async def call_tool(name: str, arguments: t.Any) -> Sequence[TextContent | ImageContent | EmbeddedResource]:  # noqa: ANN401
    """Handle tool calls for command line run."""

    if not isinstance(arguments, dict):
        raise TypeError("arguments must be dictionary")

    tool = mapping.get(name)
    if not tool:
        raise ValueError(f"Unknown tool: {name}")

    try:
        args = tools.tool_args(tool, **arguments)
        return await tools.tool_runner(args)
    except Exception as e:
        logger.exception("Error running tool: %s", name)
        raise RuntimeError(f"Caught Exception. Error: {e}") from e


async def run_mcp_server() -> None:
    # Import here to avoid issues with event loops
    from mcp.server.stdio import stdio_server

    async with stdio_server() as (read_stream, write_stream):
        await app.run(read_stream, write_stream, app.create_initialization_options())

```

--------------------------------------------------------------------------------
/src/mcp_youtube/tools.py:
--------------------------------------------------------------------------------

```python
from __future__ import annotations

import json
import logging
import sys
import typing as t
from functools import singledispatch
from urllib.parse import parse_qs, urlparse

from mcp.server.session import ServerSession
from mcp.types import (
    EmbeddedResource,
    ImageContent,
    TextContent,
    Tool,
)
from pydantic import BaseModel, ConfigDict
from xdg_base_dirs import xdg_cache_home
from youtube_transcript_api import YouTubeTranscriptApi  # type: ignore[import-untyped]

logger = logging.getLogger(__name__)


# How to add a new tool:
#
# 1. Create a new class that inherits from ToolArgs
#    ```python
#    class NewTool(ToolArgs):
#        """Description of the new tool."""
#        pass
#    ```
#    Attributes of the class will be used as arguments for the tool.
#    The class docstring will be used as the tool description.
#
# 2. Implement the tool_runner function for the new class
#    ```python
#    @tool_runner.register
#    async def new_tool(args: NewTool) -> t.Sequence[TextContent | ImageContent | EmbeddedResource]:
#        pass
#    ```
#    The function should return a sequence of TextContent, ImageContent or EmbeddedResource.
#    The function should be async and accept a single argument of the new class.
#
# 3. Done! Restart the client and the new tool should be available.


class ToolArgs(BaseModel):
    model_config = ConfigDict()


@singledispatch
async def tool_runner(
    args,  # noqa: ANN001
) -> t.Sequence[TextContent | ImageContent | EmbeddedResource]:
    raise NotImplementedError(f"Unsupported type: {type(args)}")


def tool_description(args: type[ToolArgs]) -> Tool:
    return Tool(
        name=args.__name__,
        description=args.__doc__,
        inputSchema=args.model_json_schema(),
    )


def tool_args(tool: Tool, *args, **kwargs) -> ToolArgs:  # noqa: ANN002, ANN003
    return sys.modules[__name__].__dict__[tool.name](*args, **kwargs)


## Tools ##

### Download close captions from YouTube video ###


class DownloadClosedCaptions(ToolArgs):
    """Download closed captions from YouTube video."""

    video_url: str


def _parse_youtube_url(url: str) -> str | None:
    """
    Parse a YouTube URL and extract the video ID from the v= parameter.

    Args:
        url (str): YouTube URL in various formats

    Returns:
        str: Video ID if found, None otherwise

    Examples:
        >>> parse_youtube_url("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
        'dQw4w9WgXcQ'
        >>> parse_youtube_url("https://youtu.be/dQw4w9WgXcQ")
        'dQw4w9WgXcQ'
        >>> parse_youtube_url("https://www.youtube.com/watch?v=dQw4w9WgXcQ&t=123")
        'dQw4w9WgXcQ'
    """

    # Handle youtu.be format
    if "youtu.be" in url:
        return url.split("/")[-1].split("?")[0]

    # Handle regular youtube.com format
    try:
        parsed_url = urlparse(url)
        if "youtube.com" in parsed_url.netloc:
            params = parse_qs(parsed_url.query)
            if "v" in params:
                return params["v"][0]
    except:  # noqa: E722, S110
        pass

    return None


@tool_runner.register
async def download_closed_captions(
    args: DownloadClosedCaptions,
) -> t.Sequence[TextContent | ImageContent | EmbeddedResource]:
    transcripts_dir = xdg_cache_home() / "mcp-youtube" / "transcripts"
    transcripts_dir.mkdir(parents=True, exist_ok=True)

    video_id = _parse_youtube_url(args.video_url)
    if not video_id:
        raise ValueError(f"Unrecognized YouTube URL: {args.video_url}")

    if not transcripts_dir.joinpath(f"{video_id}.json").exists():
        transcript = YouTubeTranscriptApi.get_transcript(video_id)
        if not transcript or not isinstance(transcript, list):
            raise ValueError("No transcript found for the video.")

        json_data = json.dumps(transcript, indent=None)
        transcripts_dir.joinpath(f"{video_id}.json").write_text(json_data)

    else:
        json_data = transcripts_dir.joinpath(f"{video_id}.json").read_text()
        transcript = json.loads(json_data)

    content = " ".join([line["text"] for line in transcript])

    return [
        TextContent(
            type="text",
            text=content,
        ),
    ]

```