langchain-ai/mcpdoc # codebase.md

# Directory Structure

```
├── .github
│   ├── actions
│   │   └── uv_setup
│   │       └── action.yml
│   └── workflows
│       ├── _lint.yml
│       ├── _test.yml
│       ├── ci.yml
│       └── release.yml
├── .gitignore
├── LICENSE
├── Makefile
├── mcpdoc
│   ├── __init__.py
│   ├── _version.py
│   ├── cli.py
│   ├── langgraph.py
│   ├── main.py
│   └── splash.py
├── pyproject.toml
├── README.md
├── sample_config.json
├── sample_config.yaml
├── tests
│   └── unit_tests
│       ├── __init__.py
│       ├── test_imports.py
│       └── test_main.py
└── uv.lock
```

# Files

--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------

```
.vs/
.vscode/
.idea/
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/
docs/docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints
notebooks/

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.envrc
.venv
.venvs
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# macOS display setting files
.DS_Store

# Wandb directory
wandb/

# asdf tool versions
.tool-versions
/.ruff_cache/

*.pkl
*.bin

# integration test artifacts
data_map*
\[('_type', 'fake'), ('stop', None)]

# Replit files
*replit*

node_modules
docs/.yarn/
docs/node_modules/
docs/.docusaurus/
docs/.cache-loader/
docs/_dist
docs/api_reference/api_reference.rst
docs/api_reference/experimental_api_reference.rst
docs/api_reference/_build
docs/api_reference/*/
!docs/api_reference/_static/
!docs/api_reference/templates/
!docs/api_reference/themes/
docs/docs_skeleton/build
docs/docs_skeleton/node_modules
docs/docs_skeleton/yarn.lock

# Any new jupyter notebooks
# not intended for the repo
Untitled*.ipynb

Chinook.db

.vercel
.turbo

```

--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------

```markdown
# MCP LLMS-TXT Documentation Server

## Overview

[llms.txt](https://llmstxt.org/) is a website index for LLMs, providing background information, guidance, and links to detailed markdown files. IDEs like Cursor and Windsurf or apps like Claude Code/Desktop can use `llms.txt` to retrieve context for tasks. However, these apps use different built-in tools to read and process files like `llms.txt`. The retrieval process can be opaque, and there is not always a way to audit the tool calls or the context returned.

[MCP](https://github.com/modelcontextprotocol) offers a way for developers to have *full control* over tools used by these applications. Here, we create [an open source MCP server](https://github.com/modelcontextprotocol) to provide MCP host applications (e.g., Cursor, Windsurf, Claude Code/Desktop) with (1) a user-defined list of `llms.txt` files and (2) a simple  `fetch_docs` tool read URLs within any of the provided `llms.txt` files. This allows the user to audit each tool call as well as the context returned. 

<img src="https://github.com/user-attachments/assets/736f8f55-833d-4200-b833-5fca01a09e1b" width="60%">

## llms-txt

You can find llms.txt files for langgraph and langchain here:

| Library          | llms.txt                                                                                                   |
|------------------|------------------------------------------------------------------------------------------------------------|
| LangGraph Python | [https://langchain-ai.github.io/langgraph/llms.txt](https://langchain-ai.github.io/langgraph/llms.txt)     |
| LangGraph JS     | [https://langchain-ai.github.io/langgraphjs/llms.txt](https://langchain-ai.github.io/langgraphjs/llms.txt) |
| LangChain Python | [https://python.langchain.com/llms.txt](https://python.langchain.com/llms.txt)                             |
| LangChain JS     | [https://js.langchain.com/llms.txt](https://js.langchain.com/llms.txt)                                     |

## Quickstart

#### Install uv
* Please see [official uv docs](https://docs.astral.sh/uv/getting-started/installation/#installation-methods) for other ways to install `uv`.

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

#### Choose an `llms.txt` file to use. 
* For example, [here's](https://langchain-ai.github.io/langgraph/llms.txt) the LangGraph `llms.txt` file.

> **Note: Security and Domain Access Control**
> 
> For security reasons, mcpdoc implements strict domain access controls:
> 
> 1. **Remote llms.txt files**: When you specify a remote llms.txt URL (e.g., `https://langchain-ai.github.io/langgraph/llms.txt`), mcpdoc automatically adds only that specific domain (`langchain-ai.github.io`) to the allowed domains list. This means the tool can only fetch documentation from URLs on that domain.
> 
> 2. **Local llms.txt files**: When using a local file, NO domains are automatically added to the allowed list. You MUST explicitly specify which domains to allow using the `--allowed-domains` parameter.
> 
> 3. **Adding additional domains**: To allow fetching from domains beyond those automatically included:
>    - Use `--allowed-domains domain1.com domain2.com` to add specific domains
>    - Use `--allowed-domains '*'` to allow all domains (use with caution)
> 
> This security measure prevents unauthorized access to domains not explicitly approved by the user, ensuring that documentation can only be retrieved from trusted sources.

#### (Optional) Test the MCP server locally with your `llms.txt` file(s) of choice:
```bash
uvx --from mcpdoc mcpdoc \
    --urls "LangGraph:https://langchain-ai.github.io/langgraph/llms.txt" "LangChain:https://python.langchain.com/llms.txt" \
    --transport sse \
    --port 8082 \
    --host localhost
```

* This should run at: http://localhost:8082

![Screenshot 2025-03-18 at 3 29 30 PM](https://github.com/user-attachments/assets/24a3d483-cd7a-4c7e-a4f7-893df70e888f)

* Run [MCP inspector](https://modelcontextprotocol.io/docs/tools/inspector) and connect to the running server:
```bash
npx @modelcontextprotocol/inspector
```

![Screenshot 2025-03-18 at 3 30 30 PM](https://github.com/user-attachments/assets/14645d57-1b52-4a5e-abfe-8e7756772704)

* Here, you can test the `tool` calls. 

#### Connect to Cursor 

* Open `Cursor Settings` and `MCP` tab.
* This will open the `~/.cursor/mcp.json` file.

![Screenshot 2025-03-19 at 11 01 31 AM](https://github.com/user-attachments/assets/3d1c8eb3-4d40-487f-8bad-3f9e660f770a)

* Paste the following into the file (we use the `langgraph-docs-mcp` name and link to the LangGraph `llms.txt`).

```
{
  "mcpServers": {
    "langgraph-docs-mcp": {
      "command": "uvx",
      "args": [
        "--from",
        "mcpdoc",
        "mcpdoc",
        "--urls",
        "LangGraph:https://langchain-ai.github.io/langgraph/llms.txt LangChain:https://python.langchain.com/llms.txt",
        "--transport",
        "stdio"
      ]
    }
  }
}
```

* Confirm that the server is running in your `Cursor Settings/MCP` tab.
* Best practice is to then update Cursor Global (User) rules.
* Open Cursor `Settings/Rules` and update `User Rules` with the following (or similar):

```
for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer -- 
+ call list_doc_sources tool to get the available llms.txt file
+ call fetch_docs tool to read it
+ reflect on the urls in llms.txt 
+ reflect on the input question 
+ call fetch_docs on any urls relevant to the question
+ use this to answer the question
```

* `CMD+L` (on Mac) to open chat.
* Ensure `agent` is selected. 

![Screenshot 2025-03-18 at 1 56 54 PM](https://github.com/user-attachments/assets/0dd747d0-7ec0-43d2-b6ef-cdcf5a2a30bf)

Then, try an example prompt, such as:
```
what are types of memory in LangGraph?
```

![Screenshot 2025-03-18 at 1 58 38 PM](https://github.com/user-attachments/assets/180966b5-ab03-4b78-8b5d-bab43f5954ed)

### Connect to Windsurf

* Open Cascade with `CMD+L` (on Mac).
* Click `Configure MCP` to open the config file, `~/.codeium/windsurf/mcp_config.json`.
* Update with `langgraph-docs-mcp` as noted above.

![Screenshot 2025-03-19 at 11 02 52 AM](https://github.com/user-attachments/assets/d45b427c-1c1e-4602-820a-7161a310af24)

* Update `Windsurf Rules/Global rules` with the following (or similar):

```
for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer -- 
+ call list_doc_sources tool to get the available llms.txt file
+ call fetch_docs tool to read it
+ reflect on the urls in llms.txt 
+ reflect on the input question 
+ call fetch_docs on any urls relevant to the question
```

![Screenshot 2025-03-18 at 2 02 12 PM](https://github.com/user-attachments/assets/5a29bd6a-ad9a-4c4a-a4d5-262c914c5276)

Then, try the example prompt:
* It will perform your tool calls.

![Screenshot 2025-03-18 at 2 03 07 PM](https://github.com/user-attachments/assets/0e24e1b2-dc94-4153-b4fa-495fd768125b)

### Connect to Claude Desktop

* Open `Settings/Developer` to update `~/Library/Application\ Support/Claude/claude_desktop_config.json`.
* Update with `langgraph-docs-mcp` as noted above.
* Restart Claude Desktop app.

> [!Note]
> If you run into issues with Python version incompatibility when trying to add MCPDoc tools to Claude Desktop, you can explicitly specify the filepath to `python` executable in the `uvx` command.
>
> <details>
> <summary>Example configuration</summary>
>
> ```
> {
>   "mcpServers": {
>     "langgraph-docs-mcp": {
>       "command": "uvx",
>       "args": [
>         "--python",
>         "/path/to/python",
>         "--from",
>         "mcpdoc",
>         "mcpdoc",
>         "--urls",
>         "LangGraph:https://langchain-ai.github.io/langgraph/llms.txt",
>         "--transport",
>         "stdio"
>       ]
>     }
>   }
> }
> ```
> </details>

> [!Note]
> Currently (3/21/25) it appears that Claude Desktop does not support `rules` for global rules, so appending the following to your prompt.

```
<rules>
for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer -- 
+ call list_doc_sources tool to get the available llms.txt file
+ call fetch_docs tool to read it
+ reflect on the urls in llms.txt 
+ reflect on the input question 
+ call fetch_docs on any urls relevant to the question
</rules>
```

![Screenshot 2025-03-18 at 2 05 54 PM](https://github.com/user-attachments/assets/228d96b6-8fb3-4385-8399-3e42fa08b128)

* You will see your tools visible in the bottom right of your chat input.

![Screenshot 2025-03-18 at 2 05 39 PM](https://github.com/user-attachments/assets/71f3c507-91b2-4fa7-9bd1-ac9cbed73cfb)

Then, try the example prompt:

* It will ask to approve tool calls as it processes your request.

![Screenshot 2025-03-18 at 2 06 54 PM](https://github.com/user-attachments/assets/59b3a010-94fa-4a4d-b650-5cd449afeec0)

### Connect to Claude Code

* In a terminal after installing [Claude Code](https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview), run this command to add the MCP server to your project:
```
claude mcp add-json langgraph-docs '{"type":"stdio","command":"uvx" ,"args":["--from", "mcpdoc", "mcpdoc", "--urls", "langgraph:https://langchain-ai.github.io/langgraph/llms.txt", "LangChain:https://python.langchain.com/llms.txt"]}' -s local
```
* You will see `~/.claude.json` updated.
* Test by launching Claude Code and running to view your tools:
```
$ Claude
$ /mcp 
```

![Screenshot 2025-03-18 at 2 13 49 PM](https://github.com/user-attachments/assets/eb876a0e-27b4-480e-8c37-0f683f878616)

> [!Note]
> Currently (3/21/25) it appears that Claude Code does not support `rules` for global rules, so appending the following to your prompt.

```
<rules>
for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer -- 
+ call list_doc_sources tool to get the available llms.txt file
+ call fetch_docs tool to read it
+ reflect on the urls in llms.txt 
+ reflect on the input question 
+ call fetch_docs on any urls relevant to the question
</rules>
```

Then, try the example prompt:

* It will ask to approve tool calls.

![Screenshot 2025-03-18 at 2 14 37 PM](https://github.com/user-attachments/assets/5b9a2938-ea69-4443-8d3b-09061faccad0)

## Command-line Interface

The `mcpdoc` command provides a simple CLI for launching the documentation server. 

You can specify documentation sources in three ways, and these can be combined:

1. Using a YAML config file:

* This will load the LangGraph Python documentation from the `sample_config.yaml` file in this repo.

```bash
mcpdoc --yaml sample_config.yaml
```

2. Using a JSON config file:

* This will load the LangGraph Python documentation from the `sample_config.json` file in this repo.

```bash
mcpdoc --json sample_config.json
```

3. Directly specifying llms.txt URLs with optional names:

* URLs can be specified either as plain URLs or with optional names using the format `name:url`.
* You can specify multiple URLs by using the `--urls` parameter multiple times.
* This is how we loaded `llms.txt` for the MCP server above.

```bash
mcpdoc --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt --urls LangChain:https://python.langchain.com/llms.txt
```

You can also combine these methods to merge documentation sources:

```bash
mcpdoc --yaml sample_config.yaml --json sample_config.json --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt --urls LangChain:https://python.langchain.com/llms.txt
```

## Additional Options

- `--follow-redirects`: Follow HTTP redirects (defaults to False)
- `--timeout SECONDS`: HTTP request timeout in seconds (defaults to 10.0)

Example with additional options:

```bash
mcpdoc --yaml sample_config.yaml --follow-redirects --timeout 15
```

This will load the LangGraph Python documentation with a 15-second timeout and follow any HTTP redirects if necessary.

## Configuration Format

Both YAML and JSON configuration files should contain a list of documentation sources. 

Each source must include an `llms_txt` URL and can optionally include a `name`:

### YAML Configuration Example (sample_config.yaml)

```yaml
# Sample configuration for mcp-mcpdoc server
# Each entry must have a llms_txt URL and optionally a name
- name: LangGraph Python
  llms_txt: https://langchain-ai.github.io/langgraph/llms.txt
```

### JSON Configuration Example (sample_config.json)

```json
[
  {
    "name": "LangGraph Python",
    "llms_txt": "https://langchain-ai.github.io/langgraph/llms.txt"
  }
]
```

## Programmatic Usage

```python
from mcpdoc.main import create_server

# Create a server with documentation sources
server = create_server(
    [
        {
            "name": "LangGraph Python",
            "llms_txt": "https://langchain-ai.github.io/langgraph/llms.txt",
        },
        # You can add multiple documentation sources
        # {
        #     "name": "Another Documentation",
        #     "llms_txt": "https://example.com/llms.txt",
        # },
    ],
    follow_redirects=True,
    timeout=15.0,
)

# Run the server
server.run(transport="stdio")
```

```

--------------------------------------------------------------------------------
/tests/unit_tests/__init__.py:
--------------------------------------------------------------------------------

```python

```

--------------------------------------------------------------------------------
/mcpdoc/__init__.py:
--------------------------------------------------------------------------------

```python
from mcpdoc._version import __version__

__all__ = ["__version__"]

```

--------------------------------------------------------------------------------
/sample_config.json:
--------------------------------------------------------------------------------

```json
[
  {
    "name": "LangGraph Python",
    "llms_txt": "https://langchain-ai.github.io/langgraph/llms.txt"
  }
]

```

--------------------------------------------------------------------------------
/sample_config.yaml:
--------------------------------------------------------------------------------

```yaml
# Sample configuration for mcp-llms-txt server
# Each entry must have a llms_txt URL and optionally a name
- name: LangGraph Python
  llms_txt: https://langchain-ai.github.io/langgraph/llms.txt

```

--------------------------------------------------------------------------------
/mcpdoc/_version.py:
--------------------------------------------------------------------------------

```python
from importlib import metadata

try:
    __version__ = metadata.version(__package__)
except metadata.PackageNotFoundError:
    # Case where package metadata is not available.
    __version__ = ""

```

--------------------------------------------------------------------------------
/tests/unit_tests/test_imports.py:
--------------------------------------------------------------------------------

```python
def test_imports():
    """Test that main modules can be imported."""
    from mcpdoc import main  # noqa
    from mcpdoc import cli  # noqa
    from mcpdoc import langgraph  # noqa

    assert True

```

--------------------------------------------------------------------------------
/.github/actions/uv_setup/action.yml:
--------------------------------------------------------------------------------

```yaml
# TODO: https://docs.astral.sh/uv/guides/integration/github/#caching

name: uv-install
description: Set up Python and uv

inputs:
  python-version:
    description: Python version, supporting MAJOR.MINOR only
    required: true

env:
  UV_VERSION: "0.5.25"

runs:
  using: composite
  steps:
    - name: Install uv and set the python version
      uses: astral-sh/setup-uv@v5
      with:
        version: ${{ env.UV_VERSION }}
        python-version: ${{ inputs.python-version }}

```

--------------------------------------------------------------------------------
/.github/workflows/_test.yml:
--------------------------------------------------------------------------------

```yaml
name: test

on:
  workflow_call:
    inputs:
      working-directory:
        required: true
        type: string
        description: "From which folder this pipeline executes"
      python-version:
        required: true
        type: string
        description: "Python version to use"

env:
  UV_FROZEN: "true"
  UV_NO_SYNC: "true"

jobs:
  build:
    defaults:
      run:
        working-directory: ${{ inputs.working-directory }}
    runs-on: ubuntu-latest
    timeout-minutes: 20
    name: "make test #${{ inputs.python-version }}"
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python ${{ inputs.python-version }} + uv
        uses: "./.github/actions/uv_setup"
        id: setup-python
        with:
          python-version: ${{ inputs.python-version }}
      - name: Install dependencies
        shell: bash
        run: uv sync --group test

      - name: Run core tests
        shell: bash
        run: |
          make test

```

--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------

```toml
[project]
name = "mcpdoc"
version = "0.0.10"
description = "Server llms-txt documentation over MCP"
readme = "README.md"
license = "MIT"
requires-python = ">=3.10"
dependencies = [
    "httpx>=0.28.1",
    "markdownify>=1.1.0",
    "mcp[cli]>=1.4.1",
    "pyyaml>=6.0.1",
]

[project.scripts]
mcpdoc = "mcpdoc.cli:main"

[dependency-groups]
test = [
    "pytest>=8.3.4",
    "pytest-asyncio>=0.25.3",
    "pytest-cov>=6.0.0",
    "pytest-mock>=3.14.0",
    "pytest-socket>=0.7.0",
    "pytest-timeout>=2.3.1",
    "ruff>=0.9.7",
]



[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.pytest.ini_options]
minversion = "8.0"
# -ra: Report all extra test outcomes (passed, skipped, failed, etc.)
# -q: Enable quiet mode for less cluttered output
# -v: Enable verbose output to display detailed test names and statuses
# --durations=5: Show the 10 slowest tests after the run (useful for performance tuning)
addopts = "-ra -q -v --durations=5"
testpaths = [
    "tests",
]
python_files = ["test_*.py"]
python_functions = ["test_*"]
asyncio_mode = "auto"
asyncio_default_fixture_loop_scope = "function"


```

--------------------------------------------------------------------------------
/.github/workflows/_lint.yml:
--------------------------------------------------------------------------------

```yaml
name: lint

on:
  workflow_call:
    inputs:
      working-directory:
        required: true
        type: string
        description: "From which folder this pipeline executes"
      python-version:
        required: true
        type: string
        description: "Python version to use"

env:
  WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}

  # This env var allows us to get inline annotations when ruff has complaints.
  RUFF_OUTPUT_FORMAT: github

  UV_FROZEN: "true"

jobs:
  build:
    name: "make lint #${{ inputs.python-version }}"
    runs-on: ubuntu-latest
    timeout-minutes: 20
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python ${{ inputs.python-version }} + uv
        uses: "./.github/actions/uv_setup"
        with:
          python-version: ${{ inputs.python-version }}

      - name: Install dependencies
        working-directory: ${{ inputs.working-directory }}
        run: |
          uv sync --group test

      - name: Analysing the code with our lint
        working-directory: ${{ inputs.working-directory }}
        run: |
          make lint

```

--------------------------------------------------------------------------------
/mcpdoc/langgraph.py:
--------------------------------------------------------------------------------

```python
"""A server for just langgraph docs from langchain-ai.github.io.

This is used as a way to test the doc functionality via MCP.
"""

# /usr/bin/env python3
import httpx
from markdownify import markdownify
from mcp.server.fastmcp import FastMCP

server = FastMCP(name="llms-txt")

ALLOWED_PREFIX = "https://langchain-ai.github.io/"

HTTPX_CLIENT = httpx.AsyncClient(follow_redirects=False)


@server.tool()
async def get_docs(url: str = "overview") -> str:
    """Get langgraph docs.

    Always fetch the `overview` prior to fetching any other URLs as it will provide a
    list of available URLs.

    Args:
        url: The URL to fetch. Must start with https://langchain-ai.github.io/
        or be "overview".
    """
    if url == "overview":
        url = "https://langchain-ai.github.io/langgraph/llms.txt"

    if not url.startswith(ALLOWED_PREFIX):
        return (
            "Error: Invalid url. Must start with https://langchain-ai.github.io/ "
            'or be "overview"'
        )

    response = await HTTPX_CLIENT.get(url)
    response.raise_for_status()
    if response.status_code == 200:
        # Convert HTML to markdown
        markdown_content = markdownify(response.text)
        return markdown_content
    else:
        return "Encountered an error while fetching the URL."


if __name__ == "__main__":
    server.run(transport="stdio")

```

--------------------------------------------------------------------------------
/.github/workflows/ci.yml:
--------------------------------------------------------------------------------

```yaml
---
name: Run CI Tests

on:
  push:
    branches: [ main ]
  pull_request:
  workflow_dispatch:  # Allows to trigger the workflow manually in GitHub UI

# If another push to the same PR or branch happens while this workflow is still running,
# cancel the earlier run in favor of the next run.
#
# There's no point in testing an outdated version of the code. GitHub only allows
# a limited number of job runners to be active at the same time, so it's better to cancel
# pointless jobs early so that more useful jobs can run sooner.
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  lint:
    strategy:
      matrix:
        # Only lint on the min and max supported Python versions.
        # It's extremely unlikely that there's a lint issue on any version in between
        # that doesn't show up on the min or max versions.
        #
        # GitHub rate-limits how many jobs can be running at any one time.
        # Starting new jobs is also relatively slow,
        # so linting on fewer versions makes CI faster.
        python-version:
          - "3.12"
    uses:
      ./.github/workflows/_lint.yml
    with:
      working-directory: .
      python-version: ${{ matrix.python-version }}
    secrets: inherit
  test:
    strategy:
      matrix:
        # Only lint on the min and max supported Python versions.
        # It's extremely unlikely that there's a lint issue on any version in between
        # that doesn't show up on the min or max versions.
        #
        # GitHub rate-limits how many jobs can be running at any one time.
        # Starting new jobs is also relatively slow,
        # so linting on fewer versions makes CI faster.
        python-version:
          - "3.10"
          - "3.12"
    uses:
      ./.github/workflows/_test.yml
    with:
      working-directory: .
      python-version: ${{ matrix.python-version }}
    secrets: inherit


```

--------------------------------------------------------------------------------
/tests/unit_tests/test_main.py:
--------------------------------------------------------------------------------

```python
"""Tests for mcpdoc.main module."""

import pytest

from mcpdoc.main import (
    _get_fetch_description,
    _is_http_or_https,
    extract_domain,
)


def test_extract_domain() -> None:
    """Test extract_domain function."""
    # Test with https URL
    assert extract_domain("https://example.com/page") == "https://example.com/"

    # Test with http URL
    assert extract_domain("http://test.org/docs/index.html") == "http://test.org/"

    # Test with URL that has port
    assert extract_domain("https://localhost:8080/api") == "https://localhost:8080/"

    # Check trailing slash
    assert extract_domain("https://localhost:8080") == "https://localhost:8080/"

    # Test with URL that has subdomain
    assert extract_domain("https://docs.python.org/3/") == "https://docs.python.org/"


@pytest.mark.parametrize(
    "url,expected",
    [
        ("http://example.com", True),
        ("https://example.com", True),
        ("/path/to/file.txt", False),
        ("file:///path/to/file.txt", False),
        (
            "ftp://example.com",
            False,
        ),  # Not HTTP or HTTPS, even though it's not a local file
    ],
)
def test_is_http_or_https(url, expected):
    """Test _is_http_or_https function."""
    assert _is_http_or_https(url) is expected


@pytest.mark.parametrize(
    "has_local_sources,expected_substrings",
    [
        (True, ["local file path", "file://"]),
        (False, ["URL to fetch"]),
    ],
)
def test_get_fetch_description(has_local_sources, expected_substrings):
    """Test _get_fetch_description function."""
    description = _get_fetch_description(has_local_sources)

    # Common assertions for both cases
    assert "Fetch and parse documentation" in description
    assert "Returns:" in description

    # Specific assertions based on has_local_sources
    for substring in expected_substrings:
        if has_local_sources:
            assert substring in description
        else:
            # For the False case, we only check that "local file path"
            # and "file://" are NOT present
            if substring in ["local file path", "file://"]:
                assert substring not in description

```

--------------------------------------------------------------------------------
/.github/workflows/release.yml:
--------------------------------------------------------------------------------

```yaml
name: release
run-name: Release ${{ inputs.working-directory }} by @${{ github.actor }}
on:
  workflow_call:
    inputs:
      working-directory:
        required: true
        type: string
        description: "From which folder this pipeline executes"
  workflow_dispatch:
    inputs:
      working-directory:
        description: "From which folder this pipeline executes"
        default: "."
      dangerous-nonmain-release:
        required: false
        type: boolean
        default: false
        description: "Release from a non-main branch (danger!)"

env:
  PYTHON_VERSION: "3.11"
  UV_FROZEN: "true"
  UV_NO_SYNC: "true"

jobs:
  build:
    if: github.ref == 'refs/heads/main' || inputs.dangerous-nonmain-release
    environment: Scheduled testing
    runs-on: ubuntu-latest

    outputs:
      pkg-name: ${{ steps.check-version.outputs.pkg-name }}
      version: ${{ steps.check-version.outputs.version }}

    steps:
      - uses: actions/checkout@v4

      - name: Set up Python + uv
        uses: "./.github/actions/uv_setup"
        with:
          python-version: ${{ env.PYTHON_VERSION }}

      # We want to keep this build stage *separate* from the release stage,
      # so that there's no sharing of permissions between them.
      # The release stage has trusted publishing and GitHub repo contents write access,
      # and we want to keep the scope of that access limited just to the release job.
      # Otherwise, a malicious `build` step (e.g. via a compromised dependency)
      # could get access to our GitHub or PyPI credentials.
      #
      # Per the trusted publishing GitHub Action:
      # > It is strongly advised to separate jobs for building [...]
      # > from the publish job.
      # https://github.com/pypa/gh-action-pypi-publish#non-goals
      - name: Build project for distribution
        run: uv build
      - name: Upload build
        uses: actions/upload-artifact@v4
        with:
          name: dist
          path: ${{ inputs.working-directory }}/dist/

      - name: Check Version
        id: check-version
        shell: python
        working-directory: ${{ inputs.working-directory }}
        run: |
          import os
          import tomllib
          with open("pyproject.toml", "rb") as f:
              data = tomllib.load(f)
          pkg_name = data["project"]["name"]
          version = data["project"]["version"]
          with open(os.environ["GITHUB_OUTPUT"], "a") as f:
              f.write(f"pkg-name={pkg_name}\n")
              f.write(f"version={version}\n")

  publish:
    needs:
      - build
    runs-on: ubuntu-latest
    permissions:
      # This permission is used for trusted publishing:
      # https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/
      #
      # Trusted publishing has to also be configured on PyPI for each package:
      # https://docs.pypi.org/trusted-publishers/adding-a-publisher/
      id-token: write

    defaults:
      run:
        working-directory: ${{ inputs.working-directory }}

    steps:
      - uses: actions/checkout@v4

      - name: Set up Python + uv
        uses: "./.github/actions/uv_setup"
        with:
          python-version: ${{ env.PYTHON_VERSION }}

      - uses: actions/download-artifact@v4
        with:
          name: dist
          path: ${{ inputs.working-directory }}/dist/

      - name: Publish package distributions to PyPI
        uses: pypa/gh-action-pypi-publish@release/v1
        with:
          packages-dir: ${{ inputs.working-directory }}/dist/
          verbose: true
          print-hash: true
          # Temp workaround since attestations are on by default as of gh-action-pypi-publish v1.11.0
          attestations: false

  mark-release:
    needs:
      - build
      - publish
    runs-on: ubuntu-latest
    permissions:
      # This permission is needed by `ncipollo/release-action` to
      # create the GitHub release.
      contents: write

    defaults:
      run:
        working-directory: ${{ inputs.working-directory }}

    steps:
      - uses: actions/checkout@v4

      - name: Set up Python + uv
        uses: "./.github/actions/uv_setup"
        with:
          python-version: ${{ env.PYTHON_VERSION }}

      - uses: actions/download-artifact@v4
        with:
          name: dist
          path: ${{ inputs.working-directory }}/dist/

      - name: Create Tag
        uses: ncipollo/release-action@v1
        with:
          artifacts: "dist/*"
          token: ${{ secrets.GITHUB_TOKEN }}
          generateReleaseNotes: true
          tag: ${{needs.build.outputs.pkg-name}}==${{ needs.build.outputs.version }}
          body: ${{ needs.release-notes.outputs.release-body }}
          commit: main
          makeLatest: true
```

--------------------------------------------------------------------------------
/mcpdoc/cli.py:
--------------------------------------------------------------------------------

```python
#!/usr/bin/env python3
"""Command-line interface for mcp-llms-txt server."""

import argparse
import json
import sys
from typing import List, Dict

import yaml

from mcpdoc._version import __version__
from mcpdoc.main import create_server, DocSource
from mcpdoc.splash import SPLASH


class CustomFormatter(
    argparse.RawDescriptionHelpFormatter, argparse.ArgumentDefaultsHelpFormatter
):
    # Custom formatter to preserve epilog formatting while showing default values
    pass


EPILOG = """
Examples:
  # Directly specifying llms.txt URLs with optional names
  mcpdoc --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt
  
  # Using a local file (absolute or relative path)
  mcpdoc --urls LocalDocs:/path/to/llms.txt --allowed-domains '*'
  
  # Using a YAML config file
  mcpdoc --yaml sample_config.yaml

  # Using a JSON config file
  mcpdoc --json sample_config.json

  # Combining multiple documentation sources
  mcpdoc --yaml sample_config.yaml --json sample_config.json --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt

  # Using SSE transport with default host (127.0.0.1) and port (8000)
  mcpdoc --yaml sample_config.yaml --transport sse
  
  # Using SSE transport with custom host and port
  mcpdoc --yaml sample_config.yaml --transport sse --host 0.0.0.0 --port 9000
  
  # Using SSE transport with additional HTTP options
  mcpdoc --yaml sample_config.yaml --follow-redirects --timeout 15 --transport sse --host localhost --port 8080
  
  # Allow fetching from additional domains. The domains hosting the llms.txt files are always allowed.
  mcpdoc --yaml sample_config.yaml --allowed-domains https://example.com/ https://another-example.com/
  
  # Allow fetching from any domain
  mcpdoc --yaml sample_config.yaml --allowed-domains '*'
"""


def parse_args() -> argparse.Namespace:
    """Parse command-line arguments."""
    # Custom formatter to preserve epilog formatting
    parser = argparse.ArgumentParser(
        description="MCP LLMS-TXT Documentation Server",
        formatter_class=CustomFormatter,
        epilog=EPILOG,
    )

    # Allow combining multiple doc source methods
    parser.add_argument(
        "--yaml", "-y", type=str, help="Path to YAML config file with doc sources"
    )
    parser.add_argument(
        "--json", "-j", type=str, help="Path to JSON config file with doc sources"
    )
    parser.add_argument(
        "--urls",
        "-u",
        type=str,
        nargs="+",
        help="List of llms.txt URLs or file paths with optional names (format: 'url_or_path' or 'name:url_or_path')",
    )

    parser.add_argument(
        "--follow-redirects",
        action="store_true",
        help="Whether to follow HTTP redirects",
    )
    parser.add_argument(
        "--allowed-domains",
        type=str,
        nargs="*",
        help="Additional allowed domains to fetch documentation from. Use '*' to allow all domains.",
    )
    parser.add_argument(
        "--timeout", type=float, default=10.0, help="HTTP request timeout in seconds"
    )
    parser.add_argument(
        "--transport",
        type=str,
        default="stdio",
        choices=["stdio", "sse"],
        help="Transport protocol for MCP server",
    )

    parser.add_argument(
        "--log-level",
        type=str,
        default="INFO",
        help=(
            "Log level for the server. Use one on the following: DEBUG, INFO, "
            "WARNING, ERROR."
            " (only used with --transport sse)"
        ),
    )

    # SSE-specific options
    parser.add_argument(
        "--host",
        type=str,
        default="127.0.0.1",
        help="Host to bind the server to (only used with --transport sse)",
    )
    parser.add_argument(
        "--port",
        type=int,
        default=8000,
        help="Port to bind the server to (only used with --transport sse)",
    )

    # Version information
    parser.add_argument(
        "--version",
        "-V",
        action="version",
        version=f"mcpdoc {__version__}",
        help="Show version information and exit",
    )

    return parser.parse_args()


def load_config_file(file_path: str, file_format: str) -> List[Dict[str, str]]:
    """Load configuration from a file.

    Args:
        file_path: Path to the config file
        file_format: Format of the config file ("yaml" or "json")

    Returns:
        List of doc source configurations
    """
    try:
        with open(file_path, "r", encoding="utf-8") as file:
            if file_format.lower() == "yaml":
                config = yaml.safe_load(file)
            elif file_format.lower() == "json":
                config = json.load(file)
            else:
                raise ValueError(f"Unsupported file format: {file_format}")

        if not isinstance(config, list):
            raise ValueError("Config file must contain a list of doc sources")

        return config
    except (FileNotFoundError, yaml.YAMLError, json.JSONDecodeError) as e:
        print(f"Error loading config file: {e}", file=sys.stderr)
        sys.exit(1)


def create_doc_sources_from_urls(urls: List[str]) -> List[DocSource]:
    """Create doc sources from a list of URLs or file paths with optional names.

    Args:
        urls: List of llms.txt URLs or file paths with optional names
             (format: 'url_or_path' or 'name:url_or_path')

    Returns:
        List of DocSource objects
    """
    doc_sources = []
    for entry in urls:
        if not entry.strip():
            continue
        if ":" in entry and not entry.startswith(("http:", "https:")):
            # Format is name:url
            name, url = entry.split(":", 1)
            doc_sources.append({"name": name, "llms_txt": url})
        else:
            # Format is just url
            doc_sources.append({"llms_txt": entry})
    return doc_sources


def main() -> None:
    """Main entry point for the CLI."""
    # Check if any arguments were provided
    if len(sys.argv) == 1:
        # No arguments, print help
        # Use the same custom formatter as parse_args()
        help_parser = argparse.ArgumentParser(
            description="MCP LLMS-TXT Documentation Server",
            formatter_class=CustomFormatter,
            epilog=EPILOG,
        )
        # Add version to help parser too
        help_parser.add_argument(
            "--version",
            "-V",
            action="version",
            version=f"mcpdoc {__version__}",
            help="Show version information and exit",
        )
        help_parser.print_help()
        sys.exit(0)

    args = parse_args()

    # Load doc sources based on command-line arguments
    doc_sources: List[DocSource] = []

    # Check if any source options were provided
    if not (args.yaml or args.json or args.urls):
        print(
            "Error: At least one source option (--yaml, --json, or --urls) is required",
            file=sys.stderr,
        )
        sys.exit(1)

    # Merge doc sources from all provided methods
    if args.yaml:
        doc_sources.extend(load_config_file(args.yaml, "yaml"))
    if args.json:
        doc_sources.extend(load_config_file(args.json, "json"))
    if args.urls:
        doc_sources.extend(create_doc_sources_from_urls(args.urls))

    # Only used with SSE transport
    settings = {
        "host": args.host,
        "port": args.port,
        "log_level": "INFO",
    }

    # Create and run the server
    server = create_server(
        doc_sources,
        follow_redirects=args.follow_redirects,
        timeout=args.timeout,
        settings=settings,
        allowed_domains=args.allowed_domains,
    )

    if args.transport == "sse":
        print()
        print(SPLASH)
        print()

        print(
            f"Launching MCPDOC server with {len(doc_sources)} doc sources",
        )

    # Pass transport-specific options
    server.run(transport=args.transport)


if __name__ == "__main__":
    main()

```

--------------------------------------------------------------------------------
/mcpdoc/main.py:
--------------------------------------------------------------------------------

```python
"""MCP Llms-txt server for docs."""

import os
import re
from urllib.parse import urlparse, urljoin

import httpx
from markdownify import markdownify
from mcp.server.fastmcp import FastMCP
from typing_extensions import NotRequired, TypedDict


class DocSource(TypedDict):
    """A source of documentation for a library or a package."""

    name: NotRequired[str]
    """Name of the documentation source (optional)."""

    llms_txt: str
    """URL to the llms.txt file or documentation source."""

    description: NotRequired[str]
    """Description of the documentation source (optional)."""


def extract_domain(url: str) -> str:
    """Extract domain from URL.

    Args:
        url: Full URL

    Returns:
        Domain with scheme and trailing slash (e.g., https://example.com/)
    """
    parsed = urlparse(url)
    return f"{parsed.scheme}://{parsed.netloc}/"


def _is_http_or_https(url: str) -> bool:
    """Check if the URL is an HTTP or HTTPS URL."""
    return url.startswith(("http:", "https:"))


def _get_fetch_description(has_local_sources: bool) -> str:
    """Get fetch docs tool description."""
    description = [
        "Fetch and parse documentation from a given URL or local file.",
        "",
        "Use this tool after list_doc_sources to:",
        "1. First fetch the llms.txt file from a documentation source",
        "2. Analyze the URLs listed in the llms.txt file",
        "3. Then fetch specific documentation pages relevant to the user's question",
        "",
    ]

    if has_local_sources:
        description.extend(
            [
                "Args:",
                "    url: The URL or file path to fetch documentation from. Can be:",
                "        - URL from an allowed domain",
                "        - A local file path (absolute or relative)",
                "        - A file:// URL (e.g., file:///path/to/llms.txt)",
            ]
        )
    else:
        description.extend(
            [
                "Args:",
                "    url: The URL to fetch documentation from.",
            ]
        )

    description.extend(
        [
            "",
            "Returns:",
            "    The fetched documentation content converted to markdown, or an error message",  # noqa: E501
            "    if the request fails or the URL is not from an allowed domain.",
        ]
    )

    return "\n".join(description)


def _normalize_path(path: str) -> str:
    """Accept paths in file:/// or relative format and map to absolute paths."""
    return (
        os.path.abspath(path[7:])
        if path.startswith("file://")
        else os.path.abspath(path)
    )


def _get_server_instructions(doc_sources: list[DocSource]) -> str:
    """Generate server instructions with available documentation source names."""
    # Extract source names from doc_sources
    source_names = []
    for entry in doc_sources:
        if "name" in entry:
            source_names.append(entry["name"])
        elif _is_http_or_https(entry["llms_txt"]):
            # Use domain name as fallback for HTTP sources
            domain = extract_domain(entry["llms_txt"])
            source_names.append(domain.rstrip("/").split("//")[-1])
        else:
            # Use filename as fallback for local sources
            source_names.append(os.path.basename(entry["llms_txt"]))

    instructions = [
        "Use the list_doc_sources tool to see available documentation sources.",
        "This tool will return a URL for each documentation source.",
    ]

    if source_names:
        if len(source_names) == 1:
            instructions.append(
                f"Documentation URLs are available from this tool "
                f"for {source_names[0]}."
            )
        else:
            names_str = ", ".join(source_names[:-1]) + f", and {source_names[-1]}"
            instructions.append(
                f"Documentation URLs are available from this tool for {names_str}."
            )

    instructions.extend(
        [
            "",
            "Once you have a source documentation URL, use the fetch_docs tool "
            "to get the documentation contents. ",
            "If the documentation contents contains a URL for additional documentation "
            "that is relevant to your task, you can use the fetch_docs tool to "
            "fetch documentation from that URL next.",
        ]
    )

    return "\n".join(instructions)


def create_server(
    doc_sources: list[DocSource],
    *,
    follow_redirects: bool = False,
    timeout: float = 10,
    settings: dict | None = None,
    allowed_domains: list[str] | None = None,
) -> FastMCP:
    """Create the server and generate documentation retrieval tools.

    Args:
        doc_sources: List of documentation sources to make available
        follow_redirects: Whether to follow HTTP redirects when fetching docs
        timeout: HTTP request timeout in seconds
        settings: Additional settings to pass to FastMCP
        allowed_domains: Additional domains to allow fetching from.
            Use ['*'] to allow all domains
            The domain hosting the llms.txt file is always appended to the list
            of allowed domains.

    Returns:
        A FastMCP server instance configured with documentation tools
    """
    settings = settings or {}
    server = FastMCP(
        name="llms-txt",
        instructions=_get_server_instructions(doc_sources),
        **settings,
    )
    httpx_client = httpx.AsyncClient(follow_redirects=follow_redirects, timeout=timeout)

    local_sources = []
    remote_sources = []

    for entry in doc_sources:
        url = entry["llms_txt"]
        if _is_http_or_https(url):
            remote_sources.append(entry)
        else:
            local_sources.append(entry)

    # Let's verify that all local sources exist
    for entry in local_sources:
        path = entry["llms_txt"]
        abs_path = _normalize_path(path)
        if not os.path.exists(abs_path):
            raise FileNotFoundError(f"Local file not found: {abs_path}")

    # Parse the domain names in the llms.txt URLs and identify local file paths
    domains = set(extract_domain(entry["llms_txt"]) for entry in remote_sources)

    # Add additional allowed domains if specified, or set to '*' if we have local files
    if allowed_domains:
        if "*" in allowed_domains:
            domains = {"*"}  # Special marker for allowing all domains
        else:
            domains.update(allowed_domains)

    allowed_local_files = set(
        _normalize_path(entry["llms_txt"]) for entry in local_sources
    )

    @server.tool()
    def list_doc_sources() -> str:
        """List all available documentation sources.

        This is the first tool you should call in the documentation workflow.
        It provides URLs to llms.txt files or local file paths that the user has made available.

        Returns:
            A string containing a formatted list of documentation sources with their URLs or file paths
        """
        content = ""
        for entry_ in doc_sources:
            url_or_path = entry_["llms_txt"]

            if _is_http_or_https(url_or_path):
                name = entry_.get("name", extract_domain(url_or_path))
                content += f"{name}\nURL: {url_or_path}\n\n"
            else:
                path = _normalize_path(url_or_path)
                name = entry_.get("name", path)
                content += f"{name}\nPath: {path}\n\n"
        return content

    fetch_docs_description = _get_fetch_description(
        has_local_sources=bool(local_sources)
    )

    @server.tool(description=fetch_docs_description)
    async def fetch_docs(url: str) -> str:
        nonlocal domains, follow_redirects
        url = url.strip()
        # Handle local file paths (either as file:// URLs or direct filesystem paths)
        if not _is_http_or_https(url):
            abs_path = _normalize_path(url)
            if abs_path not in allowed_local_files:
                raise ValueError(
                    f"Local file not allowed: {abs_path}. Allowed files: {allowed_local_files}"
                )
            try:
                with open(abs_path, "r", encoding="utf-8") as f:
                    content = f.read()
                return markdownify(content)
            except Exception as e:
                return f"Error reading local file: {str(e)}"
        else:
            # Otherwise treat as URL
            if "*" not in domains and not any(
                url.startswith(domain) for domain in domains
            ):
                return (
                    "Error: URL not allowed. Must start with one of the following domains: "
                    + ", ".join(domains)
                )

            try:
                response = await httpx_client.get(url, timeout=timeout)
                response.raise_for_status()
                content = response.text

                if follow_redirects:
                    # Check for meta refresh tag which indicates a client-side redirect
                    match = re.search(
                        r'<meta http-equiv="refresh" content="[^;]+;\s*url=([^"]+)"',
                        content,
                        re.IGNORECASE,
                    )

                    if match:
                        redirect_url = match.group(1)
                        new_url = urljoin(str(response.url), redirect_url)

                        if "*" not in domains and not any(
                            new_url.startswith(domain) for domain in domains
                        ):
                            return (
                                "Error: Redirect URL not allowed. Must start with one of the following domains: "
                                + ", ".join(domains)
                            )

                        response = await httpx_client.get(new_url, timeout=timeout)
                        response.raise_for_status()
                        content = response.text

                return markdownify(content)
            except (httpx.HTTPStatusError, httpx.RequestError) as e:
                return f"Encountered an HTTP error: {str(e)}"

    return server

```