# Directory Structure
```
├── .github
│ ├── actions
│ │ └── uv_setup
│ │ └── action.yml
│ └── workflows
│ ├── _lint.yml
│ ├── _test.yml
│ ├── ci.yml
│ └── release.yml
├── .gitignore
├── LICENSE
├── Makefile
├── mcpdoc
│ ├── __init__.py
│ ├── _version.py
│ ├── cli.py
│ ├── langgraph.py
│ ├── main.py
│ └── splash.py
├── pyproject.toml
├── README.md
├── sample_config.json
├── sample_config.yaml
├── tests
│ └── unit_tests
│ ├── __init__.py
│ ├── test_imports.py
│ └── test_main.py
└── uv.lock
```
# Files
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
```
.vs/
.vscode/
.idea/
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
docs/docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
notebooks/
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.envrc
.venv
.venvs
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# macOS display setting files
.DS_Store
# Wandb directory
wandb/
# asdf tool versions
.tool-versions
/.ruff_cache/
*.pkl
*.bin
# integration test artifacts
data_map*
\[('_type', 'fake'), ('stop', None)]
# Replit files
*replit*
node_modules
docs/.yarn/
docs/node_modules/
docs/.docusaurus/
docs/.cache-loader/
docs/_dist
docs/api_reference/api_reference.rst
docs/api_reference/experimental_api_reference.rst
docs/api_reference/_build
docs/api_reference/*/
!docs/api_reference/_static/
!docs/api_reference/templates/
!docs/api_reference/themes/
docs/docs_skeleton/build
docs/docs_skeleton/node_modules
docs/docs_skeleton/yarn.lock
# Any new jupyter notebooks
# not intended for the repo
Untitled*.ipynb
Chinook.db
.vercel
.turbo
```
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
```markdown
# MCP LLMS-TXT Documentation Server
## Overview
[llms.txt](https://llmstxt.org/) is a website index for LLMs, providing background information, guidance, and links to detailed markdown files. IDEs like Cursor and Windsurf or apps like Claude Code/Desktop can use `llms.txt` to retrieve context for tasks. However, these apps use different built-in tools to read and process files like `llms.txt`. The retrieval process can be opaque, and there is not always a way to audit the tool calls or the context returned.
[MCP](https://github.com/modelcontextprotocol) offers a way for developers to have *full control* over tools used by these applications. Here, we create [an open source MCP server](https://github.com/modelcontextprotocol) to provide MCP host applications (e.g., Cursor, Windsurf, Claude Code/Desktop) with (1) a user-defined list of `llms.txt` files and (2) a simple `fetch_docs` tool read URLs within any of the provided `llms.txt` files. This allows the user to audit each tool call as well as the context returned.
<img src="https://github.com/user-attachments/assets/736f8f55-833d-4200-b833-5fca01a09e1b" width="60%">
## llms-txt
You can find llms.txt files for langgraph and langchain here:
| Library | llms.txt |
|------------------|------------------------------------------------------------------------------------------------------------|
| LangGraph Python | [https://langchain-ai.github.io/langgraph/llms.txt](https://langchain-ai.github.io/langgraph/llms.txt) |
| LangGraph JS | [https://langchain-ai.github.io/langgraphjs/llms.txt](https://langchain-ai.github.io/langgraphjs/llms.txt) |
| LangChain Python | [https://python.langchain.com/llms.txt](https://python.langchain.com/llms.txt) |
| LangChain JS | [https://js.langchain.com/llms.txt](https://js.langchain.com/llms.txt) |
## Quickstart
#### Install uv
* Please see [official uv docs](https://docs.astral.sh/uv/getting-started/installation/#installation-methods) for other ways to install `uv`.
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
#### Choose an `llms.txt` file to use.
* For example, [here's](https://langchain-ai.github.io/langgraph/llms.txt) the LangGraph `llms.txt` file.
> **Note: Security and Domain Access Control**
>
> For security reasons, mcpdoc implements strict domain access controls:
>
> 1. **Remote llms.txt files**: When you specify a remote llms.txt URL (e.g., `https://langchain-ai.github.io/langgraph/llms.txt`), mcpdoc automatically adds only that specific domain (`langchain-ai.github.io`) to the allowed domains list. This means the tool can only fetch documentation from URLs on that domain.
>
> 2. **Local llms.txt files**: When using a local file, NO domains are automatically added to the allowed list. You MUST explicitly specify which domains to allow using the `--allowed-domains` parameter.
>
> 3. **Adding additional domains**: To allow fetching from domains beyond those automatically included:
> - Use `--allowed-domains domain1.com domain2.com` to add specific domains
> - Use `--allowed-domains '*'` to allow all domains (use with caution)
>
> This security measure prevents unauthorized access to domains not explicitly approved by the user, ensuring that documentation can only be retrieved from trusted sources.
#### (Optional) Test the MCP server locally with your `llms.txt` file(s) of choice:
```bash
uvx --from mcpdoc mcpdoc \
--urls "LangGraph:https://langchain-ai.github.io/langgraph/llms.txt" "LangChain:https://python.langchain.com/llms.txt" \
--transport sse \
--port 8082 \
--host localhost
```
* This should run at: http://localhost:8082

* Run [MCP inspector](https://modelcontextprotocol.io/docs/tools/inspector) and connect to the running server:
```bash
npx @modelcontextprotocol/inspector
```

* Here, you can test the `tool` calls.
#### Connect to Cursor
* Open `Cursor Settings` and `MCP` tab.
* This will open the `~/.cursor/mcp.json` file.

* Paste the following into the file (we use the `langgraph-docs-mcp` name and link to the LangGraph `llms.txt`).
```
{
"mcpServers": {
"langgraph-docs-mcp": {
"command": "uvx",
"args": [
"--from",
"mcpdoc",
"mcpdoc",
"--urls",
"LangGraph:https://langchain-ai.github.io/langgraph/llms.txt LangChain:https://python.langchain.com/llms.txt",
"--transport",
"stdio"
]
}
}
}
```
* Confirm that the server is running in your `Cursor Settings/MCP` tab.
* Best practice is to then update Cursor Global (User) rules.
* Open Cursor `Settings/Rules` and update `User Rules` with the following (or similar):
```
for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer --
+ call list_doc_sources tool to get the available llms.txt file
+ call fetch_docs tool to read it
+ reflect on the urls in llms.txt
+ reflect on the input question
+ call fetch_docs on any urls relevant to the question
+ use this to answer the question
```
* `CMD+L` (on Mac) to open chat.
* Ensure `agent` is selected.

Then, try an example prompt, such as:
```
what are types of memory in LangGraph?
```

### Connect to Windsurf
* Open Cascade with `CMD+L` (on Mac).
* Click `Configure MCP` to open the config file, `~/.codeium/windsurf/mcp_config.json`.
* Update with `langgraph-docs-mcp` as noted above.

* Update `Windsurf Rules/Global rules` with the following (or similar):
```
for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer --
+ call list_doc_sources tool to get the available llms.txt file
+ call fetch_docs tool to read it
+ reflect on the urls in llms.txt
+ reflect on the input question
+ call fetch_docs on any urls relevant to the question
```

Then, try the example prompt:
* It will perform your tool calls.

### Connect to Claude Desktop
* Open `Settings/Developer` to update `~/Library/Application\ Support/Claude/claude_desktop_config.json`.
* Update with `langgraph-docs-mcp` as noted above.
* Restart Claude Desktop app.
> [!Note]
> If you run into issues with Python version incompatibility when trying to add MCPDoc tools to Claude Desktop, you can explicitly specify the filepath to `python` executable in the `uvx` command.
>
> <details>
> <summary>Example configuration</summary>
>
> ```
> {
> "mcpServers": {
> "langgraph-docs-mcp": {
> "command": "uvx",
> "args": [
> "--python",
> "/path/to/python",
> "--from",
> "mcpdoc",
> "mcpdoc",
> "--urls",
> "LangGraph:https://langchain-ai.github.io/langgraph/llms.txt",
> "--transport",
> "stdio"
> ]
> }
> }
> }
> ```
> </details>
> [!Note]
> Currently (3/21/25) it appears that Claude Desktop does not support `rules` for global rules, so appending the following to your prompt.
```
<rules>
for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer --
+ call list_doc_sources tool to get the available llms.txt file
+ call fetch_docs tool to read it
+ reflect on the urls in llms.txt
+ reflect on the input question
+ call fetch_docs on any urls relevant to the question
</rules>
```

* You will see your tools visible in the bottom right of your chat input.

Then, try the example prompt:
* It will ask to approve tool calls as it processes your request.

### Connect to Claude Code
* In a terminal after installing [Claude Code](https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview), run this command to add the MCP server to your project:
```
claude mcp add-json langgraph-docs '{"type":"stdio","command":"uvx" ,"args":["--from", "mcpdoc", "mcpdoc", "--urls", "langgraph:https://langchain-ai.github.io/langgraph/llms.txt", "LangChain:https://python.langchain.com/llms.txt"]}' -s local
```
* You will see `~/.claude.json` updated.
* Test by launching Claude Code and running to view your tools:
```
$ Claude
$ /mcp
```

> [!Note]
> Currently (3/21/25) it appears that Claude Code does not support `rules` for global rules, so appending the following to your prompt.
```
<rules>
for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer --
+ call list_doc_sources tool to get the available llms.txt file
+ call fetch_docs tool to read it
+ reflect on the urls in llms.txt
+ reflect on the input question
+ call fetch_docs on any urls relevant to the question
</rules>
```
Then, try the example prompt:
* It will ask to approve tool calls.

## Command-line Interface
The `mcpdoc` command provides a simple CLI for launching the documentation server.
You can specify documentation sources in three ways, and these can be combined:
1. Using a YAML config file:
* This will load the LangGraph Python documentation from the `sample_config.yaml` file in this repo.
```bash
mcpdoc --yaml sample_config.yaml
```
2. Using a JSON config file:
* This will load the LangGraph Python documentation from the `sample_config.json` file in this repo.
```bash
mcpdoc --json sample_config.json
```
3. Directly specifying llms.txt URLs with optional names:
* URLs can be specified either as plain URLs or with optional names using the format `name:url`.
* You can specify multiple URLs by using the `--urls` parameter multiple times.
* This is how we loaded `llms.txt` for the MCP server above.
```bash
mcpdoc --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt --urls LangChain:https://python.langchain.com/llms.txt
```
You can also combine these methods to merge documentation sources:
```bash
mcpdoc --yaml sample_config.yaml --json sample_config.json --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt --urls LangChain:https://python.langchain.com/llms.txt
```
## Additional Options
- `--follow-redirects`: Follow HTTP redirects (defaults to False)
- `--timeout SECONDS`: HTTP request timeout in seconds (defaults to 10.0)
Example with additional options:
```bash
mcpdoc --yaml sample_config.yaml --follow-redirects --timeout 15
```
This will load the LangGraph Python documentation with a 15-second timeout and follow any HTTP redirects if necessary.
## Configuration Format
Both YAML and JSON configuration files should contain a list of documentation sources.
Each source must include an `llms_txt` URL and can optionally include a `name`:
### YAML Configuration Example (sample_config.yaml)
```yaml
# Sample configuration for mcp-mcpdoc server
# Each entry must have a llms_txt URL and optionally a name
- name: LangGraph Python
llms_txt: https://langchain-ai.github.io/langgraph/llms.txt
```
### JSON Configuration Example (sample_config.json)
```json
[
{
"name": "LangGraph Python",
"llms_txt": "https://langchain-ai.github.io/langgraph/llms.txt"
}
]
```
## Programmatic Usage
```python
from mcpdoc.main import create_server
# Create a server with documentation sources
server = create_server(
[
{
"name": "LangGraph Python",
"llms_txt": "https://langchain-ai.github.io/langgraph/llms.txt",
},
# You can add multiple documentation sources
# {
# "name": "Another Documentation",
# "llms_txt": "https://example.com/llms.txt",
# },
],
follow_redirects=True,
timeout=15.0,
)
# Run the server
server.run(transport="stdio")
```
```
--------------------------------------------------------------------------------
/tests/unit_tests/__init__.py:
--------------------------------------------------------------------------------
```python
```
--------------------------------------------------------------------------------
/mcpdoc/__init__.py:
--------------------------------------------------------------------------------
```python
from mcpdoc._version import __version__
__all__ = ["__version__"]
```
--------------------------------------------------------------------------------
/sample_config.json:
--------------------------------------------------------------------------------
```json
[
{
"name": "LangGraph Python",
"llms_txt": "https://langchain-ai.github.io/langgraph/llms.txt"
}
]
```
--------------------------------------------------------------------------------
/sample_config.yaml:
--------------------------------------------------------------------------------
```yaml
# Sample configuration for mcp-llms-txt server
# Each entry must have a llms_txt URL and optionally a name
- name: LangGraph Python
llms_txt: https://langchain-ai.github.io/langgraph/llms.txt
```
--------------------------------------------------------------------------------
/mcpdoc/_version.py:
--------------------------------------------------------------------------------
```python
from importlib import metadata
try:
__version__ = metadata.version(__package__)
except metadata.PackageNotFoundError:
# Case where package metadata is not available.
__version__ = ""
```
--------------------------------------------------------------------------------
/tests/unit_tests/test_imports.py:
--------------------------------------------------------------------------------
```python
def test_imports():
"""Test that main modules can be imported."""
from mcpdoc import main # noqa
from mcpdoc import cli # noqa
from mcpdoc import langgraph # noqa
assert True
```
--------------------------------------------------------------------------------
/.github/actions/uv_setup/action.yml:
--------------------------------------------------------------------------------
```yaml
# TODO: https://docs.astral.sh/uv/guides/integration/github/#caching
name: uv-install
description: Set up Python and uv
inputs:
python-version:
description: Python version, supporting MAJOR.MINOR only
required: true
env:
UV_VERSION: "0.5.25"
runs:
using: composite
steps:
- name: Install uv and set the python version
uses: astral-sh/setup-uv@v5
with:
version: ${{ env.UV_VERSION }}
python-version: ${{ inputs.python-version }}
```
--------------------------------------------------------------------------------
/.github/workflows/_test.yml:
--------------------------------------------------------------------------------
```yaml
name: test
on:
workflow_call:
inputs:
working-directory:
required: true
type: string
description: "From which folder this pipeline executes"
python-version:
required: true
type: string
description: "Python version to use"
env:
UV_FROZEN: "true"
UV_NO_SYNC: "true"
jobs:
build:
defaults:
run:
working-directory: ${{ inputs.working-directory }}
runs-on: ubuntu-latest
timeout-minutes: 20
name: "make test #${{ inputs.python-version }}"
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ inputs.python-version }} + uv
uses: "./.github/actions/uv_setup"
id: setup-python
with:
python-version: ${{ inputs.python-version }}
- name: Install dependencies
shell: bash
run: uv sync --group test
- name: Run core tests
shell: bash
run: |
make test
```
--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
```toml
[project]
name = "mcpdoc"
version = "0.0.10"
description = "Server llms-txt documentation over MCP"
readme = "README.md"
license = "MIT"
requires-python = ">=3.10"
dependencies = [
"httpx>=0.28.1",
"markdownify>=1.1.0",
"mcp[cli]>=1.4.1",
"pyyaml>=6.0.1",
]
[project.scripts]
mcpdoc = "mcpdoc.cli:main"
[dependency-groups]
test = [
"pytest>=8.3.4",
"pytest-asyncio>=0.25.3",
"pytest-cov>=6.0.0",
"pytest-mock>=3.14.0",
"pytest-socket>=0.7.0",
"pytest-timeout>=2.3.1",
"ruff>=0.9.7",
]
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.pytest.ini_options]
minversion = "8.0"
# -ra: Report all extra test outcomes (passed, skipped, failed, etc.)
# -q: Enable quiet mode for less cluttered output
# -v: Enable verbose output to display detailed test names and statuses
# --durations=5: Show the 10 slowest tests after the run (useful for performance tuning)
addopts = "-ra -q -v --durations=5"
testpaths = [
"tests",
]
python_files = ["test_*.py"]
python_functions = ["test_*"]
asyncio_mode = "auto"
asyncio_default_fixture_loop_scope = "function"
```
--------------------------------------------------------------------------------
/.github/workflows/_lint.yml:
--------------------------------------------------------------------------------
```yaml
name: lint
on:
workflow_call:
inputs:
working-directory:
required: true
type: string
description: "From which folder this pipeline executes"
python-version:
required: true
type: string
description: "Python version to use"
env:
WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}
# This env var allows us to get inline annotations when ruff has complaints.
RUFF_OUTPUT_FORMAT: github
UV_FROZEN: "true"
jobs:
build:
name: "make lint #${{ inputs.python-version }}"
runs-on: ubuntu-latest
timeout-minutes: 20
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ inputs.python-version }} + uv
uses: "./.github/actions/uv_setup"
with:
python-version: ${{ inputs.python-version }}
- name: Install dependencies
working-directory: ${{ inputs.working-directory }}
run: |
uv sync --group test
- name: Analysing the code with our lint
working-directory: ${{ inputs.working-directory }}
run: |
make lint
```
--------------------------------------------------------------------------------
/mcpdoc/langgraph.py:
--------------------------------------------------------------------------------
```python
"""A server for just langgraph docs from langchain-ai.github.io.
This is used as a way to test the doc functionality via MCP.
"""
# /usr/bin/env python3
import httpx
from markdownify import markdownify
from mcp.server.fastmcp import FastMCP
server = FastMCP(name="llms-txt")
ALLOWED_PREFIX = "https://langchain-ai.github.io/"
HTTPX_CLIENT = httpx.AsyncClient(follow_redirects=False)
@server.tool()
async def get_docs(url: str = "overview") -> str:
"""Get langgraph docs.
Always fetch the `overview` prior to fetching any other URLs as it will provide a
list of available URLs.
Args:
url: The URL to fetch. Must start with https://langchain-ai.github.io/
or be "overview".
"""
if url == "overview":
url = "https://langchain-ai.github.io/langgraph/llms.txt"
if not url.startswith(ALLOWED_PREFIX):
return (
"Error: Invalid url. Must start with https://langchain-ai.github.io/ "
'or be "overview"'
)
response = await HTTPX_CLIENT.get(url)
response.raise_for_status()
if response.status_code == 200:
# Convert HTML to markdown
markdown_content = markdownify(response.text)
return markdown_content
else:
return "Encountered an error while fetching the URL."
if __name__ == "__main__":
server.run(transport="stdio")
```
--------------------------------------------------------------------------------
/.github/workflows/ci.yml:
--------------------------------------------------------------------------------
```yaml
---
name: Run CI Tests
on:
push:
branches: [ main ]
pull_request:
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
# If another push to the same PR or branch happens while this workflow is still running,
# cancel the earlier run in favor of the next run.
#
# There's no point in testing an outdated version of the code. GitHub only allows
# a limited number of job runners to be active at the same time, so it's better to cancel
# pointless jobs early so that more useful jobs can run sooner.
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
lint:
strategy:
matrix:
# Only lint on the min and max supported Python versions.
# It's extremely unlikely that there's a lint issue on any version in between
# that doesn't show up on the min or max versions.
#
# GitHub rate-limits how many jobs can be running at any one time.
# Starting new jobs is also relatively slow,
# so linting on fewer versions makes CI faster.
python-version:
- "3.12"
uses:
./.github/workflows/_lint.yml
with:
working-directory: .
python-version: ${{ matrix.python-version }}
secrets: inherit
test:
strategy:
matrix:
# Only lint on the min and max supported Python versions.
# It's extremely unlikely that there's a lint issue on any version in between
# that doesn't show up on the min or max versions.
#
# GitHub rate-limits how many jobs can be running at any one time.
# Starting new jobs is also relatively slow,
# so linting on fewer versions makes CI faster.
python-version:
- "3.10"
- "3.12"
uses:
./.github/workflows/_test.yml
with:
working-directory: .
python-version: ${{ matrix.python-version }}
secrets: inherit
```
--------------------------------------------------------------------------------
/tests/unit_tests/test_main.py:
--------------------------------------------------------------------------------
```python
"""Tests for mcpdoc.main module."""
import pytest
from mcpdoc.main import (
_get_fetch_description,
_is_http_or_https,
extract_domain,
)
def test_extract_domain() -> None:
"""Test extract_domain function."""
# Test with https URL
assert extract_domain("https://example.com/page") == "https://example.com/"
# Test with http URL
assert extract_domain("http://test.org/docs/index.html") == "http://test.org/"
# Test with URL that has port
assert extract_domain("https://localhost:8080/api") == "https://localhost:8080/"
# Check trailing slash
assert extract_domain("https://localhost:8080") == "https://localhost:8080/"
# Test with URL that has subdomain
assert extract_domain("https://docs.python.org/3/") == "https://docs.python.org/"
@pytest.mark.parametrize(
"url,expected",
[
("http://example.com", True),
("https://example.com", True),
("/path/to/file.txt", False),
("file:///path/to/file.txt", False),
(
"ftp://example.com",
False,
), # Not HTTP or HTTPS, even though it's not a local file
],
)
def test_is_http_or_https(url, expected):
"""Test _is_http_or_https function."""
assert _is_http_or_https(url) is expected
@pytest.mark.parametrize(
"has_local_sources,expected_substrings",
[
(True, ["local file path", "file://"]),
(False, ["URL to fetch"]),
],
)
def test_get_fetch_description(has_local_sources, expected_substrings):
"""Test _get_fetch_description function."""
description = _get_fetch_description(has_local_sources)
# Common assertions for both cases
assert "Fetch and parse documentation" in description
assert "Returns:" in description
# Specific assertions based on has_local_sources
for substring in expected_substrings:
if has_local_sources:
assert substring in description
else:
# For the False case, we only check that "local file path"
# and "file://" are NOT present
if substring in ["local file path", "file://"]:
assert substring not in description
```
--------------------------------------------------------------------------------
/.github/workflows/release.yml:
--------------------------------------------------------------------------------
```yaml
name: release
run-name: Release ${{ inputs.working-directory }} by @${{ github.actor }}
on:
workflow_call:
inputs:
working-directory:
required: true
type: string
description: "From which folder this pipeline executes"
workflow_dispatch:
inputs:
working-directory:
description: "From which folder this pipeline executes"
default: "."
dangerous-nonmain-release:
required: false
type: boolean
default: false
description: "Release from a non-main branch (danger!)"
env:
PYTHON_VERSION: "3.11"
UV_FROZEN: "true"
UV_NO_SYNC: "true"
jobs:
build:
if: github.ref == 'refs/heads/main' || inputs.dangerous-nonmain-release
environment: Scheduled testing
runs-on: ubuntu-latest
outputs:
pkg-name: ${{ steps.check-version.outputs.pkg-name }}
version: ${{ steps.check-version.outputs.version }}
steps:
- uses: actions/checkout@v4
- name: Set up Python + uv
uses: "./.github/actions/uv_setup"
with:
python-version: ${{ env.PYTHON_VERSION }}
# We want to keep this build stage *separate* from the release stage,
# so that there's no sharing of permissions between them.
# The release stage has trusted publishing and GitHub repo contents write access,
# and we want to keep the scope of that access limited just to the release job.
# Otherwise, a malicious `build` step (e.g. via a compromised dependency)
# could get access to our GitHub or PyPI credentials.
#
# Per the trusted publishing GitHub Action:
# > It is strongly advised to separate jobs for building [...]
# > from the publish job.
# https://github.com/pypa/gh-action-pypi-publish#non-goals
- name: Build project for distribution
run: uv build
- name: Upload build
uses: actions/upload-artifact@v4
with:
name: dist
path: ${{ inputs.working-directory }}/dist/
- name: Check Version
id: check-version
shell: python
working-directory: ${{ inputs.working-directory }}
run: |
import os
import tomllib
with open("pyproject.toml", "rb") as f:
data = tomllib.load(f)
pkg_name = data["project"]["name"]
version = data["project"]["version"]
with open(os.environ["GITHUB_OUTPUT"], "a") as f:
f.write(f"pkg-name={pkg_name}\n")
f.write(f"version={version}\n")
publish:
needs:
- build
runs-on: ubuntu-latest
permissions:
# This permission is used for trusted publishing:
# https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/
#
# Trusted publishing has to also be configured on PyPI for each package:
# https://docs.pypi.org/trusted-publishers/adding-a-publisher/
id-token: write
defaults:
run:
working-directory: ${{ inputs.working-directory }}
steps:
- uses: actions/checkout@v4
- name: Set up Python + uv
uses: "./.github/actions/uv_setup"
with:
python-version: ${{ env.PYTHON_VERSION }}
- uses: actions/download-artifact@v4
with:
name: dist
path: ${{ inputs.working-directory }}/dist/
- name: Publish package distributions to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
packages-dir: ${{ inputs.working-directory }}/dist/
verbose: true
print-hash: true
# Temp workaround since attestations are on by default as of gh-action-pypi-publish v1.11.0
attestations: false
mark-release:
needs:
- build
- publish
runs-on: ubuntu-latest
permissions:
# This permission is needed by `ncipollo/release-action` to
# create the GitHub release.
contents: write
defaults:
run:
working-directory: ${{ inputs.working-directory }}
steps:
- uses: actions/checkout@v4
- name: Set up Python + uv
uses: "./.github/actions/uv_setup"
with:
python-version: ${{ env.PYTHON_VERSION }}
- uses: actions/download-artifact@v4
with:
name: dist
path: ${{ inputs.working-directory }}/dist/
- name: Create Tag
uses: ncipollo/release-action@v1
with:
artifacts: "dist/*"
token: ${{ secrets.GITHUB_TOKEN }}
generateReleaseNotes: true
tag: ${{needs.build.outputs.pkg-name}}==${{ needs.build.outputs.version }}
body: ${{ needs.release-notes.outputs.release-body }}
commit: main
makeLatest: true
```
--------------------------------------------------------------------------------
/mcpdoc/cli.py:
--------------------------------------------------------------------------------
```python
#!/usr/bin/env python3
"""Command-line interface for mcp-llms-txt server."""
import argparse
import json
import sys
from typing import List, Dict
import yaml
from mcpdoc._version import __version__
from mcpdoc.main import create_server, DocSource
from mcpdoc.splash import SPLASH
class CustomFormatter(
argparse.RawDescriptionHelpFormatter, argparse.ArgumentDefaultsHelpFormatter
):
# Custom formatter to preserve epilog formatting while showing default values
pass
EPILOG = """
Examples:
# Directly specifying llms.txt URLs with optional names
mcpdoc --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt
# Using a local file (absolute or relative path)
mcpdoc --urls LocalDocs:/path/to/llms.txt --allowed-domains '*'
# Using a YAML config file
mcpdoc --yaml sample_config.yaml
# Using a JSON config file
mcpdoc --json sample_config.json
# Combining multiple documentation sources
mcpdoc --yaml sample_config.yaml --json sample_config.json --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt
# Using SSE transport with default host (127.0.0.1) and port (8000)
mcpdoc --yaml sample_config.yaml --transport sse
# Using SSE transport with custom host and port
mcpdoc --yaml sample_config.yaml --transport sse --host 0.0.0.0 --port 9000
# Using SSE transport with additional HTTP options
mcpdoc --yaml sample_config.yaml --follow-redirects --timeout 15 --transport sse --host localhost --port 8080
# Allow fetching from additional domains. The domains hosting the llms.txt files are always allowed.
mcpdoc --yaml sample_config.yaml --allowed-domains https://example.com/ https://another-example.com/
# Allow fetching from any domain
mcpdoc --yaml sample_config.yaml --allowed-domains '*'
"""
def parse_args() -> argparse.Namespace:
"""Parse command-line arguments."""
# Custom formatter to preserve epilog formatting
parser = argparse.ArgumentParser(
description="MCP LLMS-TXT Documentation Server",
formatter_class=CustomFormatter,
epilog=EPILOG,
)
# Allow combining multiple doc source methods
parser.add_argument(
"--yaml", "-y", type=str, help="Path to YAML config file with doc sources"
)
parser.add_argument(
"--json", "-j", type=str, help="Path to JSON config file with doc sources"
)
parser.add_argument(
"--urls",
"-u",
type=str,
nargs="+",
help="List of llms.txt URLs or file paths with optional names (format: 'url_or_path' or 'name:url_or_path')",
)
parser.add_argument(
"--follow-redirects",
action="store_true",
help="Whether to follow HTTP redirects",
)
parser.add_argument(
"--allowed-domains",
type=str,
nargs="*",
help="Additional allowed domains to fetch documentation from. Use '*' to allow all domains.",
)
parser.add_argument(
"--timeout", type=float, default=10.0, help="HTTP request timeout in seconds"
)
parser.add_argument(
"--transport",
type=str,
default="stdio",
choices=["stdio", "sse"],
help="Transport protocol for MCP server",
)
parser.add_argument(
"--log-level",
type=str,
default="INFO",
help=(
"Log level for the server. Use one on the following: DEBUG, INFO, "
"WARNING, ERROR."
" (only used with --transport sse)"
),
)
# SSE-specific options
parser.add_argument(
"--host",
type=str,
default="127.0.0.1",
help="Host to bind the server to (only used with --transport sse)",
)
parser.add_argument(
"--port",
type=int,
default=8000,
help="Port to bind the server to (only used with --transport sse)",
)
# Version information
parser.add_argument(
"--version",
"-V",
action="version",
version=f"mcpdoc {__version__}",
help="Show version information and exit",
)
return parser.parse_args()
def load_config_file(file_path: str, file_format: str) -> List[Dict[str, str]]:
"""Load configuration from a file.
Args:
file_path: Path to the config file
file_format: Format of the config file ("yaml" or "json")
Returns:
List of doc source configurations
"""
try:
with open(file_path, "r", encoding="utf-8") as file:
if file_format.lower() == "yaml":
config = yaml.safe_load(file)
elif file_format.lower() == "json":
config = json.load(file)
else:
raise ValueError(f"Unsupported file format: {file_format}")
if not isinstance(config, list):
raise ValueError("Config file must contain a list of doc sources")
return config
except (FileNotFoundError, yaml.YAMLError, json.JSONDecodeError) as e:
print(f"Error loading config file: {e}", file=sys.stderr)
sys.exit(1)
def create_doc_sources_from_urls(urls: List[str]) -> List[DocSource]:
"""Create doc sources from a list of URLs or file paths with optional names.
Args:
urls: List of llms.txt URLs or file paths with optional names
(format: 'url_or_path' or 'name:url_or_path')
Returns:
List of DocSource objects
"""
doc_sources = []
for entry in urls:
if not entry.strip():
continue
if ":" in entry and not entry.startswith(("http:", "https:")):
# Format is name:url
name, url = entry.split(":", 1)
doc_sources.append({"name": name, "llms_txt": url})
else:
# Format is just url
doc_sources.append({"llms_txt": entry})
return doc_sources
def main() -> None:
"""Main entry point for the CLI."""
# Check if any arguments were provided
if len(sys.argv) == 1:
# No arguments, print help
# Use the same custom formatter as parse_args()
help_parser = argparse.ArgumentParser(
description="MCP LLMS-TXT Documentation Server",
formatter_class=CustomFormatter,
epilog=EPILOG,
)
# Add version to help parser too
help_parser.add_argument(
"--version",
"-V",
action="version",
version=f"mcpdoc {__version__}",
help="Show version information and exit",
)
help_parser.print_help()
sys.exit(0)
args = parse_args()
# Load doc sources based on command-line arguments
doc_sources: List[DocSource] = []
# Check if any source options were provided
if not (args.yaml or args.json or args.urls):
print(
"Error: At least one source option (--yaml, --json, or --urls) is required",
file=sys.stderr,
)
sys.exit(1)
# Merge doc sources from all provided methods
if args.yaml:
doc_sources.extend(load_config_file(args.yaml, "yaml"))
if args.json:
doc_sources.extend(load_config_file(args.json, "json"))
if args.urls:
doc_sources.extend(create_doc_sources_from_urls(args.urls))
# Only used with SSE transport
settings = {
"host": args.host,
"port": args.port,
"log_level": "INFO",
}
# Create and run the server
server = create_server(
doc_sources,
follow_redirects=args.follow_redirects,
timeout=args.timeout,
settings=settings,
allowed_domains=args.allowed_domains,
)
if args.transport == "sse":
print()
print(SPLASH)
print()
print(
f"Launching MCPDOC server with {len(doc_sources)} doc sources",
)
# Pass transport-specific options
server.run(transport=args.transport)
if __name__ == "__main__":
main()
```
--------------------------------------------------------------------------------
/mcpdoc/main.py:
--------------------------------------------------------------------------------
```python
"""MCP Llms-txt server for docs."""
import os
import re
from urllib.parse import urlparse, urljoin
import httpx
from markdownify import markdownify
from mcp.server.fastmcp import FastMCP
from typing_extensions import NotRequired, TypedDict
class DocSource(TypedDict):
"""A source of documentation for a library or a package."""
name: NotRequired[str]
"""Name of the documentation source (optional)."""
llms_txt: str
"""URL to the llms.txt file or documentation source."""
description: NotRequired[str]
"""Description of the documentation source (optional)."""
def extract_domain(url: str) -> str:
"""Extract domain from URL.
Args:
url: Full URL
Returns:
Domain with scheme and trailing slash (e.g., https://example.com/)
"""
parsed = urlparse(url)
return f"{parsed.scheme}://{parsed.netloc}/"
def _is_http_or_https(url: str) -> bool:
"""Check if the URL is an HTTP or HTTPS URL."""
return url.startswith(("http:", "https:"))
def _get_fetch_description(has_local_sources: bool) -> str:
"""Get fetch docs tool description."""
description = [
"Fetch and parse documentation from a given URL or local file.",
"",
"Use this tool after list_doc_sources to:",
"1. First fetch the llms.txt file from a documentation source",
"2. Analyze the URLs listed in the llms.txt file",
"3. Then fetch specific documentation pages relevant to the user's question",
"",
]
if has_local_sources:
description.extend(
[
"Args:",
" url: The URL or file path to fetch documentation from. Can be:",
" - URL from an allowed domain",
" - A local file path (absolute or relative)",
" - A file:// URL (e.g., file:///path/to/llms.txt)",
]
)
else:
description.extend(
[
"Args:",
" url: The URL to fetch documentation from.",
]
)
description.extend(
[
"",
"Returns:",
" The fetched documentation content converted to markdown, or an error message", # noqa: E501
" if the request fails or the URL is not from an allowed domain.",
]
)
return "\n".join(description)
def _normalize_path(path: str) -> str:
"""Accept paths in file:/// or relative format and map to absolute paths."""
return (
os.path.abspath(path[7:])
if path.startswith("file://")
else os.path.abspath(path)
)
def _get_server_instructions(doc_sources: list[DocSource]) -> str:
"""Generate server instructions with available documentation source names."""
# Extract source names from doc_sources
source_names = []
for entry in doc_sources:
if "name" in entry:
source_names.append(entry["name"])
elif _is_http_or_https(entry["llms_txt"]):
# Use domain name as fallback for HTTP sources
domain = extract_domain(entry["llms_txt"])
source_names.append(domain.rstrip("/").split("//")[-1])
else:
# Use filename as fallback for local sources
source_names.append(os.path.basename(entry["llms_txt"]))
instructions = [
"Use the list_doc_sources tool to see available documentation sources.",
"This tool will return a URL for each documentation source.",
]
if source_names:
if len(source_names) == 1:
instructions.append(
f"Documentation URLs are available from this tool "
f"for {source_names[0]}."
)
else:
names_str = ", ".join(source_names[:-1]) + f", and {source_names[-1]}"
instructions.append(
f"Documentation URLs are available from this tool for {names_str}."
)
instructions.extend(
[
"",
"Once you have a source documentation URL, use the fetch_docs tool "
"to get the documentation contents. ",
"If the documentation contents contains a URL for additional documentation "
"that is relevant to your task, you can use the fetch_docs tool to "
"fetch documentation from that URL next.",
]
)
return "\n".join(instructions)
def create_server(
doc_sources: list[DocSource],
*,
follow_redirects: bool = False,
timeout: float = 10,
settings: dict | None = None,
allowed_domains: list[str] | None = None,
) -> FastMCP:
"""Create the server and generate documentation retrieval tools.
Args:
doc_sources: List of documentation sources to make available
follow_redirects: Whether to follow HTTP redirects when fetching docs
timeout: HTTP request timeout in seconds
settings: Additional settings to pass to FastMCP
allowed_domains: Additional domains to allow fetching from.
Use ['*'] to allow all domains
The domain hosting the llms.txt file is always appended to the list
of allowed domains.
Returns:
A FastMCP server instance configured with documentation tools
"""
settings = settings or {}
server = FastMCP(
name="llms-txt",
instructions=_get_server_instructions(doc_sources),
**settings,
)
httpx_client = httpx.AsyncClient(follow_redirects=follow_redirects, timeout=timeout)
local_sources = []
remote_sources = []
for entry in doc_sources:
url = entry["llms_txt"]
if _is_http_or_https(url):
remote_sources.append(entry)
else:
local_sources.append(entry)
# Let's verify that all local sources exist
for entry in local_sources:
path = entry["llms_txt"]
abs_path = _normalize_path(path)
if not os.path.exists(abs_path):
raise FileNotFoundError(f"Local file not found: {abs_path}")
# Parse the domain names in the llms.txt URLs and identify local file paths
domains = set(extract_domain(entry["llms_txt"]) for entry in remote_sources)
# Add additional allowed domains if specified, or set to '*' if we have local files
if allowed_domains:
if "*" in allowed_domains:
domains = {"*"} # Special marker for allowing all domains
else:
domains.update(allowed_domains)
allowed_local_files = set(
_normalize_path(entry["llms_txt"]) for entry in local_sources
)
@server.tool()
def list_doc_sources() -> str:
"""List all available documentation sources.
This is the first tool you should call in the documentation workflow.
It provides URLs to llms.txt files or local file paths that the user has made available.
Returns:
A string containing a formatted list of documentation sources with their URLs or file paths
"""
content = ""
for entry_ in doc_sources:
url_or_path = entry_["llms_txt"]
if _is_http_or_https(url_or_path):
name = entry_.get("name", extract_domain(url_or_path))
content += f"{name}\nURL: {url_or_path}\n\n"
else:
path = _normalize_path(url_or_path)
name = entry_.get("name", path)
content += f"{name}\nPath: {path}\n\n"
return content
fetch_docs_description = _get_fetch_description(
has_local_sources=bool(local_sources)
)
@server.tool(description=fetch_docs_description)
async def fetch_docs(url: str) -> str:
nonlocal domains, follow_redirects
url = url.strip()
# Handle local file paths (either as file:// URLs or direct filesystem paths)
if not _is_http_or_https(url):
abs_path = _normalize_path(url)
if abs_path not in allowed_local_files:
raise ValueError(
f"Local file not allowed: {abs_path}. Allowed files: {allowed_local_files}"
)
try:
with open(abs_path, "r", encoding="utf-8") as f:
content = f.read()
return markdownify(content)
except Exception as e:
return f"Error reading local file: {str(e)}"
else:
# Otherwise treat as URL
if "*" not in domains and not any(
url.startswith(domain) for domain in domains
):
return (
"Error: URL not allowed. Must start with one of the following domains: "
+ ", ".join(domains)
)
try:
response = await httpx_client.get(url, timeout=timeout)
response.raise_for_status()
content = response.text
if follow_redirects:
# Check for meta refresh tag which indicates a client-side redirect
match = re.search(
r'<meta http-equiv="refresh" content="[^;]+;\s*url=([^"]+)"',
content,
re.IGNORECASE,
)
if match:
redirect_url = match.group(1)
new_url = urljoin(str(response.url), redirect_url)
if "*" not in domains and not any(
new_url.startswith(domain) for domain in domains
):
return (
"Error: Redirect URL not allowed. Must start with one of the following domains: "
+ ", ".join(domains)
)
response = await httpx_client.get(new_url, timeout=timeout)
response.raise_for_status()
content = response.text
return markdownify(content)
except (httpx.HTTPStatusError, httpx.RequestError) as e:
return f"Encountered an HTTP error: {str(e)}"
return server
```