tcpipuk/mcp-server # codebase.md

# Directory Structure

```
├── .github
│   ├── dependabot.yaml
│   └── workflows
│       └── build.yml
├── .gitignore
├── docker-compose.yml
├── docs
│   ├── images
│   │   └── web-usage.png
│   ├── search.md
│   └── web.md
├── LICENSE
├── README.md
└── server
    ├── .dockerignore
    ├── .python-version
    ├── Dockerfile
    ├── entrypoint.sh
    ├── mcp_server
    │   ├── __init__.py
    │   ├── __main__.py
    │   ├── server.py
    │   └── tools
    │       ├── __init__.py
    │       ├── helpers.py
    │       ├── search.py
    │       └── web.py
    ├── pyproject.toml
    ├── tests
    │   ├── __init__.py
    │   ├── conftest.py
    │   ├── test_server.py
    │   └── test_web.py
    ├── tools.yaml
    └── uv.lock
```

# Files

--------------------------------------------------------------------------------
/server/.python-version:
--------------------------------------------------------------------------------

```
1 | 3.13
2 | 
```

--------------------------------------------------------------------------------
/server/.dockerignore:
--------------------------------------------------------------------------------

```
 1 | .dockerignore
 2 | .git
 3 | .github
 4 | .gitignore
 5 | .venv
 6 | .ruff_cache
 7 | .pytest_cache
 8 | __pycache__
 9 | docs
10 | Dockerfile
11 | 
```

--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------

```
 1 | # Byte-compiled / optimized / DLL files
 2 | __pycache__/
 3 | .ruff_cache/
 4 | 
 5 | # Environments
 6 | .env
 7 | .venv
 8 | env/
 9 | venv/
10 | ENV/
11 | env.bak/
12 | venv.bak/
13 | 
14 | # Temporary logs
15 | *.log
16 | 
17 | # Sphinx documentation
18 | docs/_build/
19 | 
```

--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------

```markdown
  1 | # MCP Server
  2 | 
  3 | Give your AI assistants the power to help you more effectively. This server lets them safely access
  4 | websites and search the web - with clear feedback about what's happening and helpful error messages
  5 | when things go wrong.
  6 | 
  7 | - [🛠️ What tools does this server offer?](#️-what-tools-does-this-server-offer)
  8 | - [🏎️ How can I run it?](#️-how-can-i-run-it)
  9 |   - [🐋 Using Docker (recommended)](#-using-docker-recommended)
 10 |   - [💻 Running locally](#-running-locally)
 11 | - [🔌 How to connect](#-how-to-connect)
 12 | - [📚 Learn more about MCP](#-learn-more-about-mcp)
 13 | - [📄 License](#-license)
 14 | 
 15 | ## 🛠️ What tools does this server offer?
 16 | 
 17 | The server provides two powerful tools that help AI assistants solve real-world problems:
 18 | 
 19 | | Tool               | What it can do                                                              |
 20 | | ------------------ | --------------------------------------------------------------------------- |
 21 | | [Search](docs/search.md) | Search the web via SearXNG for current information, specific resources, or to perform calculations. |
 22 | | [Web](docs/web.md) | Access websites and process their content. Can convert pages to markdown for easy reading, get the raw content, or extract links. |
 23 | 
 24 | ## 🏎️ How can I run it?
 25 | 
 26 | ### 🐋 Using Docker (recommended)
 27 | 
 28 | The server runs in Docker containers to keep things safe and simple. Here's how to get started:
 29 | 
 30 | 1. [Install Docker](https://docs.docker.com/engine/install/) if you haven't already
 31 | 2. Create a file called `docker-compose.yml` with:
 32 | 
 33 |    ```yaml:docker-compose.yml
 34 |    services:
 35 |      mcp-server:
 36 |        environment:
 37 |          # Required: URL for your SearXNG instance's Search API
 38 |          - SEARXNG_QUERY_URL=http://searxng:8080
 39 |          # Optional: Configure network mode (SSE) for LibreChat etc.
 40 |          - SSE_HOST=0.0.0.0
 41 |          - SSE_PORT=8080
 42 |          # Optional: Set a custom User-Agent for web requests
 43 |          - USER_AGENT=MCP-Server/1.0 (github.com/tcpipuk/mcp-server)
 44 |        image: ghcr.io/tcpipuk/mcp-server/server:latest
 45 |        ports: # Only needed if using SSE_HOST/SSE_PORT
 46 |          - "8080:8080" # Expose port 8080 on host
 47 |        restart: unless-stopped
 48 |        stop_grace_period: 1s
 49 | 
 50 |      # Example SearXNG service (optional, adapt as needed)
 51 |      # searxng:
 52 |      #   environment:
 53 |      #     - SEARXNG_BASE_URL=http://searxng:8080 # Ensure SearXNG knows its own URL
 54 |      #   image: searxng/searxng:latest
 55 |      #   restart: unless-stopped
 56 |      #   volumes:
 57 |      #     - ./searxng:/etc/searxng:rw
 58 |    ```
 59 | 
 60 |    > **Important**: You *must* provide the `SEARXNG_QUERY_URL` environment variable, pointing to
 61 |    > the Search API endpoint of your SearXNG instance (usually ending in `/` or `/search`).
 62 |    >
 63 |    > Setting `SSE_HOST` and `SSE_PORT` enables network mode (Server-Sent Events), recommended for
 64 |    > multi-container setups like LibreChat. If omitted, the server uses standard I/O.
 65 | 
 66 | 3. Run `docker compose up -d` to start the server container (and optionally SearXNG).
 67 | 
 68 | Most people use this with either:
 69 | 
 70 | - [Claude Desktop](https://modelcontextprotocol.io/quickstart/user) - connects directly via stdio
 71 |   (omit `SSE_HOST`/`SSE_PORT` in `docker-compose.yml`).
 72 | - [LibreChat](https://www.librechat.ai/docs/local) - connects over the network via SSE.
 73 | 
 74 | For LibreChat, add this to your `librechat.yaml` (assuming `SSE_PORT=8080`):
 75 | 
 76 | ```yaml:librechat.yaml
 77 | mcpServers:
 78 |   mcp-server:
 79 |     iconPath: "/path/to/icon.png" # Optional: Custom icon
 80 |     label: "MCP Web/Search" # Optional: Custom label shown in UI
 81 |     type: sse
 82 |     url: http://mcp-server:8080/sse # Adjust host/port if needed
 83 | ```
 84 | 
 85 | ### 💻 Running locally
 86 | 
 87 | 1. Install `uv` (requires Python 3.13+):
 88 | 
 89 |    ```bash
 90 |    curl -LsSf https://astral.sh/uv/install.sh | sh
 91 |    ```
 92 | 
 93 |    > **Note:** If you already have `uv` installed, update it with `uv self update`.
 94 | 
 95 | 2. Create and activate a virtual environment:
 96 | 
 97 |    ```bash
 98 |    uv venv
 99 |    source .venv/bin/activate  # Linux/macOS
100 |    # or
101 |    .venv\Scripts\activate     # Windows
102 |    ```
103 | 
104 | 3. Install dependencies from the lockfile:
105 | 
106 |    ```bash
107 |    uv sync
108 |    ```
109 | 
110 | 4. Set required environment variables:
111 | 
112 |    ```bash
113 |    # Required: URL for your SearXNG instance's Search API
114 |    export SEARXNG_QUERY_URL="http://your-searxng-instance.local:8080"
115 |    # Optional: Custom User-Agent
116 |    export USER_AGENT="CustomAgent/1.0"
117 |    ```
118 | 
119 | 5. Run the server:
120 | 
121 |    ```bash
122 |    # For network (SSE) mode (e.g., for LibreChat)
123 |    mcp-server --sse-host 0.0.0.0 --sse-port 3001
124 | 
125 |    # For direct stdio mode (e.g., for Claude Desktop)
126 |    mcp-server
127 |    ```
128 | 
129 | Available arguments:
130 | 
131 | - `--sse-host`: SSE listening address (e.g., `0.0.0.0`). Enables SSE mode.
132 | - `--sse-port`: SSE listening port (e.g., `3001`). Enables SSE mode.
133 | - `--user-agent`: Custom User-Agent string (overrides `USER_AGENT` env var).
134 | 
135 | > **Note**: If neither `--sse-host` nor `--sse-port` are provided (and `SSE_HOST`/`SSE_PORT` env
136 | > vars are not set), the server defaults to `stdio` mode. The `SEARXNG_QUERY_URL` environment
137 | > variable is *always* required.
138 | 
139 | ## 🔌 How to connect
140 | 
141 | You can connect to the server in two ways:
142 | 
143 | | Method                    | What it means                                           | When to use it                                  |
144 | | ------------------------- | ------------------------------------------------------- | ----------------------------------------------- |
145 | | Network connection (SSE)  | The server listens on a network port for connections.   | Best for LibreChat or other networked clients.  |
146 | | Direct connection (stdio) | The server communicates directly via standard input/out. | Useful for local testing or Claude Desktop. |
147 | 
148 | ## 📚 Learn more about MCP
149 | 
150 | Here are a few resources to get you started:
151 | 
152 | - [MCP Specification](https://spec.modelcontextprotocol.io/)
153 | - [MCP Python SDK](https://github.com/modelcontextprotocol/python-sdk)
154 | - [MCP Example Servers](https://github.com/modelcontextprotocol/servers)
155 | 
156 | ## 📄 License
157 | 
158 | This project is licensed under the GPLv3. See the [LICENSE](LICENSE) file for full details.
159 | 
```

--------------------------------------------------------------------------------
/server/tests/__init__.py:
--------------------------------------------------------------------------------

```python
1 | """Test suite for MCP Server."""
2 | 
```

--------------------------------------------------------------------------------
/server/entrypoint.sh:
--------------------------------------------------------------------------------

```bash
 1 | #!/bin/bash
 2 | set -e
 3 | 
 4 | # Then run the main command
 5 | if [ "$BUILD_ENV" = "dev" ]; then
 6 |     pytest -v --log-cli-level=INFO tests/
 7 | else
 8 |     exec mcp-server
 9 | fi
10 | 
```

--------------------------------------------------------------------------------
/.github/dependabot.yaml:
--------------------------------------------------------------------------------

```yaml
 1 | version: 2
 2 | updates:
 3 |   - package-ecosystem: "pip"
 4 |     directory: "/"
 5 |     schedule:
 6 |       interval: "weekly"
 7 | 
 8 |   - package-ecosystem: "docker"
 9 |     directory: "/"
10 |     schedule:
11 |       interval: "weekly"
12 | 
13 |   - package-ecosystem: "github-actions"
14 |     directory: "/"
15 |     schedule:
16 |       interval: "weekly"
17 | 
```

--------------------------------------------------------------------------------
/server/mcp_server/__init__.py:
--------------------------------------------------------------------------------

```python
 1 | """MCP Fetch Server module for handling web content retrieval.
 2 | 
 3 | This module provides HTTP fetching capabilities for the Model Context Protocol (MCP) framework,
 4 | allowing models to retrieve and process web content in a controlled manner.
 5 | """
 6 | 
 7 | from __future__ import annotations
 8 | 
 9 | from .__main__ import main
10 | from .server import MCPServer
11 | 
12 | __all__ = ["MCPServer", "main"]
13 | 
```

--------------------------------------------------------------------------------
/server/mcp_server/tools/__init__.py:
--------------------------------------------------------------------------------

```python
 1 | """Tools submodule package for mcp_server.
 2 | 
 3 | Provides tools that let AI assistants safely interact with external systems:
 4 | 
 5 | - search: Use SearXNG's search API to find information on the web
 6 | - web: Access and process web content with support for markdown conversion and link extraction
 7 | 
 8 | Each tool is designed to handle errors gracefully and provide clear feedback to help AI
 9 | assistants solve problems independently.
10 | """
11 | 
12 | from __future__ import annotations
13 | 
14 | from .search import tool_search
15 | from .web import tool_web
16 | 
17 | __all__ = ["tool_search", "tool_web"]
18 | 
```

--------------------------------------------------------------------------------
/docker-compose.yml:
--------------------------------------------------------------------------------

```yaml
 1 | services:
 2 |   mcp-server:
 3 |     build:
 4 |       context: ./server
 5 |       dockerfile: Dockerfile
 6 |     environment:
 7 |       - SSE_HOST=0.0.0.0
 8 |       - SSE_PORT=8080
 9 |       - SANDBOX_SOCKET=/run/sandbox/shell.sock
10 |       - USER_AGENT=CustomAgent/1.0
11 |     volumes:
12 |       - sandbox_sockets:/run/sandbox
13 |     image: ghcr.io/tcpipuk/mcp-server:latest
14 |     networks:
15 |       - mcp_net
16 |     restart: unless-stopped
17 |     stop_grace_period: 1s
18 | 
19 |   sandbox:
20 |     build:
21 |       context: ./sandbox
22 |       dockerfile: Dockerfile
23 |     environment:
24 |       - SANDBOX_SOCKET=/run/sandbox/shell.sock
25 |     image: ghcr.io/tcpipuk/mcp-sandbox:latest
26 |     volumes:
27 |       - sandbox_home:/home/sandbox
28 |       - sandbox_sockets:/run/sandbox
29 |     networks:
30 |       - mcp_net
31 |     restart: unless-stopped
32 | 
33 | volumes:
34 |   sandbox_home:
35 | 
```

--------------------------------------------------------------------------------
/server/tests/conftest.py:
--------------------------------------------------------------------------------

```python
 1 | """Configure pytest for the test suite."""
 2 | 
 3 | from __future__ import annotations
 4 | 
 5 | from asyncio import create_subprocess_exec, sleep as asyncio_sleep
 6 | from os import setsid as os_setsid
 7 | from typing import TYPE_CHECKING
 8 | 
 9 | import pytest
10 | import pytest_asyncio
11 | 
12 | if TYPE_CHECKING:
13 |     from collections.abc import AsyncGenerator
14 | 
15 | 
16 | @pytest.fixture(autouse=True)
17 | def _setup_test_env() -> None:
18 |     """Set up test environment variables and cleanup."""
19 | 
20 | 
21 | @pytest_asyncio.fixture
22 | async def sandbox_server(unused_tcp_port: int) -> AsyncGenerator[tuple[str, int]]:
23 |     """Create a socat-based TCP server for sandbox testing.
24 | 
25 |     Yields:
26 |         Tuple of (host, port) for the test server
27 |     """
28 |     # Start socat in the background, echoing input back
29 |     process = await create_subprocess_exec(
30 |         "/usr/bin/socat",
31 |         f"TCP-LISTEN:{unused_tcp_port},reuseaddr,fork",
32 |         "EXEC:'bash -i',pty,stderr,setsid,sigint,sane",
33 |         preexec_fn=os_setsid,
34 |     )
35 | 
36 |     # Give socat a moment to start up
37 |     await asyncio_sleep(0.2)
38 | 
39 |     try:
40 |         yield "127.0.0.1", unused_tcp_port
41 |     finally:
42 |         process.terminate()
43 |         await process.wait()
44 | 
```

--------------------------------------------------------------------------------
/server/Dockerfile:
--------------------------------------------------------------------------------

```dockerfile
 1 | # Build stage using uv with a frozen lockfile and dependency caching
 2 | FROM ghcr.io/astral-sh/uv:python3.13-bookworm-slim AS uv
 3 | WORKDIR /app
 4 | 
 5 | # Enable bytecode compilation and copy mode
 6 | ENV UV_COMPILE_BYTECODE=1 \
 7 |   UV_LINK_MODE=copy
 8 | 
 9 | # Install dependencies using the lockfile and settings
10 | COPY pyproject.toml uv.lock ./
11 | RUN --mount=type=cache,target=/root/.cache/uv \
12 |   uv sync --frozen --no-install-project ${BUILD_ENV:+"--dev"} --no-editable
13 | 
14 | # Add the source code and install main project dependencies
15 | COPY . .
16 | RUN --mount=type=cache,target=/root/.cache/uv \
17 |   uv sync --frozen ${BUILD_ENV:+"--dev"} --no-editable
18 | 
19 | # Prepare runtime image
20 | FROM python:3.13-slim-bookworm AS runtime
21 | WORKDIR /app
22 | 
23 | # Set default build environment
24 | ARG BUILD_ENV=prod
25 | 
26 | # Install minimal system dependencies and create runtime user
27 | RUN apt-get update \
28 |   && apt-get install -y --no-install-recommends socat \
29 |   && rm -rf /var/lib/apt/lists/* \
30 |   && groupadd -g 1000 appuser \
31 |   && useradd -u 1000 -g 1000 -m appuser
32 | 
33 | # Copy only necessary files from build stage
34 | COPY --from=uv --chown=appuser:appuser /app/ .
35 | 
36 | # Switch to non-root user
37 | 
38 | # Set environment variables for runtime
39 | USER appuser
40 | ENV PATH="/app/.venv/bin:$PATH" \
41 |   PYTHONDONTWRITEBYTECODE=1 \
42 |   PYTHONUNBUFFERED=1
43 | 
44 | # Use wrapper script to handle startup
45 | ENTRYPOINT ["/app/entrypoint.sh"]
46 | 
```

--------------------------------------------------------------------------------
/server/pyproject.toml:
--------------------------------------------------------------------------------

```toml
 1 | [project]
 2 | name = "mcp-server"
 3 | version = "0.1.0"
 4 | description = "Provides tools to clients over the Model Context Protocol, supporting both stdio and SSE"
 5 | requires-python = ">=3.13"
 6 | authors = [{ name = "Tom Foster" }]
 7 | maintainers = [{ name = "Tom Foster", email = "[email protected]" }]
 8 | keywords = ["http", "mcp", "llm", "automation"]
 9 | license = { text = "GPLv3" }
10 | classifiers = [
11 |   "Development Status :: 4 - Beta",
12 |   "Intended Audience :: Developers",
13 |   "License :: OSI Approved :: GNU General Public License v3 (GPLv3)",
14 |   "Programming Language :: Python :: 3",
15 |   "Programming Language :: Python :: 3.13",
16 | ]
17 | dependencies = [
18 |   "aiohttp>=3.11.12",
19 |   "beautifulsoup4>=4.13.3",
20 |   "mcp>=1.2.1",
21 |   "pyyaml>=6.0.2",
22 |   "trafilatura>=2.0.0",
23 |   "uvicorn>=0.34.0",
24 | ]
25 | 
26 | [project.scripts]
27 | mcp-server = "mcp_server:main"
28 | 
29 | [build-system]
30 | requires = ["hatchling"]
31 | build-backend = "hatchling.build"
32 | 
33 | [tool.pytest.ini_options]
34 | addopts = "-ra -v"
35 | asyncio_mode = "strict"
36 | asyncio_default_fixture_loop_scope = "session"
37 | cache_dir = "/tmp/.pytest_cache"
38 | filterwarnings = [
39 |   "ignore:assertions not in test modules or plugins will be ignored:pytest.PytestConfigWarning",
40 | ]
41 | testpaths = "tests"
42 | 
43 | [tool.ruff]
44 | cache-dir = "/tmp/.cache/ruff"
45 | fix = true
46 | line-length = 110
47 | target-version = "py313"
48 | unsafe-fixes = true
49 | 
50 | [tool.ruff.format]
51 | skip-magic-trailing-comma = true
52 | 
53 | [tool.ruff.lint]
54 | select = ["ALL"]
55 | ignore = ["COM812", "CPY", "D203", "D213", "FBT", "RUF029"]
56 | 
57 | [tool.ruff.lint.isort]
58 | combine-as-imports = true
59 | required-imports = ["from __future__ import annotations"]
60 | split-on-trailing-comma = false
61 | 
62 | [tool.ruff.lint.per-file-ignores]
63 | "tests/*" = ["ARG001"]
64 | 
65 | [tool.ruff.lint.pydocstyle]
66 | convention = "google"
67 | 
68 | [tool.uv]
69 | dev-dependencies = [
70 |   "psutil>=7.0.0",
71 |   "pytest>=8.3.4",
72 |   "pytest-asyncio>=0.25.3",
73 |   "ruff>=0.9.6",
74 | ]
75 | 
```

--------------------------------------------------------------------------------
/server/mcp_server/__main__.py:
--------------------------------------------------------------------------------

```python
 1 | """Command-line entry point for the MCP fetch server.
 2 | 
 3 | Provides configuration options for running the fetch server, including customisation
 4 | of the User-Agent string for HTTP requests. The server runs asynchronously to handle
 5 | concurrent requests efficiently.
 6 | """
 7 | 
 8 | from __future__ import annotations
 9 | 
10 | from argparse import ArgumentParser
11 | from asyncio import CancelledError, run as asyncio_run
12 | from contextlib import suppress as contextlib_suppress
13 | from os import environ as os_environ
14 | from pathlib import Path
15 | 
16 | from yaml import safe_load as yaml_safe_load
17 | 
18 | from .server import MCPServer
19 | from .tools import tool_search, tool_web
20 | 
21 | 
22 | def main() -> None:
23 |     """Provide command-line entrypoint for the MCP fetch server."""
24 |     parser = ArgumentParser(description="Give your LLM access to external tools")
25 |     parser.add_argument("--sandbox", type=str, help="TCP host:port pair (e.g. mcp-sandbox:8080)")
26 |     parser.add_argument("--sse-host", type=str, help="SSE listening address (e.g. 0.0.0.0)")
27 |     parser.add_argument("--sse-port", type=int, help="SSE listening port (e.g. 3001)")
28 |     parser.add_argument("--user-agent", type=str, help="Custom User-Agent string")
29 |     parser.add_argument("--searxng-query-url", type=str, help="URL for SearXNG search endpoint")
30 |     args = parser.parse_args()
31 | 
32 |     if args.sandbox:
33 |         os_environ["SANDBOX"] = args.sandbox
34 |     if args.sse_host:
35 |         os_environ["SSE_HOST"] = args.sse_host
36 |     if args.sse_port:
37 |         os_environ["SSE_PORT"] = str(args.sse_port)
38 |     if args.user_agent:
39 |         os_environ["USER_AGENT"] = args.user_agent
40 |     if args.searxng_query_url:
41 |         os_environ["SEARXNG_QUERY_URL"] = args.searxng_query_url
42 | 
43 |     config = yaml_safe_load(Path("tools.yaml").read_text(encoding="utf-8"))
44 |     config["tools"]["search"]["method"] = tool_search
45 |     config["tools"]["web"]["method"] = tool_web
46 |     # Remove the sandbox tool if there's no sandbox
47 |     if not os_environ.get("SANDBOX") and "sandbox" in config["tools"]:
48 |         del config["tools"]["sandbox"]
49 |     server = MCPServer(config)
50 |     with contextlib_suppress(KeyboardInterrupt, CancelledError):
51 |         asyncio_run(server.serve())
52 | 
53 | 
54 | if __name__ == "__main__":
55 |     main()
56 | 
```

--------------------------------------------------------------------------------
/docs/search.md:
--------------------------------------------------------------------------------

```markdown
 1 | # Search Tool
 2 | 
 3 | - [Capabilities](#capabilities)
 4 | - [Refining the Search](#refining-the-search)
 5 |   - [The Query (`q`)](#the-query-q)
 6 |   - [Filtering by Time (`time_range`)](#filtering-by-time-time_range)
 7 |   - [Content Safety (`safesearch`)](#content-safety-safesearch)
 8 | - [Technical Details](#technical-details)
 9 | 
10 | Provides the AI assistant with web search capabilities via a SearXNG instance. It allows the AI to
11 | fetch current information, look up specific resources, and perform other search-related tasks.
12 | 
13 | ## Capabilities
14 | 
15 | The tool enables the AI assistant to perform tasks requiring external information lookup, such as:
16 | 
17 | - Finding details on current events or recent developments.
18 | - Retrieving specific technical documentation or code examples.
19 | - Searching for various online content types (e.g., images, news).
20 | - Accessing specialised resources like scientific papers, package repositories (PyPI, npm), or Q&A
21 |   sites (Stack Exchange).
22 | - Using WolframAlpha for calculations or fetching random data (UUIDs, numbers).
23 | - Calculating text hashes.
24 | 
25 | ## Refining the Search
26 | 
27 | The AI can tailor searches using the available parameters:
28 | 
29 | ### The Query (`q`)
30 | 
31 | The primary search input. Supports standard queries and SearXNG's specific syntax:
32 | 
33 | - **Bang Prefixes (`!`):** Focuses the search on categories or engines (e.g. `!news`, `!images`,
34 |   `!it`, `!repos`, `!pypi`, `!wa`, `!re`). Prefixes can be chained (e.g., `!it !q&a python async`).
35 | - **Keywords (No `!`):** Executes specific actions like calculations (`avg 1 2 3`), random data
36 |   generation (`random uuid`), or hashing (`sha512 text`).
37 | 
38 | ### Filtering by Time (`time_range`)
39 | 
40 | Restricts results to a specific period (`day`, `month`, `year`), where supported by the underlying
41 | SearXNG engines.
42 | 
43 | ### Content Safety (`safesearch`)
44 | 
45 | Adjusts the filtering level for potentially explicit content: `0` (Off), `1` (Moderate - default),
46 | or `2` (Strict), engine permitting.
47 | 
48 | ## Technical Details
49 | 
50 | Key aspects of the tool's operation:
51 | 
52 | - **Backend:** Relies on the SearXNG instance specified by the server's `SEARXNG_QUERY_URL`
53 |   environment variable.
54 | - **Output Format:** Returns results exclusively in JSON format for straightforward parsing by the AI.
55 | - **Request Handling:** Uses the common `get_request` helper function (shared with the `web` tool)
56 |   for managing HTTP requests, including redirects, timeouts, and connection errors. Errors are
57 |   reported back to the AI.
58 | - **Parameter Exposure:** Only the parameters defined in `tools.yaml` (`q`, `time_range`,
59 |   `safesearch`) are available to the AI.
60 | 
61 | This tool gives the AI assistant a mechanism to query a SearXNG instance, enabling access to
62 | real-time web information and specialised search functions.
63 | 
```

--------------------------------------------------------------------------------
/.github/workflows/build.yml:
--------------------------------------------------------------------------------

```yaml
  1 | name: Build MCP Server
  2 | 
  3 | concurrency:
  4 |   group: ${{ github.workflow }}-${{ github.ref }}
  5 |   cancel-in-progress: true
  6 | 
  7 | on:
  8 |   pull_request:
  9 |   push:
 10 |     paths:
 11 |       - "server/**"
 12 |       - ".github/workflows/build-server.yml"
 13 |   workflow_dispatch:
 14 | 
 15 | permissions:
 16 |   contents: read
 17 |   packages: write
 18 |   pull-requests: write
 19 |   actions: write
 20 | 
 21 | jobs:
 22 |   test:
 23 |     name: Pytest
 24 |     runs-on: ubuntu-latest
 25 |     steps:
 26 |       - name: Checkout repository
 27 |         uses: actions/checkout@v5
 28 | 
 29 |       - name: Set up Docker Buildx
 30 |         uses: docker/setup-buildx-action@v3
 31 | 
 32 |       - name: Build test image
 33 |         uses: docker/build-push-action@v6
 34 |         env:
 35 |           DOCKER_BUILD_SUMMARY: false
 36 |           DOCKER_BUILD_RECORD_UPLOAD: false
 37 |         with:
 38 |           context: server
 39 |           load: true
 40 |           build-args: |
 41 |             BUILD_ENV=dev
 42 |           tags: mcp-server:test
 43 |           cache-from: type=gha
 44 |           cache-to: type=gha,mode=max
 45 | 
 46 |       - name: Run tests and output results
 47 |         run: |
 48 |           set -o pipefail
 49 |           docker run --rm -e BUILD_ENV=dev mcp-server:test | tee pytest_output.txt
 50 |           exit_code=${PIPESTATUS[0]}
 51 |           echo '```' >> "$GITHUB_STEP_SUMMARY"
 52 |           cat pytest_output.txt >> "$GITHUB_STEP_SUMMARY"
 53 |           echo '```' >> "$GITHUB_STEP_SUMMARY"
 54 |           exit $exit_code
 55 | 
 56 |   build:
 57 |     name: Docker build
 58 |     needs: test
 59 |     runs-on: ubuntu-latest
 60 |     steps:
 61 |       - name: Checkout repository
 62 |         uses: actions/checkout@v5
 63 | 
 64 |       - name: Set up Docker Buildx
 65 |         uses: docker/setup-buildx-action@v3
 66 | 
 67 |       - name: Generate Docker metadata
 68 |         id: meta
 69 |         uses: docker/metadata-action@v5
 70 |         with:
 71 |           context: workflow
 72 |           images: |
 73 |             name=ghcr.io/${{ github.repository }}/server
 74 |           tags: |
 75 |             type=raw,value=latest,enable={{is_default_branch}}
 76 |             type=ref,event=branch
 77 |             type=ref,event=pr
 78 |             type=ref,event=tag
 79 |             type=sha,enable={{is_default_branch}},prefix=${{ github.event_name == 'pull_request' && 'pr-' || github.ref_name }}-
 80 | 
 81 |       - name: Log in to GitHub Container Registry
 82 |         if: github.event_name != 'pull_request'
 83 |         uses: docker/login-action@v3
 84 |         with:
 85 |           registry: ghcr.io
 86 |           username: ${{ github.actor }}
 87 |           password: ${{ secrets.GITHUB_TOKEN }}
 88 | 
 89 |       - name: Build and push production image
 90 |         uses: docker/build-push-action@v6
 91 |         env:
 92 |           DOCKER_BUILD_RECORD_UPLOAD: false
 93 |         with:
 94 |           context: server
 95 |           push: ${{ github.event_name != 'pull_request' }}
 96 |           tags: ${{ steps.meta.outputs.tags }}
 97 |           labels: ${{ steps.meta.outputs.labels }}
 98 |           build-args: |
 99 |             BUILD_ENV=prod
100 |           cache-from: type=gha
101 |           cache-to: type=gha,mode=max
102 | 
```

--------------------------------------------------------------------------------
/server/mcp_server/tools/helpers.py:
--------------------------------------------------------------------------------

```python
 1 | """Helper functions for the MCP fetch server tools.
 2 | 
 3 | Provides shared utilities for error handling and web content retrieval:
 4 | 
 5 | - Error formatting: Consistent XML-style error messages for AI parsing
 6 | - HTTP client: Robust web content fetching with configurable User-Agent
 7 | - Error handling: Detailed error messages for common network issues
 8 | 
 9 | All functions handle errors gracefully and provide clear feedback to help
10 | AI assistants understand and resolve issues independently.
11 | """
12 | 
13 | from __future__ import annotations
14 | 
15 | from os import getenv as os_getenv
16 | 
17 | from aiohttp import (
18 |     ClientConnectionError,
19 |     ClientError,
20 |     ClientResponseError,
21 |     ClientSession as AiohttpClientSession,
22 |     ServerTimeoutError,
23 |     TooManyRedirects,
24 | )
25 | from mcp.shared.exceptions import McpError
26 | from mcp.types import INTERNAL_ERROR, ErrorData
27 | 
28 | 
29 | def add_error(text: str, error: str, append: bool = True) -> str:
30 |     """Append an error message to the string.
31 | 
32 |     Args:
33 |         text: The string to append the error to.
34 |         error: The error message to append.
35 |         append: Whether to append or prepend the error.
36 | 
37 |     Returns:
38 |         The string with the error message appended.
39 | 
40 |     """
41 |     return f"{text}\n\n<error>{error}</error>" if append else f"<error>{error}</error>\n\n{text}"
42 | 
43 | 
44 | async def get_request(url: str) -> str:
45 |     """Fetch content from a URL asynchronously.
46 | 
47 |     Args:
48 |         url: The URL to fetch.
49 | 
50 |     Returns:
51 |         The fetched content as a string.
52 | 
53 |     Raises:
54 |         McpError: If fetching or processing fails.
55 | 
56 |     """
57 |     errmsg: str = ""
58 |     try:
59 |         async with AiohttpClientSession(
60 |             headers={
61 |                 "User-Agent": os_getenv("USER_AGENT")
62 |                 or "Mozilla/5.0 (X11; Linux i686; rv:135.0) Gecko/20100101 Firefox/135.0"
63 |             }
64 |         ) as session:
65 |             response = await session.get(url)
66 |             response_text = (await response.text()).strip()
67 |             if response.ok:
68 |                 if response_text:
69 |                     return response_text
70 |                 errmsg = f"Failed to fetch {url}: HTTP {response.status} with empty body"
71 |             else:
72 |                 errmsg = f"Failed to fetch {url}: HTTP {response.status} ({response.reason})"
73 |     except ServerTimeoutError as err:
74 |         errmsg = f"Timeout while fetching {url}: {str(err)!r}"
75 |     except ClientConnectionError as err:
76 |         errmsg = f"Failed to connect to {url}: {str(err)!r}"
77 |     except TooManyRedirects as err:
78 |         errmsg = f"Too many redirects while fetching {url}: {str(err)!r}"
79 |     except ClientResponseError as err:
80 |         errmsg = f"HTTP error while fetching {url}: {err.status} - {err.message}"
81 |     except ClientError as err:
82 |         errmsg = f"Network error while fetching {url}: {str(err)!r}"
83 |     except Exception as err:  # noqa: BLE001
84 |         errmsg = f"Unexpected error while fetching {url}: {str(err)!r}"
85 | 
86 |     raise McpError(ErrorData(code=INTERNAL_ERROR, message=errmsg))
87 | 
```

--------------------------------------------------------------------------------
/server/mcp_server/tools/search.py:
--------------------------------------------------------------------------------

```python
 1 | """Provide a tool to query a SearXNG instance.
 2 | 
 3 | Allows AI assistants to search the web using a configured SearXNG instance,
 4 | leveraging its API for targeted and filtered searches.
 5 | """
 6 | 
 7 | from __future__ import annotations
 8 | 
 9 | from os import getenv as os_getenv
10 | from typing import Any
11 | from urllib.parse import urlencode
12 | 
13 | from mcp.shared.exceptions import McpError
14 | from mcp.types import INTERNAL_ERROR, INVALID_PARAMS, ErrorData
15 | 
16 | from .helpers import get_request
17 | 
18 | # Allowed parameters for the SearXNG API, excluding 'q' which is handled separately.
19 | ALLOWED_PARAMS: set[str] = {
20 |     "categories",
21 |     "engines",
22 |     "language",
23 |     "pageno",
24 |     "time_range",
25 |     "format",
26 |     "safesearch",
27 | }
28 | 
29 | 
30 | async def tool_search(q: str, **kwargs: Any) -> str:
31 |     """Query a SearXNG instance using its Search API.
32 | 
33 |     Args:
34 |         q: The search query string.
35 |         **kwargs: Additional optional parameters for the SearXNG API
36 |                   (categories, engines, language, pageno, time_range, format, safesearch).
37 | 
38 |     Returns:
39 |         The search results as a string (content depends on the 'format' parameter).
40 | 
41 |     Raises:
42 |         McpError: If the SEARXNG_QUERY_URL environment variable is not set,
43 |                   if invalid parameters are provided, or if the request fails.
44 |     """
45 |     searxng_url = os_getenv("SEARXNG_QUERY_URL")
46 |     if not searxng_url:
47 |         raise McpError(
48 |             ErrorData(code=INTERNAL_ERROR, message="SearXNG query URL is not configured on the server.")
49 |         )
50 | 
51 |     # Filter out any provided kwargs that are not valid SearXNG parameters
52 |     search_params = {k: v for k, v in kwargs.items() if k in ALLOWED_PARAMS and v is not None}
53 |     search_params["q"] = q  # Add the mandatory query
54 | 
55 |     # Default format to json if not specified, as it's often easiest for programmatic use
56 |     if "format" not in search_params:
57 |         search_params["format"] = "json"
58 | 
59 |     # Validate format if provided
60 |     if search_params["format"] not in ("json", "csv", "rss"):
61 |         raise McpError(
62 |             ErrorData(
63 |                 code=INVALID_PARAMS,
64 |                 message=f"Invalid format '{search_params['format']}'. Must be 'json', 'csv', or 'rss'.",
65 |             )
66 |         )
67 | 
68 |     query_string = urlencode(search_params)
69 |     full_url = f"{searxng_url}?{query_string}"
70 | 
71 |     try:
72 |         # Use the existing get_request helper
73 |         result = await get_request(full_url)
74 |         # Simple check for empty result which might indicate no results found
75 |         # depending on the format requested. SearXNG JSON format includes metadata even for no results.
76 |         if not result and search_params["format"] != "json":
77 |             return f"No results found for query '{q}' with specified parameters."
78 |     except McpError as e:
79 |         # Re-raise McpError to ensure it's handled correctly by the server
80 |         raise McpError(ErrorData(code=e.data.code, message=f"SearXNG query failed: {e.data.message}")) from e
81 |     except Exception as e:
82 |         # Catch any other unexpected errors during the request
83 |         raise McpError(
84 |             ErrorData(code=INTERNAL_ERROR, message=f"Unexpected error during SearXNG query: {e!r}")
85 |         ) from e
86 |     else:
87 |         return result
88 | 
```

--------------------------------------------------------------------------------
/server/tools.yaml:
--------------------------------------------------------------------------------

```yaml
 1 | tools:
 2 |   search:
 3 |     description: >
 4 |       Use this tool to access SearXNG to search the internet for current information or to perform
 5 |       calculations. Use this tool when the user asks about recent events, technical details, to find
 6 |       content, or your task requires calculations. If the search summary doesn't clearly answer the
 7 |       question, you can read one of the search results by providing the URL to the `web` tool, or use
 8 |       this `search` tool again to make further narrower requests to gain context to help your answer.
 9 |     inputSchema:
10 |       type: object
11 |       properties:
12 |         q:
13 |           type: string
14 |           description: |
15 |             SearXNG search query. Use `!` prefixes for categories/engines (chainable to search multiple sources) followed by your query:
16 |             - General: `!news <query>`, `!map <place>`, `!images <keywords>`
17 |             - Multimedia: `!videos` (PeerTube/Vimeo/YouTube), `!music` (Bandcamp/SoundCloud/YouTube), `!lyrics`, `!yt` (YouTube specific)
18 |             - Files: `!files` (books/apps/torrents), `!1337x` or `!kc` or `!solid` or `!tpb` (Torrents), `!gpa` (Google Play), `!wcf` (Wikimedia Commons)
19 |             - IT/Dev: `!it` (all tech), `!repos` (Git repos), `!dh` (Docker Hub), `!q&a` (Stack Ex.), `!mdn` (Web Docs), `!software_wikis` (Linux/dev wikis)
20 |             - Packages: `!pypi` (Python), `!npm` (Node), `!crates` or `!lrs` (Rust), `!alp` (Alpine Linux)
21 |             - Science/Compute: `!scientific_publications` (arXiv/PubMed/etc), `!wa` (WolframAlpha calculations/facts/definitions)
22 |             - Social: `!re` (Reddit)
23 |             Special keywords (no `!`):
24 |             - Stats: `avg 1 2 3`, `max`, `min`, `sum`, `prod`
25 |             - Random: `random color`, `random int`, `random string`, `random uuid`
26 |             - Hash: `sha512 text`
27 |         time_range:
28 |           type: string
29 |           enum: ["day", "month", "year"]
30 |           description: Filter results by time range if supported
31 |         safesearch:
32 |           type: integer
33 |           enum: [0, 1, 2]
34 |           description: Safe search level (0=Off, 1=Moderate, 2=Strict) if supported
35 |           default: 1
36 |       required: ["q"]
37 |   web:
38 |     description: >
39 |       Use this tool to access live web pages using their URL. This is crucial for providing users
40 |       with accurate information from up-to-date sources. You will typically want to use `markdown`
41 |       to read content, or use 'links' mode to extract hyperlinks to find related pages on a site,
42 |       e.g. for navigating documentation.
43 |     inputSchema:
44 |       type: object
45 |       properties:
46 |         url:
47 |           type: string
48 |           description: URL to access - must be a complete and valid web address.
49 |         mode:
50 |           type: string
51 |           enum:
52 |             - markdown
53 |             - raw
54 |             - links
55 |           description: |
56 |             Processing mode:
57 |             - `markdown` (default) for clean readable text
58 |             - `links` to list all hyperlinks
59 |             - `raw` for unprocessed content (code, JSON, etc)            
60 |           default: markdown
61 |         max_length:
62 |           type: integer
63 |           description: Optional character limit for the response (0 = no limit).
64 |           default: 0
65 |       required: ["url"]
66 | 
```

--------------------------------------------------------------------------------
/server/tests/test_web.py:
--------------------------------------------------------------------------------

```python
  1 | """Test the web content retrieval and processing tools."""
  2 | 
  3 | from __future__ import annotations
  4 | 
  5 | import pytest
  6 | from mcp.shared.exceptions import McpError
  7 | 
  8 | from mcp_server.tools.web import ProcessingMode, WebProcessor, tool_web
  9 | 
 10 | 
 11 | @pytest.fixture
 12 | def mock_html_content() -> str:
 13 |     """Return sample HTML content for testing.
 14 | 
 15 |     Returns:
 16 |         Sample HTML content as a string
 17 |     """
 18 |     return """
 19 |     <html>
 20 |         <body>
 21 |             <h1>Test Page</h1>
 22 |             <p>This is a test paragraph.</p>
 23 |             <a href="https://example.com">Example Link</a>
 24 |             <a href="/relative/path">Relative Link</a>
 25 |             <a href="#skip">Skip Link</a>
 26 |             <a href="javascript:void(0)">JavaScript Link</a>
 27 |         </body>
 28 |     </html>
 29 |     """
 30 | 
 31 | 
 32 | def test_processing_mode_from_str() -> None:
 33 |     """Test conversion of strings to ProcessingMode enum values."""
 34 |     if ProcessingMode.from_str("markdown") != ProcessingMode.MARKDOWN:
 35 |         pytest.fail("Failed to convert 'markdown' to ProcessingMode.MARKDOWN")
 36 |     if ProcessingMode.from_str("raw") != ProcessingMode.RAW:
 37 |         pytest.fail("Failed to convert 'raw' to ProcessingMode.RAW")
 38 |     if ProcessingMode.from_str("links") != ProcessingMode.LINKS:
 39 |         pytest.fail("Failed to convert 'links' to ProcessingMode.LINKS")
 40 |     if ProcessingMode.from_str("invalid") != ProcessingMode.RAW:
 41 |         pytest.fail("Failed to convert invalid mode to ProcessingMode.RAW")
 42 | 
 43 | 
 44 | @pytest.mark.asyncio
 45 | async def test_web_processor_links(monkeypatch: pytest.MonkeyPatch, mock_html_content: str) -> None:
 46 |     """Test extraction and formatting of links from web content."""
 47 | 
 48 |     async def mock_get_request(_url: str) -> str:
 49 |         return mock_html_content
 50 | 
 51 |     monkeypatch.setattr("mcp_server.tools.web.get_request", mock_get_request)
 52 | 
 53 |     processor = WebProcessor("https://test.com", mode=ProcessingMode.LINKS)
 54 |     result = await processor.process()
 55 | 
 56 |     if "Example Link: https://example.com" not in result:
 57 |         pytest.fail(f"Missing absolute link in output: {result}")
 58 |     if "https://test.com/relative/path" not in result:
 59 |         pytest.fail(f"Missing resolved relative link in output: {result}")
 60 |     if "#skip" in result:
 61 |         pytest.fail(f"Found invalid anchor link in output: {result}")
 62 |     if "javascript:void(0)" in result:
 63 |         pytest.fail(f"Found invalid JavaScript link in output: {result}")
 64 | 
 65 | 
 66 | @pytest.mark.asyncio
 67 | async def test_web_processor_markdown(monkeypatch: pytest.MonkeyPatch) -> None:
 68 |     """Test conversion of HTML content to markdown format."""
 69 | 
 70 |     async def mock_get_request(_url: str) -> str:
 71 |         return """
 72 |         <!DOCTYPE html>
 73 |         <html>
 74 |         <head><title>Test Page</title></head>
 75 |         <body>
 76 |             <article>
 77 |                 <h1>Test Heading</h1>
 78 |                 <p>This is a test paragraph with some <strong>bold text</strong>.</p>
 79 |                 <p>And another paragraph for good measure.</p>
 80 |             </article>
 81 |         </body>
 82 |         </html>
 83 |         """
 84 | 
 85 |     monkeypatch.setattr("mcp_server.tools.web.get_request", mock_get_request)
 86 | 
 87 |     processor = WebProcessor("https://test.com", mode=ProcessingMode.MARKDOWN)
 88 |     result = await processor.process()
 89 | 
 90 |     if "Test Heading" not in result:
 91 |         pytest.fail(f"Missing heading content in output: {result}")
 92 |     if "test paragraph" not in result:
 93 |         pytest.fail(f"Missing paragraph content in output: {result}")
 94 | 
 95 | 
 96 | @pytest.mark.asyncio
 97 | async def test_max_length_limit() -> None:
 98 |     """Test truncation of content based on max_length parameter."""
 99 |     processor = WebProcessor("https://test.com", max_length=10)
100 |     content = "This is a very long text that should be truncated"
101 | 
102 |     truncated = processor._format_links({"https://test.com": content})  # noqa: SLF001
103 |     if len(truncated) > processor.max_length + 100:  # Allow for header text
104 |         pytest.fail(f"Content exceeds max length: {len(truncated)} > {processor.max_length + 100}")
105 | 
106 | 
107 | @pytest.mark.asyncio
108 | async def test_invalid_url() -> None:
109 |     """Test error handling for invalid URLs."""
110 |     try:
111 |         await tool_web("not-a-url")
112 |         pytest.fail("Expected McpError for invalid URL")
113 |     except McpError:
114 |         pass
115 | 
116 | 
117 | @pytest.mark.asyncio
118 | async def test_empty_links() -> None:
119 |     """Test error handling when no links are found."""
120 |     processor = WebProcessor("https://test.com", mode=ProcessingMode.LINKS)
121 |     try:
122 |         processor._format_links({})  # noqa: SLF001
123 |         pytest.fail("Expected McpError for empty links")
124 |     except McpError:
125 |         pass
126 | 
```

--------------------------------------------------------------------------------
/docs/web.md:
--------------------------------------------------------------------------------

```markdown
  1 | # Web Tool
  2 | 
  3 | 1. [What can it do?](#what-can-it-do)
  4 | 2. [Processing Modes](#processing-modes)
  5 |    1. [Markdown Mode (default)](#markdown-mode-default)
  6 |    2. [Links Mode](#links-mode)
  7 |    3. [Raw Mode](#raw-mode)
  8 | 3. [Features and Limits](#features-and-limits)
  9 |    1. [Content Management](#content-management)
 10 |    2. [Safety Features](#safety-features)
 11 | 
 12 | A tool that lets AI assistants access and process web content safely. It can convert pages to
 13 | markdown, extract links, or get raw content - helping the AI give you more accurate, up-to-date
 14 | information.
 15 | 
 16 | ![Screenshot of GPT asked to research SSE in the MCP documentation and providing the answer after reading three different pages](./images/web-usage.png)
 17 | 
 18 | ## What can it do?
 19 | 
 20 | When you're discussing documentation, researching solutions, or need current information, the AI
 21 | can access web content to help. It's particularly useful when you want to:
 22 | 
 23 | - Get the latest documentation for a library or tool
 24 | - Find code examples that match your specific needs
 25 | - Navigate through complex documentation structures
 26 | - Verify that advice is current and accurate
 27 | 
 28 | The tool handles all the technical details like following redirects, handling errors, and cleaning
 29 | up messy HTML. You just point the AI at a URL, and it'll bring back the information in a format
 30 | that's easy to work with.
 31 | 
 32 | ## Processing Modes
 33 | 
 34 | The AI can process web content in three different ways, each designed for specific needs:
 35 | 
 36 | ### Markdown Mode (default)
 37 | 
 38 | Most of the time, you'll want clean, readable content without the clutter of web formatting.
 39 | Markdown mode automatically removes adverts, navigation menus, and other distractions, focusing on
 40 | the actual content you care about. It preserves important elements like headings, lists, tables,
 41 | and images, converting them into clean markdown that's easy to read.
 42 | 
 43 | If something goes wrong with the conversion, the tool automatically falls back to raw content,
 44 | letting the AI still help you even if the page is unusually formatted.
 45 | 
 46 | Example output:
 47 | 
 48 | ```markdown
 49 | Contents of https://example.com/article:
 50 | 
 51 | # Main Heading
 52 | 
 53 | Article content in clean markdown format...
 54 | 
 55 | ## Subheadings preserved
 56 | 
 57 | * Lists kept intact
 58 | * With proper formatting
 59 | 
 60 | ![Images kept with alt text](image.jpg)
 61 | 
 62 | | Tables | Converted |
 63 | |--------|-----------|
 64 | | To     | Markdown  |
 65 | ```
 66 | 
 67 | ### Links Mode
 68 | 
 69 | When you're exploring documentation or need to navigate through a website, links mode helps map
 70 | out the available paths. It finds all the links on a page, converts them to absolute URLs so they
 71 | always work, and shows you the text used to describe each link. This is particularly helpful when
 72 | you need to:
 73 | 
 74 | - Navigate through multi-page documentation
 75 | - Find related articles or resources
 76 | - Locate specific sections in large documents
 77 | - Build a map of available information
 78 | 
 79 | The AI orders links by relevance, filters out noise like social media buttons, and gives you a
 80 | clean list of where you can go next.
 81 | 
 82 | Example output:
 83 | 
 84 | ```markdown
 85 | All 45 links found on https://example.com
 86 | 
 87 | - Home: https://example.com/
 88 | - Products: https://example.com/products
 89 | - About Us: https://example.com/about
 90 | ...
 91 | ```
 92 | 
 93 | ### Raw Mode
 94 | 
 95 | Sometimes you need the original, unprocessed content - particularly when working with APIs,
 96 | downloading code, or accessing structured data. Raw mode gives you exactly what the server sends,
 97 | while still handling things like authentication, redirects, and error handling behind the scenes.
 98 | 
 99 | ## Features and Limits
100 | 
101 | The tool includes several features to make web access both powerful and safe:
102 | 
103 | ### Content Management
104 | 
105 | The AI can handle content of any size, but you can control how much it processes at once. Setting
106 | a length limit helps when you're working with large documents or want to focus on specific
107 | sections. You'll always get complete sentences and properly formatted content, with clear warnings
108 | if anything gets truncated.
109 | 
110 | If something goes wrong - whether it's a network issue, an authentication problem, or just an
111 | unusually formatted page - you'll get clear, actionable error messages explaining what happened
112 | and often suggesting how to fix it.
113 | 
114 | ### Safety Features
115 | 
116 | Behind the scenes, the tool uses industrial-strength libraries like `trafilatura` and
117 | `BeautifulSoup` to handle web content safely. It carefully processes URLs, headers, and content to
118 | prevent common issues, while giving you the flexibility to access the resources you need.
119 | 
120 | The tool strikes a careful balance - giving AI assistants broad access to web content while
121 | maintaining security and providing clear feedback. This means you can confidently point the AI at
122 | documentation or resources, knowing it'll handle the technical details and bring back exactly what
123 | you need.
124 | 
```

--------------------------------------------------------------------------------
/server/mcp_server/server.py:
--------------------------------------------------------------------------------

```python
  1 | """Core MCPServer implementation for the MCP fetch service.
  2 | 
  3 | Provides a generic MCPServer class for serving MCP requests. Allows drop-in tool support by mapping
  4 | tool functions to configuration loaded from an external YAML file.
  5 | """
  6 | 
  7 | from __future__ import annotations
  8 | 
  9 | from dataclasses import dataclass, field
 10 | from os import getenv as os_getenv
 11 | from pathlib import Path
 12 | from typing import TYPE_CHECKING, Any
 13 | 
 14 | from mcp.server import Server as BaseMCPServer
 15 | from mcp.server.sse import SseServerTransport
 16 | from mcp.server.stdio import stdio_server
 17 | from mcp.shared.exceptions import McpError
 18 | from mcp.types import INVALID_PARAMS, ErrorData, TextContent, Tool
 19 | from starlette.applications import Starlette
 20 | from starlette.routing import Mount, Route
 21 | from uvicorn import Config as UvicornConfig, Server as UvicornServer
 22 | 
 23 | if TYPE_CHECKING:
 24 |     from starlette.requests import Request
 25 |     from starlette.responses import Response
 26 | 
 27 | # Default path for tool configuration YAML file
 28 | DEFAULT_TOOL_CONFIG_PATH = Path(__file__).parent / "tools.yaml"
 29 | 
 30 | 
 31 | @dataclass(slots=True)
 32 | class MCPServer:
 33 |     """Define a generic MCP server class with drop-in tool support."""
 34 | 
 35 |     config: dict[str, Any]
 36 |     server: BaseMCPServer = field(init=False)
 37 |     server_name: str = field(default="mcp-server")
 38 |     tools: list[Tool] = field(default_factory=list)
 39 | 
 40 |     def __post_init__(self) -> None:
 41 |         """Initialise the MCPServer."""
 42 |         if self.config.get("server", {}).get("name"):
 43 |             self.server_name = self.config["server"]["name"]
 44 |         # Create MCP server instance
 45 |         self.server = BaseMCPServer(self.server_name)
 46 |         # Build the tool registry and tool list
 47 |         self.tools = [
 48 |             Tool(name=name, **{k: v for k, v in tool.items() if k != "method"})
 49 |             for name, tool in self.config["tools"].items()
 50 |         ]
 51 |         # Register the tool listing/calling methods
 52 |         self.server.list_tools()(self.list_tools)
 53 |         self.server.call_tool()(self.call_tool)
 54 | 
 55 |     async def list_tools(self) -> list[Tool]:
 56 |         """Return a list of available tools.
 57 | 
 58 |         Returns:
 59 |             A list of Tool objects representing the available tools.
 60 |         """
 61 |         return self.tools
 62 | 
 63 |     async def call_tool(self, name: str, arguments: dict) -> list[TextContent]:
 64 |         """Call the tool specified by name with provided arguments.
 65 | 
 66 |         Returns:
 67 |             A list of TextContent objects containing the tool's result
 68 | 
 69 |         Raises:
 70 |             McpError: If the tool is unknown or fails to execute
 71 |         """
 72 |         if name not in self.config["tools"]:
 73 |             raise McpError(
 74 |                 ErrorData(
 75 |                     code=INVALID_PARAMS, message=f"Tool '{name}' isn't available on this server anymore"
 76 |                 )
 77 |             )
 78 |         if "method" not in self.config["tools"][name]:
 79 |             raise McpError(
 80 |                 ErrorData(
 81 |                     code=INVALID_PARAMS,
 82 |                     message=(
 83 |                         f"Tool '{name}' has no registered method: inform the user that their MCP "
 84 |                         "server requires configuration to provide a function for this tool."
 85 |                     ),
 86 |                 )
 87 |             )
 88 |         try:
 89 |             result = await self.config["tools"][name]["method"](**arguments)
 90 |             return [TextContent(type="text", text=result)]
 91 |         except McpError as err:
 92 |             raise McpError(ErrorData(code=INVALID_PARAMS, message=str(err))) from err
 93 | 
 94 |     async def serve(self) -> None:
 95 |         """Run the MCP server, using either SSE or stdio mode."""
 96 |         options = self.server.create_initialization_options()
 97 |         sse_host, sse_port = os_getenv("SSE_HOST"), os_getenv("SSE_PORT")
 98 |         if sse_host and sse_port:
 99 |             sse = SseServerTransport("/messages/")
100 | 
101 |             async def _handle_sse(request: Request) -> Response | None:
102 |                 """Handle incoming SSE connection."""
103 |                 async with sse.connect_sse(
104 |                     request.scope,
105 |                     request.receive,
106 |                     request._send,  # noqa: SLF001
107 |                 ) as streams:
108 |                     await self.server.run(streams[0], streams[1], options, raise_exceptions=True)
109 | 
110 |             starlette_app = Starlette(
111 |                 debug=True,
112 |                 routes=[
113 |                     Route("/sse", endpoint=_handle_sse),
114 |                     Mount("/messages/", app=sse.handle_post_message),
115 |                 ],
116 |             )
117 | 
118 |             config = UvicornConfig(app=starlette_app, host=sse_host, port=int(sse_port), log_level="info")
119 |             server_instance = UvicornServer(config)
120 |             await server_instance.serve()
121 |         else:
122 |             async with stdio_server() as (read_stream, write_stream):
123 |                 await self.server.run(read_stream, write_stream, options, raise_exceptions=True)
124 | 
```

--------------------------------------------------------------------------------
/server/mcp_server/tools/web.py:
--------------------------------------------------------------------------------

```python
  1 | """Provide tools to retrieve and process web content.
  2 | 
  3 | Helps AI assistants access and understand web content through three processing modes:
  4 | 
  5 | - markdown: Converts HTML to clean, readable markdown (default)
  6 | - links: Extracts and formats hyperlinks with their anchor text
  7 | - raw: Returns unprocessed content for APIs or non-HTML resources
  8 | 
  9 | Features include:
 10 | - Smart content extraction focusing on main text
 11 | - Link processing with relative URL resolution
 12 | - Configurable length limits
 13 | - Detailed error messages for common issues
 14 | """
 15 | 
 16 | from __future__ import annotations
 17 | 
 18 | from collections import Counter
 19 | from dataclasses import dataclass, field
 20 | from enum import Enum
 21 | from typing import Final
 22 | from urllib.parse import urljoin
 23 | 
 24 | from bs4 import BeautifulSoup, Tag
 25 | from bs4.filter import SoupStrainer
 26 | from mcp.shared.exceptions import McpError
 27 | from mcp.types import INTERNAL_ERROR, ErrorData
 28 | from trafilatura import extract as trafilatura_extract
 29 | 
 30 | from .helpers import add_error, get_request
 31 | 
 32 | 
 33 | class ProcessingMode(Enum):
 34 |     """Define valid content processing modes."""
 35 | 
 36 |     MARKDOWN = "markdown"
 37 |     RAW = "raw"
 38 |     LINKS = "links"
 39 | 
 40 |     @classmethod
 41 |     def from_str(cls, mode: str) -> ProcessingMode:
 42 |         """Create ProcessingMode from string, defaulting to RAW if invalid.
 43 | 
 44 |         Args:
 45 |             mode: String representation of the processing mode
 46 | 
 47 |         Returns:
 48 |             ProcessingMode enum value
 49 |         """
 50 |         try:
 51 |             return cls(mode.lower())
 52 |         except ValueError:
 53 |             return cls.RAW
 54 | 
 55 | 
 56 | SKIP_HREF_PREFIXES: Final = ("#", "javascript:")
 57 | 
 58 | 
 59 | @dataclass(slots=True)
 60 | class WebProcessor:
 61 |     """Handle web content retrieval and processing."""
 62 | 
 63 |     url: str
 64 |     mode: ProcessingMode | str = field(default=ProcessingMode.MARKDOWN)
 65 |     max_length: int = field(default=0)
 66 | 
 67 |     def __post_init__(self) -> None:
 68 |         """Validate and correct inputs as needed."""
 69 |         if isinstance(self.mode, str):
 70 |             self.mode = ProcessingMode.from_str(self.mode)
 71 |         self.max_length = max(self.max_length, 0)
 72 | 
 73 |     async def process(self) -> str:
 74 |         """Fetch and process the content according to the specified mode.
 75 | 
 76 |         Returns:
 77 |             Processed content as a string
 78 |         """
 79 |         content = await get_request(self.url)
 80 | 
 81 |         match self.mode:
 82 |             case ProcessingMode.LINKS:
 83 |                 return self._format_links(self._extract_links(content))
 84 | 
 85 |             case ProcessingMode.MARKDOWN:
 86 |                 extracted = trafilatura_extract(
 87 |                     content,
 88 |                     favor_recall=True,
 89 |                     include_formatting=True,
 90 |                     include_images=True,
 91 |                     include_links=True,
 92 |                     include_tables=True,
 93 |                     output_format="markdown",
 94 |                     with_metadata=True,
 95 |                 ) or add_error(content, "Extraction to markdown failed; returning raw content", append=False)
 96 | 
 97 |             case ProcessingMode.RAW:
 98 |                 extracted = content
 99 | 
100 |         if self.max_length > 0 and len(extracted) > self.max_length:
101 |             extracted = add_error(
102 |                 extracted[: self.max_length],
103 |                 f"Content truncated to {self.max_length} characters",
104 |                 append=True,
105 |             )
106 | 
107 |         return f"Contents of {self.url}:\n\n{extracted}"
108 | 
109 |     def _get_absolute_url(self, href: str) -> str | None:
110 |         """Get the absolute URL from a relative or absolute href.
111 | 
112 |         Returns:
113 |             Absolute URL or None if invalid
114 |         """
115 |         stripped = href.strip()
116 |         if not stripped or any(stripped.startswith(prefix) for prefix in SKIP_HREF_PREFIXES):
117 |             return None
118 |         return stripped if stripped.startswith(("http://", "https://")) else urljoin(self.url, stripped)
119 | 
120 |     def _extract_links(self, content: str) -> dict[str, str]:
121 |         """Extract all valid links with their anchor text.
122 | 
123 |         Returns:
124 |             Dictionary mapping each unique absolute URL to its first-found anchor text
125 |         """
126 |         soup = BeautifulSoup(content, "html.parser", parse_only=SoupStrainer("a", href=True))
127 | 
128 |         anchors = [a for a in soup.find_all("a", href=True) if isinstance(a, Tag)]
129 |         valid_anchors = [
130 |             (a, url)
131 |             for a in anchors
132 |             if (href := a.get("href")) and isinstance(href, str) and (url := self._get_absolute_url(href))
133 |         ]
134 | 
135 |         url_counts = Counter(url for _, url in valid_anchors)
136 | 
137 |         return dict(
138 |             sorted(
139 |                 {
140 |                     url: next(a.get_text(strip=True) for a, anchor_url in valid_anchors if anchor_url == url)
141 |                     for url in url_counts
142 |                 }.items(),
143 |                 key=lambda x: (-url_counts[x[0]], x[0]),
144 |             )
145 |         )
146 | 
147 |     def _format_links(self, links: dict[str, str]) -> str:
148 |         """Format extracted links into a readable string.
149 | 
150 |         Args:
151 |             links: Dictionary of URLs and their titles
152 | 
153 |         Returns:
154 |             Formatted string of links
155 | 
156 |         Raises:
157 |             McpError: If no links are found
158 |         """
159 |         if not links:
160 |             raise McpError(
161 |                 ErrorData(
162 |                     code=INTERNAL_ERROR,
163 |                     message=f"No links found on {self.url} - it may require JavaScript or auth.",
164 |                 )
165 |             )
166 | 
167 |         total_links = len(links)
168 |         formatted_links = []
169 |         length = 0
170 | 
171 |         for url, title in links.items():
172 |             link_text = f"- {title}: {url}" if title else f"- {url}"
173 |             new_length = length + len(link_text) + 1
174 | 
175 |             if self.max_length > 0 and new_length > self.max_length:
176 |                 break
177 | 
178 |             formatted_links.append(link_text)
179 |             length = new_length
180 | 
181 |         added_count = len(formatted_links)
182 |         header = (
183 |             f"{added_count} of {total_links} links found on {self.url}"
184 |             if added_count < total_links
185 |             else f"All {total_links} links found on {self.url}"
186 |         )
187 | 
188 |         return f"{header}\n" + "\n".join(formatted_links)
189 | 
190 | 
191 | async def tool_web(url: str, mode: str = "markdown", max_length: int = 0) -> str:
192 |     """Access and process web content from a given URL.
193 | 
194 |     Returns:
195 |         Processed content as a string
196 |     """
197 |     processor = WebProcessor(url=url, mode=mode, max_length=max_length)
198 |     return await processor.process()
199 | 
```

--------------------------------------------------------------------------------
/server/tests/test_server.py:
--------------------------------------------------------------------------------

```python
  1 | """Test the MCP server initialization and configuration."""
  2 | 
  3 | from __future__ import annotations
  4 | 
  5 | from os import environ as os_environ
  6 | from pathlib import Path
  7 | from typing import TYPE_CHECKING
  8 | 
  9 | import pytest
 10 | import pytest_asyncio
 11 | from yaml import dump as yaml_dump, safe_load as yaml_safe_load
 12 | 
 13 | from mcp_server.server import MCPServer
 14 | from mcp_server.tools import tool_search, tool_web
 15 | 
 16 | if TYPE_CHECKING:
 17 |     from collections.abc import Generator
 18 | 
 19 | # Constants for testing
 20 | MAX_DESCRIPTION_LENGTH = 1024
 21 | 
 22 | 
 23 | @pytest.fixture
 24 | def mock_yaml_file(tmp_path: Path) -> Path:
 25 |     """Create a temporary tools.yaml file for testing.
 26 | 
 27 |     Args:
 28 |         tmp_path: Pytest fixture providing temporary directory
 29 | 
 30 |     Returns:
 31 |         Path to the temporary YAML file
 32 |     """
 33 |     yaml_content = {
 34 |         "tools": {
 35 |             "search": {
 36 |                 "description": "Test Search tool",
 37 |                 "inputSchema": {"type": "object", "properties": {"query": {"type": "string"}}},
 38 |             },
 39 |             "web": {
 40 |                 "description": "Test Web tool",
 41 |                 "inputSchema": {"type": "object", "properties": {"url": {"type": "string"}}},
 42 |             },
 43 |         }
 44 |     }
 45 | 
 46 |     yaml_path = tmp_path / "tools.yaml"
 47 |     yaml_path.write_text(yaml_dump(yaml_content), encoding="utf-8")
 48 |     return yaml_path
 49 | 
 50 | 
 51 | @pytest.fixture
 52 | def server_env() -> Generator[None]:
 53 |     """Set up server environment variables for testing."""
 54 |     os_environ["SANDBOX"] = "127.0.0.1:8080"
 55 |     os_environ["SSE_HOST"] = "127.0.0.1"
 56 |     os_environ["SSE_PORT"] = "3001"
 57 |     os_environ["USER_AGENT"] = "TestAgent/1.0"
 58 |     yield
 59 |     for key in ["SANDBOX", "SSE_HOST", "SSE_PORT", "USER_AGENT"]:
 60 |         if key in os_environ:
 61 |             del os_environ[key]
 62 | 
 63 | 
 64 | @pytest_asyncio.fixture
 65 | async def server(mock_yaml_file: Path) -> MCPServer:
 66 |     """Create a test server instance.
 67 | 
 68 |     Args:
 69 |         mock_yaml_file: Path to test YAML configuration
 70 | 
 71 |     Returns:
 72 |         Configured MCPServer instance
 73 |     """
 74 |     config = yaml_safe_load(mock_yaml_file.read_text(encoding="utf-8"))
 75 |     config["tools"]["search"]["method"] = tool_search
 76 |     config["tools"]["web"]["method"] = tool_web
 77 |     return MCPServer(config)
 78 | 
 79 | 
 80 | def test_yaml_loading(mock_yaml_file: Path) -> None:
 81 |     """Test that the YAML configuration can be loaded correctly."""
 82 |     config = yaml_safe_load(mock_yaml_file.read_text(encoding="utf-8"))
 83 | 
 84 |     if "tools" not in config:
 85 |         pytest.fail("Missing 'tools' section in config")
 86 |     if "search" not in config["tools"]:
 87 |         pytest.fail("Missing 'search' tool in config")
 88 |     if "web" not in config["tools"]:
 89 |         pytest.fail("Missing 'web' tool in config")
 90 | 
 91 |     for tool_name in ("search", "web"):
 92 |         if "description" not in config["tools"][tool_name]:
 93 |             pytest.fail(f"Missing 'description' in {tool_name} tool config")
 94 | 
 95 |         description_length = len(config["tools"][tool_name]["description"])
 96 |         if description_length > MAX_DESCRIPTION_LENGTH:
 97 |             pytest.fail(
 98 |                 f"Description for tool '{tool_name}' is too long: "
 99 |                 f"{description_length} characters (max {MAX_DESCRIPTION_LENGTH})"
100 |             )
101 | 
102 | 
103 | def test_server_initialisation(server: MCPServer) -> None:
104 |     """Test that the server initializes with the correct tools."""
105 |     if not hasattr(server, "tools"):
106 |         pytest.fail("Server missing tools attribute")
107 |     tool_names = {tool.name for tool in server.tools}
108 |     if "search" not in tool_names:
109 |         pytest.fail("Server missing search tool")
110 |     if "web" not in tool_names:
111 |         pytest.fail("Server missing web tool")
112 | 
113 |     search_tool_config = server.config["tools"]["search"]
114 |     web_tool_config = server.config["tools"]["web"]
115 | 
116 |     if search_tool_config.get("method") != tool_search:
117 |         pytest.fail("Search tool has incorrect method")
118 |     if web_tool_config.get("method") != tool_web:
119 |         pytest.fail("Web tool has incorrect method")
120 | 
121 | 
122 | @pytest.mark.asyncio
123 | @pytest.mark.usefixtures("server_env")
124 | async def test_server_environment() -> None:
125 |     """Test that environment variables are correctly set."""
126 |     if os_environ["SANDBOX"] != "127.0.0.1:8080":
127 |         pytest.fail(f"Incorrect SANDBOX: {os_environ['SANDBOX']}")
128 |     if os_environ["SSE_HOST"] != "127.0.0.1":
129 |         pytest.fail(f"Incorrect SSE_HOST: {os_environ['SSE_HOST']}")
130 |     if os_environ["SSE_PORT"] != "3001":
131 |         pytest.fail(f"Incorrect SSE_PORT: {os_environ['SSE_PORT']}")
132 |     if os_environ["USER_AGENT"] != "TestAgent/1.0":
133 |         pytest.fail(f"Incorrect USER_AGENT: {os_environ['USER_AGENT']}")
134 | 
135 | 
136 | def test_live_tools_yaml_file() -> None:
137 |     """Test that the live tools.yaml file is readable and contains required keys."""
138 |     # Determine the project root (assumed one level above the tests directory)
139 |     project_root = Path(__file__).parent.parent
140 |     tools_yaml_path = project_root / "tools.yaml"
141 |     if not tools_yaml_path.exists():
142 |         pytest.fail(f"tools.yaml file not found at {tools_yaml_path}")
143 | 
144 |     config = yaml_safe_load(tools_yaml_path.read_text(encoding="utf-8"))
145 | 
146 |     if "tools" not in config:
147 |         pytest.fail("Missing 'tools' section in live tools.yaml")
148 | 
149 |     for tool in ("search", "web"):
150 |         if tool not in config["tools"]:
151 |             pytest.fail(f"Missing '{tool}' configuration in live tools.yaml")
152 |         if "inputSchema" not in config["tools"][tool]:
153 |             pytest.fail(f"Missing 'inputSchema' for tool '{tool}' in live tools.yaml")
154 | 
155 | 
156 | def test_tool_description_length() -> None:
157 |     """Test that tool descriptions don't exceed the OpenAI API limit of 1024 characters."""
158 |     # Determine the project root (assumed one level above the tests directory)
159 |     project_root = Path(__file__).parent.parent
160 |     tools_yaml_path = project_root / "tools.yaml"
161 |     if not tools_yaml_path.exists():
162 |         pytest.fail(f"tools.yaml file not found at {tools_yaml_path}")
163 | 
164 |     config = yaml_safe_load(tools_yaml_path.read_text(encoding="utf-8"))
165 | 
166 |     if "tools" not in config:
167 |         pytest.fail("Missing 'tools' section in tools.yaml")
168 | 
169 |     for tool_name, tool_config in config["tools"].items():
170 |         if "description" not in tool_config:
171 |             pytest.fail(f"Missing 'description' for tool '{tool_name}' in tools.yaml")
172 | 
173 |         description_length = len(tool_config["description"])
174 |         if description_length > MAX_DESCRIPTION_LENGTH:
175 |             pytest.fail(
176 |                 f"Description for tool '{tool_name}' is too long: "
177 |                 f"{description_length} characters (max {MAX_DESCRIPTION_LENGTH})"
178 |             )
179 | 
```