# Directory Structure
```
├── Dockerfile
├── README.md
├── requirements.txt
└── src
└── server.py
```
# Files
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
```markdown
1 | # PDF Reader MCP Server
2 |
3 | A Model Context Protocol (MCP) server that provides tools for reading and extracting text from PDF files, supporting both local files and URLs.
4 |
5 | ## Author
6 |
7 | Philip Van de Walker
8 | Email: [email protected]
9 | GitHub: https://github.com/trafflux
10 |
11 | ## Features
12 |
13 | - Read text content from local PDF files
14 | - Read text content from PDF URLs
15 | - Error handling for corrupt or invalid PDFs
16 | - Volume mounting for accessing local PDFs
17 | - Auto-detection of PDF encoding
18 | - Standardized JSON output format
19 |
20 | ## Installation
21 |
22 | 1. Clone the repository:
23 |
24 | ```bash
25 | git clone https://github.com/trafflux/pdf-reader-mcp.git
26 | cd pdf-reader-mcp
27 | ```
28 |
29 | 2. Build the Docker image:
30 |
31 | ```bash
32 | docker build -t mcp/pdf-reader .
33 | ```
34 |
35 | ## Usage
36 |
37 | ### Running the Server
38 |
39 | To run the server with access to local PDF files:
40 |
41 | ```bash
42 | docker run -i --rm -v /path/to/pdfs:/pdfs mcp/pdf-reader
43 | ```
44 |
45 | Replace `/path/to/pdfs` with the actual path to your PDF files directory.
46 |
47 | If not using local PDF files:
48 |
49 | ```bash
50 | docker run -i --rm mcp/pdf-reader
51 | ```
52 |
53 | ### MCP Configuration
54 |
55 | Add to your MCP settings configuration:
56 |
57 | ```json
58 | {
59 | "mcpServers": {
60 | "pdf-reader": {
61 | "command": "docker",
62 | "args": [
63 | "run",
64 | "-i",
65 | "--rm",
66 | "-v",
67 | "/path/to/pdfs:/pdfs",
68 | "mcp/pdf-reader"
69 | ],
70 | "disabled": false,
71 | "autoApprove": []
72 | }
73 | }
74 | }
75 | ```
76 |
77 | Without local file PDF files:
78 |
79 | ```json
80 | {
81 | "mcpServers": {
82 | "pdf-reader": {
83 | "command": "docker",
84 | "args": ["run", "-i", "--rm", "mcp/pdf-reader"],
85 | "disabled": false,
86 | "autoApprove": []
87 | }
88 | }
89 | }
90 | ```
91 |
92 | ### Available Tools
93 |
94 | 1. `read_local_pdf`
95 |
96 | - Purpose: Read text content from a local PDF file
97 | - Input:
98 | ```json
99 | {
100 | "path": "/pdfs/document.pdf"
101 | }
102 | ```
103 | - Output:
104 | ```json
105 | {
106 | "success": true,
107 | "data": {
108 | "text": "Extracted content..."
109 | }
110 | }
111 | ```
112 |
113 | 2. `read_pdf_url`
114 | - Purpose: Read text content from a PDF URL
115 | - Input:
116 | ```json
117 | {
118 | "url": "https://example.com/document.pdf"
119 | }
120 | ```
121 | - Output:
122 | ```json
123 | {
124 | "success": true,
125 | "data": {
126 | "text": "Extracted content..."
127 | }
128 | }
129 | ```
130 |
131 | ## Error Handling
132 |
133 | The server handles various error cases with clear error messages:
134 |
135 | - Invalid or corrupt PDF files
136 | - Missing files
137 | - Failed URL requests
138 | - Permission issues
139 | - Network connectivity problems
140 |
141 | Error responses follow the format:
142 |
143 | ```json
144 | {
145 | "success": false,
146 | "error": "Detailed error message"
147 | }
148 | ```
149 |
150 | ## Dependencies
151 |
152 | - Python 3.11+
153 | - PyPDF2: PDF parsing and text extraction
154 | - requests: HTTP client for fetching PDFs from URLs
155 | - MCP SDK: Model Context Protocol implementation
156 |
157 | ## Project Structure
158 |
159 | ```
160 | .
161 | ├── Dockerfile # Container configuration
162 | ├── README.md # This documentation
163 | ├── requirements.txt # Python dependencies
164 | └── src/
165 | ├── __init__.py # Package initialization
166 | └── server.py # Main server implementation
167 | ```
168 |
169 | ## License
170 |
171 | Copyright 2025 Philip Van de Walker
172 |
173 | Licensed under the Apache License, Version 2.0 (the "License");
174 | you may not use this file except in compliance with the License.
175 | You may obtain a copy of the License at
176 |
177 | http://www.apache.org/licenses/LICENSE-2.0
178 |
179 | Unless required by applicable law or agreed to in writing, software
180 | distributed under the License is distributed on an "AS IS" BASIS,
181 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
182 | See the License for the specific language governing permissions and
183 | limitations under the License.
184 |
185 | ## Contributing
186 |
187 | Contributions are welcome! Please feel free to submit a Pull Request.
188 |
189 | ## Contact
190 |
191 | For questions, issues, or contributions, please contact Philip Van de Walker:
192 |
193 | - Email: [email protected]
194 | - GitHub: https://github.com/trafflux
195 |
```
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
```
1 | PyPDF2>=3.0.0
2 | requests>=2.31.0
3 |
```
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
```dockerfile
1 | # PDF Reader MCP Server Dockerfile
2 | # Author: Philip Van de Walker
3 | # Email: [email protected]
4 | # Repo: https://github.com/trafflux/pdf-reader-mcp
5 | # Licensed under the Apache License, Version 2.0 (the "License");
6 | # you may not use this file except in compliance with the License.
7 | # You may obtain a copy of the License at
8 | #
9 | # http://www.apache.org/licenses/LICENSE-2.0
10 | #
11 | # Unless required by applicable law or agreed to in writing, software
12 | # distributed under the License is distributed on an "AS IS" BASIS,
13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14 | # See the License for the specific language governing permissions and
15 | # limitations under the License.
16 |
17 | # Use Python 3.11 slim image as base
18 | FROM python:3.11-slim
19 |
20 | # Set working directory
21 | WORKDIR /app
22 |
23 | # Install git for MCP SDK installation
24 | RUN apt-get update && \
25 | apt-get install -y git && \
26 | apt-get clean && \
27 | rm -rf /var/lib/apt/lists/*
28 |
29 | # Install MCP SDK directly from GitHub repository
30 | RUN pip install git+https://github.com/modelcontextprotocol/python-sdk.git
31 |
32 | # Install project Python dependencies
33 | COPY requirements.txt .
34 | RUN pip install -r requirements.txt
35 |
36 | # Copy source code into container
37 | COPY src/ .
38 |
39 | # Command to run the server
40 | # The container expects a volume mount at /pdfs for accessing local PDF files
41 | ENTRYPOINT ["python", "server.py"]
42 |
```
--------------------------------------------------------------------------------
/src/server.py:
--------------------------------------------------------------------------------
```python
1 | """
2 | PDF Reader MCP Server
3 | --------------------
4 |
5 | A Model Context Protocol (MCP) server that provides tools for reading and extracting text from PDF files.
6 | Supports both local files and URLs, with comprehensive error handling and standardized output format.
7 |
8 | Author: Philip Van de Walker
9 | Email: [email protected]
10 | Repo: https://github.com/trafflux/pdf-reader-mcp
11 |
12 | Licensed under the Apache License, Version 2.0 (the "License");
13 | you may not use this file except in compliance with the License.
14 | You may obtain a copy of the License at
15 |
16 | http://www.apache.org/licenses/LICENSE-2.0
17 |
18 | Unless required by applicable law or agreed to in writing, software
19 | distributed under the License is distributed on an "AS IS" BASIS,
20 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
21 | See the License for the specific language governing permissions and
22 | limitations under the License.
23 |
24 | This module implements an MCP server with two main tools:
25 | - read_local_pdf: Extracts text from local PDF files
26 | - read_pdf_url: Extracts text from PDFs accessed via URLs
27 |
28 | The server uses FastMCP for simplified tool registration and standardized error handling.
29 | All text extraction is done using PyPDF2 with proper error handling for various edge cases.
30 | """
31 |
32 | import os
33 | import io
34 | import logging
35 | from typing import Dict, Any
36 |
37 | import PyPDF2
38 | import requests
39 | from mcp.server.fastmcp import FastMCP
40 |
41 | def get_logger(name: str):
42 | logger = logging.getLogger(name)
43 | return logger
44 |
45 | logger = get_logger(__name__)
46 |
47 | # Create server instance using FastMCP
48 | mcp = FastMCP("pdf-reader")
49 |
50 | def extract_text_from_pdf(pdf_file) -> str:
51 | """Extract text content from a PDF file."""
52 | try:
53 | reader = PyPDF2.PdfReader(pdf_file)
54 | text = ""
55 | for page in reader.pages:
56 | text += page.extract_text() + "\n"
57 | return text.strip()
58 | except Exception as e:
59 | logger.error(f"Failed to extract text from PDF: {str(e)}")
60 | raise ValueError(f"Failed to extract text from PDF: {str(e)}")
61 |
62 | @mcp.tool()
63 | async def read_local_pdf(path: str) -> Dict[str, Any]:
64 | """Read text content from a local PDF file."""
65 | try:
66 | with open(path, 'rb') as file:
67 | text = extract_text_from_pdf(file)
68 | return {
69 | "success": True,
70 | "data": {
71 | "text": text
72 | }
73 | }
74 | except FileNotFoundError:
75 | logger.error(f"PDF file not found: {path}")
76 | return {
77 | "success": False,
78 | "error": f"PDF file not found: {path}"
79 | }
80 | except Exception as e:
81 | logger.error(str(e))
82 | return {
83 | "success": False,
84 | "error": str(e)
85 | }
86 |
87 | @mcp.tool()
88 | async def read_pdf_url(url: str) -> Dict[str, Any]:
89 | """Read text content from a PDF URL."""
90 | try:
91 | response = requests.get(url)
92 | response.raise_for_status()
93 | pdf_file = io.BytesIO(response.content)
94 | text = extract_text_from_pdf(pdf_file)
95 | return {
96 | "success": True,
97 | "data": {
98 | "text": text
99 | }
100 | }
101 | except requests.RequestException as e:
102 | logger.error(f"Failed to fetch PDF from URL: {str(e)}")
103 | return {
104 | "success": False,
105 | "error": f"Failed to fetch PDF from URL: {str(e)}"
106 | }
107 | except Exception as e:
108 | logger.error(str(e))
109 | return {
110 | "success": False,
111 | "error": str(e)
112 | }
113 |
114 | def main() -> None:
115 | """Run the MCP server."""
116 | try:
117 | mcp.run()
118 | except Exception as e:
119 | logger.error(f"Error starting server: {str(e)}")
120 | raise
121 |
122 | if __name__ == "__main__":
123 | main()
124 |
```