# Directory Structure
```
├── .gitignore
├── assets
│ └── deep-research-mcp-logo.png
├── CHANGELOG.md
├── Dockerfile
├── example_config.json
├── LICENSE
├── OVERVIEW.md
├── package-lock.json
├── package.json
├── README.md
├── smithery.yaml
├── src
│ └── index.ts
├── test-file-writing.js
├── test-output
│ └── test-summary.md
└── tsconfig.json
```
# Files
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
```
1 | # Dependencies
2 | node_modules/
3 | npm-debug.log
4 | yarn-debug.log
5 | yarn-error.log
6 |
7 | # Build outputs
8 | dist/
9 |
10 | # Environment variables
11 | .env
12 | .env.local
13 | .env.development.local
14 | .env.test.local
15 | .env.production.local
16 |
17 | # Configuration
18 | config.json
19 |
20 | # OS specific files
21 | .DS_Store
22 | Thumbs.db
23 |
24 | # Editor specific files
25 | .idea/
26 | .vs/
27 | .vscode/
28 | *.swp
29 | *.swo
30 | Tavily API Reference.md
31 | Tavily Javascript SDK.md
32 |
33 | # Other
34 | reference-mcptools/
```
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
```markdown
1 | <p align="center">
2 | <img src="assets/deep-research-mcp-logo.png" alt="Deep Research MCP Logo" width="250" height="250">
3 | </p>
4 |
5 | <h1 align="center">Deep Research MCP Server</h1>
6 |
7 | <p align="center">
8 | <a href="https://www.npmjs.com/package/@pinkpixel/deep-research-mcp"><img src="https://img.shields.io/npm/v/@pinkpixel/deep-research-mcp.svg" alt="NPM Version"></a>
9 | <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT"></a>
10 | <a href="https://smithery.ai/server/@pinkpixel/dev-deep-research-mcp"><img src="https://smithery.ai/badge/@pinkpixel/dev-deep-research-mcp" alt="Smithery Installs"></a>
11 | </p>
12 |
13 | The Deep Research MCP Server is a Model Context Protocol (MCP) compliant server designed to perform comprehensive web research. It leverages Tavily's powerful Search and new Crawl APIs to gather extensive, up-to-date information on a given topic. The server then aggregates this data along with documentation generation instructions into a structured JSON output, perfectly tailored for Large Language Models (LLMs) to create detailed and high-quality markdown documents.
14 |
15 | ## Features
16 |
17 | * **Multi-Step Research:** Combines Tavily's AI-powered web search with deep content crawling for thorough information gathering.
18 | * **Structured JSON Output:** Provides well-organized data (original query, search summary, detailed findings per source, and documentation instructions) optimized for LLM consumption.
19 | * **Configurable Documentation Prompt:** Includes a comprehensive default prompt for generating high-quality technical documentation. This prompt can be:
20 | * Overridden by setting the `DOCUMENTATION_PROMPT` environment variable.
21 | * Further overridden by passing a `documentation_prompt` argument directly to the tool.
22 | * **Configurable Output Path:** Specify where research documents and images should be saved through:
23 | * Environment variable configuration
24 | * JSON configuration
25 | * Direct parameter in tool calls
26 | * **Granular Control:** Offers a wide range of parameters to fine-tune both the search and crawl processes.
27 | * **MCP Compliant:** Designed to integrate seamlessly into MCP-based AI agent ecosystems.
28 |
29 | ## Prerequisites
30 |
31 | * [Node.js](https://nodejs.org/) (version 18.x or later recommended)
32 | * [npm](https://www.npmjs.com/) (comes with Node.js) or [Yarn](https://yarnpkg.com/)
33 |
34 | ## Installation
35 |
36 | ### Installing via Smithery
37 |
38 | To install deep-research-mcp for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@pinkpixel/dev-deep-research-mcp):
39 |
40 | ```bash
41 | npx -y @smithery/cli install @pinkpixel/dev-deep-research-mcp --client claude
42 | ```
43 |
44 | ### Option 1: Using with NPX (Recommended for quick use)
45 |
46 | You can run the server directly using `npx` without a global installation:
47 |
48 | ```bash
49 | npx @pinkpixel/deep-research-mcp
50 | ```
51 |
52 | ### Option 2: Global Installation (Optional)
53 |
54 | ```bash
55 | npm install -g @pinkpixel/deep-research-mcp
56 | ```
57 |
58 | Then you can run it using:
59 |
60 | ```bash
61 | deep-research-mcp
62 | ```
63 |
64 | ### Option 3: Local Project Integration or Development
65 |
66 | 1. Clone the repository (if you want to modify or contribute):
67 | ```bash
68 | git clone [https://github.com/your-username/deep-research-mcp.git](https://github.com/your-username/deep-research-mcp.git)
69 | cd deep-research-mcp
70 | ```
71 | 2. Install dependencies:
72 | ```bash
73 | npm install
74 | ```
75 |
76 | ## Configuration
77 |
78 | The server requires a Tavily API key and can optionally accept a custom documentation prompt.
79 |
80 | ```json
81 | {
82 | "mcpServers": {
83 | "deep-research": {
84 | "command": "npx",
85 | "args": [
86 | "-y",
87 | "@pinkpixel/deep-research-mcp"
88 | ],
89 | "env": {
90 | "TAVILY_API_KEY": "tvly-YOUR_ACTUAL_API_KEY_HERE", // Required
91 | "DOCUMENTATION_PROMPT": "Your custom, detailed instructions for the LLM on how to generate markdown documents from the research data...", // Optional - if not provided, the default prompt will be used
92 | "SEARCH_TIMEOUT": "120", // Optional - timeout in seconds for search requests (default: 60)
93 | "CRAWL_TIMEOUT": "300", // Optional - timeout in seconds for crawl requests (default: 180)
94 | "MAX_SEARCH_RESULTS": "10", // Optional - maximum search results to retrieve (default: 7)
95 | "CRAWL_MAX_DEPTH": "2", // Optional - maximum crawl depth (default: 1)
96 | "CRAWL_LIMIT": "15", // Optional - maximum URLs to crawl per source (default: 10)
97 | "FILE_WRITE_ENABLED": "true", // Optional - enable file writing capability (default: false)
98 | "ALLOWED_WRITE_PATHS": "/home/user/research,/home/user/documents", // Optional - comma-separated allowed directories (default: user home directory)
99 | "FILE_WRITE_LINE_LIMIT": "300" // Optional - maximum lines per file write operation (default: 200)
100 | }
101 | }
102 | }
103 | }
104 | ```
105 |
106 | ### 1\. Tavily API Key (Required)
107 |
108 | Set the `TAVILY_API_KEY` environment variable to your Tavily API key.
109 |
110 | **Methods:**
111 |
112 | * **`.env` file:** Create a `.env` file in the project root (if running locally for development):
113 | ```env
114 | TAVILY_API_KEY="tvly-YOUR_ACTUAL_API_KEY"
115 | ```
116 | * **Directly in command line:**
117 | ```bash
118 | TAVILY_API_KEY="tvly-YOUR_ACTUAL_API_KEY" npx @pinkpixel/deep-research-mcp
119 | ```
120 | * **System Environment Variable:** Set it in your operating system's environment variables.
121 |
122 | ### 2\. Custom Documentation Prompt (Optional)
123 |
124 | You can override the default comprehensive documentation prompt by setting the `DOCUMENTATION_PROMPT` environment variable.
125 |
126 | **Methods (in order of precedence):**
127 |
128 | 1. **Tool Argument:** The `documentation_prompt` parameter passed when calling the `deep-research-tool` takes highest precedence
129 | 2. **Environment Variable:** If no parameter is provided in the tool call, the system checks for a `DOCUMENTATION_PROMPT` environment variable
130 | 3. **Default Value:** If neither of the above are set, the comprehensive built-in default prompt is used
131 |
132 | **Setting via `.env` file:**
133 |
134 | ```env
135 | DOCUMENTATION_PROMPT="Your custom, detailed instructions for the LLM on how to generate markdown..."
136 | ```
137 |
138 | **Or directly in command line:**
139 |
140 | ```bash
141 | DOCUMENTATION_PROMPT="Your custom prompt..." TAVILY_API_KEY="tvly-YOUR_KEY" npx @pinkpixel/deep-research-mcp
142 | ```
143 |
144 | ### 3\. Output Path Configuration (Optional)
145 |
146 | You can specify where research documents and images should be saved. If not configured, a default path in the user's Documents folder with a timestamp will be used.
147 |
148 | **Methods (in order of precedence):**
149 |
150 | 1. **Tool Argument:** The `output_path` parameter passed when calling the `deep-research-tool` takes highest precedence
151 | 2. **Environment Variable:** If no parameter is provided in the tool call, the system checks for a `RESEARCH_OUTPUT_PATH` environment variable
152 | 3. **Default Path:** If neither of the above are set, a timestamped subfolder in the user's Documents folder is used: `~/Documents/research/YYYY-MM-DDTHH-MM-SS/`
153 |
154 | **Setting via `.env` file:**
155 |
156 | ```env
157 | RESEARCH_OUTPUT_PATH="/path/to/your/research/folder"
158 | ```
159 |
160 | **Or directly in command line:**
161 |
162 | ```bash
163 | RESEARCH_OUTPUT_PATH="/path/to/your/research/folder" TAVILY_API_KEY="tvly-YOUR_KEY" npx @pinkpixel/deep-research-mcp
164 | ```
165 |
166 | ### 4\. Timeout and Performance Configuration (Optional)
167 |
168 | You can configure timeout and performance settings via environment variables to optimize the tool for your specific use case or deployment environment:
169 |
170 | **Available Environment Variables:**
171 |
172 | - `SEARCH_TIMEOUT` - Timeout in seconds for Tavily search requests (default: 60)
173 | - `CRAWL_TIMEOUT` - Timeout in seconds for Tavily crawl requests (default: 180)
174 | - `MAX_SEARCH_RESULTS` - Maximum number of search results to retrieve (default: 7)
175 | - `CRAWL_MAX_DEPTH` - Maximum crawl depth from base URL (default: 1)
176 | - `CRAWL_LIMIT` - Maximum number of URLs to crawl per source (default: 10)
177 |
178 | **Setting via `.env` file:**
179 |
180 | ```env
181 | SEARCH_TIMEOUT=120
182 | CRAWL_TIMEOUT=300
183 | MAX_SEARCH_RESULTS=10
184 | CRAWL_MAX_DEPTH=2
185 | CRAWL_LIMIT=15
186 | ```
187 |
188 | **Or directly in command line:**
189 |
190 | ```bash
191 | SEARCH_TIMEOUT=120 CRAWL_TIMEOUT=300 TAVILY_API_KEY="tvly-YOUR_KEY" npx @pinkpixel/deep-research-mcp
192 | ```
193 |
194 | **When to adjust these settings:**
195 |
196 | - **Increase timeouts** if you're experiencing timeout errors in LibreChat or other MCP clients
197 | - **Decrease timeouts** for faster responses when working with simpler queries
198 | - **Increase limits** for more comprehensive research (but expect longer processing times)
199 | - **Decrease limits** for faster processing with lighter resource usage
200 |
201 | ### 5\. File Writing Configuration (Optional)
202 |
203 | The server includes a secure file writing tool that allows LLMs to save research findings directly to files. This feature is **disabled by default** for security reasons.
204 |
205 | **Security Features:**
206 | - File writing must be explicitly enabled via `FILE_WRITE_ENABLED=true`
207 | - Directory restrictions via `ALLOWED_WRITE_PATHS` (defaults to user home directory)
208 | - Line limits per write operation to prevent abuse
209 | - Path validation and sanitization
210 | - Automatic directory creation
211 |
212 | **Configuration:**
213 |
214 | ```env
215 | FILE_WRITE_ENABLED=true
216 | ALLOWED_WRITE_PATHS=/home/user/research,/home/user/documents,/tmp/research
217 | FILE_WRITE_LINE_LIMIT=500
218 | ```
219 |
220 | **Usage Example:**
221 | Once enabled, LLMs can use the `write-research-file` tool to save content:
222 |
223 | ```json
224 | {
225 | "tool": "write-research-file",
226 | "arguments": {
227 | "file_path": "/home/user/research/quantum-computing-report.md",
228 | "content": "# Quantum Computing Research Report\n\n...",
229 | "mode": "rewrite"
230 | }
231 | }
232 | ```
233 |
234 | **Security Considerations:**
235 | - Only enable file writing in trusted environments
236 | - Use specific directory restrictions rather than allowing system-wide access
237 | - Monitor file operations through server logs
238 | - Consider using read-only directories for sensitive systems
239 |
240 | ## Running the Server
241 |
242 | * **Development (with auto-reload):**
243 | If you've cloned the repository and are in the project directory:
244 |
245 | ```bash
246 | npm run dev
247 | ```
248 |
249 | This uses `nodemon` and `ts-node` to watch for changes and restart the server.
250 | * **Production/Standalone:**
251 | First, build the TypeScript code:
252 |
253 | ```bash
254 | npm run build
255 | ```
256 |
257 | Then, start the server:
258 |
259 | ```bash
260 | npm start
261 | ```
262 | * **With NPX or Global Install:**
263 | (Ensure environment variables are set as described in Configuration)
264 |
265 | ```bash
266 | npx @pinkpixel/deep-research-mcp
267 | ```
268 |
269 | or if globally installed:
270 |
271 | ```bash
272 | deep-research-mcp
273 | ```
274 |
275 | The server will listen for MCP requests on stdio.
276 |
277 | ## How It Works
278 |
279 | 1. An LLM or AI agent makes a `CallToolRequest` to this MCP server, specifying the `deep-research-tool` and providing a query and other optional parameters.
280 | 2. The `deep-research-tool` first performs a Tavily Search to find relevant web sources.
281 | 3. It then uses Tavily Crawl to extract detailed content from each of these sources.
282 | 4. All gathered information (search snippets, crawled content, image URLs) is aggregated.
283 | 5. The chosen documentation prompt (default, ENV, or tool argument) is included.
284 | 6. The server returns a single JSON string containing all this structured data.
285 | 7. The calling LLM/agent uses this JSON output, guided by the `documentation_instructions`, to generate a comprehensive markdown document.
286 |
287 | ## Using the `deep-research-tool`
288 |
289 | This is the primary tool exposed by the server.
290 |
291 | ### Output Structure
292 |
293 | The tool returns a JSON string with the following structure:
294 |
295 | ```json
296 | {
297 | "documentation_instructions": "string", // The detailed prompt for the LLM to generate the markdown.
298 | "original_query": "string", // The initial query provided to the tool.
299 | "search_summary": "string | null", // An LLM-generated answer/summary from Tavily's search phase (if include_answer was true).
300 | "research_data": [ // Array of findings, one element per source.
301 | {
302 | "search_rank": "number",
303 | "original_url": "string", // URL of the source found by search.
304 | "title": "string", // Title of the web page.
305 | "initial_content_snippet": "string",// Content snippet from the initial search result.
306 | "search_score": "number | undefined",// Relevance score from Tavily search.
307 | "published_date": "string | undefined",// Publication date (if 'news' topic and available).
308 | "crawled_data": [ // Array of pages crawled starting from original_url.
309 | {
310 | "url": "string", // URL of the specific page crawled.
311 | "raw_content": "string | null", // Rich, extracted content from this page.
312 | "images": ["string", "..."] // Array of image URLs found on this page.
313 | }
314 | ],
315 | "crawl_errors": ["string", "..."] // Array of error messages if crawling this source failed or had issues.
316 | }
317 | // ... more sources
318 | ],
319 | "output_path": "string" // Path where research documents and images should be saved.
320 | }
321 | ```
322 |
323 | ### Input Parameters
324 |
325 | The `deep-research-tool` accepts the following parameters in its `arguments` object:
326 |
327 | #### General Parameters
328 |
329 | * `query` (string, **required**): The main research topic or question.
330 | * `documentation_prompt` (string, optional): Custom prompt for LLM documentation generation.
331 | * *Description:* If provided, this prompt will be used by the LLM. It overrides both the `DOCUMENTATION_PROMPT` environment variable and the server's built-in default prompt. If not provided here, the server checks the environment variable, then falls back to the default.
332 | * `output_path` (string, optional): Path where generated research documents and images should be saved.
333 | * *Description:* If provided, this path will be used for saving research outputs. It overrides the `RESEARCH_OUTPUT_PATH` environment variable. If neither is set, a timestamped folder in the user's Documents directory will be used.
334 |
335 | #### Search Parameters (for Tavily Search API)
336 |
337 | * `search_depth` (string, optional, default: `"advanced"`): Depth of the initial Tavily search.
338 | * *Options:* `"basic"`, `"advanced"`. Advanced search is tailored for more relevant sources.
339 | * `topic` (string, optional, default: `"general"`): Category for the Tavily search.
340 | * *Options:* `"general"`, `"news"`.
341 | * `days` (number, optional): For `topic: "news"`, the number of days back from the current date to include search results.
342 | * `time_range` (string, optional): Time range for search results (e.g., `"d"` for day, `"w"` for week, `"m"` for month, `"y"` for year).
343 | * `max_search_results` (number, optional, default: `7`): Maximum number of search results to retrieve and consider for crawling (1-20).
344 | * `chunks_per_source` (number, optional, default: `3`): For `search_depth: "advanced"`, the number of content chunks to retrieve from each source (1-3).
345 | * `include_search_images` (boolean, optional, default: `false`): Include a list of query-related image URLs from the initial search.
346 | * `include_search_image_descriptions` (boolean, optional, default: `false`): Include image descriptions along with URLs from the initial search.
347 | * `include_answer` (boolean or string, optional, default: `false`): Include an LLM-generated answer from Tavily based on search results.
348 | * *Options:* `true` (implies `"basic"`), `false`, `"basic"`, `"advanced"`.
349 | * `include_raw_content_search` (boolean, optional, default: `false`): Include the cleaned and parsed HTML content of each initial search result.
350 | * `include_domains_search` (array of strings, optional, default: `[]`): A list of domains to specifically include in the search results.
351 | * `exclude_domains_search` (array of strings, optional, default: `[]`): A list of domains to specifically exclude from the search results.
352 | * `search_timeout` (number, optional, default: `60`): Timeout in seconds for Tavily search requests.
353 |
354 | #### Crawl Parameters (for Tavily Crawl API - applied to each URL from search)
355 |
356 | * `crawl_max_depth` (number, optional, default: `1`): Max depth of the crawl from the base URL. `0` means only the base URL, `1` means the base URL and links found on it, etc.
357 | * `crawl_max_breadth` (number, optional, default: `5`): Max number of links to follow per level of the crawl tree (i.e., per page).
358 | * `crawl_limit` (number, optional, default: `10`): Total number of links the crawler will process starting from a single root URL before stopping.
359 | * `crawl_instructions` (string, optional): Natural language instructions for the crawler for how to approach crawling the site.
360 | * `crawl_select_paths` (array of strings, optional, default: `[]`): Regex patterns to select only URLs with specific path patterns for crawling (e.g., `"/docs/.*"`).
361 | * `crawl_select_domains` (array of strings, optional, default: `[]`): Regex patterns to restrict crawling to specific domains or subdomains (e.g., `"^docs\\.example\\.com$"`). If `crawl_allow_external` is false (default) and this is empty, crawling is focused on the domain of the URL being crawled. This overrides that focus.
362 | * `crawl_exclude_paths` (array of strings, optional, default: `[]`): Regex patterns to exclude URLs with specific path patterns from crawling.
363 | * `crawl_exclude_domains` (array of strings, optional, default: `[]`): Regex patterns to exclude specific domains or subdomains from crawling.
364 | * `crawl_allow_external` (boolean, optional, default: `false`): Whether to allow the crawler to follow links to external domains.
365 | * `crawl_include_images` (boolean, optional, default: `true`): Whether to extract image URLs from the crawled pages.
366 | * `crawl_categories` (array of strings, optional, default: `[]`): Filter URLs for crawling using predefined categories (e.g., `"Blog"`, `"Documentation"`, `"Careers"`). Refer to Tavily Crawl API for all options.
367 | * `crawl_extract_depth` (string, optional, default: `"advanced"`): Depth of content extraction during crawl.
368 | * *Options:* `"basic"`, `"advanced"`. Advanced retrieves more data (tables, embedded content) but may have higher latency.
369 | * `crawl_timeout` (number, optional, default: `180`): Timeout in seconds for each Tavily Crawl request.
370 |
371 | ## Understanding Documentation Prompt Precedence
372 |
373 | The `documentation_prompt` is an essential part of this tool as it guides the LLM in how to format and structure the research findings. The system uses this precedence to determine which prompt to use:
374 |
375 | 1. If the LLM/agent provides a `documentation_prompt` parameter in the tool call:
376 |
377 | - This takes highest precedence and will be used regardless of other settings
378 | - This allows end users to customize documentation format through natural language requests to the LLM
379 | 2. If no parameter is provided in the tool call, but the `DOCUMENTATION_PROMPT` environment variable is set:
380 |
381 | - The environment variable value will be used
382 | - This is useful for system administrators or developers to set a consistent prompt across all tool calls
383 | 3. If neither of the above are set:
384 |
385 | - The comprehensive built-in default prompt is used
386 | - This default prompt is designed to produce high-quality technical documentation
387 |
388 | This flexibility allows:
389 |
390 | - End users to customize documentation through natural language requests to the LLM
391 | - Developers to set system-wide defaults
392 | - A fallback to well-designed defaults if no customization is provided
393 |
394 | ## Working with Output Paths
395 |
396 | The `output_path` parameter determines where research documents and images will be saved. This is especially important when the LLM needs to:
397 |
398 | 1. Save generated markdown documents
399 | 2. Download and save images from the research
400 | 3. Create supplementary files or resources
401 |
402 | The system follows this precedence to determine the output path:
403 |
404 | 1. If the LLM/agent provides an `output_path` parameter in the tool call:
405 |
406 | - This takes highest precedence
407 | - Allows end users to specify a custom save location through natural language requests
408 | 2. If no parameter is provided, but the `RESEARCH_OUTPUT_PATH` environment variable is set:
409 |
410 | - The environment variable value will be used
411 | - Good for system-wide configuration
412 | 3. If neither of the above are set:
413 |
414 | - A default path with timestamp is used: `~/Documents/research/YYYY-MM-DDTHH-MM-SS/`
415 | - This prevents overwriting previous research results
416 |
417 | The LLM receives the final resolved output path in the tool's response JSON as the `output_path` field, so it always knows where to save generated content.
418 |
419 | **Note for LLMs:** When processing the tool results, check the `output_path` field to determine where to save any files you generate. This path is guaranteed to be present in the response.
420 |
421 | ## Instructions for the LLM
422 |
423 | As an LLM using the output of the `deep-research-tool`, your primary goal is to generate a comprehensive, accurate, and well-structured markdown document that addresses the `original_query`.
424 |
425 | **Key Steps:**
426 |
427 | 1. **Parse the JSON Output:** The tool will return a JSON string. Parse this to access its fields: `documentation_instructions`, `original_query`, `search_summary`, and `research_data`.
428 | 2. **Adhere to `documentation_instructions`:** This field contains the **primary set of guidelines** for creating the markdown document. It will either be the server's extensive default prompt (focused on high-quality technical documentation) or a custom prompt provided by the user. **Follow these instructions meticulously** regarding content quality, style, structure, markdown formatting, and handling of technical details.
429 | 3. **Utilize `research_data` for Content:**
430 | * The `research_data` array is your main source of information. Each object in this array represents a distinct web source.
431 | * For each source, pay attention to its `title`, `original_url`, and `initial_content_snippet` for context.
432 | * The core information for your document will come from the `crawled_data` array within each source. Specifically, the `raw_content` field of each `crawled_data` object contains the rich text extracted from that page.
433 | * Synthesize information *across multiple sources* in `research_data` to provide a comprehensive view. Do not just list content from one source after another.
434 | * If `crawled_data[].images` are present, you can mention them or list their URLs if appropriate and aligned with the `documentation_instructions`.
435 | * If `crawl_errors` are present for a source, it means that particular source might be incomplete. You can choose to note this subtly if it impacts coverage.
436 | 4. **Address the `original_query`:** The final document must comprehensively answer or address the `original_query`.
437 | 5. **Leverage `search_summary`:** If the `search_summary` field is present (from Tavily's `include_answer` feature), it can serve as a helpful starting point, an executive summary, or a way to frame the introduction. However, the main body of your document should be built from the more detailed `research_data`.
438 | 6. **Synthesize, Don't Just Copy:** Your role is not to dump the `raw_content`. You must process, understand, synthesize, rephrase, and organize the information from various sources into a coherent, well-written document that flows logically, as per the `documentation_instructions`.
439 | 7. **Markdown Formatting:** Strictly follow the markdown formatting guidelines provided in the `documentation_instructions` (headings, lists, code blocks, emphasis, links, etc.).
440 | 8. **Handling Large Volumes:** The `research_data` can be extensive. If you have limitations on processing large inputs, the system calling you might need to provide you with chunks of the `research_data` or make multiple requests to you to build the document section by section. The `deep-research-tool` itself will always attempt to return all collected data in one JSON output.
441 | 9. **Technical Accuracy:** Preserve all technical details, code examples, and important specifics from the source content, as mandated by the `documentation_instructions`. Do not oversimplify.
442 | 10. **Visual Appeal (If Instructed):** If the `documentation_instructions` include guidelines for visual appeal (like colored text or emojis using HTML), apply them judiciously.
443 |
444 | **Example LLM Invocation Thought Process:**
445 |
446 | *Agent to LLM:*
447 | "Okay, I've called the `deep-research-tool` with the query '\<em\>What are the latest advancements in quantum-resistant cryptography?\</em\>' and requested 5 sources with advanced crawling. Here's the JSON output:
448 | `{ ... (JSON output from the tool) ... }`
449 |
450 | Now, using the `documentation_instructions` provided within this JSON, and the `research_data`, please generate a comprehensive markdown document on 'The Latest Advancements in Quantum-Resistant Cryptography'. Ensure you follow all formatting and content guidelines from the instructions."
451 |
452 | ## Example `CallToolRequest` (Conceptual Arguments)
453 |
454 | An agent might make a call to the MCP server with arguments like this:
455 |
456 | ```json
457 | {
458 | "name": "deep-research-tool",
459 | "arguments": {
460 | "query": "Explain the architecture of modern data lakes and data lakehouses.",
461 | "max_search_results": 5,
462 | "search_depth": "advanced",
463 | "topic": "general",
464 | "crawl_max_depth": 1,
465 | "crawl_extract_depth": "advanced",
466 | "include_answer": true,
467 | "documentation_prompt": "Generate a highly technical whitepaper. Start with an abstract, then introduction, detailed sections for data lakes, data lakehouses, comparison, use cases, and a future outlook. Use academic tone. Include all diagrams mentioned by URL if possible as [Diagram: URL].",
468 | "output_path": "C:/Users/username/Documents/research/datalakes-whitepaper"
469 | }
470 | }
471 | ```
472 |
473 | ## Troubleshooting
474 |
475 | * **API Key Errors:** Ensure `TAVILY_API_KEY` is correctly set and valid.
476 | * **SDK Issues:** Make sure `@modelcontextprotocol/sdk` and `@tavily/core` are installed and up-to-date.
477 | * **No Output/Errors:** Check the server console logs for any error messages. Increase verbosity if needed for debugging.
478 |
479 | ## Changelog
480 |
481 | ### v0.1.2 (2024-05-10)
482 |
483 | - Added configurable output path functionality
484 | - Fixed type errors with latest Tavily SDK
485 | - Added comprehensive documentation about output paths
486 | - Added logo and improved documentation
487 |
488 | ### v0.1.1
489 |
490 | - Initial public release
491 |
492 | ## Contributing
493 |
494 | Contributions are welcome! Please feel free to submit issues, fork the repository, and create pull requests.
495 |
496 | ## License
497 |
498 | This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
499 |
```
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
```dockerfile
1 | # Generated by https://smithery.ai. See: https://smithery.ai/docs/build/project-config
2 | # Builder stage
3 | FROM node:lts-alpine AS builder
4 | WORKDIR /app
5 |
6 | # Install all dependencies including dev
7 | COPY package.json package-lock.json tsconfig.json ./
8 | COPY src ./src
9 | RUN npm install
10 | RUN npm run build
11 |
12 | # Runtime stage
13 | FROM node:lts-alpine
14 | WORKDIR /app
15 |
16 | # Install production dependencies
17 | COPY package.json package-lock.json ./
18 | RUN npm install --production
19 |
20 | # Copy built files and assets
21 | COPY --from=builder /app/dist ./dist
22 | COPY assets ./assets
23 |
24 | # Set environment
25 | ENV NODE_ENV=production
26 |
27 | # Start the server
28 | CMD ["node", "dist/index.js"]
29 |
```
--------------------------------------------------------------------------------
/example_config.json:
--------------------------------------------------------------------------------
```json
1 | {
2 | "mcpServers": {
3 | "deep-research": {
4 | "command": "npx",
5 | "args": [
6 | "-y",
7 | "@pinkpixel/deep-research-mcp"
8 | ],
9 | "env": {
10 | "TAVILY_API_KEY": "tvly-YOUR_ACTUAL_API_KEY_HERE",
11 | "DOCUMENTATION_PROMPT": "Your custom, detailed instructions for the LLM on how to generate markdown documents from the research data...",
12 | "SEARCH_TIMEOUT": "120",
13 | "CRAWL_TIMEOUT": "300",
14 | "MAX_SEARCH_RESULTS": "10",
15 | "CRAWL_MAX_DEPTH": "2",
16 | "CRAWL_LIMIT": "15",
17 | "FILE_WRITE_ENABLED": "true",
18 | "ALLOWED_WRITE_PATHS": "/home/user/research,/home/user/documents",
19 | "FILE_WRITE_LINE_LIMIT": "300"
20 | }
21 | }
22 | }
23 | }
```
--------------------------------------------------------------------------------
/tsconfig.json:
--------------------------------------------------------------------------------
```json
1 | {
2 | "compilerOptions": {
3 | "target": "ES2022",
4 | "module": "NodeNext",
5 | "moduleResolution": "NodeNext",
6 | "baseUrl": "./",
7 | "outDir": "./dist",
8 | "rootDir": "./src",
9 |
10 | "strict": true,
11 | "esModuleInterop": true,
12 | "skipLibCheck": true,
13 | "forceConsistentCasingInFileNames": true,
14 | "resolveJsonModule": true,
15 |
16 | "declaration": true,
17 | "declarationMap": true,
18 | "sourceMap": true,
19 |
20 | "noUnusedLocals": true,
21 | "noUnusedParameters": true,
22 | "noImplicitReturns": true,
23 | "noFallthroughCasesInSwitch": true,
24 | "allowSyntheticDefaultImports": true
25 | },
26 | "include": [
27 | "src/**/*.ts"
28 | ],
29 | "exclude": [
30 | "node_modules",
31 | "dist",
32 | "**/*.spec.ts",
33 | "**/*.test.ts"
34 | ]
35 | }
```
--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------
```json
1 | {
2 | "name": "@pinkpixel/deep-research-mcp",
3 | "version": "1.3.1",
4 | "description": "A Model Context Protocol (MCP) server for performing deep web research using Tavily Search and Crawl APIs, preparing structured data for LLM-powered documentation generation.",
5 | "main": "dist/index.js",
6 | "types": "dist/index.d.ts",
7 | "bin": {
8 | "deep-research-mcp": "dist/index.js"
9 | },
10 | "scripts": {
11 | "build": "tsc",
12 | "start": "node dist/index.js",
13 | "dev": "nodemon --watch src -e ts --exec ts-node src/index.ts",
14 | "serve": "npm run build && npm start",
15 | "lint": "eslint src/**/*.ts",
16 | "format": "eslint src/**/*.ts --fix",
17 | "prepublishOnly": "npm run build"
18 | },
19 | "repository": {
20 | "type": "git",
21 | "url": "git+https://github.com/pinkpixel-dev/deep-research-mcp.git"
22 | },
23 | "keywords": [
24 | "mcp",
25 | "model-context-protocol",
26 | "tavily",
27 | "ai",
28 | "llm",
29 | "research",
30 | "web-crawl",
31 | "documentation"
32 | ],
33 | "author": "PinkPixel",
34 | "license": "MIT",
35 | "dependencies": {
36 | "@modelcontextprotocol/sdk": "^1.11.1",
37 | "@tavily/core": "^0.5.2",
38 | "dotenv": "^16.5.0"
39 | },
40 | "devDependencies": {
41 | "@types/node": "^22.15.17",
42 | "@typescript-eslint/eslint-plugin": "^8.32.0",
43 | "@typescript-eslint/parser": "^8.32.0",
44 | "eslint": "^9.26.0",
45 | "nodemon": "^3.1.10",
46 | "ts-node": "^10.9.2",
47 | "typescript": "^5.8.3"
48 | },
49 | "engines": {
50 | "node": ">=18.0.0"
51 | },
52 | "publishConfig": {
53 | "access": "public"
54 | },
55 | "files": [
56 | "dist",
57 | "README.md",
58 | "LICENSE",
59 | "assets",
60 | "OVERVIEW.md"
61 | ]
62 | }
63 |
```
--------------------------------------------------------------------------------
/test-output/test-summary.md:
--------------------------------------------------------------------------------
```markdown
1 | # Deep Research MCP Tool Test Summary
2 |
3 | ## Test Details
4 | - **Date**: May 29, 2025
5 | - **Query**: "Latest developments in quantum computing breakthroughs 2024"
6 | - **Output Path**: C:/Users/sizzlebop/Desktop/projects/github/deep-research-mcp/test-output
7 |
8 | ## Test Parameters Used
9 | - **Max Search Results**: 5
10 | - **Search Depth**: Advanced
11 | - **Crawl Max Depth**: 1
12 | - **Crawl Limit**: 8
13 | - **Include Answer**: True
14 | - **Custom Documentation Prompt**: Technical report format
15 |
16 | ## Test Results ✅
17 |
18 | ### ✅ Tool Functionality
19 | - Deep Research MCP tool executed successfully
20 | - Retrieved comprehensive data from 5 sources
21 | - Generated structured JSON output with all expected fields
22 |
23 | ### ✅ Search Performance
24 | - Found relevant, high-quality sources about quantum computing in 2024
25 | - Search summary provided: "In 2024, quantum computing saw significant breakthroughs in algorithms and error correction, pushing the boundaries of practical applications. Quantum machine learning and funding reached record highs."
26 |
27 | ### ✅ Crawl Performance
28 | - Successfully crawled multiple pages from each source
29 | - Retrieved detailed content from:
30 | - Microtime.com (quantum computing overview)
31 | - NetworkWorld.com (10 quantum milestones)
32 | - IDTechEx.com (market research)
33 | - No crawl errors reported
34 |
35 | ### ✅ Output Generation
36 | - Created comprehensive 164-line technical report
37 | - Properly structured with executive summary, company developments, technical details
38 | - Saved to specified output path successfully
39 | - Custom documentation prompt was followed correctly
40 |
41 | ### ✅ Data Quality
42 | - Rich, detailed content from authoritative sources
43 | - Current information from 2024
44 | - Technical specifications and company details included
45 | - Industry analysis and future projections covered
46 |
47 | ## Files Generated
48 | 1. `quantum-computing-breakthroughs-2024.md` - Full technical report (164 lines)
49 | 2. `test-summary.md` - This test summary
50 |
51 | ## Conclusion
52 | The Deep Research MCP tool is working perfectly! ✨
53 | - All core functionality operational
54 | - High-quality research data retrieval
55 | - Proper output formatting and file generation
56 | - Custom prompts and output paths working as expected
```
--------------------------------------------------------------------------------
/test-file-writing.js:
--------------------------------------------------------------------------------
```javascript
1 | // Simple test script to verify file writing functionality
2 | const fs = require('fs');
3 | const path = require('path');
4 | const os = require('os');
5 |
6 | // Test the file writing utility functions
7 | async function testFileWriting() {
8 | console.log('Testing file writing functionality...');
9 |
10 | // Set environment variables for testing
11 | process.env.FILE_WRITE_ENABLED = 'true';
12 | process.env.FILE_WRITE_LINE_LIMIT = '50';
13 |
14 | // Import the module after setting env vars
15 | const { writeResearchFile, isPathAllowed, validateWritePath } = require('./dist/index.js');
16 |
17 | const testDir = path.join(os.homedir(), 'test-deep-research');
18 | const testFile = path.join(testDir, 'test-output.md');
19 |
20 | try {
21 | // Test path validation
22 | console.log('✓ Testing path validation...');
23 | const validPath = await validateWritePath(testFile);
24 | console.log(`✓ Valid path: ${validPath}`);
25 |
26 | // Test file writing
27 | console.log('✓ Testing file writing...');
28 | const testContent = '# Test File\n\nThis is a test of the file writing functionality.\n\n- Feature 1\n- Feature 2\n- Feature 3';
29 | await writeResearchFile(testFile, testContent, 'rewrite');
30 | console.log(`✓ File written successfully: ${testFile}`);
31 |
32 | // Verify file exists and has correct content
33 | if (fs.existsSync(testFile)) {
34 | const content = fs.readFileSync(testFile, 'utf8');
35 | if (content === testContent) {
36 | console.log('✓ File content verified successfully');
37 | } else {
38 | console.log('✗ File content mismatch');
39 | }
40 | } else {
41 | console.log('✗ File was not created');
42 | }
43 |
44 | // Test append mode
45 | console.log('✓ Testing append mode...');
46 | const appendContent = '\n\n## Additional Section\n\nThis was appended to the file.';
47 | await writeResearchFile(testFile, appendContent, 'append');
48 | console.log('✓ Content appended successfully');
49 |
50 | // Clean up
51 | fs.unlinkSync(testFile);
52 | fs.rmdirSync(testDir);
53 | console.log('✓ Test cleanup completed');
54 |
55 | console.log('\n🎉 All file writing tests passed!');
56 |
57 | } catch (error) {
58 | console.error('✗ Test failed:', error.message);
59 |
60 | // Clean up on error
61 | try {
62 | if (fs.existsSync(testFile)) fs.unlinkSync(testFile);
63 | if (fs.existsSync(testDir)) fs.rmdirSync(testDir);
64 | } catch (cleanupError) {
65 | console.error('Cleanup error:', cleanupError.message);
66 | }
67 | }
68 | }
69 |
70 | // Only run if this file is executed directly
71 | if (require.main === module) {
72 | testFileWriting().catch(console.error);
73 | }
74 |
75 | module.exports = { testFileWriting };
```
--------------------------------------------------------------------------------
/smithery.yaml:
--------------------------------------------------------------------------------
```yaml
1 | # Smithery configuration file: https://smithery.ai/docs/build/project-config
2 |
3 | startCommand:
4 | type: stdio
5 | commandFunction:
6 | # A JS function that produces the CLI command based on the given config to start the MCP on stdio.
7 | |-
8 | (config) => ({ command: 'node', args: ['dist/index.js'], env: { TAVILY_API_KEY: config.tavilyApiKey, ...(config.documentationPrompt !== undefined && { DOCUMENTATION_PROMPT: config.documentationPrompt }), ...(config.searchTimeout !== undefined && { SEARCH_TIMEOUT: config.searchTimeout.toString() }), ...(config.crawlTimeout !== undefined && { CRAWL_TIMEOUT: config.crawlTimeout.toString() }), ...(config.maxSearchResults !== undefined && { MAX_SEARCH_RESULTS: config.maxSearchResults.toString() }), ...(config.crawlMaxDepth !== undefined && { CRAWL_MAX_DEPTH: config.crawlMaxDepth.toString() }), ...(config.crawlLimit !== undefined && { CRAWL_LIMIT: config.crawlLimit.toString() }), ...(config.fileWriteEnabled !== undefined && { FILE_WRITE_ENABLED: config.fileWriteEnabled.toString() }), ...(config.allowedWritePaths !== undefined && { ALLOWED_WRITE_PATHS: config.allowedWritePaths }), ...(config.fileWriteLineLimit !== undefined && { FILE_WRITE_LINE_LIMIT: config.fileWriteLineLimit.toString() }) }} )
9 | configSchema:
10 | # JSON Schema defining the configuration options for the MCP.
11 | type: object
12 | required:
13 | - tavilyApiKey
14 | properties:
15 | tavilyApiKey:
16 | type: string
17 | description: Tavily API key for authentication.
18 | documentationPrompt:
19 | type: string
20 | description: Optional custom documentation prompt to override default.
21 | searchTimeout:
22 | type: number
23 | default: 60
24 | description: Optional timeout in seconds for search requests.
25 | crawlTimeout:
26 | type: number
27 | default: 180
28 | description: Optional timeout in seconds for crawl requests.
29 | maxSearchResults:
30 | type: number
31 | default: 7
32 | description: Optional maximum number of search results to retrieve.
33 | crawlMaxDepth:
34 | type: number
35 | default: 1
36 | description: Optional maximum crawl depth from source URL.
37 | crawlLimit:
38 | type: number
39 | default: 10
40 | description: Optional maximum number of URLs to crawl per source.
41 | fileWriteEnabled:
42 | type: boolean
43 | default: false
44 | description: Enable file writing tool.
45 | allowedWritePaths:
46 | type: string
47 | description: Comma-separated allowed directories for file writing.
48 | fileWriteLineLimit:
49 | type: number
50 | default: 200
51 | description: Maximum lines per file write operation.
52 | exampleConfig:
53 | tavilyApiKey: tvly-EXAMPLE_KEY_12345
54 | documentationPrompt: Generate a concise summary of key findings.
55 | searchTimeout: 120
56 | crawlTimeout: 300
57 | maxSearchResults: 10
58 | crawlMaxDepth: 2
59 | crawlLimit: 15
60 | fileWriteEnabled: false
61 | allowedWritePaths: /home/user/research,/tmp
62 | fileWriteLineLimit: 300
63 |
```
--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------
```markdown
1 | # Changelog
2 |
3 | All notable changes to the Deep Research MCP Server will be documented in this file.
4 |
5 | ## [1.2.1] - 2025-05-29
6 |
7 | ### Added
8 | - Environment variable support for timeout and performance configuration
9 | - `SEARCH_TIMEOUT` environment variable for configuring search request timeouts
10 | - `CRAWL_TIMEOUT` environment variable for configuring crawl request timeouts
11 | - `MAX_SEARCH_RESULTS` environment variable for setting maximum search results
12 | - `CRAWL_MAX_DEPTH` environment variable for setting maximum crawl depth
13 | - `CRAWL_LIMIT` environment variable for setting maximum URLs to crawl per source
14 | - **NEW: File Writing Tool** - `write-research-file` tool for saving research content to files
15 | - `FILE_WRITE_ENABLED` environment variable to enable/disable file writing (default: disabled)
16 | - `ALLOWED_WRITE_PATHS` environment variable for directory restrictions (default: user home)
17 | - `FILE_WRITE_LINE_LIMIT` environment variable for write operation limits (default: 200 lines)
18 | - Secure file writing with path validation, directory creation, and permission controls
19 | - Enhanced startup logging showing current timeout, limit, and file writing configurations
20 | - Updated example configuration with new environment variables
21 |
22 | ### Fixed
23 | - Timeout configuration now properly respects environment variables in addition to tool parameters
24 | - LibreChat timeout issues can now be resolved by setting appropriate environment variables
25 |
26 | ### Changed
27 | - Tool parameter precedence: tool arguments > environment variables > defaults
28 | - Improved documentation with detailed timeout, performance, and file writing configuration guides
29 | - Added comprehensive security documentation for file writing feature
30 |
31 | ### Security
32 | - File writing feature disabled by default for security
33 | - Directory-based access controls for file operations
34 | - Path validation and sanitization to prevent directory traversal
35 | - Configurable line limits to prevent abuse
36 |
37 | ## [1.2.0] - 2024-05-29
38 |
39 | ### Fixed
40 | - Fixed issue with console logging interfering with MCP protocol by replacing all `console.log` and `console.debug` calls with `console.error`
41 | - Fixed proper response structure to match MCP specifications, removing the extra `tools` wrapper from responses
42 | - Fixed type errors in Tavily SDK parameters, ensuring correct typing for `includeAnswer` and `timeRange`
43 | - Fixed parameter handling for crawl API, ensuring required parameters are always provided
44 |
45 | ### Added
46 | - Added progress tracking during long-running operations
47 | - Added memory usage tracking and optimization
48 | - Added hardware acceleration option with `hardware_acceleration` parameter
49 | - Added proper domain validation to prevent excessive crawling
50 | - Added timeout handling for both search and crawl operations
51 |
52 | ### Changed
53 | - Reduced default crawl limits for better performance:
54 | - Maximum depth reduced to 2 (from unlimited)
55 | - Default breadth reduced to 10 (from 20)
56 | - Default limit reduced to 10 URLs (from 50)
57 | - Improved error handling and reporting
58 | - Updated documentation to reflect parameter changes
59 |
60 | ## [1.1.0] - 2024-05-01
61 |
62 | ### Added
63 | - Initial public release
64 | - Integration with Tavily Search and Crawl APIs
65 | - MCP compliant tool interface
66 | - Structured JSON output for LLM consumption
67 | - Configurable documentation prompt
68 | - Configurable output path
```
--------------------------------------------------------------------------------
/OVERVIEW.md:
--------------------------------------------------------------------------------
```markdown
1 | <!--- ✨ OVERVIEW.md for Deep Research MCP Server (Last Updated: May 29, 2025) ✨ --->
2 |
3 | <h1 align="center"><span style="color:#7f5af0;">Deep Research MCP Server</span> ✨</h1>
4 | <p align="center"><b><span style="color:#2cb67d;">Dream it, Pixel it</span></b> — <i>by Pink Pixel</i></p>
5 |
6 | ---
7 |
8 | ## <span style="color:#7f5af0;">🚀 Project Purpose</span>
9 |
10 | The <b>Deep Research MCP Server</b> is a <b>Model Context Protocol (MCP)</b> compliant server for <span style="color:#2cb67d;">comprehensive, up-to-date web research</span>. It leverages <b>Tavily's Search & Crawl APIs</b> to gather, aggregate, and structure information for <b>LLM-powered documentation generation</b>.
11 |
12 | ---
13 |
14 | ## <span style="color:#7f5af0;">🧩 Architecture Overview</span>
15 |
16 | - <b>MCP Server</b> (Node.js, TypeScript)
17 | - <b>Stdio Transport</b> for agent/server communication
18 | - <b>Tavily API Integration</b> (Search + Crawl)
19 | - <b>Configurable Documentation Prompt</b> (default, ENV, or per-request)
20 | - <b>Structured JSON Output</b> for LLMs
21 |
22 | <details>
23 | <summary><b>Architecture Diagram (Text)</b></summary>
24 |
25 | ```
26 | [LLM/Agent]
27 | │
28 | ▼
29 | [Deep Research MCP Server]
30 | │ ├─> Tavily Search API
31 | │ └─> Tavily Crawl API
32 | ▼
33 | [Aggregated JSON Output + Documentation Instructions]
34 | ```
35 | </details>
36 |
37 | ---
38 |
39 | ## <span style="color:#7f5af0;">✨ Main Features</span>
40 |
41 | - <b>Multi-Step Research</b>: Combines AI-powered search with deep content crawling
42 | - <b>Structured Output</b>: JSON with query, search summary, findings, and doc instructions
43 | - <b>Configurable Prompts</b>: Override documentation style via ENV or per-request
44 | - <b>Configurable Output Path</b>: Specify where research documents and images should be saved
45 | - <b>Granular Control</b>: Fine-tune search/crawl with many parameters
46 | - <b>MCP Compliant</b>: Plug-and-play for agent ecosystems
47 | - <b>Resource Optimized</b>: Memory tracking, auto-garbage collection, and hardware acceleration support
48 |
49 | ---
50 |
51 | ## <span style="color:#7f5af0;">🛠️ Key Dependencies</span>
52 |
53 | - <b>@modelcontextprotocol/sdk</b> (v1.11.1) — MCP server framework
54 | - <b>@tavily/core</b> (v0.5.2) — Tavily Search & Crawl APIs
55 | - <b>dotenv</b> (v16.5.0) — Environment variable management
56 |
57 | ---
58 |
59 | ## <span style="color:#7f5af0;">📁 File Structure</span>
60 |
61 | ```
62 | deep-research-mcp/
63 | ├── dist/ # Compiled JS output
64 | ├── src/
65 | │ └── index.ts # Main server logic
66 | ├── assets/ # Project assets (logo)
67 | ├── README.md # Full documentation
68 | ├── OVERVIEW.md # (You are here!)
69 | ├── example_config.json # Example MCP config
70 | ├── package.json # Project metadata & dependencies
71 | ├── tsconfig.json # TypeScript config
72 | ├── CHANGELOG.md # Version history and changes
73 | ```
74 |
75 | ---
76 |
77 | ## <span style="color:#7f5af0;">⚡ Usage & Integration</span>
78 |
79 | - <b>Install & Run:</b>
80 | - <code>npx @pinkpixel/deep-research-mcp</code> <span style="color:#2cb67d;">(quickest)</span>
81 | - Or clone & <code>npm install</code>, then <code>npm start</code>
82 | - <b>Configure:</b> Set <code>TAVILY_API_KEY</code> in your environment (see <b>README.md</b>)
83 | - <b>Integrate:</b> Connect to your LLM/agent via MCP stdio
84 | - <b>Customize:</b> Override documentation prompt via ENV or tool argument
85 | - <b>Output:</b> Specify where research documents and images should be saved
86 | - <b>Performance:</b> Enable hardware acceleration with <code>hardware_acceleration: true</code> parameter
87 |
88 | ---
89 |
90 | ## <span style="color:#7f5af0;">🔄 Recent Updates</span>
91 |
92 | - <b>Optimized Resource Usage</b>: Reduced default crawl limits to prevent excessive memory consumption
93 | - <b>MCP Protocol Compliance</b>: Fixed response structure to properly follow MCP specifications
94 | - <b>Improved Error Handling</b>: Better error reporting and handling of timeouts
95 | - <b>Performance Optimizations</b>: Added optional hardware acceleration (WebGPU) support
96 | - <b>Smarter Crawling</b>: Added domain validation to focus crawling and prevent overly broad searches
97 |
98 | <i>See <b>CHANGELOG.md</b> for complete version history</i>
99 |
100 | ---
101 |
102 | ## <span style="color:#7f5af0;">📚 More Info</span>
103 |
104 | - See <b>README.md</b> for full usage, parameters, and troubleshooting
105 | - Example config: <b>example_config.json</b>
106 | - License: <b>MIT</b>
107 | - Node.js: <b>>=18.0.0 required</b>
108 |
109 | ---
110 |
111 | <p align="center"><span style="color:#7f5af0;">Made with ❤️ by Pink Pixel</span></p>
```
--------------------------------------------------------------------------------
/src/index.ts:
--------------------------------------------------------------------------------
```typescript
1 | #!/usr/bin/env node
2 |
3 | import { Server } from "@modelcontextprotocol/sdk/server/index.js";
4 | import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
5 | import {
6 | CallToolRequestSchema,
7 | ListToolsRequestSchema,
8 | Tool,
9 | McpError,
10 | ErrorCode,
11 | } from "@modelcontextprotocol/sdk/types.js";
12 |
13 | import { tavily as createTavilyClient } from "@tavily/core";
14 | import type { TavilyClient } from "@tavily/core"; // For typing the Tavily client instance
15 | import dotenv from "dotenv";
16 | import fs from "fs/promises";
17 | import path from "path";
18 | import os from "os";
19 |
20 | dotenv.config(); // Load .env file if present (for local development)
21 |
22 | const TAVILY_API_KEY = process.env.TAVILY_API_KEY;
23 | if (!TAVILY_API_KEY) {
24 | throw new Error(
25 | "TAVILY_API_KEY environment variable is required. Please set it in your .env file or execution environment."
26 | );
27 | }
28 |
29 | const ENV_DOCUMENTATION_PROMPT = process.env.DOCUMENTATION_PROMPT;
30 |
31 | // Environment variables for timeout configuration
32 | const ENV_SEARCH_TIMEOUT = process.env.SEARCH_TIMEOUT ? parseInt(process.env.SEARCH_TIMEOUT, 10) : undefined;
33 | const ENV_CRAWL_TIMEOUT = process.env.CRAWL_TIMEOUT ? parseInt(process.env.CRAWL_TIMEOUT, 10) : undefined;
34 | const ENV_MAX_SEARCH_RESULTS = process.env.MAX_SEARCH_RESULTS ? parseInt(process.env.MAX_SEARCH_RESULTS, 10) : undefined;
35 | const ENV_CRAWL_MAX_DEPTH = process.env.CRAWL_MAX_DEPTH ? parseInt(process.env.CRAWL_MAX_DEPTH, 10) : undefined;
36 | const ENV_CRAWL_LIMIT = process.env.CRAWL_LIMIT ? parseInt(process.env.CRAWL_LIMIT, 10) : undefined;
37 |
38 | // Environment variables for file writing configuration
39 | const ENV_ALLOWED_WRITE_PATHS = process.env.ALLOWED_WRITE_PATHS ? process.env.ALLOWED_WRITE_PATHS.split(',').map(p => p.trim()) : undefined;
40 | const ENV_FILE_WRITE_ENABLED = process.env.FILE_WRITE_ENABLED === 'true';
41 | const ENV_FILE_WRITE_LINE_LIMIT = process.env.FILE_WRITE_LINE_LIMIT ? parseInt(process.env.FILE_WRITE_LINE_LIMIT, 10) : 200;
42 |
43 | const DEFAULT_DOCUMENTATION_PROMPT = `
44 | For all queries, search the web extensively to acquire up to date information. Research several sources. Use all the tools provided to you to gather as much context as possible.
45 | Adhere to these guidelines when creating documentation:
46 | Include screenshots when appropriate
47 |
48 | 1. CONTENT QUALITY:
49 | Clear, concise, and factually accurate
50 | Structured with logical organization
51 | Comprehensive coverage of topics
52 | Technical precision and attention to detail
53 | Free of unnecessary commentary or humor
54 | DOCUMENTATION STYLE:
55 | Professional and objective tone
56 | Thorough explanations with appropriate technical depth
57 | Well-formatted with proper headings, lists, and code blocks
58 | Consistent terminology and naming conventions
59 | Clean, readable layout without extraneous elements
60 | CODE QUALITY:
61 | Clean, maintainable, and well-commented code
62 | Best practices and modern patterns
63 | Proper error handling and edge case considerations
64 | Optimal performance and efficiency
65 | Follows language-specific style guidelines
66 | TECHNICAL EXPERTISE:
67 | Programming languages and frameworks
68 | System architecture and design patterns
69 | Development methodologies and practices
70 | Security considerations and standards
71 | Industry-standard tools and technologies
72 | Documentation guidelines
73 | Create an extremely detailed, comprehensive markdown document about a given topic when asked. Follow the below instructions:
74 | Start with an INTRODUCTION and FIRST MAJOR SECTIONS of the topic, covering:
75 | Overview and definition of the topic
76 | Historical background and origins
77 | Core concepts and fundamentals
78 | Early developments and pioneers
79 | Create a strong foundation and then continue with ADDITIONAL SECTIONS:
80 | Advanced concepts and developments
81 | Modern applications and technologies
82 | Current trends and future directions
83 | Challenges and limitations
84 | IMPORTANT GUIDELINES:
85 | Create a SUBSTANTIAL document section (2000-3000 words for this section)
86 | PRESERVE all technical details, code examples, and important specifics from the sources
87 | MAINTAIN the depth and complexity of the original content
88 | DO NOT simplify or omit technical information
89 | Include all relevant examples, specifications, and implementation details
90 | Format with proper markdown headings (## for main sections, ### for subsections).
91 | Include examples and code snippets Maintain relationships between concepts
92 | Avoid omitting "less important" sections that might be critical for complete documentation
93 | Preserve hierarchical structures in documentation
94 | Guidelines for Proper Markdown Formatting:
95 | Document Structure:
96 | Use an informative title at the top of the document.
97 | Include a brief introduction to the topic.
98 | Organize content into logical sections using headings.
99 | Headings:
100 | Use # Heading for the main title.
101 | Use ## Heading for major sections.
102 | Use ### Heading for subsections.
103 | Limit headings to three levels for clarity.
104 | Text Formatting:
105 | Use *italic text* for emphasis.
106 | Use **bold text** for strong emphasis.
107 | Combine emphasis with ***bold and italic***.
108 | Lists:
109 | Use -, +, or * for unordered lists.
110 | Use 1., 2., etc., for ordered lists.
111 | Avoid ending list items with periods unless they contain multiple sentences.
112 | Links and Images:
113 | Use [Descriptive Text](https://example.com) for links.
114 | Use Alt Text for images.
115 | Ensure descriptive text is informative.
116 | Code Blocks:
117 | Use triple backticks \`\`\`to enclose code blocks. Specify the programming language if necessary (e.g.,\`\`\`
118 | Line Breaks and Paragraphs:
119 | Use a blank line to separate paragraphs.
120 | Use two spaces at the end of a line for a line break.
121 | Special Characters:
122 | Escape special characters with a backslash (\\) if needed (e.g., \\# for a literal #).
123 | Metadata:
124 | Include metadata at the top of the document if necessary (e.g., author, date).
125 | Consistency and Style:
126 | Follow a consistent naming convention for files and directories.
127 | Use a project-specific style guide if available.
128 | Additional Tips:
129 | Use Markdown extensions if supported by your platform (e.g., tables, footnotes).
130 | Preview your documentation regularly to ensure formatting is correct.
131 | Use linting tools to enforce style and formatting rules.
132 | Output Requirements:
133 | The documentation should be in Markdown format.
134 | Ensure all links and images are properly formatted and functional.
135 | The document should be easy to navigate with clear headings and sections.
136 | By following these guidelines, you produce high-quality Markdown documentation that is both informative and visually appealing.
137 | To make your Markdown document visually appealing with colored text and emojis, you can incorporate the following elements:
138 | Using Colored Text
139 | Since Markdown does not natively support colored text, you can use HTML tags to achieve this:
140 | HTML \`<span>\` Tag:
141 | Use the \`<span>\` tag with inline CSS to change the text color. For example:
142 | html
143 | \`<span style="color:red;">\`This text will be red.
144 | You can replace red with any color name or hex code (e.g., #FF0000 for red).
145 | HTML \`<font>\` Tag (Deprecated):
146 | Although the \`<font>\` tag is deprecated, it can still be used in some environments:
147 | html
148 | \`<font color="red">\`This text will be red.\`</font>\`
149 | However, it's recommended to use the \`<span>\` tag for better compatibility.
150 | Incorporating Emojis
151 | Emojis can add visual appeal and convey emotions or ideas effectively:
152 | Copy and Paste Emojis:
153 | You can copy emojis from sources like Emojipedia and paste them directly into your Markdown document. For example:
154 | markdown
155 | This is a happy face 😊.
156 | Emoji Shortcodes:
157 | Some platforms support emoji shortcodes, which vary by application. For example, on GitHub, you can use :star: for a star emoji ⭐️.
158 | Best Practices for Visual Appeal
159 | Consistency: Use colors and emojis consistently throughout the document to maintain a cohesive look.
160 | Accessibility: Ensure that colored text has sufficient contrast with the background and avoid relying solely on color to convey information3.
161 | Purposeful Use: Use colors and emojis to highlight important information or add visual interest, but avoid overusing them.
162 | Example of a Visually Appealing Markdown Document
163 | markdown
164 |
165 | # Introduction to Python 🐍
166 |
167 | ## What is Python?
168 |
169 | Python is a versatile and widely-used programming language. It is \`<span style="color:blue;">\`easy to learn and has a vast number of libraries for various tasks.
170 |
171 | ### Key Features
172 |
173 | - **Easy to Read:** Python's syntax is designed to be clear and concise.
174 | - **Versatile:** From web development to data analysis, Python can do it all 📊.
175 | - **Large Community:** Python has a large and active community, which means there are many resources available 🌟.
176 |
177 | ## Conclusion
178 |
179 | Python is a great language for beginners and experienced developers alike. Start your Python journey today! 🚀
180 | This example incorporates colored text and emojis to enhance readability and visual appeal.
181 | VERY IMPORTANT - Remember that your goal is to produce high-quality, clean, professional technical content, documentation and code that meets the highest standards, without unnecessary commentary, following all of the above guidelines
182 | `;
183 |
184 | // Interface for the write-research-file tool arguments
185 | interface WriteResearchFileArguments {
186 | file_path: string;
187 | content: string;
188 | mode?: 'rewrite' | 'append';
189 | }
190 |
191 | // Interface for the arguments our deep-research-tool will accept
192 | interface DeepResearchToolArguments {
193 | query: string;
194 | search_depth?: "basic" | "advanced";
195 | topic?: "general" | "news";
196 | days?: number;
197 | time_range?: string;
198 | max_search_results?: number;
199 | chunks_per_source?: number;
200 | include_search_images?: boolean;
201 | include_search_image_descriptions?: boolean;
202 | include_answer?: boolean | "basic" | "advanced";
203 | include_raw_content_search?: boolean;
204 | include_domains_search?: string[];
205 | exclude_domains_search?: string[];
206 | search_timeout?: number;
207 |
208 | crawl_max_depth?: number;
209 | crawl_max_breadth?: number;
210 | crawl_limit?: number;
211 | crawl_instructions?: string;
212 | crawl_select_paths?: string[];
213 | crawl_select_domains?: string[];
214 | crawl_exclude_paths?: string[];
215 | crawl_exclude_domains?: string[];
216 | crawl_allow_external?: boolean;
217 | crawl_include_images?: boolean;
218 | crawl_categories?: string[];
219 | crawl_extract_depth?: "basic" | "advanced";
220 | crawl_timeout?: number;
221 | documentation_prompt?: string; // For custom documentation instructions
222 | hardware_acceleration?: boolean;
223 | }
224 |
225 | // Structure for storing combined search and crawl results for one source
226 | interface CombinedResult {
227 | search_rank: number;
228 | original_url: string;
229 | title: string;
230 | initial_content_snippet: string; // Snippet from the search result
231 | search_score?: number;
232 | published_date?: string; // If topic is 'news'
233 | crawled_data: CrawledPageData[];
234 | crawl_errors: string[];
235 | }
236 |
237 | // Structure for data from a single crawled page
238 | interface CrawledPageData {
239 | url: string; // The specific URL that was crawled (could be same as original_url or deeper)
240 | raw_content: string | null; // The main content extracted from this page
241 | images?: string[]; // URLs of images found on this page
242 | }
243 |
244 | // Add Tavily API parameter interfaces based on documentation
245 | interface TavilySearchParams {
246 | query: string;
247 | searchDepth?: "basic" | "advanced";
248 | topic?: "general" | "news";
249 | days?: number;
250 | timeRange?: string;
251 | maxResults?: number;
252 | chunksPerSource?: number;
253 | includeImages?: boolean;
254 | includeImageDescriptions?: boolean;
255 | includeAnswer?: boolean | "basic" | "advanced";
256 | includeRawContent?: boolean;
257 | includeDomains?: string[];
258 | excludeDomains?: string[];
259 | timeout?: number;
260 | }
261 |
262 | // Add the necessary TavilyCrawlCategory type
263 | type TavilyCrawlCategory =
264 | | "Careers"
265 | | "Blog"
266 | | "Documentation"
267 | | "About"
268 | | "Pricing"
269 | | "Community"
270 | | "Developers"
271 | | "Contact"
272 | | "Media";
273 |
274 | // Update interface with all required fields
275 | interface TavilyCrawlParams {
276 | url: string;
277 | maxDepth?: number;
278 | maxBreadth?: number;
279 | limit?: number;
280 | instructions?: string;
281 | selectPaths?: string[];
282 | selectDomains?: string[];
283 | excludePaths?: string[];
284 | excludeDomains?: string[];
285 | allowExternal?: boolean;
286 | includeImages?: boolean;
287 | categories?: TavilyCrawlCategory[];
288 | extractDepth?: "basic" | "advanced";
289 | timeout?: number;
290 | }
291 |
292 | // File writing utility functions
293 | function normalizePath(p: string): string {
294 | return path.normalize(p).toLowerCase();
295 | }
296 |
297 | async function isPathAllowed(filePath: string): Promise<boolean> {
298 | if (!ENV_FILE_WRITE_ENABLED) {
299 | return false; // File writing disabled
300 | }
301 |
302 | if (!ENV_ALLOWED_WRITE_PATHS || ENV_ALLOWED_WRITE_PATHS.length === 0) {
303 | // If no allowed paths specified, allow writing to user's home directory and subdirectories
304 | const userHome = os.homedir();
305 | let normalizedPathToCheck = normalizePath(path.resolve(filePath));
306 | let normalizedUserHome = normalizePath(userHome);
307 |
308 | // Remove trailing separators
309 | if (normalizedPathToCheck.slice(-1) === path.sep) {
310 | normalizedPathToCheck = normalizedPathToCheck.slice(0, -1);
311 | }
312 | if (normalizedUserHome.slice(-1) === path.sep) {
313 | normalizedUserHome = normalizedUserHome.slice(0, -1);
314 | }
315 |
316 | // Check if path is exactly the home directory or a subdirectory
317 | return normalizedPathToCheck === normalizedUserHome ||
318 | normalizedPathToCheck.startsWith(normalizedUserHome + path.sep);
319 | }
320 |
321 | // Check if the file path is within any of the allowed directories
322 | let normalizedPathToCheck = normalizePath(path.resolve(filePath));
323 | if (normalizedPathToCheck.slice(-1) === path.sep) {
324 | normalizedPathToCheck = normalizedPathToCheck.slice(0, -1);
325 | }
326 |
327 | return ENV_ALLOWED_WRITE_PATHS.some(allowedPath => {
328 | let normalizedAllowedPath = normalizePath(path.resolve(allowedPath));
329 | if (normalizedAllowedPath.slice(-1) === path.sep) {
330 | normalizedAllowedPath = normalizedAllowedPath.slice(0, -1);
331 | }
332 |
333 | // Check if path is exactly the allowed directory
334 | if (normalizedPathToCheck === normalizedAllowedPath) {
335 | return true;
336 | }
337 |
338 | // Check if path is a subdirectory of the allowed directory
339 | // Add separator to prevent partial directory name matches
340 | return normalizedPathToCheck.startsWith(normalizedAllowedPath + path.sep);
341 | });
342 | }
343 |
344 | async function validateWritePath(filePath: string): Promise<string> {
345 | // Convert to absolute path
346 | const absolutePath = path.resolve(filePath);
347 |
348 | // Check if path is allowed
349 | if (!(await isPathAllowed(absolutePath))) {
350 | const allowedPaths = ENV_ALLOWED_WRITE_PATHS || [os.homedir()];
351 | throw new Error(`File writing not allowed to path: ${filePath}. Must be within one of these directories: ${allowedPaths.join(', ')}`);
352 | }
353 |
354 | // Ensure parent directory exists
355 | const parentDir = path.dirname(absolutePath);
356 | try {
357 | await fs.mkdir(parentDir, { recursive: true });
358 | } catch (error) {
359 | throw new Error(`Failed to create parent directory: ${parentDir}`);
360 | }
361 |
362 | return absolutePath;
363 | }
364 |
365 | async function writeResearchFile(filePath: string, content: string, mode: 'rewrite' | 'append' = 'rewrite'): Promise<void> {
366 | const validPath = await validateWritePath(filePath);
367 |
368 | // Check line limit
369 | const lines = content.split('\n');
370 | const lineCount = lines.length;
371 |
372 | if (lineCount > ENV_FILE_WRITE_LINE_LIMIT) {
373 | throw new Error(`Content exceeds line limit: ${lineCount} lines (maximum: ${ENV_FILE_WRITE_LINE_LIMIT}). Please split content into smaller chunks.`);
374 | }
375 |
376 | // Write the file
377 | if (mode === 'append') {
378 | await fs.appendFile(validPath, content);
379 | } else {
380 | await fs.writeFile(validPath, content);
381 | }
382 | }
383 |
384 | class DeepResearchMcpServer {
385 | private server: Server;
386 | private tavilyClient: TavilyClient;
387 |
388 | constructor() {
389 | this.server = new Server(
390 | {
391 | name: "deep-research-mcp",
392 | version: "1.3.1", // Version with file writing tool and path validation fixes
393 | },
394 | {
395 | capabilities: {
396 | resources: {},
397 | tools: {},
398 | prompts: {}, // Prompts handled by the tool's output logic
399 | },
400 | }
401 | );
402 |
403 | try {
404 | this.tavilyClient = createTavilyClient({ apiKey: TAVILY_API_KEY }) as unknown as TavilyClient;
405 | } catch (e: any) {
406 | console.error("Failed to instantiate Tavily Client:", e.message);
407 | throw new Error(
408 | `Could not initialize Tavily Client. Check API key and SDK usage: ${e.message}`
409 | );
410 | }
411 |
412 | this.setupRequestHandlers();
413 | this.setupErrorHandling();
414 | }
415 |
416 | private setupErrorHandling(): void {
417 | this.server.onerror = (error) => {
418 | console.error("[DeepResearchMCP Error]", error);
419 | };
420 |
421 | const shutdown = async () => {
422 | console.error("Shutting down DeepResearchMcpServer...");
423 | try {
424 | await this.server.close();
425 | } catch (err) {
426 | console.error("Error during server shutdown:", err);
427 | }
428 | process.exit(0);
429 | };
430 |
431 | process.on("SIGINT", shutdown);
432 | process.on("SIGTERM", shutdown);
433 | }
434 |
435 | private setupRequestHandlers(): void {
436 | this.server.setRequestHandler(ListToolsRequestSchema, async () => {
437 | const tools: Tool[] = [
438 | {
439 | name: "deep-research-tool",
440 | description:
441 | "Performs extensive web research using Tavily Search and Crawl. Returns aggregated JSON data including the query, search summary (if any), detailed research findings, and documentation instructions. The documentation instructions will guide you on how the user wants the research data to be formatted into markdown.",
442 | inputSchema: {
443 | type: "object",
444 | properties: {
445 | query: { type: "string", description: "The main research topic or question." },
446 | search_depth: { type: "string", enum: ["basic", "advanced"], default: "advanced", description: "Depth of the initial Tavily search ('basic' or 'advanced')." },
447 | topic: { type: "string", enum: ["general", "news"], default: "general", description: "Category for the Tavily search ('general' or 'news')." },
448 | days: { type: "number", description: "For 'news' topic: number of days back from current date to include results." },
449 | time_range: { type: "string", description: "Time range for search results (e.g., 'd' for day, 'w' for week, 'm' for month, 'y' for year)." },
450 | max_search_results: { type: "number", default: 7, minimum: 1, maximum: 20, description: "Max search results to retrieve for crawling (1-20). Can be set via MAX_SEARCH_RESULTS environment variable." },
451 | chunks_per_source: { type: "number", default: 3, minimum: 1, maximum: 3, description: "For 'advanced' search: number of content chunks from each source (1-3)." },
452 | include_search_images: { type: "boolean", default: false, description: "Include image URLs from initial search results." },
453 | include_search_image_descriptions: { type: "boolean", default: false, description: "Include image descriptions from initial search results." },
454 | include_answer: {
455 | anyOf: [
456 | { type: "boolean" },
457 | { type: "string", enum: ["basic", "advanced"] }
458 | ],
459 | default: false,
460 | description: "Include an LLM-generated answer from Tavily search (true implies 'basic')."
461 | },
462 | include_raw_content_search: { type: "boolean", default: false, description: "Include cleaned HTML from initial search results." },
463 | include_domains_search: { type: "array", items: { type: "string" }, default: [], description: "List of domains to specifically include in search." },
464 | exclude_domains_search: { type: "array", items: { type: "string" }, default: [], description: "List of domains to specifically exclude from search." },
465 | search_timeout: { type: "number", default: 60, description: "Timeout in seconds for Tavily search requests. Can be set via SEARCH_TIMEOUT environment variable." },
466 | crawl_max_depth: { type: "number", default: 1, description: "Max crawl depth from base URL (1-2). Higher values increase processing time significantly. Can be set via CRAWL_MAX_DEPTH environment variable." },
467 | crawl_max_breadth: { type: "number", default: 10, description: "Max links to follow per page level during crawl (1-10)." },
468 | crawl_limit: { type: "number", default: 10, description: "Total links crawler will process per root URL (1-20). Can be set via CRAWL_LIMIT environment variable." },
469 | crawl_instructions: { type: "string", description: "Natural language instructions for the crawler." },
470 | crawl_select_paths: { type: "array", items: { type: "string" }, default: [], description: "Regex for URLs paths to crawl (e.g., '/docs/.*')." },
471 | crawl_select_domains: { type: "array", items: { type: "string" }, default: [], description: "Regex for domains/subdomains to crawl (e.g., '^docs\\.example\\.com$'). Overrides auto-domain focus." },
472 | crawl_exclude_paths: { type: "array", items: { type: "string" }, default: [], description: "Regex for URL paths to exclude." },
473 | crawl_exclude_domains: { type: "array", items: { type: "string" }, default: [], description: "Regex for domains/subdomains to exclude." },
474 | crawl_allow_external: { type: "boolean", default: false, description: "Allow crawler to follow links to external domains." },
475 | crawl_include_images: { type: "boolean", default: false, description: "Extract image URLs from crawled pages." },
476 | crawl_categories: { type: "array", items: { type: "string" }, default: [], description: "Filter crawl URLs by categories (e.g., 'Blog', 'Documentation')." },
477 | crawl_extract_depth: { type: "string", enum: ["basic", "advanced"], default: "basic", description: "Extraction depth for crawl ('basic' or 'advanced')." },
478 | crawl_timeout: { type: "number", default: 180, description: "Timeout in seconds for Tavily crawl requests. Can be set via CRAWL_TIMEOUT environment variable." },
479 | documentation_prompt: {
480 | type: "string",
481 | description: "Optional. Custom prompt for LLM documentation generation. Overrides 'DOCUMENTATION_PROMPT' env var and default. If none set, a comprehensive default is used.",
482 | },
483 | hardware_acceleration: { type: "boolean", default: false, description: "Try to use hardware acceleration (WebGPU) if available." },
484 | },
485 | required: ["query"],
486 | },
487 | },
488 | {
489 | name: "write-research-file",
490 | description: `Write research content to a file. This tool allows you to save research findings, documentation, or any text content to a specified file path.
491 |
492 | SECURITY: File writing is controlled by environment variables:
493 | - FILE_WRITE_ENABLED must be set to 'true' to enable file writing
494 | - ALLOWED_WRITE_PATHS can specify allowed directories (comma-separated)
495 | - If no ALLOWED_WRITE_PATHS specified, defaults to user's home directory
496 | - FILE_WRITE_LINE_LIMIT controls maximum lines per write operation (default: 200)
497 |
498 | Use this tool to save research reports, documentation, or any content generated from the deep-research-tool results.`,
499 | inputSchema: {
500 | type: "object",
501 | properties: {
502 | file_path: {
503 | type: "string",
504 | description: "Path where the file should be written. Must be within allowed directories."
505 | },
506 | content: {
507 | type: "string",
508 | description: "Content to write to the file."
509 | },
510 | mode: {
511 | type: "string",
512 | enum: ["rewrite", "append"],
513 | default: "rewrite",
514 | description: "Write mode: 'rewrite' to overwrite file, 'append' to add to existing content."
515 | },
516 | },
517 | required: ["file_path", "content"],
518 | },
519 | },
520 | ];
521 | return { tools };
522 | });
523 |
524 | this.server.setRequestHandler(
525 | CallToolRequestSchema,
526 | async (request) => {
527 | if (!request.params || typeof request.params.name !== 'string' || typeof request.params.arguments !== 'object') {
528 | console.error("Invalid CallToolRequest structure:", request);
529 | throw new McpError(ErrorCode.InvalidParams, "Invalid tool call request structure.");
530 | }
531 |
532 | if (request.params.name === "write-research-file") {
533 | return await this.handleWriteResearchFile(request.params.arguments);
534 | } else if (request.params.name === "deep-research-tool") {
535 | return await this.handleDeepResearchTool(request.params.arguments);
536 | } else {
537 | throw new McpError(ErrorCode.MethodNotFound, `Unknown tool: ${request.params.name}`);
538 | }
539 | }
540 | );
541 | }
542 |
543 | private async handleWriteResearchFile(rawArgs: any): Promise<any> {
544 | if (!ENV_FILE_WRITE_ENABLED) {
545 | throw new McpError(ErrorCode.InvalidParams, "File writing is disabled. Set FILE_WRITE_ENABLED=true to enable this feature.");
546 | }
547 |
548 | const args: WriteResearchFileArguments = {
549 | file_path: typeof rawArgs.file_path === 'string' ? rawArgs.file_path : '',
550 | content: typeof rawArgs.content === 'string' ? rawArgs.content : '',
551 | mode: rawArgs.mode === 'append' ? 'append' : 'rewrite',
552 | };
553 |
554 | if (!args.file_path || !args.content) {
555 | throw new McpError(ErrorCode.InvalidParams, "Both file_path and content are required.");
556 | }
557 |
558 | try {
559 | await writeResearchFile(args.file_path, args.content, args.mode);
560 |
561 | const successMessage = `Successfully ${args.mode === 'append' ? 'appended to' : 'wrote'} file: ${args.file_path}`;
562 | console.error(successMessage);
563 |
564 | return {
565 | content: [{
566 | type: "text",
567 | text: JSON.stringify({
568 | success: true,
569 | message: successMessage,
570 | file_path: args.file_path,
571 | mode: args.mode,
572 | content_length: args.content.length,
573 | line_count: args.content.split('\n').length
574 | }, null, 2)
575 | }]
576 | };
577 | } catch (error: any) {
578 | const errorMessage = error.message || 'Failed to write file';
579 | console.error(`File write error: ${errorMessage}`);
580 |
581 | return {
582 | content: [{
583 | type: "text",
584 | text: JSON.stringify({
585 | success: false,
586 | error: errorMessage,
587 | file_path: args.file_path,
588 | mode: args.mode
589 | }, null, 2)
590 | }],
591 | isError: true
592 | };
593 | }
594 | }
595 |
596 | private async handleDeepResearchTool(rawArgs: any): Promise<any> {
597 | const args: DeepResearchToolArguments = {
598 | query: typeof rawArgs.query === 'string' ? rawArgs.query : '',
599 | search_depth: rawArgs.search_depth as "basic" | "advanced" | undefined,
600 | topic: rawArgs.topic as "general" | "news" | undefined,
601 | days: rawArgs.days as number | undefined,
602 | time_range: rawArgs.time_range as string | undefined,
603 | max_search_results: rawArgs.max_search_results as number | undefined,
604 | chunks_per_source: rawArgs.chunks_per_source as number | undefined,
605 | include_search_images: rawArgs.include_search_images as boolean | undefined,
606 | include_search_image_descriptions: rawArgs.include_search_image_descriptions as boolean | undefined,
607 | include_answer: rawArgs.include_answer as boolean | "basic" | "advanced" | undefined,
608 | include_raw_content_search: rawArgs.include_raw_content_search as boolean | undefined,
609 | include_domains_search: rawArgs.include_domains_search as string[] | undefined,
610 | exclude_domains_search: rawArgs.exclude_domains_search as string[] | undefined,
611 | search_timeout: rawArgs.search_timeout as number | undefined,
612 | crawl_max_depth: rawArgs.crawl_max_depth as number | undefined,
613 | crawl_max_breadth: rawArgs.crawl_max_breadth as number | undefined,
614 | crawl_limit: rawArgs.crawl_limit as number | undefined,
615 | crawl_instructions: rawArgs.crawl_instructions as string | undefined,
616 | crawl_select_paths: rawArgs.crawl_select_paths as string[] | undefined,
617 | crawl_select_domains: rawArgs.crawl_select_domains as string[] | undefined,
618 | crawl_exclude_paths: rawArgs.crawl_exclude_paths as string[] | undefined,
619 | crawl_exclude_domains: rawArgs.crawl_exclude_domains as string[] | undefined,
620 | crawl_allow_external: rawArgs.crawl_allow_external as boolean | undefined,
621 | crawl_include_images: rawArgs.crawl_include_images as boolean | undefined,
622 | crawl_categories: rawArgs.crawl_categories as string[] | undefined,
623 | crawl_extract_depth: rawArgs.crawl_extract_depth as "basic" | "advanced" | undefined,
624 | crawl_timeout: rawArgs.crawl_timeout as number | undefined,
625 | documentation_prompt: rawArgs.documentation_prompt as string | undefined,
626 | hardware_acceleration: rawArgs.hardware_acceleration as boolean | undefined,
627 | };
628 |
629 | if (!args.query) {
630 | throw new McpError(ErrorCode.InvalidParams, "Tool arguments are missing.");
631 | }
632 |
633 | let finalDocumentationPrompt = DEFAULT_DOCUMENTATION_PROMPT;
634 | if (ENV_DOCUMENTATION_PROMPT) {
635 | finalDocumentationPrompt = ENV_DOCUMENTATION_PROMPT;
636 | }
637 | if (args.documentation_prompt) {
638 | finalDocumentationPrompt = args.documentation_prompt;
639 | }
640 |
641 | try {
642 | // Check if hardware acceleration is requested for this specific call
643 | if (args.hardware_acceleration) {
644 | console.error("Hardware acceleration requested for this research query");
645 | try {
646 | // Try to enable Node.js flags for GPU if not already enabled
647 | process.env.NODE_OPTIONS = process.env.NODE_OPTIONS || '';
648 | if (!process.env.NODE_OPTIONS.includes('--enable-webgpu')) {
649 | process.env.NODE_OPTIONS += ' --enable-webgpu';
650 | console.error("Added WebGPU flag to Node options");
651 | } else {
652 | console.error("WebGPU flag already present in Node options");
653 | }
654 | } catch (err) {
655 | console.error("Failed to set hardware acceleration:", err);
656 | }
657 | }
658 |
659 | // Convert our parameters to Tavily Search API format
660 | const searchParams: TavilySearchParams = {
661 | query: args.query,
662 | searchDepth: args.search_depth ?? "advanced",
663 | topic: args.topic ?? "general",
664 | maxResults: args.max_search_results ?? ENV_MAX_SEARCH_RESULTS ?? 7,
665 | includeImages: args.include_search_images ?? false,
666 | includeImageDescriptions: args.include_search_image_descriptions ?? false,
667 | includeAnswer: args.include_answer ?? false,
668 | includeRawContent: args.include_raw_content_search ?? false,
669 | includeDomains: args.include_domains_search ?? [],
670 | excludeDomains: args.exclude_domains_search ?? [],
671 | timeout: args.search_timeout ?? ENV_SEARCH_TIMEOUT ?? 60,
672 | };
673 |
674 | if (searchParams.searchDepth === "advanced" && (args.chunks_per_source !== undefined && args.chunks_per_source !== null)) {
675 | searchParams.chunksPerSource = args.chunks_per_source;
676 | }
677 | if (searchParams.topic === "news" && (args.days !== undefined && args.days !== null)) {
678 | searchParams.days = args.days;
679 | }
680 | if (args.time_range) {
681 | searchParams.timeRange = args.time_range;
682 | }
683 |
684 | console.error("Tavily Search API Parameters:", JSON.stringify(searchParams, null, 2));
685 | // Set search timeout for faster response
686 | const searchTimeout = args.search_timeout ?? ENV_SEARCH_TIMEOUT ?? 60; // Default 60 seconds
687 | console.error(`Starting search with timeout: ${searchTimeout}s`);
688 | const startSearchTime = Date.now();
689 |
690 | // Execute search with timeout
691 | let searchResponse: any; // Use any to avoid unknown type errors
692 | try {
693 | searchResponse = await Promise.race([
694 | this.tavilyClient.search(searchParams.query, {
695 | searchDepth: searchParams.searchDepth,
696 | topic: searchParams.topic,
697 | maxResults: searchParams.maxResults,
698 | chunksPerSource: searchParams.chunksPerSource,
699 | includeImages: searchParams.includeImages,
700 | includeImageDescriptions: searchParams.includeImageDescriptions,
701 | // Convert string types to boolean for includeAnswer
702 | includeAnswer: typeof searchParams.includeAnswer === 'boolean' ?
703 | searchParams.includeAnswer : false,
704 | includeRawContent: searchParams.includeRawContent,
705 | includeDomains: searchParams.includeDomains,
706 | excludeDomains: searchParams.excludeDomains,
707 | // Fix timeRange to match allowed values
708 | timeRange: (searchParams.timeRange as "year" | "month" | "week" | "day" | "y" | "m" | "w" | "d" | undefined),
709 | days: searchParams.days
710 | }),
711 | new Promise((_, reject) =>
712 | setTimeout(() => reject(new Error(`Search timeout after ${searchTimeout}s`)), searchTimeout * 1000)
713 | )
714 | ]);
715 | console.error(`Search completed in ${((Date.now() - startSearchTime) / 1000).toFixed(1)}s`);
716 | } catch (error: any) {
717 | console.error(`Search error: ${error.message}`);
718 | throw error;
719 | }
720 |
721 | const combinedResults: CombinedResult[] = [];
722 | let searchRank = 1;
723 |
724 | if (!searchResponse.results || searchResponse.results.length === 0) {
725 | const noResultsOutput = JSON.stringify({
726 | documentation_instructions: finalDocumentationPrompt,
727 | original_query: args.query,
728 | search_summary: searchResponse.answer || `No search results found for query: "${args.query}".`,
729 | research_data: [],
730 | }, null, 2);
731 | return {
732 | content: [{ type: "text", text: noResultsOutput }]
733 | };
734 | }
735 |
736 | for (const searchResult of searchResponse.results) {
737 | if (!searchResult.url) {
738 | console.warn(`Search result "${searchResult.title}" missing URL, skipping crawl.`);
739 | continue;
740 | }
741 |
742 | // Ensure crawl parameters are strictly enforced with smaller defaults
743 | const crawlParams: TavilyCrawlParams = {
744 | url: searchResult.url,
745 | maxDepth: Math.min(2, args.crawl_max_depth ?? ENV_CRAWL_MAX_DEPTH ?? 1), // Hard cap at 2, default to 1
746 | maxBreadth: Math.min(10, args.crawl_max_breadth ?? 10), // Hard cap at 10, default to 10 (down from 20)
747 | limit: Math.min(20, args.crawl_limit ?? ENV_CRAWL_LIMIT ?? 10), // Hard cap at 20, default to 10 (down from 50)
748 | instructions: args.crawl_instructions || "",
749 | selectPaths: args.crawl_select_paths ?? [],
750 | selectDomains: args.crawl_select_domains ?? [],
751 | excludePaths: args.crawl_exclude_paths ?? [],
752 | excludeDomains: args.crawl_exclude_domains ?? [],
753 | allowExternal: args.crawl_allow_external ?? false,
754 | includeImages: args.crawl_include_images ?? false,
755 | categories: (args.crawl_categories ?? []) as TavilyCrawlCategory[],
756 | extractDepth: args.crawl_extract_depth ?? "basic"
757 | };
758 |
759 | // If no select_domains provided and not allowing external domains,
760 | // restrict to the current domain to prevent excessive crawling
761 | if ((!args.crawl_select_domains || args.crawl_select_domains.length === 0) &&
762 | !crawlParams.allowExternal) {
763 | try {
764 | const currentUrlDomain = new URL(searchResult.url).hostname;
765 | crawlParams.selectDomains = [`^${currentUrlDomain.replace(/\./g, "\\.")}$`];
766 | console.error(`Auto-limiting crawl to domain: ${currentUrlDomain}`);
767 | } catch (e: any) {
768 | console.error(`Could not parse URL to limit crawl domain: ${searchResult.url}. Error: ${e.message}`);
769 | }
770 | }
771 |
772 | console.error(`Crawling ${searchResult.url} with maxDepth=${crawlParams.maxDepth}, maxBreadth=${crawlParams.maxBreadth}, limit=${crawlParams.limit}`);
773 |
774 | // Add memory usage tracking
775 | if (process.memoryUsage) {
776 | const memUsage = process.memoryUsage();
777 | console.error(`Memory usage before crawl: RSS=${Math.round(memUsage.rss / 1024 / 1024)}MB, Heap=${Math.round(memUsage.heapUsed / 1024 / 1024)}MB`);
778 | }
779 |
780 | console.error(`Crawling URL: ${searchResult.url} with params:`, JSON.stringify(crawlParams, null, 2));
781 | const currentCombinedResult: CombinedResult = {
782 | search_rank: searchRank++,
783 | original_url: searchResult.url,
784 | title: searchResult.title,
785 | initial_content_snippet: searchResult.content,
786 | search_score: searchResult.score,
787 | published_date: searchResult.publishedDate,
788 | crawled_data: [],
789 | crawl_errors: [],
790 | };
791 |
792 | try {
793 | const startCrawlTime = Date.now();
794 | const crawlTimeout = args.crawl_timeout ?? ENV_CRAWL_TIMEOUT ?? 180; // Default 3 minutes
795 | console.error(`Starting crawl with timeout: ${crawlTimeout}s`);
796 |
797 | // Progress tracking for the crawl
798 | let progressTimer = setInterval(() => {
799 | const elapsedSec = (Date.now() - startCrawlTime) / 1000;
800 | console.error(`Crawl in progress... (${elapsedSec.toFixed(0)}s elapsed)`);
801 | }, 15000); // Report every 15 seconds
802 |
803 | // Ensure timer is always cleared
804 | let crawlResponse: any; // Use any to avoid unknown type errors
805 | try {
806 | // Execute crawl with timeout
807 | crawlResponse = await Promise.race([
808 | this.tavilyClient.crawl(crawlParams.url, {
809 | // Ensure all parameters have non-undefined values to match API requirements
810 | maxDepth: crawlParams.maxDepth ?? 1,
811 | maxBreadth: crawlParams.maxBreadth ?? 10,
812 | limit: crawlParams.limit ?? 10,
813 | instructions: crawlParams.instructions ?? "",
814 | selectPaths: crawlParams.selectPaths ?? [],
815 | selectDomains: crawlParams.selectDomains ?? [],
816 | excludePaths: crawlParams.excludePaths ?? [],
817 | excludeDomains: crawlParams.excludeDomains ?? [],
818 | allowExternal: crawlParams.allowExternal ?? false,
819 | includeImages: crawlParams.includeImages ?? false,
820 | // Cast categories to the proper type
821 | categories: (crawlParams.categories ?? []) as TavilyCrawlCategory[],
822 | extractDepth: crawlParams.extractDepth ?? "basic",
823 | // Add the required timeout parameter
824 | timeout: args.crawl_timeout ?? ENV_CRAWL_TIMEOUT ?? 180
825 | }),
826 | new Promise((_, reject) =>
827 | setTimeout(() => reject(new Error(`Crawl timeout after ${crawlTimeout}s`)), crawlTimeout * 1000)
828 | )
829 | ]);
830 | } catch (err) {
831 | clearInterval(progressTimer); // Clear timer on error
832 | throw err; // Re-throw to be caught by outer try/catch
833 | }
834 |
835 | // Clear the progress timer once the crawl is complete
836 | clearInterval(progressTimer);
837 |
838 | console.error(`Crawl completed in ${((Date.now() - startCrawlTime) / 1000).toFixed(1)}s`);
839 |
840 | if (crawlResponse.results && crawlResponse.results.length > 0) {
841 | crawlResponse.results.forEach((page: any) => {
842 | currentCombinedResult.crawled_data.push({
843 | url: page.url,
844 | raw_content: page.rawContent || null,
845 | images: page.images || [],
846 | });
847 | });
848 | } else {
849 | currentCombinedResult.crawl_errors.push(`No content pages returned from crawl of ${searchResult.url}.`);
850 | }
851 |
852 | // After crawl completes, log memory usage
853 | if (process.memoryUsage) {
854 | const memUsage = process.memoryUsage();
855 | console.error(`Memory usage after crawl: RSS=${Math.round(memUsage.rss / 1024 / 1024)}MB, Heap=${Math.round(memUsage.heapUsed / 1024 / 1024)}MB`);
856 |
857 | // Force garbage collection if available and memory usage is high
858 | if (memUsage.heapUsed > 500 * 1024 * 1024 && global.gc) {
859 | console.error("Memory usage high, forcing garbage collection");
860 | try {
861 | global.gc();
862 | } catch (e) {
863 | console.error("Failed to force garbage collection", e);
864 | }
865 | }
866 | }
867 | } catch (crawlError: any) {
868 | const errorMessage = crawlError.response?.data?.error || crawlError.message || 'Unknown crawl error';
869 | console.error(`Error crawling ${searchResult.url}:`, errorMessage, crawlError.stack);
870 | currentCombinedResult.crawl_errors.push(
871 | `Failed to crawl ${searchResult.url}: ${errorMessage}`
872 | );
873 | }
874 | combinedResults.push(currentCombinedResult);
875 | }
876 |
877 | const outputText = JSON.stringify({
878 | documentation_instructions: finalDocumentationPrompt,
879 | original_query: args.query,
880 | search_summary: searchResponse.answer || null,
881 | research_data: combinedResults,
882 | }, null, 2);
883 |
884 | return {
885 | content: [{ type: "text", text: outputText }]
886 | };
887 |
888 | } catch (error: any) {
889 | const errorMessage = error.response?.data?.error || error.message || 'An unexpected error occurred in deep-research-tool.';
890 | console.error("[DeepResearchTool Error]", errorMessage, error.stack);
891 |
892 | const errorOutput = JSON.stringify({
893 | documentation_instructions: finalDocumentationPrompt,
894 | error: errorMessage,
895 | original_query: args.query,
896 | }, null, 2);
897 |
898 | return {
899 | content: [{ type: "text", text: errorOutput }],
900 | isError: true
901 | };
902 | }
903 | }
904 |
905 | public async run(): Promise<void> {
906 | const transport = new StdioServerTransport();
907 | await this.server.connect(transport);
908 |
909 | // Check if we should try to enable hardware acceleration
910 | const useHardwareAcceleration = process.env.HARDWARE_ACCELERATION === 'true';
911 | if (useHardwareAcceleration) {
912 | console.error("Attempting to enable hardware acceleration");
913 | try {
914 | // Try to enable Node.js flags for GPU
915 | process.env.NODE_OPTIONS = process.env.NODE_OPTIONS || '';
916 | if (!process.env.NODE_OPTIONS.includes('--enable-webgpu')) {
917 | process.env.NODE_OPTIONS += ' --enable-webgpu';
918 | }
919 | } catch (err) {
920 | console.error("Failed to set hardware acceleration:", err);
921 | }
922 | }
923 |
924 | console.error(
925 | "Deep Research MCP Server (deep-research-mcp) is running and connected via stdio.\n" +
926 | `Documentation prompt source: ` +
927 | (process.env.npm_config_documentation_prompt || ENV_DOCUMENTATION_PROMPT ? `Environment Variable ('DOCUMENTATION_PROMPT')` : `Default (can be overridden by tool argument)`) +
928 | `.\n` +
929 | `Timeout configuration: ` +
930 | `Search=${ENV_SEARCH_TIMEOUT || 60}s, Crawl=${ENV_CRAWL_TIMEOUT || 180}s` +
931 | (ENV_SEARCH_TIMEOUT || ENV_CRAWL_TIMEOUT ? ` (from environment variables)` : ` (defaults)`) +
932 | `.\n` +
933 | `Limits configuration: ` +
934 | `MaxResults=${ENV_MAX_SEARCH_RESULTS || 7}, CrawlDepth=${ENV_CRAWL_MAX_DEPTH || 1}, CrawlLimit=${ENV_CRAWL_LIMIT || 10}` +
935 | (ENV_MAX_SEARCH_RESULTS || ENV_CRAWL_MAX_DEPTH || ENV_CRAWL_LIMIT ? ` (from environment variables)` : ` (defaults)`) +
936 | `.\n` +
937 | `File writing: ` +
938 | (ENV_FILE_WRITE_ENABLED ? `Enabled` : `Disabled`) +
939 | (ENV_FILE_WRITE_ENABLED ? ` (LineLimit=${ENV_FILE_WRITE_LINE_LIMIT}, AllowedPaths=${ENV_ALLOWED_WRITE_PATHS ? ENV_ALLOWED_WRITE_PATHS.join(',') : 'user home directory'})` : ` (set FILE_WRITE_ENABLED=true to enable)`) +
940 | `.\n` +
941 | "Awaiting requests..."
942 | );
943 | }
944 | }
945 |
946 | // Main execution block for running the server directly
947 | if (require.main === module || (typeof module !== 'undefined' && !module.parent)) {
948 | const deepResearchServer = new DeepResearchMcpServer();
949 | deepResearchServer
950 | .run()
951 | .catch((err) => {
952 | console.error("Failed to start or run Deep Research MCP Server:", err.stack || err);
953 | process.exit(1);
954 | });
955 | }
956 |
957 | export { DeepResearchMcpServer }; // Export for potential programmatic use
```