# Directory Structure
```
├── .gitignore
├── assets
│ └── deep-research-mcp-logo.png
├── CHANGELOG.md
├── Dockerfile
├── example_config.json
├── LICENSE
├── OVERVIEW.md
├── package-lock.json
├── package.json
├── README.md
├── smithery.yaml
├── src
│ └── index.ts
├── test-file-writing.js
├── test-output
│ └── test-summary.md
└── tsconfig.json
```
# Files
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
```
# Dependencies
node_modules/
npm-debug.log
yarn-debug.log
yarn-error.log
# Build outputs
dist/
# Environment variables
.env
.env.local
.env.development.local
.env.test.local
.env.production.local
# Configuration
config.json
# OS specific files
.DS_Store
Thumbs.db
# Editor specific files
.idea/
.vs/
.vscode/
*.swp
*.swo
Tavily API Reference.md
Tavily Javascript SDK.md
# Other
reference-mcptools/
```
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
```markdown
<p align="center">
<img src="assets/deep-research-mcp-logo.png" alt="Deep Research MCP Logo" width="250" height="250">
</p>
<h1 align="center">Deep Research MCP Server</h1>
<p align="center">
<a href="https://www.npmjs.com/package/@pinkpixel/deep-research-mcp"><img src="https://img.shields.io/npm/v/@pinkpixel/deep-research-mcp.svg" alt="NPM Version"></a>
<a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT"></a>
<a href="https://smithery.ai/server/@pinkpixel/dev-deep-research-mcp"><img src="https://smithery.ai/badge/@pinkpixel/dev-deep-research-mcp" alt="Smithery Installs"></a>
</p>
The Deep Research MCP Server is a Model Context Protocol (MCP) compliant server designed to perform comprehensive web research. It leverages Tavily's powerful Search and new Crawl APIs to gather extensive, up-to-date information on a given topic. The server then aggregates this data along with documentation generation instructions into a structured JSON output, perfectly tailored for Large Language Models (LLMs) to create detailed and high-quality markdown documents.
## Features
* **Multi-Step Research:** Combines Tavily's AI-powered web search with deep content crawling for thorough information gathering.
* **Structured JSON Output:** Provides well-organized data (original query, search summary, detailed findings per source, and documentation instructions) optimized for LLM consumption.
* **Configurable Documentation Prompt:** Includes a comprehensive default prompt for generating high-quality technical documentation. This prompt can be:
* Overridden by setting the `DOCUMENTATION_PROMPT` environment variable.
* Further overridden by passing a `documentation_prompt` argument directly to the tool.
* **Configurable Output Path:** Specify where research documents and images should be saved through:
* Environment variable configuration
* JSON configuration
* Direct parameter in tool calls
* **Granular Control:** Offers a wide range of parameters to fine-tune both the search and crawl processes.
* **MCP Compliant:** Designed to integrate seamlessly into MCP-based AI agent ecosystems.
## Prerequisites
* [Node.js](https://nodejs.org/) (version 18.x or later recommended)
* [npm](https://www.npmjs.com/) (comes with Node.js) or [Yarn](https://yarnpkg.com/)
## Installation
### Installing via Smithery
To install deep-research-mcp for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@pinkpixel/dev-deep-research-mcp):
```bash
npx -y @smithery/cli install @pinkpixel/dev-deep-research-mcp --client claude
```
### Option 1: Using with NPX (Recommended for quick use)
You can run the server directly using `npx` without a global installation:
```bash
npx @pinkpixel/deep-research-mcp
```
### Option 2: Global Installation (Optional)
```bash
npm install -g @pinkpixel/deep-research-mcp
```
Then you can run it using:
```bash
deep-research-mcp
```
### Option 3: Local Project Integration or Development
1. Clone the repository (if you want to modify or contribute):
```bash
git clone [https://github.com/your-username/deep-research-mcp.git](https://github.com/your-username/deep-research-mcp.git)
cd deep-research-mcp
```
2. Install dependencies:
```bash
npm install
```
## Configuration
The server requires a Tavily API key and can optionally accept a custom documentation prompt.
```json
{
"mcpServers": {
"deep-research": {
"command": "npx",
"args": [
"-y",
"@pinkpixel/deep-research-mcp"
],
"env": {
"TAVILY_API_KEY": "tvly-YOUR_ACTUAL_API_KEY_HERE", // Required
"DOCUMENTATION_PROMPT": "Your custom, detailed instructions for the LLM on how to generate markdown documents from the research data...", // Optional - if not provided, the default prompt will be used
"SEARCH_TIMEOUT": "120", // Optional - timeout in seconds for search requests (default: 60)
"CRAWL_TIMEOUT": "300", // Optional - timeout in seconds for crawl requests (default: 180)
"MAX_SEARCH_RESULTS": "10", // Optional - maximum search results to retrieve (default: 7)
"CRAWL_MAX_DEPTH": "2", // Optional - maximum crawl depth (default: 1)
"CRAWL_LIMIT": "15", // Optional - maximum URLs to crawl per source (default: 10)
"FILE_WRITE_ENABLED": "true", // Optional - enable file writing capability (default: false)
"ALLOWED_WRITE_PATHS": "/home/user/research,/home/user/documents", // Optional - comma-separated allowed directories (default: user home directory)
"FILE_WRITE_LINE_LIMIT": "300" // Optional - maximum lines per file write operation (default: 200)
}
}
}
}
```
### 1\. Tavily API Key (Required)
Set the `TAVILY_API_KEY` environment variable to your Tavily API key.
**Methods:**
* **`.env` file:** Create a `.env` file in the project root (if running locally for development):
```env
TAVILY_API_KEY="tvly-YOUR_ACTUAL_API_KEY"
```
* **Directly in command line:**
```bash
TAVILY_API_KEY="tvly-YOUR_ACTUAL_API_KEY" npx @pinkpixel/deep-research-mcp
```
* **System Environment Variable:** Set it in your operating system's environment variables.
### 2\. Custom Documentation Prompt (Optional)
You can override the default comprehensive documentation prompt by setting the `DOCUMENTATION_PROMPT` environment variable.
**Methods (in order of precedence):**
1. **Tool Argument:** The `documentation_prompt` parameter passed when calling the `deep-research-tool` takes highest precedence
2. **Environment Variable:** If no parameter is provided in the tool call, the system checks for a `DOCUMENTATION_PROMPT` environment variable
3. **Default Value:** If neither of the above are set, the comprehensive built-in default prompt is used
**Setting via `.env` file:**
```env
DOCUMENTATION_PROMPT="Your custom, detailed instructions for the LLM on how to generate markdown..."
```
**Or directly in command line:**
```bash
DOCUMENTATION_PROMPT="Your custom prompt..." TAVILY_API_KEY="tvly-YOUR_KEY" npx @pinkpixel/deep-research-mcp
```
### 3\. Output Path Configuration (Optional)
You can specify where research documents and images should be saved. If not configured, a default path in the user's Documents folder with a timestamp will be used.
**Methods (in order of precedence):**
1. **Tool Argument:** The `output_path` parameter passed when calling the `deep-research-tool` takes highest precedence
2. **Environment Variable:** If no parameter is provided in the tool call, the system checks for a `RESEARCH_OUTPUT_PATH` environment variable
3. **Default Path:** If neither of the above are set, a timestamped subfolder in the user's Documents folder is used: `~/Documents/research/YYYY-MM-DDTHH-MM-SS/`
**Setting via `.env` file:**
```env
RESEARCH_OUTPUT_PATH="/path/to/your/research/folder"
```
**Or directly in command line:**
```bash
RESEARCH_OUTPUT_PATH="/path/to/your/research/folder" TAVILY_API_KEY="tvly-YOUR_KEY" npx @pinkpixel/deep-research-mcp
```
### 4\. Timeout and Performance Configuration (Optional)
You can configure timeout and performance settings via environment variables to optimize the tool for your specific use case or deployment environment:
**Available Environment Variables:**
- `SEARCH_TIMEOUT` - Timeout in seconds for Tavily search requests (default: 60)
- `CRAWL_TIMEOUT` - Timeout in seconds for Tavily crawl requests (default: 180)
- `MAX_SEARCH_RESULTS` - Maximum number of search results to retrieve (default: 7)
- `CRAWL_MAX_DEPTH` - Maximum crawl depth from base URL (default: 1)
- `CRAWL_LIMIT` - Maximum number of URLs to crawl per source (default: 10)
**Setting via `.env` file:**
```env
SEARCH_TIMEOUT=120
CRAWL_TIMEOUT=300
MAX_SEARCH_RESULTS=10
CRAWL_MAX_DEPTH=2
CRAWL_LIMIT=15
```
**Or directly in command line:**
```bash
SEARCH_TIMEOUT=120 CRAWL_TIMEOUT=300 TAVILY_API_KEY="tvly-YOUR_KEY" npx @pinkpixel/deep-research-mcp
```
**When to adjust these settings:**
- **Increase timeouts** if you're experiencing timeout errors in LibreChat or other MCP clients
- **Decrease timeouts** for faster responses when working with simpler queries
- **Increase limits** for more comprehensive research (but expect longer processing times)
- **Decrease limits** for faster processing with lighter resource usage
### 5\. File Writing Configuration (Optional)
The server includes a secure file writing tool that allows LLMs to save research findings directly to files. This feature is **disabled by default** for security reasons.
**Security Features:**
- File writing must be explicitly enabled via `FILE_WRITE_ENABLED=true`
- Directory restrictions via `ALLOWED_WRITE_PATHS` (defaults to user home directory)
- Line limits per write operation to prevent abuse
- Path validation and sanitization
- Automatic directory creation
**Configuration:**
```env
FILE_WRITE_ENABLED=true
ALLOWED_WRITE_PATHS=/home/user/research,/home/user/documents,/tmp/research
FILE_WRITE_LINE_LIMIT=500
```
**Usage Example:**
Once enabled, LLMs can use the `write-research-file` tool to save content:
```json
{
"tool": "write-research-file",
"arguments": {
"file_path": "/home/user/research/quantum-computing-report.md",
"content": "# Quantum Computing Research Report\n\n...",
"mode": "rewrite"
}
}
```
**Security Considerations:**
- Only enable file writing in trusted environments
- Use specific directory restrictions rather than allowing system-wide access
- Monitor file operations through server logs
- Consider using read-only directories for sensitive systems
## Running the Server
* **Development (with auto-reload):**
If you've cloned the repository and are in the project directory:
```bash
npm run dev
```
This uses `nodemon` and `ts-node` to watch for changes and restart the server.
* **Production/Standalone:**
First, build the TypeScript code:
```bash
npm run build
```
Then, start the server:
```bash
npm start
```
* **With NPX or Global Install:**
(Ensure environment variables are set as described in Configuration)
```bash
npx @pinkpixel/deep-research-mcp
```
or if globally installed:
```bash
deep-research-mcp
```
The server will listen for MCP requests on stdio.
## How It Works
1. An LLM or AI agent makes a `CallToolRequest` to this MCP server, specifying the `deep-research-tool` and providing a query and other optional parameters.
2. The `deep-research-tool` first performs a Tavily Search to find relevant web sources.
3. It then uses Tavily Crawl to extract detailed content from each of these sources.
4. All gathered information (search snippets, crawled content, image URLs) is aggregated.
5. The chosen documentation prompt (default, ENV, or tool argument) is included.
6. The server returns a single JSON string containing all this structured data.
7. The calling LLM/agent uses this JSON output, guided by the `documentation_instructions`, to generate a comprehensive markdown document.
## Using the `deep-research-tool`
This is the primary tool exposed by the server.
### Output Structure
The tool returns a JSON string with the following structure:
```json
{
"documentation_instructions": "string", // The detailed prompt for the LLM to generate the markdown.
"original_query": "string", // The initial query provided to the tool.
"search_summary": "string | null", // An LLM-generated answer/summary from Tavily's search phase (if include_answer was true).
"research_data": [ // Array of findings, one element per source.
{
"search_rank": "number",
"original_url": "string", // URL of the source found by search.
"title": "string", // Title of the web page.
"initial_content_snippet": "string",// Content snippet from the initial search result.
"search_score": "number | undefined",// Relevance score from Tavily search.
"published_date": "string | undefined",// Publication date (if 'news' topic and available).
"crawled_data": [ // Array of pages crawled starting from original_url.
{
"url": "string", // URL of the specific page crawled.
"raw_content": "string | null", // Rich, extracted content from this page.
"images": ["string", "..."] // Array of image URLs found on this page.
}
],
"crawl_errors": ["string", "..."] // Array of error messages if crawling this source failed or had issues.
}
// ... more sources
],
"output_path": "string" // Path where research documents and images should be saved.
}
```
### Input Parameters
The `deep-research-tool` accepts the following parameters in its `arguments` object:
#### General Parameters
* `query` (string, **required**): The main research topic or question.
* `documentation_prompt` (string, optional): Custom prompt for LLM documentation generation.
* *Description:* If provided, this prompt will be used by the LLM. It overrides both the `DOCUMENTATION_PROMPT` environment variable and the server's built-in default prompt. If not provided here, the server checks the environment variable, then falls back to the default.
* `output_path` (string, optional): Path where generated research documents and images should be saved.
* *Description:* If provided, this path will be used for saving research outputs. It overrides the `RESEARCH_OUTPUT_PATH` environment variable. If neither is set, a timestamped folder in the user's Documents directory will be used.
#### Search Parameters (for Tavily Search API)
* `search_depth` (string, optional, default: `"advanced"`): Depth of the initial Tavily search.
* *Options:* `"basic"`, `"advanced"`. Advanced search is tailored for more relevant sources.
* `topic` (string, optional, default: `"general"`): Category for the Tavily search.
* *Options:* `"general"`, `"news"`.
* `days` (number, optional): For `topic: "news"`, the number of days back from the current date to include search results.
* `time_range` (string, optional): Time range for search results (e.g., `"d"` for day, `"w"` for week, `"m"` for month, `"y"` for year).
* `max_search_results` (number, optional, default: `7`): Maximum number of search results to retrieve and consider for crawling (1-20).
* `chunks_per_source` (number, optional, default: `3`): For `search_depth: "advanced"`, the number of content chunks to retrieve from each source (1-3).
* `include_search_images` (boolean, optional, default: `false`): Include a list of query-related image URLs from the initial search.
* `include_search_image_descriptions` (boolean, optional, default: `false`): Include image descriptions along with URLs from the initial search.
* `include_answer` (boolean or string, optional, default: `false`): Include an LLM-generated answer from Tavily based on search results.
* *Options:* `true` (implies `"basic"`), `false`, `"basic"`, `"advanced"`.
* `include_raw_content_search` (boolean, optional, default: `false`): Include the cleaned and parsed HTML content of each initial search result.
* `include_domains_search` (array of strings, optional, default: `[]`): A list of domains to specifically include in the search results.
* `exclude_domains_search` (array of strings, optional, default: `[]`): A list of domains to specifically exclude from the search results.
* `search_timeout` (number, optional, default: `60`): Timeout in seconds for Tavily search requests.
#### Crawl Parameters (for Tavily Crawl API - applied to each URL from search)
* `crawl_max_depth` (number, optional, default: `1`): Max depth of the crawl from the base URL. `0` means only the base URL, `1` means the base URL and links found on it, etc.
* `crawl_max_breadth` (number, optional, default: `5`): Max number of links to follow per level of the crawl tree (i.e., per page).
* `crawl_limit` (number, optional, default: `10`): Total number of links the crawler will process starting from a single root URL before stopping.
* `crawl_instructions` (string, optional): Natural language instructions for the crawler for how to approach crawling the site.
* `crawl_select_paths` (array of strings, optional, default: `[]`): Regex patterns to select only URLs with specific path patterns for crawling (e.g., `"/docs/.*"`).
* `crawl_select_domains` (array of strings, optional, default: `[]`): Regex patterns to restrict crawling to specific domains or subdomains (e.g., `"^docs\\.example\\.com$"`). If `crawl_allow_external` is false (default) and this is empty, crawling is focused on the domain of the URL being crawled. This overrides that focus.
* `crawl_exclude_paths` (array of strings, optional, default: `[]`): Regex patterns to exclude URLs with specific path patterns from crawling.
* `crawl_exclude_domains` (array of strings, optional, default: `[]`): Regex patterns to exclude specific domains or subdomains from crawling.
* `crawl_allow_external` (boolean, optional, default: `false`): Whether to allow the crawler to follow links to external domains.
* `crawl_include_images` (boolean, optional, default: `true`): Whether to extract image URLs from the crawled pages.
* `crawl_categories` (array of strings, optional, default: `[]`): Filter URLs for crawling using predefined categories (e.g., `"Blog"`, `"Documentation"`, `"Careers"`). Refer to Tavily Crawl API for all options.
* `crawl_extract_depth` (string, optional, default: `"advanced"`): Depth of content extraction during crawl.
* *Options:* `"basic"`, `"advanced"`. Advanced retrieves more data (tables, embedded content) but may have higher latency.
* `crawl_timeout` (number, optional, default: `180`): Timeout in seconds for each Tavily Crawl request.
## Understanding Documentation Prompt Precedence
The `documentation_prompt` is an essential part of this tool as it guides the LLM in how to format and structure the research findings. The system uses this precedence to determine which prompt to use:
1. If the LLM/agent provides a `documentation_prompt` parameter in the tool call:
- This takes highest precedence and will be used regardless of other settings
- This allows end users to customize documentation format through natural language requests to the LLM
2. If no parameter is provided in the tool call, but the `DOCUMENTATION_PROMPT` environment variable is set:
- The environment variable value will be used
- This is useful for system administrators or developers to set a consistent prompt across all tool calls
3. If neither of the above are set:
- The comprehensive built-in default prompt is used
- This default prompt is designed to produce high-quality technical documentation
This flexibility allows:
- End users to customize documentation through natural language requests to the LLM
- Developers to set system-wide defaults
- A fallback to well-designed defaults if no customization is provided
## Working with Output Paths
The `output_path` parameter determines where research documents and images will be saved. This is especially important when the LLM needs to:
1. Save generated markdown documents
2. Download and save images from the research
3. Create supplementary files or resources
The system follows this precedence to determine the output path:
1. If the LLM/agent provides an `output_path` parameter in the tool call:
- This takes highest precedence
- Allows end users to specify a custom save location through natural language requests
2. If no parameter is provided, but the `RESEARCH_OUTPUT_PATH` environment variable is set:
- The environment variable value will be used
- Good for system-wide configuration
3. If neither of the above are set:
- A default path with timestamp is used: `~/Documents/research/YYYY-MM-DDTHH-MM-SS/`
- This prevents overwriting previous research results
The LLM receives the final resolved output path in the tool's response JSON as the `output_path` field, so it always knows where to save generated content.
**Note for LLMs:** When processing the tool results, check the `output_path` field to determine where to save any files you generate. This path is guaranteed to be present in the response.
## Instructions for the LLM
As an LLM using the output of the `deep-research-tool`, your primary goal is to generate a comprehensive, accurate, and well-structured markdown document that addresses the `original_query`.
**Key Steps:**
1. **Parse the JSON Output:** The tool will return a JSON string. Parse this to access its fields: `documentation_instructions`, `original_query`, `search_summary`, and `research_data`.
2. **Adhere to `documentation_instructions`:** This field contains the **primary set of guidelines** for creating the markdown document. It will either be the server's extensive default prompt (focused on high-quality technical documentation) or a custom prompt provided by the user. **Follow these instructions meticulously** regarding content quality, style, structure, markdown formatting, and handling of technical details.
3. **Utilize `research_data` for Content:**
* The `research_data` array is your main source of information. Each object in this array represents a distinct web source.
* For each source, pay attention to its `title`, `original_url`, and `initial_content_snippet` for context.
* The core information for your document will come from the `crawled_data` array within each source. Specifically, the `raw_content` field of each `crawled_data` object contains the rich text extracted from that page.
* Synthesize information *across multiple sources* in `research_data` to provide a comprehensive view. Do not just list content from one source after another.
* If `crawled_data[].images` are present, you can mention them or list their URLs if appropriate and aligned with the `documentation_instructions`.
* If `crawl_errors` are present for a source, it means that particular source might be incomplete. You can choose to note this subtly if it impacts coverage.
4. **Address the `original_query`:** The final document must comprehensively answer or address the `original_query`.
5. **Leverage `search_summary`:** If the `search_summary` field is present (from Tavily's `include_answer` feature), it can serve as a helpful starting point, an executive summary, or a way to frame the introduction. However, the main body of your document should be built from the more detailed `research_data`.
6. **Synthesize, Don't Just Copy:** Your role is not to dump the `raw_content`. You must process, understand, synthesize, rephrase, and organize the information from various sources into a coherent, well-written document that flows logically, as per the `documentation_instructions`.
7. **Markdown Formatting:** Strictly follow the markdown formatting guidelines provided in the `documentation_instructions` (headings, lists, code blocks, emphasis, links, etc.).
8. **Handling Large Volumes:** The `research_data` can be extensive. If you have limitations on processing large inputs, the system calling you might need to provide you with chunks of the `research_data` or make multiple requests to you to build the document section by section. The `deep-research-tool` itself will always attempt to return all collected data in one JSON output.
9. **Technical Accuracy:** Preserve all technical details, code examples, and important specifics from the source content, as mandated by the `documentation_instructions`. Do not oversimplify.
10. **Visual Appeal (If Instructed):** If the `documentation_instructions` include guidelines for visual appeal (like colored text or emojis using HTML), apply them judiciously.
**Example LLM Invocation Thought Process:**
*Agent to LLM:*
"Okay, I've called the `deep-research-tool` with the query '\<em\>What are the latest advancements in quantum-resistant cryptography?\</em\>' and requested 5 sources with advanced crawling. Here's the JSON output:
`{ ... (JSON output from the tool) ... }`
Now, using the `documentation_instructions` provided within this JSON, and the `research_data`, please generate a comprehensive markdown document on 'The Latest Advancements in Quantum-Resistant Cryptography'. Ensure you follow all formatting and content guidelines from the instructions."
## Example `CallToolRequest` (Conceptual Arguments)
An agent might make a call to the MCP server with arguments like this:
```json
{
"name": "deep-research-tool",
"arguments": {
"query": "Explain the architecture of modern data lakes and data lakehouses.",
"max_search_results": 5,
"search_depth": "advanced",
"topic": "general",
"crawl_max_depth": 1,
"crawl_extract_depth": "advanced",
"include_answer": true,
"documentation_prompt": "Generate a highly technical whitepaper. Start with an abstract, then introduction, detailed sections for data lakes, data lakehouses, comparison, use cases, and a future outlook. Use academic tone. Include all diagrams mentioned by URL if possible as [Diagram: URL].",
"output_path": "C:/Users/username/Documents/research/datalakes-whitepaper"
}
}
```
## Troubleshooting
* **API Key Errors:** Ensure `TAVILY_API_KEY` is correctly set and valid.
* **SDK Issues:** Make sure `@modelcontextprotocol/sdk` and `@tavily/core` are installed and up-to-date.
* **No Output/Errors:** Check the server console logs for any error messages. Increase verbosity if needed for debugging.
## Changelog
### v0.1.2 (2024-05-10)
- Added configurable output path functionality
- Fixed type errors with latest Tavily SDK
- Added comprehensive documentation about output paths
- Added logo and improved documentation
### v0.1.1
- Initial public release
## Contributing
Contributions are welcome! Please feel free to submit issues, fork the repository, and create pull requests.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
```
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
```dockerfile
# Generated by https://smithery.ai. See: https://smithery.ai/docs/build/project-config
# Builder stage
FROM node:lts-alpine AS builder
WORKDIR /app
# Install all dependencies including dev
COPY package.json package-lock.json tsconfig.json ./
COPY src ./src
RUN npm install
RUN npm run build
# Runtime stage
FROM node:lts-alpine
WORKDIR /app
# Install production dependencies
COPY package.json package-lock.json ./
RUN npm install --production
# Copy built files and assets
COPY --from=builder /app/dist ./dist
COPY assets ./assets
# Set environment
ENV NODE_ENV=production
# Start the server
CMD ["node", "dist/index.js"]
```
--------------------------------------------------------------------------------
/example_config.json:
--------------------------------------------------------------------------------
```json
{
"mcpServers": {
"deep-research": {
"command": "npx",
"args": [
"-y",
"@pinkpixel/deep-research-mcp"
],
"env": {
"TAVILY_API_KEY": "tvly-YOUR_ACTUAL_API_KEY_HERE",
"DOCUMENTATION_PROMPT": "Your custom, detailed instructions for the LLM on how to generate markdown documents from the research data...",
"SEARCH_TIMEOUT": "120",
"CRAWL_TIMEOUT": "300",
"MAX_SEARCH_RESULTS": "10",
"CRAWL_MAX_DEPTH": "2",
"CRAWL_LIMIT": "15",
"FILE_WRITE_ENABLED": "true",
"ALLOWED_WRITE_PATHS": "/home/user/research,/home/user/documents",
"FILE_WRITE_LINE_LIMIT": "300"
}
}
}
}
```
--------------------------------------------------------------------------------
/tsconfig.json:
--------------------------------------------------------------------------------
```json
{
"compilerOptions": {
"target": "ES2022",
"module": "NodeNext",
"moduleResolution": "NodeNext",
"baseUrl": "./",
"outDir": "./dist",
"rootDir": "./src",
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"forceConsistentCasingInFileNames": true,
"resolveJsonModule": true,
"declaration": true,
"declarationMap": true,
"sourceMap": true,
"noUnusedLocals": true,
"noUnusedParameters": true,
"noImplicitReturns": true,
"noFallthroughCasesInSwitch": true,
"allowSyntheticDefaultImports": true
},
"include": [
"src/**/*.ts"
],
"exclude": [
"node_modules",
"dist",
"**/*.spec.ts",
"**/*.test.ts"
]
}
```
--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------
```json
{
"name": "@pinkpixel/deep-research-mcp",
"version": "1.3.1",
"description": "A Model Context Protocol (MCP) server for performing deep web research using Tavily Search and Crawl APIs, preparing structured data for LLM-powered documentation generation.",
"main": "dist/index.js",
"types": "dist/index.d.ts",
"bin": {
"deep-research-mcp": "dist/index.js"
},
"scripts": {
"build": "tsc",
"start": "node dist/index.js",
"dev": "nodemon --watch src -e ts --exec ts-node src/index.ts",
"serve": "npm run build && npm start",
"lint": "eslint src/**/*.ts",
"format": "eslint src/**/*.ts --fix",
"prepublishOnly": "npm run build"
},
"repository": {
"type": "git",
"url": "git+https://github.com/pinkpixel-dev/deep-research-mcp.git"
},
"keywords": [
"mcp",
"model-context-protocol",
"tavily",
"ai",
"llm",
"research",
"web-crawl",
"documentation"
],
"author": "PinkPixel",
"license": "MIT",
"dependencies": {
"@modelcontextprotocol/sdk": "^1.11.1",
"@tavily/core": "^0.5.2",
"dotenv": "^16.5.0"
},
"devDependencies": {
"@types/node": "^22.15.17",
"@typescript-eslint/eslint-plugin": "^8.32.0",
"@typescript-eslint/parser": "^8.32.0",
"eslint": "^9.26.0",
"nodemon": "^3.1.10",
"ts-node": "^10.9.2",
"typescript": "^5.8.3"
},
"engines": {
"node": ">=18.0.0"
},
"publishConfig": {
"access": "public"
},
"files": [
"dist",
"README.md",
"LICENSE",
"assets",
"OVERVIEW.md"
]
}
```
--------------------------------------------------------------------------------
/test-output/test-summary.md:
--------------------------------------------------------------------------------
```markdown
# Deep Research MCP Tool Test Summary
## Test Details
- **Date**: May 29, 2025
- **Query**: "Latest developments in quantum computing breakthroughs 2024"
- **Output Path**: C:/Users/sizzlebop/Desktop/projects/github/deep-research-mcp/test-output
## Test Parameters Used
- **Max Search Results**: 5
- **Search Depth**: Advanced
- **Crawl Max Depth**: 1
- **Crawl Limit**: 8
- **Include Answer**: True
- **Custom Documentation Prompt**: Technical report format
## Test Results ✅
### ✅ Tool Functionality
- Deep Research MCP tool executed successfully
- Retrieved comprehensive data from 5 sources
- Generated structured JSON output with all expected fields
### ✅ Search Performance
- Found relevant, high-quality sources about quantum computing in 2024
- Search summary provided: "In 2024, quantum computing saw significant breakthroughs in algorithms and error correction, pushing the boundaries of practical applications. Quantum machine learning and funding reached record highs."
### ✅ Crawl Performance
- Successfully crawled multiple pages from each source
- Retrieved detailed content from:
- Microtime.com (quantum computing overview)
- NetworkWorld.com (10 quantum milestones)
- IDTechEx.com (market research)
- No crawl errors reported
### ✅ Output Generation
- Created comprehensive 164-line technical report
- Properly structured with executive summary, company developments, technical details
- Saved to specified output path successfully
- Custom documentation prompt was followed correctly
### ✅ Data Quality
- Rich, detailed content from authoritative sources
- Current information from 2024
- Technical specifications and company details included
- Industry analysis and future projections covered
## Files Generated
1. `quantum-computing-breakthroughs-2024.md` - Full technical report (164 lines)
2. `test-summary.md` - This test summary
## Conclusion
The Deep Research MCP tool is working perfectly! ✨
- All core functionality operational
- High-quality research data retrieval
- Proper output formatting and file generation
- Custom prompts and output paths working as expected
```
--------------------------------------------------------------------------------
/test-file-writing.js:
--------------------------------------------------------------------------------
```javascript
// Simple test script to verify file writing functionality
const fs = require('fs');
const path = require('path');
const os = require('os');
// Test the file writing utility functions
async function testFileWriting() {
console.log('Testing file writing functionality...');
// Set environment variables for testing
process.env.FILE_WRITE_ENABLED = 'true';
process.env.FILE_WRITE_LINE_LIMIT = '50';
// Import the module after setting env vars
const { writeResearchFile, isPathAllowed, validateWritePath } = require('./dist/index.js');
const testDir = path.join(os.homedir(), 'test-deep-research');
const testFile = path.join(testDir, 'test-output.md');
try {
// Test path validation
console.log('✓ Testing path validation...');
const validPath = await validateWritePath(testFile);
console.log(`✓ Valid path: ${validPath}`);
// Test file writing
console.log('✓ Testing file writing...');
const testContent = '# Test File\n\nThis is a test of the file writing functionality.\n\n- Feature 1\n- Feature 2\n- Feature 3';
await writeResearchFile(testFile, testContent, 'rewrite');
console.log(`✓ File written successfully: ${testFile}`);
// Verify file exists and has correct content
if (fs.existsSync(testFile)) {
const content = fs.readFileSync(testFile, 'utf8');
if (content === testContent) {
console.log('✓ File content verified successfully');
} else {
console.log('✗ File content mismatch');
}
} else {
console.log('✗ File was not created');
}
// Test append mode
console.log('✓ Testing append mode...');
const appendContent = '\n\n## Additional Section\n\nThis was appended to the file.';
await writeResearchFile(testFile, appendContent, 'append');
console.log('✓ Content appended successfully');
// Clean up
fs.unlinkSync(testFile);
fs.rmdirSync(testDir);
console.log('✓ Test cleanup completed');
console.log('\n🎉 All file writing tests passed!');
} catch (error) {
console.error('✗ Test failed:', error.message);
// Clean up on error
try {
if (fs.existsSync(testFile)) fs.unlinkSync(testFile);
if (fs.existsSync(testDir)) fs.rmdirSync(testDir);
} catch (cleanupError) {
console.error('Cleanup error:', cleanupError.message);
}
}
}
// Only run if this file is executed directly
if (require.main === module) {
testFileWriting().catch(console.error);
}
module.exports = { testFileWriting };
```
--------------------------------------------------------------------------------
/smithery.yaml:
--------------------------------------------------------------------------------
```yaml
# Smithery configuration file: https://smithery.ai/docs/build/project-config
startCommand:
type: stdio
commandFunction:
# A JS function that produces the CLI command based on the given config to start the MCP on stdio.
|-
(config) => ({ command: 'node', args: ['dist/index.js'], env: { TAVILY_API_KEY: config.tavilyApiKey, ...(config.documentationPrompt !== undefined && { DOCUMENTATION_PROMPT: config.documentationPrompt }), ...(config.searchTimeout !== undefined && { SEARCH_TIMEOUT: config.searchTimeout.toString() }), ...(config.crawlTimeout !== undefined && { CRAWL_TIMEOUT: config.crawlTimeout.toString() }), ...(config.maxSearchResults !== undefined && { MAX_SEARCH_RESULTS: config.maxSearchResults.toString() }), ...(config.crawlMaxDepth !== undefined && { CRAWL_MAX_DEPTH: config.crawlMaxDepth.toString() }), ...(config.crawlLimit !== undefined && { CRAWL_LIMIT: config.crawlLimit.toString() }), ...(config.fileWriteEnabled !== undefined && { FILE_WRITE_ENABLED: config.fileWriteEnabled.toString() }), ...(config.allowedWritePaths !== undefined && { ALLOWED_WRITE_PATHS: config.allowedWritePaths }), ...(config.fileWriteLineLimit !== undefined && { FILE_WRITE_LINE_LIMIT: config.fileWriteLineLimit.toString() }) }} )
configSchema:
# JSON Schema defining the configuration options for the MCP.
type: object
required:
- tavilyApiKey
properties:
tavilyApiKey:
type: string
description: Tavily API key for authentication.
documentationPrompt:
type: string
description: Optional custom documentation prompt to override default.
searchTimeout:
type: number
default: 60
description: Optional timeout in seconds for search requests.
crawlTimeout:
type: number
default: 180
description: Optional timeout in seconds for crawl requests.
maxSearchResults:
type: number
default: 7
description: Optional maximum number of search results to retrieve.
crawlMaxDepth:
type: number
default: 1
description: Optional maximum crawl depth from source URL.
crawlLimit:
type: number
default: 10
description: Optional maximum number of URLs to crawl per source.
fileWriteEnabled:
type: boolean
default: false
description: Enable file writing tool.
allowedWritePaths:
type: string
description: Comma-separated allowed directories for file writing.
fileWriteLineLimit:
type: number
default: 200
description: Maximum lines per file write operation.
exampleConfig:
tavilyApiKey: tvly-EXAMPLE_KEY_12345
documentationPrompt: Generate a concise summary of key findings.
searchTimeout: 120
crawlTimeout: 300
maxSearchResults: 10
crawlMaxDepth: 2
crawlLimit: 15
fileWriteEnabled: false
allowedWritePaths: /home/user/research,/tmp
fileWriteLineLimit: 300
```
--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------
```markdown
# Changelog
All notable changes to the Deep Research MCP Server will be documented in this file.
## [1.2.1] - 2025-05-29
### Added
- Environment variable support for timeout and performance configuration
- `SEARCH_TIMEOUT` environment variable for configuring search request timeouts
- `CRAWL_TIMEOUT` environment variable for configuring crawl request timeouts
- `MAX_SEARCH_RESULTS` environment variable for setting maximum search results
- `CRAWL_MAX_DEPTH` environment variable for setting maximum crawl depth
- `CRAWL_LIMIT` environment variable for setting maximum URLs to crawl per source
- **NEW: File Writing Tool** - `write-research-file` tool for saving research content to files
- `FILE_WRITE_ENABLED` environment variable to enable/disable file writing (default: disabled)
- `ALLOWED_WRITE_PATHS` environment variable for directory restrictions (default: user home)
- `FILE_WRITE_LINE_LIMIT` environment variable for write operation limits (default: 200 lines)
- Secure file writing with path validation, directory creation, and permission controls
- Enhanced startup logging showing current timeout, limit, and file writing configurations
- Updated example configuration with new environment variables
### Fixed
- Timeout configuration now properly respects environment variables in addition to tool parameters
- LibreChat timeout issues can now be resolved by setting appropriate environment variables
### Changed
- Tool parameter precedence: tool arguments > environment variables > defaults
- Improved documentation with detailed timeout, performance, and file writing configuration guides
- Added comprehensive security documentation for file writing feature
### Security
- File writing feature disabled by default for security
- Directory-based access controls for file operations
- Path validation and sanitization to prevent directory traversal
- Configurable line limits to prevent abuse
## [1.2.0] - 2024-05-29
### Fixed
- Fixed issue with console logging interfering with MCP protocol by replacing all `console.log` and `console.debug` calls with `console.error`
- Fixed proper response structure to match MCP specifications, removing the extra `tools` wrapper from responses
- Fixed type errors in Tavily SDK parameters, ensuring correct typing for `includeAnswer` and `timeRange`
- Fixed parameter handling for crawl API, ensuring required parameters are always provided
### Added
- Added progress tracking during long-running operations
- Added memory usage tracking and optimization
- Added hardware acceleration option with `hardware_acceleration` parameter
- Added proper domain validation to prevent excessive crawling
- Added timeout handling for both search and crawl operations
### Changed
- Reduced default crawl limits for better performance:
- Maximum depth reduced to 2 (from unlimited)
- Default breadth reduced to 10 (from 20)
- Default limit reduced to 10 URLs (from 50)
- Improved error handling and reporting
- Updated documentation to reflect parameter changes
## [1.1.0] - 2024-05-01
### Added
- Initial public release
- Integration with Tavily Search and Crawl APIs
- MCP compliant tool interface
- Structured JSON output for LLM consumption
- Configurable documentation prompt
- Configurable output path
```
--------------------------------------------------------------------------------
/OVERVIEW.md:
--------------------------------------------------------------------------------
```markdown
<!--- ✨ OVERVIEW.md for Deep Research MCP Server (Last Updated: May 29, 2025) ✨ --->
<h1 align="center"><span style="color:#7f5af0;">Deep Research MCP Server</span> ✨</h1>
<p align="center"><b><span style="color:#2cb67d;">Dream it, Pixel it</span></b> — <i>by Pink Pixel</i></p>
---
## <span style="color:#7f5af0;">🚀 Project Purpose</span>
The <b>Deep Research MCP Server</b> is a <b>Model Context Protocol (MCP)</b> compliant server for <span style="color:#2cb67d;">comprehensive, up-to-date web research</span>. It leverages <b>Tavily's Search & Crawl APIs</b> to gather, aggregate, and structure information for <b>LLM-powered documentation generation</b>.
---
## <span style="color:#7f5af0;">🧩 Architecture Overview</span>
- <b>MCP Server</b> (Node.js, TypeScript)
- <b>Stdio Transport</b> for agent/server communication
- <b>Tavily API Integration</b> (Search + Crawl)
- <b>Configurable Documentation Prompt</b> (default, ENV, or per-request)
- <b>Structured JSON Output</b> for LLMs
<details>
<summary><b>Architecture Diagram (Text)</b></summary>
```
[LLM/Agent]
│
▼
[Deep Research MCP Server]
│ ├─> Tavily Search API
│ └─> Tavily Crawl API
▼
[Aggregated JSON Output + Documentation Instructions]
```
</details>
---
## <span style="color:#7f5af0;">✨ Main Features</span>
- <b>Multi-Step Research</b>: Combines AI-powered search with deep content crawling
- <b>Structured Output</b>: JSON with query, search summary, findings, and doc instructions
- <b>Configurable Prompts</b>: Override documentation style via ENV or per-request
- <b>Configurable Output Path</b>: Specify where research documents and images should be saved
- <b>Granular Control</b>: Fine-tune search/crawl with many parameters
- <b>MCP Compliant</b>: Plug-and-play for agent ecosystems
- <b>Resource Optimized</b>: Memory tracking, auto-garbage collection, and hardware acceleration support
---
## <span style="color:#7f5af0;">🛠️ Key Dependencies</span>
- <b>@modelcontextprotocol/sdk</b> (v1.11.1) — MCP server framework
- <b>@tavily/core</b> (v0.5.2) — Tavily Search & Crawl APIs
- <b>dotenv</b> (v16.5.0) — Environment variable management
---
## <span style="color:#7f5af0;">📁 File Structure</span>
```
deep-research-mcp/
├── dist/ # Compiled JS output
├── src/
│ └── index.ts # Main server logic
├── assets/ # Project assets (logo)
├── README.md # Full documentation
├── OVERVIEW.md # (You are here!)
├── example_config.json # Example MCP config
├── package.json # Project metadata & dependencies
├── tsconfig.json # TypeScript config
├── CHANGELOG.md # Version history and changes
```
---
## <span style="color:#7f5af0;">⚡ Usage & Integration</span>
- <b>Install & Run:</b>
- <code>npx @pinkpixel/deep-research-mcp</code> <span style="color:#2cb67d;">(quickest)</span>
- Or clone & <code>npm install</code>, then <code>npm start</code>
- <b>Configure:</b> Set <code>TAVILY_API_KEY</code> in your environment (see <b>README.md</b>)
- <b>Integrate:</b> Connect to your LLM/agent via MCP stdio
- <b>Customize:</b> Override documentation prompt via ENV or tool argument
- <b>Output:</b> Specify where research documents and images should be saved
- <b>Performance:</b> Enable hardware acceleration with <code>hardware_acceleration: true</code> parameter
---
## <span style="color:#7f5af0;">🔄 Recent Updates</span>
- <b>Optimized Resource Usage</b>: Reduced default crawl limits to prevent excessive memory consumption
- <b>MCP Protocol Compliance</b>: Fixed response structure to properly follow MCP specifications
- <b>Improved Error Handling</b>: Better error reporting and handling of timeouts
- <b>Performance Optimizations</b>: Added optional hardware acceleration (WebGPU) support
- <b>Smarter Crawling</b>: Added domain validation to focus crawling and prevent overly broad searches
<i>See <b>CHANGELOG.md</b> for complete version history</i>
---
## <span style="color:#7f5af0;">📚 More Info</span>
- See <b>README.md</b> for full usage, parameters, and troubleshooting
- Example config: <b>example_config.json</b>
- License: <b>MIT</b>
- Node.js: <b>>=18.0.0 required</b>
---
<p align="center"><span style="color:#7f5af0;">Made with ❤️ by Pink Pixel</span></p>
```
--------------------------------------------------------------------------------
/src/index.ts:
--------------------------------------------------------------------------------
```typescript
#!/usr/bin/env node
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import {
CallToolRequestSchema,
ListToolsRequestSchema,
Tool,
McpError,
ErrorCode,
} from "@modelcontextprotocol/sdk/types.js";
import { tavily as createTavilyClient } from "@tavily/core";
import type { TavilyClient } from "@tavily/core"; // For typing the Tavily client instance
import dotenv from "dotenv";
import fs from "fs/promises";
import path from "path";
import os from "os";
dotenv.config(); // Load .env file if present (for local development)
const TAVILY_API_KEY = process.env.TAVILY_API_KEY;
if (!TAVILY_API_KEY) {
throw new Error(
"TAVILY_API_KEY environment variable is required. Please set it in your .env file or execution environment."
);
}
const ENV_DOCUMENTATION_PROMPT = process.env.DOCUMENTATION_PROMPT;
// Environment variables for timeout configuration
const ENV_SEARCH_TIMEOUT = process.env.SEARCH_TIMEOUT ? parseInt(process.env.SEARCH_TIMEOUT, 10) : undefined;
const ENV_CRAWL_TIMEOUT = process.env.CRAWL_TIMEOUT ? parseInt(process.env.CRAWL_TIMEOUT, 10) : undefined;
const ENV_MAX_SEARCH_RESULTS = process.env.MAX_SEARCH_RESULTS ? parseInt(process.env.MAX_SEARCH_RESULTS, 10) : undefined;
const ENV_CRAWL_MAX_DEPTH = process.env.CRAWL_MAX_DEPTH ? parseInt(process.env.CRAWL_MAX_DEPTH, 10) : undefined;
const ENV_CRAWL_LIMIT = process.env.CRAWL_LIMIT ? parseInt(process.env.CRAWL_LIMIT, 10) : undefined;
// Environment variables for file writing configuration
const ENV_ALLOWED_WRITE_PATHS = process.env.ALLOWED_WRITE_PATHS ? process.env.ALLOWED_WRITE_PATHS.split(',').map(p => p.trim()) : undefined;
const ENV_FILE_WRITE_ENABLED = process.env.FILE_WRITE_ENABLED === 'true';
const ENV_FILE_WRITE_LINE_LIMIT = process.env.FILE_WRITE_LINE_LIMIT ? parseInt(process.env.FILE_WRITE_LINE_LIMIT, 10) : 200;
const DEFAULT_DOCUMENTATION_PROMPT = `
For all queries, search the web extensively to acquire up to date information. Research several sources. Use all the tools provided to you to gather as much context as possible.
Adhere to these guidelines when creating documentation:
Include screenshots when appropriate
1. CONTENT QUALITY:
Clear, concise, and factually accurate
Structured with logical organization
Comprehensive coverage of topics
Technical precision and attention to detail
Free of unnecessary commentary or humor
DOCUMENTATION STYLE:
Professional and objective tone
Thorough explanations with appropriate technical depth
Well-formatted with proper headings, lists, and code blocks
Consistent terminology and naming conventions
Clean, readable layout without extraneous elements
CODE QUALITY:
Clean, maintainable, and well-commented code
Best practices and modern patterns
Proper error handling and edge case considerations
Optimal performance and efficiency
Follows language-specific style guidelines
TECHNICAL EXPERTISE:
Programming languages and frameworks
System architecture and design patterns
Development methodologies and practices
Security considerations and standards
Industry-standard tools and technologies
Documentation guidelines
Create an extremely detailed, comprehensive markdown document about a given topic when asked. Follow the below instructions:
Start with an INTRODUCTION and FIRST MAJOR SECTIONS of the topic, covering:
Overview and definition of the topic
Historical background and origins
Core concepts and fundamentals
Early developments and pioneers
Create a strong foundation and then continue with ADDITIONAL SECTIONS:
Advanced concepts and developments
Modern applications and technologies
Current trends and future directions
Challenges and limitations
IMPORTANT GUIDELINES:
Create a SUBSTANTIAL document section (2000-3000 words for this section)
PRESERVE all technical details, code examples, and important specifics from the sources
MAINTAIN the depth and complexity of the original content
DO NOT simplify or omit technical information
Include all relevant examples, specifications, and implementation details
Format with proper markdown headings (## for main sections, ### for subsections).
Include examples and code snippets Maintain relationships between concepts
Avoid omitting "less important" sections that might be critical for complete documentation
Preserve hierarchical structures in documentation
Guidelines for Proper Markdown Formatting:
Document Structure:
Use an informative title at the top of the document.
Include a brief introduction to the topic.
Organize content into logical sections using headings.
Headings:
Use # Heading for the main title.
Use ## Heading for major sections.
Use ### Heading for subsections.
Limit headings to three levels for clarity.
Text Formatting:
Use *italic text* for emphasis.
Use **bold text** for strong emphasis.
Combine emphasis with ***bold and italic***.
Lists:
Use -, +, or * for unordered lists.
Use 1., 2., etc., for ordered lists.
Avoid ending list items with periods unless they contain multiple sentences.
Links and Images:
Use [Descriptive Text](https://example.com) for links.
Use Alt Text for images.
Ensure descriptive text is informative.
Code Blocks:
Use triple backticks \`\`\`to enclose code blocks. Specify the programming language if necessary (e.g.,\`\`\`
Line Breaks and Paragraphs:
Use a blank line to separate paragraphs.
Use two spaces at the end of a line for a line break.
Special Characters:
Escape special characters with a backslash (\\) if needed (e.g., \\# for a literal #).
Metadata:
Include metadata at the top of the document if necessary (e.g., author, date).
Consistency and Style:
Follow a consistent naming convention for files and directories.
Use a project-specific style guide if available.
Additional Tips:
Use Markdown extensions if supported by your platform (e.g., tables, footnotes).
Preview your documentation regularly to ensure formatting is correct.
Use linting tools to enforce style and formatting rules.
Output Requirements:
The documentation should be in Markdown format.
Ensure all links and images are properly formatted and functional.
The document should be easy to navigate with clear headings and sections.
By following these guidelines, you produce high-quality Markdown documentation that is both informative and visually appealing.
To make your Markdown document visually appealing with colored text and emojis, you can incorporate the following elements:
Using Colored Text
Since Markdown does not natively support colored text, you can use HTML tags to achieve this:
HTML \`<span>\` Tag:
Use the \`<span>\` tag with inline CSS to change the text color. For example:
html
\`<span style="color:red;">\`This text will be red.
You can replace red with any color name or hex code (e.g., #FF0000 for red).
HTML \`<font>\` Tag (Deprecated):
Although the \`<font>\` tag is deprecated, it can still be used in some environments:
html
\`<font color="red">\`This text will be red.\`</font>\`
However, it's recommended to use the \`<span>\` tag for better compatibility.
Incorporating Emojis
Emojis can add visual appeal and convey emotions or ideas effectively:
Copy and Paste Emojis:
You can copy emojis from sources like Emojipedia and paste them directly into your Markdown document. For example:
markdown
This is a happy face 😊.
Emoji Shortcodes:
Some platforms support emoji shortcodes, which vary by application. For example, on GitHub, you can use :star: for a star emoji ⭐️.
Best Practices for Visual Appeal
Consistency: Use colors and emojis consistently throughout the document to maintain a cohesive look.
Accessibility: Ensure that colored text has sufficient contrast with the background and avoid relying solely on color to convey information3.
Purposeful Use: Use colors and emojis to highlight important information or add visual interest, but avoid overusing them.
Example of a Visually Appealing Markdown Document
markdown
# Introduction to Python 🐍
## What is Python?
Python is a versatile and widely-used programming language. It is \`<span style="color:blue;">\`easy to learn and has a vast number of libraries for various tasks.
### Key Features
- **Easy to Read:** Python's syntax is designed to be clear and concise.
- **Versatile:** From web development to data analysis, Python can do it all 📊.
- **Large Community:** Python has a large and active community, which means there are many resources available 🌟.
## Conclusion
Python is a great language for beginners and experienced developers alike. Start your Python journey today! 🚀
This example incorporates colored text and emojis to enhance readability and visual appeal.
VERY IMPORTANT - Remember that your goal is to produce high-quality, clean, professional technical content, documentation and code that meets the highest standards, without unnecessary commentary, following all of the above guidelines
`;
// Interface for the write-research-file tool arguments
interface WriteResearchFileArguments {
file_path: string;
content: string;
mode?: 'rewrite' | 'append';
}
// Interface for the arguments our deep-research-tool will accept
interface DeepResearchToolArguments {
query: string;
search_depth?: "basic" | "advanced";
topic?: "general" | "news";
days?: number;
time_range?: string;
max_search_results?: number;
chunks_per_source?: number;
include_search_images?: boolean;
include_search_image_descriptions?: boolean;
include_answer?: boolean | "basic" | "advanced";
include_raw_content_search?: boolean;
include_domains_search?: string[];
exclude_domains_search?: string[];
search_timeout?: number;
crawl_max_depth?: number;
crawl_max_breadth?: number;
crawl_limit?: number;
crawl_instructions?: string;
crawl_select_paths?: string[];
crawl_select_domains?: string[];
crawl_exclude_paths?: string[];
crawl_exclude_domains?: string[];
crawl_allow_external?: boolean;
crawl_include_images?: boolean;
crawl_categories?: string[];
crawl_extract_depth?: "basic" | "advanced";
crawl_timeout?: number;
documentation_prompt?: string; // For custom documentation instructions
hardware_acceleration?: boolean;
}
// Structure for storing combined search and crawl results for one source
interface CombinedResult {
search_rank: number;
original_url: string;
title: string;
initial_content_snippet: string; // Snippet from the search result
search_score?: number;
published_date?: string; // If topic is 'news'
crawled_data: CrawledPageData[];
crawl_errors: string[];
}
// Structure for data from a single crawled page
interface CrawledPageData {
url: string; // The specific URL that was crawled (could be same as original_url or deeper)
raw_content: string | null; // The main content extracted from this page
images?: string[]; // URLs of images found on this page
}
// Add Tavily API parameter interfaces based on documentation
interface TavilySearchParams {
query: string;
searchDepth?: "basic" | "advanced";
topic?: "general" | "news";
days?: number;
timeRange?: string;
maxResults?: number;
chunksPerSource?: number;
includeImages?: boolean;
includeImageDescriptions?: boolean;
includeAnswer?: boolean | "basic" | "advanced";
includeRawContent?: boolean;
includeDomains?: string[];
excludeDomains?: string[];
timeout?: number;
}
// Add the necessary TavilyCrawlCategory type
type TavilyCrawlCategory =
| "Careers"
| "Blog"
| "Documentation"
| "About"
| "Pricing"
| "Community"
| "Developers"
| "Contact"
| "Media";
// Update interface with all required fields
interface TavilyCrawlParams {
url: string;
maxDepth?: number;
maxBreadth?: number;
limit?: number;
instructions?: string;
selectPaths?: string[];
selectDomains?: string[];
excludePaths?: string[];
excludeDomains?: string[];
allowExternal?: boolean;
includeImages?: boolean;
categories?: TavilyCrawlCategory[];
extractDepth?: "basic" | "advanced";
timeout?: number;
}
// File writing utility functions
function normalizePath(p: string): string {
return path.normalize(p).toLowerCase();
}
async function isPathAllowed(filePath: string): Promise<boolean> {
if (!ENV_FILE_WRITE_ENABLED) {
return false; // File writing disabled
}
if (!ENV_ALLOWED_WRITE_PATHS || ENV_ALLOWED_WRITE_PATHS.length === 0) {
// If no allowed paths specified, allow writing to user's home directory and subdirectories
const userHome = os.homedir();
let normalizedPathToCheck = normalizePath(path.resolve(filePath));
let normalizedUserHome = normalizePath(userHome);
// Remove trailing separators
if (normalizedPathToCheck.slice(-1) === path.sep) {
normalizedPathToCheck = normalizedPathToCheck.slice(0, -1);
}
if (normalizedUserHome.slice(-1) === path.sep) {
normalizedUserHome = normalizedUserHome.slice(0, -1);
}
// Check if path is exactly the home directory or a subdirectory
return normalizedPathToCheck === normalizedUserHome ||
normalizedPathToCheck.startsWith(normalizedUserHome + path.sep);
}
// Check if the file path is within any of the allowed directories
let normalizedPathToCheck = normalizePath(path.resolve(filePath));
if (normalizedPathToCheck.slice(-1) === path.sep) {
normalizedPathToCheck = normalizedPathToCheck.slice(0, -1);
}
return ENV_ALLOWED_WRITE_PATHS.some(allowedPath => {
let normalizedAllowedPath = normalizePath(path.resolve(allowedPath));
if (normalizedAllowedPath.slice(-1) === path.sep) {
normalizedAllowedPath = normalizedAllowedPath.slice(0, -1);
}
// Check if path is exactly the allowed directory
if (normalizedPathToCheck === normalizedAllowedPath) {
return true;
}
// Check if path is a subdirectory of the allowed directory
// Add separator to prevent partial directory name matches
return normalizedPathToCheck.startsWith(normalizedAllowedPath + path.sep);
});
}
async function validateWritePath(filePath: string): Promise<string> {
// Convert to absolute path
const absolutePath = path.resolve(filePath);
// Check if path is allowed
if (!(await isPathAllowed(absolutePath))) {
const allowedPaths = ENV_ALLOWED_WRITE_PATHS || [os.homedir()];
throw new Error(`File writing not allowed to path: ${filePath}. Must be within one of these directories: ${allowedPaths.join(', ')}`);
}
// Ensure parent directory exists
const parentDir = path.dirname(absolutePath);
try {
await fs.mkdir(parentDir, { recursive: true });
} catch (error) {
throw new Error(`Failed to create parent directory: ${parentDir}`);
}
return absolutePath;
}
async function writeResearchFile(filePath: string, content: string, mode: 'rewrite' | 'append' = 'rewrite'): Promise<void> {
const validPath = await validateWritePath(filePath);
// Check line limit
const lines = content.split('\n');
const lineCount = lines.length;
if (lineCount > ENV_FILE_WRITE_LINE_LIMIT) {
throw new Error(`Content exceeds line limit: ${lineCount} lines (maximum: ${ENV_FILE_WRITE_LINE_LIMIT}). Please split content into smaller chunks.`);
}
// Write the file
if (mode === 'append') {
await fs.appendFile(validPath, content);
} else {
await fs.writeFile(validPath, content);
}
}
class DeepResearchMcpServer {
private server: Server;
private tavilyClient: TavilyClient;
constructor() {
this.server = new Server(
{
name: "deep-research-mcp",
version: "1.3.1", // Version with file writing tool and path validation fixes
},
{
capabilities: {
resources: {},
tools: {},
prompts: {}, // Prompts handled by the tool's output logic
},
}
);
try {
this.tavilyClient = createTavilyClient({ apiKey: TAVILY_API_KEY }) as unknown as TavilyClient;
} catch (e: any) {
console.error("Failed to instantiate Tavily Client:", e.message);
throw new Error(
`Could not initialize Tavily Client. Check API key and SDK usage: ${e.message}`
);
}
this.setupRequestHandlers();
this.setupErrorHandling();
}
private setupErrorHandling(): void {
this.server.onerror = (error) => {
console.error("[DeepResearchMCP Error]", error);
};
const shutdown = async () => {
console.error("Shutting down DeepResearchMcpServer...");
try {
await this.server.close();
} catch (err) {
console.error("Error during server shutdown:", err);
}
process.exit(0);
};
process.on("SIGINT", shutdown);
process.on("SIGTERM", shutdown);
}
private setupRequestHandlers(): void {
this.server.setRequestHandler(ListToolsRequestSchema, async () => {
const tools: Tool[] = [
{
name: "deep-research-tool",
description:
"Performs extensive web research using Tavily Search and Crawl. Returns aggregated JSON data including the query, search summary (if any), detailed research findings, and documentation instructions. The documentation instructions will guide you on how the user wants the research data to be formatted into markdown.",
inputSchema: {
type: "object",
properties: {
query: { type: "string", description: "The main research topic or question." },
search_depth: { type: "string", enum: ["basic", "advanced"], default: "advanced", description: "Depth of the initial Tavily search ('basic' or 'advanced')." },
topic: { type: "string", enum: ["general", "news"], default: "general", description: "Category for the Tavily search ('general' or 'news')." },
days: { type: "number", description: "For 'news' topic: number of days back from current date to include results." },
time_range: { type: "string", description: "Time range for search results (e.g., 'd' for day, 'w' for week, 'm' for month, 'y' for year)." },
max_search_results: { type: "number", default: 7, minimum: 1, maximum: 20, description: "Max search results to retrieve for crawling (1-20). Can be set via MAX_SEARCH_RESULTS environment variable." },
chunks_per_source: { type: "number", default: 3, minimum: 1, maximum: 3, description: "For 'advanced' search: number of content chunks from each source (1-3)." },
include_search_images: { type: "boolean", default: false, description: "Include image URLs from initial search results." },
include_search_image_descriptions: { type: "boolean", default: false, description: "Include image descriptions from initial search results." },
include_answer: {
anyOf: [
{ type: "boolean" },
{ type: "string", enum: ["basic", "advanced"] }
],
default: false,
description: "Include an LLM-generated answer from Tavily search (true implies 'basic')."
},
include_raw_content_search: { type: "boolean", default: false, description: "Include cleaned HTML from initial search results." },
include_domains_search: { type: "array", items: { type: "string" }, default: [], description: "List of domains to specifically include in search." },
exclude_domains_search: { type: "array", items: { type: "string" }, default: [], description: "List of domains to specifically exclude from search." },
search_timeout: { type: "number", default: 60, description: "Timeout in seconds for Tavily search requests. Can be set via SEARCH_TIMEOUT environment variable." },
crawl_max_depth: { type: "number", default: 1, description: "Max crawl depth from base URL (1-2). Higher values increase processing time significantly. Can be set via CRAWL_MAX_DEPTH environment variable." },
crawl_max_breadth: { type: "number", default: 10, description: "Max links to follow per page level during crawl (1-10)." },
crawl_limit: { type: "number", default: 10, description: "Total links crawler will process per root URL (1-20). Can be set via CRAWL_LIMIT environment variable." },
crawl_instructions: { type: "string", description: "Natural language instructions for the crawler." },
crawl_select_paths: { type: "array", items: { type: "string" }, default: [], description: "Regex for URLs paths to crawl (e.g., '/docs/.*')." },
crawl_select_domains: { type: "array", items: { type: "string" }, default: [], description: "Regex for domains/subdomains to crawl (e.g., '^docs\\.example\\.com$'). Overrides auto-domain focus." },
crawl_exclude_paths: { type: "array", items: { type: "string" }, default: [], description: "Regex for URL paths to exclude." },
crawl_exclude_domains: { type: "array", items: { type: "string" }, default: [], description: "Regex for domains/subdomains to exclude." },
crawl_allow_external: { type: "boolean", default: false, description: "Allow crawler to follow links to external domains." },
crawl_include_images: { type: "boolean", default: false, description: "Extract image URLs from crawled pages." },
crawl_categories: { type: "array", items: { type: "string" }, default: [], description: "Filter crawl URLs by categories (e.g., 'Blog', 'Documentation')." },
crawl_extract_depth: { type: "string", enum: ["basic", "advanced"], default: "basic", description: "Extraction depth for crawl ('basic' or 'advanced')." },
crawl_timeout: { type: "number", default: 180, description: "Timeout in seconds for Tavily crawl requests. Can be set via CRAWL_TIMEOUT environment variable." },
documentation_prompt: {
type: "string",
description: "Optional. Custom prompt for LLM documentation generation. Overrides 'DOCUMENTATION_PROMPT' env var and default. If none set, a comprehensive default is used.",
},
hardware_acceleration: { type: "boolean", default: false, description: "Try to use hardware acceleration (WebGPU) if available." },
},
required: ["query"],
},
},
{
name: "write-research-file",
description: `Write research content to a file. This tool allows you to save research findings, documentation, or any text content to a specified file path.
SECURITY: File writing is controlled by environment variables:
- FILE_WRITE_ENABLED must be set to 'true' to enable file writing
- ALLOWED_WRITE_PATHS can specify allowed directories (comma-separated)
- If no ALLOWED_WRITE_PATHS specified, defaults to user's home directory
- FILE_WRITE_LINE_LIMIT controls maximum lines per write operation (default: 200)
Use this tool to save research reports, documentation, or any content generated from the deep-research-tool results.`,
inputSchema: {
type: "object",
properties: {
file_path: {
type: "string",
description: "Path where the file should be written. Must be within allowed directories."
},
content: {
type: "string",
description: "Content to write to the file."
},
mode: {
type: "string",
enum: ["rewrite", "append"],
default: "rewrite",
description: "Write mode: 'rewrite' to overwrite file, 'append' to add to existing content."
},
},
required: ["file_path", "content"],
},
},
];
return { tools };
});
this.server.setRequestHandler(
CallToolRequestSchema,
async (request) => {
if (!request.params || typeof request.params.name !== 'string' || typeof request.params.arguments !== 'object') {
console.error("Invalid CallToolRequest structure:", request);
throw new McpError(ErrorCode.InvalidParams, "Invalid tool call request structure.");
}
if (request.params.name === "write-research-file") {
return await this.handleWriteResearchFile(request.params.arguments);
} else if (request.params.name === "deep-research-tool") {
return await this.handleDeepResearchTool(request.params.arguments);
} else {
throw new McpError(ErrorCode.MethodNotFound, `Unknown tool: ${request.params.name}`);
}
}
);
}
private async handleWriteResearchFile(rawArgs: any): Promise<any> {
if (!ENV_FILE_WRITE_ENABLED) {
throw new McpError(ErrorCode.InvalidParams, "File writing is disabled. Set FILE_WRITE_ENABLED=true to enable this feature.");
}
const args: WriteResearchFileArguments = {
file_path: typeof rawArgs.file_path === 'string' ? rawArgs.file_path : '',
content: typeof rawArgs.content === 'string' ? rawArgs.content : '',
mode: rawArgs.mode === 'append' ? 'append' : 'rewrite',
};
if (!args.file_path || !args.content) {
throw new McpError(ErrorCode.InvalidParams, "Both file_path and content are required.");
}
try {
await writeResearchFile(args.file_path, args.content, args.mode);
const successMessage = `Successfully ${args.mode === 'append' ? 'appended to' : 'wrote'} file: ${args.file_path}`;
console.error(successMessage);
return {
content: [{
type: "text",
text: JSON.stringify({
success: true,
message: successMessage,
file_path: args.file_path,
mode: args.mode,
content_length: args.content.length,
line_count: args.content.split('\n').length
}, null, 2)
}]
};
} catch (error: any) {
const errorMessage = error.message || 'Failed to write file';
console.error(`File write error: ${errorMessage}`);
return {
content: [{
type: "text",
text: JSON.stringify({
success: false,
error: errorMessage,
file_path: args.file_path,
mode: args.mode
}, null, 2)
}],
isError: true
};
}
}
private async handleDeepResearchTool(rawArgs: any): Promise<any> {
const args: DeepResearchToolArguments = {
query: typeof rawArgs.query === 'string' ? rawArgs.query : '',
search_depth: rawArgs.search_depth as "basic" | "advanced" | undefined,
topic: rawArgs.topic as "general" | "news" | undefined,
days: rawArgs.days as number | undefined,
time_range: rawArgs.time_range as string | undefined,
max_search_results: rawArgs.max_search_results as number | undefined,
chunks_per_source: rawArgs.chunks_per_source as number | undefined,
include_search_images: rawArgs.include_search_images as boolean | undefined,
include_search_image_descriptions: rawArgs.include_search_image_descriptions as boolean | undefined,
include_answer: rawArgs.include_answer as boolean | "basic" | "advanced" | undefined,
include_raw_content_search: rawArgs.include_raw_content_search as boolean | undefined,
include_domains_search: rawArgs.include_domains_search as string[] | undefined,
exclude_domains_search: rawArgs.exclude_domains_search as string[] | undefined,
search_timeout: rawArgs.search_timeout as number | undefined,
crawl_max_depth: rawArgs.crawl_max_depth as number | undefined,
crawl_max_breadth: rawArgs.crawl_max_breadth as number | undefined,
crawl_limit: rawArgs.crawl_limit as number | undefined,
crawl_instructions: rawArgs.crawl_instructions as string | undefined,
crawl_select_paths: rawArgs.crawl_select_paths as string[] | undefined,
crawl_select_domains: rawArgs.crawl_select_domains as string[] | undefined,
crawl_exclude_paths: rawArgs.crawl_exclude_paths as string[] | undefined,
crawl_exclude_domains: rawArgs.crawl_exclude_domains as string[] | undefined,
crawl_allow_external: rawArgs.crawl_allow_external as boolean | undefined,
crawl_include_images: rawArgs.crawl_include_images as boolean | undefined,
crawl_categories: rawArgs.crawl_categories as string[] | undefined,
crawl_extract_depth: rawArgs.crawl_extract_depth as "basic" | "advanced" | undefined,
crawl_timeout: rawArgs.crawl_timeout as number | undefined,
documentation_prompt: rawArgs.documentation_prompt as string | undefined,
hardware_acceleration: rawArgs.hardware_acceleration as boolean | undefined,
};
if (!args.query) {
throw new McpError(ErrorCode.InvalidParams, "Tool arguments are missing.");
}
let finalDocumentationPrompt = DEFAULT_DOCUMENTATION_PROMPT;
if (ENV_DOCUMENTATION_PROMPT) {
finalDocumentationPrompt = ENV_DOCUMENTATION_PROMPT;
}
if (args.documentation_prompt) {
finalDocumentationPrompt = args.documentation_prompt;
}
try {
// Check if hardware acceleration is requested for this specific call
if (args.hardware_acceleration) {
console.error("Hardware acceleration requested for this research query");
try {
// Try to enable Node.js flags for GPU if not already enabled
process.env.NODE_OPTIONS = process.env.NODE_OPTIONS || '';
if (!process.env.NODE_OPTIONS.includes('--enable-webgpu')) {
process.env.NODE_OPTIONS += ' --enable-webgpu';
console.error("Added WebGPU flag to Node options");
} else {
console.error("WebGPU flag already present in Node options");
}
} catch (err) {
console.error("Failed to set hardware acceleration:", err);
}
}
// Convert our parameters to Tavily Search API format
const searchParams: TavilySearchParams = {
query: args.query,
searchDepth: args.search_depth ?? "advanced",
topic: args.topic ?? "general",
maxResults: args.max_search_results ?? ENV_MAX_SEARCH_RESULTS ?? 7,
includeImages: args.include_search_images ?? false,
includeImageDescriptions: args.include_search_image_descriptions ?? false,
includeAnswer: args.include_answer ?? false,
includeRawContent: args.include_raw_content_search ?? false,
includeDomains: args.include_domains_search ?? [],
excludeDomains: args.exclude_domains_search ?? [],
timeout: args.search_timeout ?? ENV_SEARCH_TIMEOUT ?? 60,
};
if (searchParams.searchDepth === "advanced" && (args.chunks_per_source !== undefined && args.chunks_per_source !== null)) {
searchParams.chunksPerSource = args.chunks_per_source;
}
if (searchParams.topic === "news" && (args.days !== undefined && args.days !== null)) {
searchParams.days = args.days;
}
if (args.time_range) {
searchParams.timeRange = args.time_range;
}
console.error("Tavily Search API Parameters:", JSON.stringify(searchParams, null, 2));
// Set search timeout for faster response
const searchTimeout = args.search_timeout ?? ENV_SEARCH_TIMEOUT ?? 60; // Default 60 seconds
console.error(`Starting search with timeout: ${searchTimeout}s`);
const startSearchTime = Date.now();
// Execute search with timeout
let searchResponse: any; // Use any to avoid unknown type errors
try {
searchResponse = await Promise.race([
this.tavilyClient.search(searchParams.query, {
searchDepth: searchParams.searchDepth,
topic: searchParams.topic,
maxResults: searchParams.maxResults,
chunksPerSource: searchParams.chunksPerSource,
includeImages: searchParams.includeImages,
includeImageDescriptions: searchParams.includeImageDescriptions,
// Convert string types to boolean for includeAnswer
includeAnswer: typeof searchParams.includeAnswer === 'boolean' ?
searchParams.includeAnswer : false,
includeRawContent: searchParams.includeRawContent,
includeDomains: searchParams.includeDomains,
excludeDomains: searchParams.excludeDomains,
// Fix timeRange to match allowed values
timeRange: (searchParams.timeRange as "year" | "month" | "week" | "day" | "y" | "m" | "w" | "d" | undefined),
days: searchParams.days
}),
new Promise((_, reject) =>
setTimeout(() => reject(new Error(`Search timeout after ${searchTimeout}s`)), searchTimeout * 1000)
)
]);
console.error(`Search completed in ${((Date.now() - startSearchTime) / 1000).toFixed(1)}s`);
} catch (error: any) {
console.error(`Search error: ${error.message}`);
throw error;
}
const combinedResults: CombinedResult[] = [];
let searchRank = 1;
if (!searchResponse.results || searchResponse.results.length === 0) {
const noResultsOutput = JSON.stringify({
documentation_instructions: finalDocumentationPrompt,
original_query: args.query,
search_summary: searchResponse.answer || `No search results found for query: "${args.query}".`,
research_data: [],
}, null, 2);
return {
content: [{ type: "text", text: noResultsOutput }]
};
}
for (const searchResult of searchResponse.results) {
if (!searchResult.url) {
console.warn(`Search result "${searchResult.title}" missing URL, skipping crawl.`);
continue;
}
// Ensure crawl parameters are strictly enforced with smaller defaults
const crawlParams: TavilyCrawlParams = {
url: searchResult.url,
maxDepth: Math.min(2, args.crawl_max_depth ?? ENV_CRAWL_MAX_DEPTH ?? 1), // Hard cap at 2, default to 1
maxBreadth: Math.min(10, args.crawl_max_breadth ?? 10), // Hard cap at 10, default to 10 (down from 20)
limit: Math.min(20, args.crawl_limit ?? ENV_CRAWL_LIMIT ?? 10), // Hard cap at 20, default to 10 (down from 50)
instructions: args.crawl_instructions || "",
selectPaths: args.crawl_select_paths ?? [],
selectDomains: args.crawl_select_domains ?? [],
excludePaths: args.crawl_exclude_paths ?? [],
excludeDomains: args.crawl_exclude_domains ?? [],
allowExternal: args.crawl_allow_external ?? false,
includeImages: args.crawl_include_images ?? false,
categories: (args.crawl_categories ?? []) as TavilyCrawlCategory[],
extractDepth: args.crawl_extract_depth ?? "basic"
};
// If no select_domains provided and not allowing external domains,
// restrict to the current domain to prevent excessive crawling
if ((!args.crawl_select_domains || args.crawl_select_domains.length === 0) &&
!crawlParams.allowExternal) {
try {
const currentUrlDomain = new URL(searchResult.url).hostname;
crawlParams.selectDomains = [`^${currentUrlDomain.replace(/\./g, "\\.")}$`];
console.error(`Auto-limiting crawl to domain: ${currentUrlDomain}`);
} catch (e: any) {
console.error(`Could not parse URL to limit crawl domain: ${searchResult.url}. Error: ${e.message}`);
}
}
console.error(`Crawling ${searchResult.url} with maxDepth=${crawlParams.maxDepth}, maxBreadth=${crawlParams.maxBreadth}, limit=${crawlParams.limit}`);
// Add memory usage tracking
if (process.memoryUsage) {
const memUsage = process.memoryUsage();
console.error(`Memory usage before crawl: RSS=${Math.round(memUsage.rss / 1024 / 1024)}MB, Heap=${Math.round(memUsage.heapUsed / 1024 / 1024)}MB`);
}
console.error(`Crawling URL: ${searchResult.url} with params:`, JSON.stringify(crawlParams, null, 2));
const currentCombinedResult: CombinedResult = {
search_rank: searchRank++,
original_url: searchResult.url,
title: searchResult.title,
initial_content_snippet: searchResult.content,
search_score: searchResult.score,
published_date: searchResult.publishedDate,
crawled_data: [],
crawl_errors: [],
};
try {
const startCrawlTime = Date.now();
const crawlTimeout = args.crawl_timeout ?? ENV_CRAWL_TIMEOUT ?? 180; // Default 3 minutes
console.error(`Starting crawl with timeout: ${crawlTimeout}s`);
// Progress tracking for the crawl
let progressTimer = setInterval(() => {
const elapsedSec = (Date.now() - startCrawlTime) / 1000;
console.error(`Crawl in progress... (${elapsedSec.toFixed(0)}s elapsed)`);
}, 15000); // Report every 15 seconds
// Ensure timer is always cleared
let crawlResponse: any; // Use any to avoid unknown type errors
try {
// Execute crawl with timeout
crawlResponse = await Promise.race([
this.tavilyClient.crawl(crawlParams.url, {
// Ensure all parameters have non-undefined values to match API requirements
maxDepth: crawlParams.maxDepth ?? 1,
maxBreadth: crawlParams.maxBreadth ?? 10,
limit: crawlParams.limit ?? 10,
instructions: crawlParams.instructions ?? "",
selectPaths: crawlParams.selectPaths ?? [],
selectDomains: crawlParams.selectDomains ?? [],
excludePaths: crawlParams.excludePaths ?? [],
excludeDomains: crawlParams.excludeDomains ?? [],
allowExternal: crawlParams.allowExternal ?? false,
includeImages: crawlParams.includeImages ?? false,
// Cast categories to the proper type
categories: (crawlParams.categories ?? []) as TavilyCrawlCategory[],
extractDepth: crawlParams.extractDepth ?? "basic",
// Add the required timeout parameter
timeout: args.crawl_timeout ?? ENV_CRAWL_TIMEOUT ?? 180
}),
new Promise((_, reject) =>
setTimeout(() => reject(new Error(`Crawl timeout after ${crawlTimeout}s`)), crawlTimeout * 1000)
)
]);
} catch (err) {
clearInterval(progressTimer); // Clear timer on error
throw err; // Re-throw to be caught by outer try/catch
}
// Clear the progress timer once the crawl is complete
clearInterval(progressTimer);
console.error(`Crawl completed in ${((Date.now() - startCrawlTime) / 1000).toFixed(1)}s`);
if (crawlResponse.results && crawlResponse.results.length > 0) {
crawlResponse.results.forEach((page: any) => {
currentCombinedResult.crawled_data.push({
url: page.url,
raw_content: page.rawContent || null,
images: page.images || [],
});
});
} else {
currentCombinedResult.crawl_errors.push(`No content pages returned from crawl of ${searchResult.url}.`);
}
// After crawl completes, log memory usage
if (process.memoryUsage) {
const memUsage = process.memoryUsage();
console.error(`Memory usage after crawl: RSS=${Math.round(memUsage.rss / 1024 / 1024)}MB, Heap=${Math.round(memUsage.heapUsed / 1024 / 1024)}MB`);
// Force garbage collection if available and memory usage is high
if (memUsage.heapUsed > 500 * 1024 * 1024 && global.gc) {
console.error("Memory usage high, forcing garbage collection");
try {
global.gc();
} catch (e) {
console.error("Failed to force garbage collection", e);
}
}
}
} catch (crawlError: any) {
const errorMessage = crawlError.response?.data?.error || crawlError.message || 'Unknown crawl error';
console.error(`Error crawling ${searchResult.url}:`, errorMessage, crawlError.stack);
currentCombinedResult.crawl_errors.push(
`Failed to crawl ${searchResult.url}: ${errorMessage}`
);
}
combinedResults.push(currentCombinedResult);
}
const outputText = JSON.stringify({
documentation_instructions: finalDocumentationPrompt,
original_query: args.query,
search_summary: searchResponse.answer || null,
research_data: combinedResults,
}, null, 2);
return {
content: [{ type: "text", text: outputText }]
};
} catch (error: any) {
const errorMessage = error.response?.data?.error || error.message || 'An unexpected error occurred in deep-research-tool.';
console.error("[DeepResearchTool Error]", errorMessage, error.stack);
const errorOutput = JSON.stringify({
documentation_instructions: finalDocumentationPrompt,
error: errorMessage,
original_query: args.query,
}, null, 2);
return {
content: [{ type: "text", text: errorOutput }],
isError: true
};
}
}
public async run(): Promise<void> {
const transport = new StdioServerTransport();
await this.server.connect(transport);
// Check if we should try to enable hardware acceleration
const useHardwareAcceleration = process.env.HARDWARE_ACCELERATION === 'true';
if (useHardwareAcceleration) {
console.error("Attempting to enable hardware acceleration");
try {
// Try to enable Node.js flags for GPU
process.env.NODE_OPTIONS = process.env.NODE_OPTIONS || '';
if (!process.env.NODE_OPTIONS.includes('--enable-webgpu')) {
process.env.NODE_OPTIONS += ' --enable-webgpu';
}
} catch (err) {
console.error("Failed to set hardware acceleration:", err);
}
}
console.error(
"Deep Research MCP Server (deep-research-mcp) is running and connected via stdio.\n" +
`Documentation prompt source: ` +
(process.env.npm_config_documentation_prompt || ENV_DOCUMENTATION_PROMPT ? `Environment Variable ('DOCUMENTATION_PROMPT')` : `Default (can be overridden by tool argument)`) +
`.\n` +
`Timeout configuration: ` +
`Search=${ENV_SEARCH_TIMEOUT || 60}s, Crawl=${ENV_CRAWL_TIMEOUT || 180}s` +
(ENV_SEARCH_TIMEOUT || ENV_CRAWL_TIMEOUT ? ` (from environment variables)` : ` (defaults)`) +
`.\n` +
`Limits configuration: ` +
`MaxResults=${ENV_MAX_SEARCH_RESULTS || 7}, CrawlDepth=${ENV_CRAWL_MAX_DEPTH || 1}, CrawlLimit=${ENV_CRAWL_LIMIT || 10}` +
(ENV_MAX_SEARCH_RESULTS || ENV_CRAWL_MAX_DEPTH || ENV_CRAWL_LIMIT ? ` (from environment variables)` : ` (defaults)`) +
`.\n` +
`File writing: ` +
(ENV_FILE_WRITE_ENABLED ? `Enabled` : `Disabled`) +
(ENV_FILE_WRITE_ENABLED ? ` (LineLimit=${ENV_FILE_WRITE_LINE_LIMIT}, AllowedPaths=${ENV_ALLOWED_WRITE_PATHS ? ENV_ALLOWED_WRITE_PATHS.join(',') : 'user home directory'})` : ` (set FILE_WRITE_ENABLED=true to enable)`) +
`.\n` +
"Awaiting requests..."
);
}
}
// Main execution block for running the server directly
if (require.main === module || (typeof module !== 'undefined' && !module.parent)) {
const deepResearchServer = new DeepResearchMcpServer();
deepResearchServer
.run()
.catch((err) => {
console.error("Failed to start or run Deep Research MCP Server:", err.stack || err);
process.exit(1);
});
}
export { DeepResearchMcpServer }; // Export for potential programmatic use
```