This is page 1 of 2. Use http://codebase.md/phialsbasement/mcp-webresearch-stealthified?lines=true&page={x} to view the full context. # Directory Structure ``` ├── .cursorrules ├── .gitignore ├── docs │ └── mcp_spec │ └── llms-full.txt ├── index.ts ├── LICENSE ├── package.json ├── pnpm-lock.yaml ├── README.md └── tsconfig.json ``` # Files -------------------------------------------------------------------------------- /.cursorrules: -------------------------------------------------------------------------------- ``` 1 | 1. Use pnpm instead of npm when generating packaging-related commands. 2 | 2. Only make changes to comments, code, or dependencies that are needed to accomplish the objective defined by the user. When editing code, don't remove comments or change dependencies or make changes that are unrelated to the code changes at hand. ``` -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- ``` 1 | # Logs 2 | logs 3 | *.log 4 | npm-debug.log* 5 | yarn-debug.log* 6 | yarn-error.log* 7 | lerna-debug.log* 8 | .pnpm-debug.log* 9 | 10 | # Diagnostic reports (https://nodejs.org/api/report.html) 11 | report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json 12 | 13 | # Runtime data 14 | pids 15 | *.pid 16 | *.seed 17 | *.pid.lock 18 | 19 | # Directory for instrumented libs generated by jscoverage/JSCover 20 | lib-cov 21 | 22 | # Coverage directory used by tools like istanbul 23 | coverage 24 | *.lcov 25 | 26 | # nyc test coverage 27 | .nyc_output 28 | 29 | # Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files) 30 | .grunt 31 | 32 | # Bower dependency directory (https://bower.io/) 33 | bower_components 34 | 35 | # node-waf configuration 36 | .lock-wscript 37 | 38 | # Compiled binary addons (https://nodejs.org/api/addons.html) 39 | build/Release 40 | 41 | # Dependency directories 42 | node_modules/ 43 | jspm_packages/ 44 | 45 | # Snowpack dependency directory (https://snowpack.dev/) 46 | web_modules/ 47 | 48 | # TypeScript cache 49 | *.tsbuildinfo 50 | 51 | # Optional npm cache directory 52 | .npm 53 | 54 | # Optional eslint cache 55 | .eslintcache 56 | 57 | # Optional stylelint cache 58 | .stylelintcache 59 | 60 | # Microbundle cache 61 | .rpt2_cache/ 62 | .rts2_cache_cjs/ 63 | .rts2_cache_es/ 64 | .rts2_cache_umd/ 65 | 66 | # Optional REPL history 67 | .node_repl_history 68 | 69 | # Output of 'npm pack' 70 | *.tgz 71 | 72 | # Yarn Integrity file 73 | .yarn-integrity 74 | 75 | # dotenv environment variable files 76 | .env 77 | .env.development.local 78 | .env.test.local 79 | .env.production.local 80 | .env.local 81 | 82 | # parcel-bundler cache (https://parceljs.org/) 83 | .cache 84 | .parcel-cache 85 | 86 | # Next.js build output 87 | .next 88 | out 89 | 90 | # Nuxt.js build / generate output 91 | .nuxt 92 | dist 93 | 94 | # Gatsby files 95 | .cache/ 96 | # Comment in the public line in if your project uses Gatsby and not Next.js 97 | # https://nextjs.org/blog/next-9-1#public-directory-support 98 | # public 99 | 100 | # vuepress build output 101 | .vuepress/dist 102 | 103 | # vuepress v2.x temp and cache directory 104 | .temp 105 | .cache 106 | 107 | # Docusaurus cache and generated files 108 | .docusaurus 109 | 110 | # Serverless directories 111 | .serverless/ 112 | 113 | # FuseBox cache 114 | .fusebox/ 115 | 116 | # DynamoDB Local files 117 | .dynamodb/ 118 | 119 | # TernJS port file 120 | .tern-port 121 | 122 | # Stores VSCode versions used for testing VSCode extensions 123 | .vscode-test 124 | 125 | # yarn v2 126 | .yarn/cache 127 | .yarn/unplugged 128 | .yarn/build-state.yml 129 | .yarn/install-state.gz 130 | .pnp.* 131 | ``` -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- ```markdown 1 | # MCP Web Research Server 2 | 3 | A Model Context Protocol (MCP) server for web research. 4 | 5 | Bring real-time info into Claude and easily research any topic. 6 | 7 | ## Features 8 | 9 | - Google search integration --- THIS FORK FIXES THIS --- NOW NO LONGER GETTING CAPTCHA BLOCKED 10 | - Webpage content extraction 11 | - Research session tracking (list of visited pages, search queries, etc.) 12 | - Screenshot capture 13 | 14 | ## Prerequisites 15 | 16 | - [Node.js](https://nodejs.org/) >= 18 (includes `npm` and `npx`) 17 | - [Claude Desktop app](https://claude.ai/download) 18 | 19 | ## Installation 20 | 21 | First, ensure you've downloaded and installed the [Claude Desktop app](https://claude.ai/download) and you have npm installed. 22 | 23 | Next, add this entry to your `claude_desktop_config.json` (on Mac, found at `~/Library/Application\ Support/Claude/claude_desktop_config.json`): 24 | 25 | ```json 26 | { 27 | "mcpServers": { 28 | "webresearch": { 29 | "command": "npx", 30 | "args": ["-y", "@mzxrai/mcp-webresearch@latest"] 31 | } 32 | } 33 | } 34 | ``` 35 | 36 | This config allows Claude Desktop to automatically start the web research MCP server when needed. 37 | 38 | ## Usage 39 | 40 | Simply start a chat with Claude and send a prompt that would benefit from web research. If you'd like a prebuilt prompt customized for deeper web research, you can use the `agentic-research` prompt that we provide through this package. Access that prompt in Claude Desktop by clicking the Paperclip icon in the chat input and then selecting `Choose an integration` → `webresearch` → `agentic-research`. 41 | 42 | <img src="https://i.ibb.co/N6Y3C0q/Screenshot-2024-12-05-at-11-01-27-PM.png" alt="Example screenshot of web research" width="400"/> 43 | 44 | ### Tools 45 | 46 | 1. `search_google` 47 | - Performs Google searches and extracts results 48 | - Arguments: `{ query: string }` 49 | 50 | 2. `visit_page` 51 | - Visits a webpage and extracts its content 52 | - Arguments: `{ url: string, takeScreenshot?: boolean }` 53 | 54 | 3. `take_screenshot` 55 | - Takes a screenshot of the current page 56 | - No arguments required 57 | 58 | ### Prompts 59 | 60 | #### `agentic-research` 61 | A guided research prompt that helps Claude conduct thorough web research. The prompt instructs Claude to: 62 | - Start with broad searches to understand the topic landscape 63 | - Prioritize high-quality, authoritative sources 64 | - Iteratively refine the research direction based on findings 65 | - Keep you informed and let you guide the research interactively 66 | - Always cite sources with URLs 67 | 68 | ### Resources 69 | 70 | We expose two things as MCP resources: (1) captured webpage screenshots, and (2) the research session. 71 | 72 | #### Screenshots 73 | 74 | When you take a screenshot, it's saved as an MCP resource. You can access captured screenshots in Claude Desktop via the Paperclip icon. 75 | 76 | #### Research Session 77 | 78 | The server maintains a research session that includes: 79 | - Search queries 80 | - Visited pages 81 | - Extracted content 82 | - Screenshots 83 | - Timestamps 84 | 85 | ### Suggestions 86 | 87 | For the best results, if you choose not to use the `agentic-research` prompt when doing your research, it may be helpful to suggest high-quality sources for Claude to use when researching general topics. For example, you could prompt `news today from reuters or AP` instead of `news today`. 88 | 89 | ## Problems 90 | 91 | This is very much pre-alpha code. And it is also AIGC, so expect bugs. 92 | 93 | If you run into issues, it may be helpful to check Claude Desktop's MCP logs: 94 | 95 | ```bash 96 | tail -n 20 -f ~/Library/Logs/Claude/mcp*.log 97 | ``` 98 | 99 | ## Development 100 | 101 | ```bash 102 | # Install dependencies 103 | pnpm install 104 | 105 | # Build the project 106 | pnpm build 107 | 108 | # Watch for changes 109 | pnpm watch 110 | 111 | # Run in development mode 112 | pnpm dev 113 | ``` 114 | 115 | ## Requirements 116 | 117 | - Node.js >= 18 118 | - Playwright (automatically installed as a dependency) 119 | 120 | ## Verified Platforms 121 | 122 | - [x] macOS 123 | - [x] Linux 124 | - [x] Windows 125 | 126 | ## License 127 | 128 | MIT 129 | 130 | ## Author 131 | 132 | [mzxrai](https://github.com/mzxrai) 133 | ``` -------------------------------------------------------------------------------- /tsconfig.json: -------------------------------------------------------------------------------- ```json 1 | { 2 | "compilerOptions": { 3 | "target": "ES2023", 4 | "module": "NodeNext", 5 | "moduleResolution": "NodeNext", 6 | "esModuleInterop": true, 7 | "strict": true, 8 | "outDir": "dist", 9 | "sourceMap": true, 10 | "declaration": true, 11 | "skipLibCheck": true, 12 | "lib": [ 13 | "ES2023", 14 | "DOM", 15 | "DOM.Iterable" 16 | ] 17 | }, 18 | "include": [ 19 | "*.ts" 20 | ], 21 | "exclude": [ 22 | "node_modules", 23 | "dist" 24 | ] 25 | } ``` -------------------------------------------------------------------------------- /package.json: -------------------------------------------------------------------------------- ```json 1 | { 2 | "name": "@mzxrai/mcp-webresearch", 3 | "version": "0.1.7", 4 | "description": "MCP server for web research", 5 | "license": "MIT", 6 | "author": "mzxrai", 7 | "homepage": "https://github.com/mzxrai/mcp-webresearch", 8 | "bugs": "https://github.com/mzxrai/mcp-webresearch/issues", 9 | "type": "module", 10 | "bin": { 11 | "mcp-server-webresearch": "dist/index.js" 12 | }, 13 | "files": [ 14 | "dist" 15 | ], 16 | "scripts": { 17 | "build": "tsc && shx chmod +x dist/*.js", 18 | "prepare": "pnpm run build", 19 | "postinstall": "playwright install chromium", 20 | "watch": "tsc --watch", 21 | "dev": "tsx watch index.ts" 22 | }, 23 | "publishConfig": { 24 | "access": "public" 25 | }, 26 | "keywords": [ 27 | "mcp", 28 | "model-context-protocol", 29 | "web-research", 30 | "ai", 31 | "web-scraping" 32 | ], 33 | "dependencies": { 34 | "@modelcontextprotocol/sdk": "1.0.1", 35 | "playwright": "^1.49.0", 36 | "turndown": "^7.1.2" 37 | }, 38 | "devDependencies": { 39 | "shx": "^0.3.4", 40 | "tsx": "^4.19.2", 41 | "typescript": "^5.6.2", 42 | "@types/turndown": "^5.0.4" 43 | } 44 | } ``` -------------------------------------------------------------------------------- /index.ts: -------------------------------------------------------------------------------- ```typescript 1 | #!/usr/bin/env node 2 | 3 | // Core dependencies for MCP server and protocol handling 4 | import { Server } from "@modelcontextprotocol/sdk/server/index.js"; 5 | import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js"; 6 | import { 7 | CallToolRequestSchema, 8 | ListResourcesRequestSchema, 9 | ListToolsRequestSchema, 10 | ReadResourceRequestSchema, 11 | ListPromptsRequestSchema, 12 | GetPromptRequestSchema, 13 | Tool, 14 | Resource, 15 | McpError, 16 | ErrorCode, 17 | TextContent, 18 | ImageContent, 19 | } from "@modelcontextprotocol/sdk/types.js"; 20 | 21 | // Web scraping and content processing dependencies 22 | import { chromium, Browser, Page } from 'playwright'; 23 | import TurndownService from "turndown"; 24 | import type { Node } from "turndown"; 25 | import * as fs from 'fs'; 26 | import * as path from 'path'; 27 | import * as os from 'os'; 28 | // Add type declaration for window extensions 29 | declare global { 30 | interface Window { 31 | chrome: { 32 | runtime: Record<string, unknown>; 33 | loadTimes: () => void; 34 | csi: () => void; 35 | app: Record<string, unknown>; 36 | }; 37 | Notification: { 38 | permission: NotificationPermission; 39 | requestPermission?: (callback?: NotificationPermissionCallback) => Promise<NotificationPermission>; 40 | }; 41 | } 42 | } 43 | // Initialize temp directory for screenshots 44 | const SCREENSHOTS_DIR = fs.mkdtempSync(path.join(os.tmpdir(), 'mcp-screenshots-')); 45 | 46 | // Initialize Turndown service for converting HTML to Markdown 47 | // Configure with specific formatting preferences 48 | const turndownService: TurndownService = new TurndownService({ 49 | headingStyle: 'atx', // Use # style headings 50 | hr: '---', // Horizontal rule style 51 | bulletListMarker: '-', // List item marker 52 | codeBlockStyle: 'fenced', // Use ``` for code blocks 53 | emDelimiter: '_', // Italics style 54 | strongDelimiter: '**', // Bold style 55 | linkStyle: 'inlined', // Use inline links 56 | }); 57 | 58 | // Custom Turndown rules for better content extraction 59 | // Remove script and style tags completely 60 | turndownService.addRule('removeScripts', { 61 | filter: ['script', 'style', 'noscript'], 62 | replacement: () => '' 63 | }); 64 | 65 | // Preserve link elements with their href attributes 66 | turndownService.addRule('preserveLinks', { 67 | filter: 'a', 68 | replacement: (content: string, node: Node) => { 69 | const element = node as HTMLAnchorElement; 70 | const href = element.getAttribute('href'); 71 | return href ? `[${content}](${href})` : content; 72 | } 73 | }); 74 | 75 | // Preserve image elements with their src and alt attributes 76 | turndownService.addRule('preserveImages', { 77 | filter: 'img', 78 | replacement: (content: string, node: Node) => { 79 | const element = node as HTMLImageElement; 80 | const alt = element.getAttribute('alt') || ''; 81 | const src = element.getAttribute('src'); 82 | return src ? `` : ''; 83 | } 84 | }); 85 | 86 | // Core interfaces for research data management 87 | interface ResearchResult { 88 | url: string; // URL of the researched page 89 | title: string; // Page title 90 | content: string; // Extracted content in markdown 91 | timestamp: string; // When the result was captured 92 | screenshotPath?: string; // Path to screenshot file on disk 93 | } 94 | 95 | // Define structure for research session data 96 | interface ResearchSession { 97 | query: string; // Search query that initiated the session 98 | results: ResearchResult[]; // Collection of research results 99 | lastUpdated: string; // Timestamp of last update 100 | } 101 | 102 | // Screenshot management functions 103 | async function saveScreenshot(screenshot: string, title: string): Promise<string> { 104 | // Convert screenshot from base64 to buffer 105 | const buffer = Buffer.from(screenshot, 'base64'); 106 | 107 | // Check size before saving 108 | const MAX_SIZE = 5 * 1024 * 1024; // 5MB 109 | if (buffer.length > MAX_SIZE) { 110 | throw new McpError( 111 | ErrorCode.InvalidRequest, 112 | `Screenshot too large: ${Math.round(buffer.length / (1024 * 1024))}MB exceeds ${MAX_SIZE / (1024 * 1024)}MB limit` 113 | ); 114 | } 115 | 116 | // Generate a safe filename 117 | const timestamp = new Date().getTime(); 118 | const safeTitle = title.replace(/[^a-z0-9]/gi, '_').toLowerCase(); 119 | const filename = `${safeTitle}-${timestamp}.png`; 120 | const filepath = path.join(SCREENSHOTS_DIR, filename); 121 | 122 | // Save the validated screenshot 123 | await fs.promises.writeFile(filepath, buffer); 124 | 125 | // Return the filepath to the saved screenshot 126 | return filepath; 127 | } 128 | 129 | // Fix the cleanupScreenshots function 130 | async function cleanupScreenshots(): Promise<void> { 131 | try { 132 | const files = await fs.promises.readdir(SCREENSHOTS_DIR); 133 | await Promise.all(files.map((file: string) => 134 | fs.promises.unlink(path.join(SCREENSHOTS_DIR, file)) 135 | )); 136 | await fs.promises.rmdir(SCREENSHOTS_DIR); 137 | } catch (error) { 138 | console.error('Error cleaning up screenshots:', error); 139 | } 140 | } 141 | 142 | // Available tools for web research functionality 143 | const TOOLS: Tool[] = [ 144 | { 145 | name: "search_google", 146 | description: "Performs a web search using Google, ideal for finding current information, news, websites, and general knowledge. Use this tool when you need to research topics, find recent information, or gather data from the web. Returns structured search results with titles, URLs, and snippets.", 147 | inputSchema: { 148 | type: "object", 149 | properties: { 150 | query: { type: "string", description: "Search query" }, 151 | }, 152 | required: ["query"], 153 | }, 154 | }, 155 | { 156 | name: "visit_page", 157 | description: "Navigates to a specific URL and extracts the page content in readable format, with option to capture a screenshot. Use this tool to deeply analyze specific web pages, read articles, examine documentation, or verify information directly from the source. Especially useful for in-depth research after identifying relevant pages via search.", 158 | inputSchema: { 159 | type: "object", 160 | properties: { 161 | url: { type: "string", description: "URL to visit" }, 162 | takeScreenshot: { type: "boolean", description: "Whether to take a screenshot" }, 163 | }, 164 | required: ["url"], 165 | }, 166 | }, 167 | { 168 | name: "take_screenshot", 169 | description: "Captures a visual image of the currently loaded webpage. Use this tool when you need to preserve visual information, analyze page layouts, or document the current state of a webpage. Perfect for situations where textual content alone doesn't convey the full context.", 170 | inputSchema: { 171 | type: "object", 172 | properties: {}, // No parameters needed 173 | }, 174 | }, 175 | { 176 | name: "search_scholar", 177 | description: "Searches Google Scholar for academic papers and scholarly articles. Use this tool when researching scientific topics, looking for peer-reviewed research, academic citations, or scholarly literature. Returns structured data including titles, authors, publication details, and citation counts. Ideal for academic research and evidence-based inquiries.", 178 | inputSchema: { 179 | type: "object", 180 | properties: { 181 | query: { type: "string", description: "Academic search query" }, 182 | }, 183 | required: ["query"], 184 | }, 185 | }, 186 | ]; 187 | 188 | // Define available prompt types for type safety 189 | type PromptName = "agentic-research"; 190 | 191 | // Define structure for research prompt arguments 192 | interface AgenticResearchArgs { 193 | topic: string; // Research topic provided by user 194 | } 195 | 196 | // Configure available prompts with their specifications 197 | const PROMPTS = { 198 | // Agentic research prompt configuration 199 | "agentic-research": { 200 | name: "agentic-research" as const, // Type-safe name 201 | description: "Conduct iterative web research on a topic, exploring it thoroughly through multiple steps while maintaining a dialogue with the user", 202 | arguments: [ 203 | { 204 | name: "topic", // Topic argument specification 205 | description: "The topic or question to research", // Description of the argument 206 | required: true // Topic is mandatory 207 | } 208 | ] 209 | } 210 | } as const; // Make object immutable 211 | 212 | // Global state management for browser and research session 213 | let browser: Browser | undefined; // Puppeteer browser instance 214 | let page: Page | undefined; // Current active page 215 | let currentSession: ResearchSession | undefined; // Current research session data 216 | 217 | // Configuration constants for session management 218 | const MAX_RESULTS_PER_SESSION = 100; // Maximum number of results to store per session 219 | const MAX_RETRIES = 3; // Maximum retry attempts for operations 220 | const RETRY_DELAY = 1000; // Delay between retries in milliseconds 221 | 222 | // Generic retry mechanism for handling transient failures 223 | async function withRetry<T>( 224 | operation: () => Promise<T>, // Operation to retry 225 | retries = MAX_RETRIES, // Number of retry attempts 226 | delay = RETRY_DELAY // Delay between retries 227 | ): Promise<T> { 228 | let lastError: Error; 229 | 230 | // Attempt operation up to max retries 231 | for (let i = 0; i < retries; i++) { 232 | try { 233 | return await operation(); 234 | } catch (error) { 235 | lastError = error as Error; 236 | if (i < retries - 1) { 237 | console.error(`Attempt ${i + 1} failed, retrying in ${delay}ms:`, error); 238 | await new Promise(resolve => setTimeout(resolve, delay)); 239 | } 240 | } 241 | } 242 | 243 | throw lastError!; // Throw last error if all retries failed 244 | } 245 | 246 | // Add a new research result to the current session with data management 247 | function addResult(result: ResearchResult): void { 248 | // If no current session exists, initialize a new one 249 | if (!currentSession) { 250 | currentSession = { 251 | query: "Research Session", 252 | results: [], 253 | lastUpdated: new Date().toISOString(), 254 | }; 255 | } 256 | 257 | // If the session has reached the maximum number of results, remove the oldest result 258 | if (currentSession.results.length >= MAX_RESULTS_PER_SESSION) { 259 | currentSession.results.shift(); 260 | } 261 | 262 | // Add the new result to the session and update the last updated timestamp 263 | currentSession.results.push(result); 264 | currentSession.lastUpdated = new Date().toISOString(); 265 | } 266 | 267 | // Safe page navigation with error handling and bot detection 268 | async function safePageNavigation(page: Page, url: string): Promise<void> { 269 | try { 270 | // Step 1: Set cookies to bypass consent banner 271 | await page.context().addCookies([{ 272 | name: 'CONSENT', 273 | value: 'YES+', 274 | domain: '.google.com', 275 | path: '/' 276 | }]); 277 | 278 | // Step 2: Initial navigation 279 | const response = await page.goto(url, { 280 | waitUntil: 'domcontentloaded', 281 | timeout: 15000 282 | }); 283 | 284 | // Step 3: Basic response validation 285 | if (!response) { 286 | throw new Error('Navigation failed: no response received'); 287 | } 288 | 289 | // Check HTTP status code; if 400 or higher, throw an error 290 | const status = response.status(); 291 | if (status >= 400) { 292 | throw new Error(`HTTP ${status}: ${response.statusText()}`); 293 | } 294 | 295 | // Step 4: Wait for network to become idle or timeout 296 | await Promise.race([ 297 | page.waitForLoadState('networkidle', { timeout: 5000 }) 298 | .catch(() => {/* ignore timeout */ }), 299 | // Fallback timeout in case networkidle never occurs 300 | new Promise(resolve => setTimeout(resolve, 5000)) 301 | ]); 302 | 303 | // Step 5: Security and content validation 304 | const validation = await page.evaluate(() => { 305 | const botProtectionExists = [ 306 | '#challenge-running', // Cloudflare 307 | '#cf-challenge-running', // Cloudflare 308 | '#px-captcha', // PerimeterX 309 | '#ddos-protection', // Various 310 | '#waf-challenge-html' // Various WAFs 311 | ].some(selector => document.querySelector(selector)); 312 | 313 | // Check for suspicious page titles 314 | const suspiciousTitle = [ 315 | 'security check', 316 | 'ddos protection', 317 | 'please wait', 318 | 'just a moment', 319 | 'attention required' 320 | ].some(phrase => document.title.toLowerCase().includes(phrase)); 321 | 322 | // Count words in the page content 323 | const bodyText = document.body.innerText || ''; 324 | const words = bodyText.trim().split(/\s+/).length; 325 | 326 | // Return validation results 327 | return { 328 | wordCount: words, 329 | botProtection: botProtectionExists, 330 | suspiciousTitle, 331 | title: document.title 332 | }; 333 | }); 334 | 335 | // If bot protection is detected, throw an error 336 | if (validation.botProtection) { 337 | throw new Error('Bot protection detected'); 338 | } 339 | 340 | // If the page title is suspicious, throw an error 341 | if (validation.suspiciousTitle) { 342 | throw new Error(`Suspicious page title detected: "${validation.title}"`); 343 | } 344 | 345 | // If the page contains insufficient content, throw an error 346 | if (validation.wordCount < 1) { 347 | throw new Error('Page contains insufficient content'); 348 | } 349 | 350 | } catch (error) { 351 | // If an error occurs during navigation, throw an error with the URL and the error message 352 | throw new Error(`Navigation to ${url} failed: ${(error as Error).message}`); 353 | } 354 | } 355 | 356 | // Take and optimize a screenshot 357 | async function takeScreenshotWithSizeLimit(page: Page): Promise<string> { 358 | const MAX_SIZE = 5 * 1024 * 1024; 359 | const MAX_DIMENSION = 1920; 360 | const MIN_DIMENSION = 800; 361 | 362 | // Set viewport size 363 | await page.setViewportSize({ 364 | width: 1920, 365 | height: 1080 366 | }); 367 | 368 | // Take initial screenshot 369 | let screenshot = await page.screenshot({ 370 | type: 'png', 371 | fullPage: false 372 | }); 373 | 374 | // Handle buffer conversion 375 | let buffer = screenshot; 376 | let attempts = 0; 377 | const MAX_ATTEMPTS = 3; 378 | 379 | // While screenshot is too large, reduce size 380 | while (buffer.length > MAX_SIZE && attempts < MAX_ATTEMPTS) { 381 | // Get current viewport size 382 | const viewport = page.viewportSize(); 383 | if (!viewport) continue; 384 | 385 | // Calculate new dimensions 386 | const scaleFactor = Math.pow(0.75, attempts + 1); 387 | let newWidth = Math.round(viewport.width * scaleFactor); 388 | let newHeight = Math.round(viewport.height * scaleFactor); 389 | 390 | // Ensure dimensions are within bounds 391 | newWidth = Math.max(MIN_DIMENSION, Math.min(MAX_DIMENSION, newWidth)); 392 | newHeight = Math.max(MIN_DIMENSION, Math.min(MAX_DIMENSION, newHeight)); 393 | 394 | // Update viewport with new dimensions 395 | await page.setViewportSize({ 396 | width: newWidth, 397 | height: newHeight 398 | }); 399 | 400 | // Take new screenshot 401 | screenshot = await page.screenshot({ 402 | type: 'png', 403 | fullPage: false 404 | }); 405 | 406 | // Update buffer with new screenshot 407 | buffer = screenshot; 408 | 409 | // Increment retry attempts 410 | attempts++; 411 | } 412 | 413 | // Final attempt with minimum settings 414 | if (buffer.length > MAX_SIZE) { 415 | await page.setViewportSize({ 416 | width: MIN_DIMENSION, 417 | height: MIN_DIMENSION 418 | }); 419 | 420 | // Take final screenshot 421 | screenshot = await page.screenshot({ 422 | type: 'png', 423 | fullPage: false 424 | }); 425 | 426 | // Update buffer with final screenshot 427 | buffer = screenshot; 428 | 429 | // Throw error if final screenshot is still too large 430 | if (buffer.length > MAX_SIZE) { 431 | throw new McpError( 432 | ErrorCode.InvalidRequest, 433 | `Failed to reduce screenshot to under 5MB even with minimum settings` 434 | ); 435 | } 436 | } 437 | 438 | // Convert Buffer to base64 string before returning 439 | return buffer.toString('base64'); 440 | } 441 | 442 | // Initialize MCP server with basic configuration 443 | const server: Server = new Server( 444 | { 445 | name: "webresearch", // Server name identifier 446 | version: "0.1.6", // Server version number 447 | }, 448 | { 449 | capabilities: { 450 | tools: {}, // Available tool configurations 451 | resources: {}, // Resource handling capabilities 452 | prompts: {} // Prompt processing capabilities 453 | }, 454 | } 455 | ); 456 | 457 | // Register handler for tool listing requests 458 | server.setRequestHandler(ListToolsRequestSchema, async () => ({ 459 | tools: TOOLS // Return list of available research tools 460 | })); 461 | 462 | // Register handler for resource listing requests 463 | server.setRequestHandler(ListResourcesRequestSchema, async () => { 464 | // Return empty list if no active session 465 | if (!currentSession) { 466 | return { resources: [] }; 467 | } 468 | 469 | // Compile list of available resources 470 | const resources: Resource[] = [ 471 | // Add session summary resource 472 | { 473 | uri: "research://current/summary", // Resource identifier 474 | name: "Current Research Session Summary", 475 | description: "Summary of the current research session including queries and results", 476 | mimeType: "application/json" 477 | }, 478 | // Add screenshot resources if available 479 | ...currentSession.results 480 | .map((r, i): Resource | undefined => r.screenshotPath ? { 481 | uri: `research://screenshots/${i}`, 482 | name: `Screenshot of ${r.title}`, 483 | description: `Screenshot taken from ${r.url}`, 484 | mimeType: "image/png" 485 | } : undefined) 486 | .filter((r): r is Resource => r !== undefined) 487 | ]; 488 | 489 | // Return compiled list of resources 490 | return { resources }; 491 | }); 492 | 493 | // Register handler for resource content requests 494 | server.setRequestHandler(ReadResourceRequestSchema, async (request) => { 495 | const uri = request.params.uri.toString(); 496 | 497 | // Handle session summary requests for research data 498 | if (uri === "research://current/summary") { 499 | if (!currentSession) { 500 | throw new McpError( 501 | ErrorCode.InvalidRequest, 502 | "No active research session" 503 | ); 504 | } 505 | 506 | // Return compiled list of resources 507 | return { 508 | contents: [{ 509 | uri, 510 | mimeType: "application/json", 511 | text: JSON.stringify({ 512 | query: currentSession.query, 513 | resultCount: currentSession.results.length, 514 | lastUpdated: currentSession.lastUpdated, 515 | results: currentSession.results.map(r => ({ 516 | title: r.title, 517 | url: r.url, 518 | timestamp: r.timestamp, 519 | screenshotPath: r.screenshotPath 520 | })) 521 | }, null, 2) 522 | }] 523 | }; 524 | } 525 | 526 | // Handle screenshot requests 527 | if (uri.startsWith("research://screenshots/")) { 528 | const index = parseInt(uri.split("/").pop() || "", 10); 529 | 530 | // Verify session exists 531 | if (!currentSession) { 532 | throw new McpError( 533 | ErrorCode.InvalidRequest, 534 | "No active research session" 535 | ); 536 | } 537 | 538 | // Verify index is within bounds 539 | if (isNaN(index) || index < 0 || index >= currentSession.results.length) { 540 | throw new McpError( 541 | ErrorCode.InvalidRequest, 542 | `Screenshot index out of bounds: ${index}` 543 | ); 544 | } 545 | 546 | // Get result containing screenshot 547 | const result = currentSession.results[index]; 548 | if (!result?.screenshotPath) { 549 | throw new McpError( 550 | ErrorCode.InvalidRequest, 551 | `No screenshot available at index: ${index}` 552 | ); 553 | } 554 | 555 | try { 556 | // Read the binary data and convert to base64 557 | const screenshotData = await fs.promises.readFile(result.screenshotPath); 558 | 559 | // Convert Buffer to base64 string before returning 560 | const base64Data = screenshotData.toString('base64'); 561 | 562 | // Return compiled list of resources 563 | return { 564 | contents: [{ 565 | uri, 566 | mimeType: "image/png", 567 | blob: base64Data 568 | }] 569 | }; 570 | } catch (error: unknown) { 571 | // Handle error if screenshot cannot be read 572 | const errorMessage = error instanceof Error ? error.message : 'Unknown error occurred'; 573 | throw new McpError( 574 | ErrorCode.InternalError, 575 | `Failed to read screenshot: ${errorMessage}` 576 | ); 577 | } 578 | } 579 | 580 | // Handle unknown resource types 581 | throw new McpError( 582 | ErrorCode.InvalidRequest, 583 | `Unknown resource: ${uri}` 584 | ); 585 | }); 586 | 587 | // Initialize MCP server connection using stdio transport 588 | const transport = new StdioServerTransport(); 589 | server.connect(transport).catch((error) => { 590 | console.error("Failed to start server:", error); 591 | process.exit(1); 592 | }); 593 | 594 | // Convert HTML content to clean, readable markdown format 595 | async function extractContentAsMarkdown( 596 | page: Page, // Puppeteer page to extract from 597 | selector?: string // Optional CSS selector to target specific content 598 | ): Promise<string> { 599 | // Step 1: Execute content extraction in browser context 600 | const html = await page.evaluate((sel) => { 601 | // Handle case where specific selector is provided 602 | if (sel) { 603 | const element = document.querySelector(sel); 604 | // Return element content or empty string if not found 605 | return element ? element.outerHTML : ''; 606 | } 607 | 608 | // Step 2: Try standard content containers first 609 | const contentSelectors = [ 610 | 'main', // HTML5 semantic main content 611 | 'article', // HTML5 semantic article content 612 | '[role="main"]', // ARIA main content role 613 | '#content', // Common content ID 614 | '.content', // Common content class 615 | '.main', // Alternative main class 616 | '.post', // Blog post content 617 | '.article', // Article content container 618 | ]; 619 | 620 | // Try each selector in priority order 621 | for (const contentSelector of contentSelectors) { 622 | const element = document.querySelector(contentSelector); 623 | if (element) { 624 | return element.outerHTML; // Return first matching content 625 | } 626 | } 627 | 628 | // Step 3: Fallback to cleaning full body content 629 | const body = document.body; 630 | 631 | // Define elements to remove for cleaner content 632 | const elementsToRemove = [ 633 | // Navigation elements 634 | 'header', // Page header 635 | 'footer', // Page footer 636 | 'nav', // Navigation sections 637 | '[role="navigation"]', // ARIA navigation elements 638 | 639 | // Sidebars and complementary content 640 | 'aside', // Sidebar content 641 | '.sidebar', // Sidebar by class 642 | '[role="complementary"]', // ARIA complementary content 643 | 644 | // Navigation-related elements 645 | '.nav', // Navigation classes 646 | '.menu', // Menu elements 647 | 648 | // Page structure elements 649 | '.header', // Header classes 650 | '.footer', // Footer classes 651 | 652 | // Advertising and notices 653 | '.advertisement', // Advertisement containers 654 | '.ads', // Ad containers 655 | '.cookie-notice', // Cookie consent notices 656 | ]; 657 | 658 | // Remove each unwanted element from content 659 | elementsToRemove.forEach(sel => { 660 | body.querySelectorAll(sel).forEach(el => el.remove()); 661 | }); 662 | 663 | // Return cleaned body content 664 | return body.outerHTML; 665 | }, selector); 666 | 667 | // Step 4: Handle empty content case 668 | if (!html) { 669 | return ''; 670 | } 671 | 672 | try { 673 | // Step 5: Convert HTML to Markdown 674 | const markdown = turndownService.turndown(html); 675 | 676 | // Step 6: Clean up and format markdown 677 | return markdown 678 | .replace(/\n{3,}/g, '\n\n') // Replace excessive newlines with double 679 | .replace(/^- $/gm, '') // Remove empty list items 680 | .replace(/^\s+$/gm, '') // Remove whitespace-only lines 681 | .trim(); // Remove leading/trailing whitespace 682 | 683 | } catch (error) { 684 | // Log conversion errors and return original HTML as fallback 685 | console.error('Error converting HTML to Markdown:', error); 686 | return html; 687 | } 688 | } 689 | 690 | // Validate URL format and ensure security constraints 691 | function isValidUrl(urlString: string): boolean { 692 | try { 693 | // Attempt to parse URL string 694 | const url = new URL(urlString); 695 | 696 | // Only allow HTTP and HTTPS protocols for security 697 | return url.protocol === 'http:' || url.protocol === 'https:'; 698 | } catch { 699 | // Return false for any invalid URL format 700 | return false; 701 | } 702 | } 703 | 704 | // Define result type for tool operations 705 | type ToolResult = { 706 | content: (TextContent | ImageContent)[]; // Array of text or image content 707 | isError?: boolean; // Optional error flag 708 | }; 709 | 710 | // Tool request handler for executing research operations 711 | server.setRequestHandler(CallToolRequestSchema, async (request): Promise<ToolResult> => { 712 | // Initialize browser for tool operations 713 | const page = await ensureBrowser(); 714 | 715 | switch (request.params.name) { 716 | // Handle Google search operations 717 | case "search_google": { 718 | // Extract search query from request parameters 719 | const { query } = request.params.arguments as { query: string }; 720 | 721 | try { 722 | // Execute search with retry mechanism 723 | const results = await withRetry(async () => { 724 | // Step 1: Navigate to Google search page 725 | await safePageNavigation(page, 'https://www.google.com'); 726 | 727 | // Step 2: Find and interact with search input 728 | await withRetry(async () => { 729 | // Wait for any search input element to appear 730 | await Promise.race([ 731 | // Try multiple possible selectors for search input 732 | page.waitForSelector('input[name="q"]', { timeout: 5000 }), 733 | page.waitForSelector('textarea[name="q"]', { timeout: 5000 }), 734 | page.waitForSelector('input[type="text"]', { timeout: 5000 }) 735 | ]).catch(() => { 736 | throw new Error('Search input not found - no matching selectors'); 737 | }); 738 | 739 | // Find the actual search input element 740 | const searchInput = await page.$('input[name="q"]') || 741 | await page.$('textarea[name="q"]') || 742 | await page.$('input[type="text"]'); 743 | 744 | // Verify search input was found 745 | if (!searchInput) { 746 | throw new Error('Search input element not found after waiting'); 747 | } 748 | 749 | // Step 3: Enter search query 750 | await searchInput.click({ clickCount: 3 }); // Select all existing text 751 | await searchInput.press('Backspace'); // Clear selected text 752 | await searchInput.type(query); // Type new query 753 | }, 3, 2000); // Allow 3 retries with 2s delay 754 | 755 | // Step 4: Submit search and wait for results 756 | await withRetry(async () => { 757 | await Promise.all([ 758 | page.keyboard.press('Enter'), 759 | page.waitForLoadState('networkidle', { timeout: 15000 }), 760 | ]); 761 | }); 762 | 763 | // Step 5: Extract search results 764 | const searchResults = await withRetry(async () => { 765 | const results = await page.evaluate(() => { 766 | // Find all search result containers 767 | const elements = document.querySelectorAll('div.g'); 768 | if (!elements || elements.length === 0) { 769 | throw new Error('No search results found'); 770 | } 771 | 772 | // Extract data from each result 773 | return Array.from(elements).map((el) => { 774 | // Find required elements within result container 775 | const titleEl = el.querySelector('h3'); // Title element 776 | const linkEl = el.querySelector('a'); // Link element 777 | const snippetEl = el.querySelector('div.VwiC3b'); // Snippet element 778 | 779 | // Skip results missing required elements 780 | if (!titleEl || !linkEl || !snippetEl) { 781 | return null; 782 | } 783 | 784 | // Return structured result data 785 | return { 786 | title: titleEl.textContent || '', // Result title 787 | url: linkEl.getAttribute('href') || '', // Result URL 788 | snippet: snippetEl.textContent || '', // Result description 789 | }; 790 | }).filter(result => result !== null); // Remove invalid results 791 | }); 792 | 793 | // Verify we found valid results 794 | if (!results || results.length === 0) { 795 | throw new Error('No valid search results found'); 796 | } 797 | 798 | // Return compiled list of results 799 | return results; 800 | }); 801 | 802 | // Step 6: Store results in session 803 | searchResults.forEach((result) => { 804 | addResult({ 805 | url: result.url, 806 | title: result.title, 807 | content: result.snippet, 808 | timestamp: new Date().toISOString(), 809 | }); 810 | }); 811 | 812 | // Return compiled list of results 813 | return searchResults; 814 | }); 815 | 816 | // Step 7: Return formatted results 817 | return { 818 | content: [{ 819 | type: "text", 820 | text: JSON.stringify(results, null, 2) // Pretty-print JSON results 821 | }] 822 | }; 823 | } catch (error) { 824 | // Handle and format search errors 825 | return { 826 | content: [{ 827 | type: "text", 828 | text: `Failed to perform search: ${(error as Error).message}` 829 | }], 830 | isError: true 831 | }; 832 | } 833 | } 834 | // Handle Google Scholar search operations 835 | case "search_scholar": { 836 | // Extract search query from request parameters 837 | const { query } = request.params.arguments as { query: string }; 838 | 839 | try { 840 | // Execute search with retry mechanism 841 | const results = await withRetry(async () => { 842 | // Step 1: Navigate to Google Scholar search page 843 | await safePageNavigation(page, 'https://scholar.google.com'); 844 | 845 | // Step 2: Find and interact with search input 846 | await withRetry(async () => { 847 | // Wait for search input element to appear 848 | await page.waitForSelector('input[name="q"]', { timeout: 5000 }) 849 | .catch(() => { 850 | throw new Error('Scholar search input not found'); 851 | }); 852 | 853 | // Find the search input element 854 | const searchInput = await page.$('input[name="q"]'); 855 | 856 | // Verify search input was found 857 | if (!searchInput) { 858 | throw new Error('Scholar search input element not found after waiting'); 859 | } 860 | 861 | // Step 3: Enter search query 862 | await searchInput.click({ clickCount: 3 }); // Select all existing text 863 | await searchInput.press('Backspace'); // Clear selected text 864 | await searchInput.type(query); // Type new query 865 | }, 3, 2000); // Allow 3 retries with 2s delay 866 | 867 | // Step 4: Submit search and wait for results 868 | await withRetry(async () => { 869 | await Promise.all([ 870 | page.keyboard.press('Enter'), 871 | page.waitForLoadState('networkidle', { timeout: 15000 }), 872 | ]); 873 | }); 874 | 875 | // Step 5: Extract scholar search results 876 | const scholarResults = await withRetry(async () => { 877 | const results = await page.evaluate(() => { 878 | // Find all scholar result containers 879 | const elements = document.querySelectorAll('.gs_r.gs_or.gs_scl'); 880 | if (!elements || elements.length === 0) { 881 | throw new Error('No scholar search results found'); 882 | } 883 | 884 | // Extract data from each result 885 | return Array.from(elements).map((el) => { 886 | try { 887 | // Find required elements within result container 888 | const titleEl = el.querySelector('.gs_rt'); // Title element 889 | const authorEl = el.querySelector('.gs_a'); // Authors, venue, year 890 | const snippetEl = el.querySelector('.gs_rs'); // Snippet/abstract 891 | const citedByEl = el.querySelector('.gs_fl a:nth-child(3)'); // Cited by element 892 | 893 | // Extract title and URL 894 | let title = ''; 895 | let url = ''; 896 | if (titleEl) { 897 | const titleLink = titleEl.querySelector('a'); 898 | title = titleEl.textContent?.trim() || ''; 899 | url = titleLink?.getAttribute('href') || ''; 900 | } 901 | 902 | // Extract author, venue, and year information 903 | const authorInfo = authorEl?.textContent?.trim() || ''; 904 | 905 | // Extract snippet 906 | const snippet = snippetEl?.textContent?.trim() || ''; 907 | 908 | // Extract citation count 909 | let citationCount = ''; 910 | if (citedByEl && citedByEl.textContent?.includes('Cited by')) { 911 | citationCount = citedByEl.textContent.trim(); 912 | } 913 | 914 | // Skip results missing critical data 915 | if (!title) { 916 | return null; 917 | } 918 | 919 | // Return structured result data 920 | return { 921 | title, // Paper title 922 | url, // Paper URL if available 923 | authorInfo, // Authors, venue, year 924 | snippet, // Abstract/snippet 925 | citationCount, // Citation information 926 | }; 927 | } catch (err) { 928 | // Skip problematic results 929 | return null; 930 | } 931 | }).filter(result => result !== null); // Remove invalid results 932 | }); 933 | 934 | // Verify we found valid results 935 | if (!results || results.length === 0) { 936 | throw new Error('No valid scholar search results found'); 937 | } 938 | 939 | // Return compiled list of results 940 | return results; 941 | }); 942 | 943 | // Step 6: Store results in session 944 | scholarResults.forEach((result) => { 945 | addResult({ 946 | url: result.url || 'https://scholar.google.com', 947 | title: result.title, 948 | content: `${result.authorInfo}\n\n${result.snippet}\n\n${result.citationCount}`, 949 | timestamp: new Date().toISOString(), 950 | }); 951 | }); 952 | 953 | // Return compiled list of results 954 | return scholarResults; 955 | }); 956 | 957 | // Step 7: Return formatted results 958 | return { 959 | content: [{ 960 | type: "text", 961 | text: JSON.stringify(results, null, 2) // Pretty-print JSON results 962 | }] 963 | }; 964 | } catch (error) { 965 | // Handle and format search errors 966 | return { 967 | content: [{ 968 | type: "text", 969 | text: `Failed to perform scholar search: ${(error as Error).message}` 970 | }], 971 | isError: true 972 | }; 973 | } 974 | } 975 | 976 | // Handle webpage visit and content extraction 977 | case "visit_page": { 978 | // Extract URL and screenshot flag from request 979 | const { url, takeScreenshot } = request.params.arguments as { 980 | url: string; // Target URL to visit 981 | takeScreenshot?: boolean; // Optional screenshot flag 982 | }; 983 | 984 | // Step 1: Validate URL format and security 985 | if (!isValidUrl(url)) { 986 | return { 987 | content: [{ 988 | type: "text" as const, 989 | text: `Invalid URL: ${url}. Only http and https protocols are supported.` 990 | }], 991 | isError: true 992 | }; 993 | } 994 | 995 | try { 996 | // Step 2: Visit page and extract content with retry mechanism 997 | const result = await withRetry(async () => { 998 | // Navigate to target URL safely 999 | await safePageNavigation(page, url); 1000 | const title = await page.title(); 1001 | 1002 | // Step 3: Extract and process page content 1003 | const content = await withRetry(async () => { 1004 | // Convert page content to markdown 1005 | const extractedContent = await extractContentAsMarkdown(page); 1006 | 1007 | // If no content is extracted, throw an error 1008 | if (!extractedContent) { 1009 | throw new Error('Failed to extract content'); 1010 | } 1011 | 1012 | // Return the extracted content 1013 | return extractedContent; 1014 | }); 1015 | 1016 | // Step 4: Create result object with page data 1017 | const pageResult: ResearchResult = { 1018 | url, // Original URL 1019 | title, // Page title 1020 | content, // Markdown content 1021 | timestamp: new Date().toISOString(), // Capture time 1022 | }; 1023 | 1024 | // Step 5: Take screenshot if requested 1025 | let screenshotUri: string | undefined; 1026 | if (takeScreenshot) { 1027 | // Capture and process screenshot 1028 | const screenshot = await takeScreenshotWithSizeLimit(page); 1029 | pageResult.screenshotPath = await saveScreenshot(screenshot, title); 1030 | 1031 | // Get the index for the resource URI 1032 | const resultIndex = currentSession ? currentSession.results.length : 0; 1033 | screenshotUri = `research://screenshots/${resultIndex}`; 1034 | 1035 | // Notify clients about new screenshot resource 1036 | server.notification({ 1037 | method: "notifications/resources/list_changed" 1038 | }); 1039 | } 1040 | 1041 | // Step 6: Store result in session 1042 | addResult(pageResult); 1043 | return { pageResult, screenshotUri }; 1044 | }); 1045 | 1046 | // Step 7: Return formatted result with screenshot URI if taken 1047 | const response: ToolResult = { 1048 | content: [{ 1049 | type: "text" as const, 1050 | text: JSON.stringify({ 1051 | url: result.pageResult.url, 1052 | title: result.pageResult.title, 1053 | content: result.pageResult.content, 1054 | timestamp: result.pageResult.timestamp, 1055 | screenshot: result.screenshotUri ? `View screenshot via *MCP Resources* (Paperclip icon) @ URI: ${result.screenshotUri}` : undefined 1056 | }, null, 2) 1057 | }] 1058 | }; 1059 | 1060 | return response; 1061 | } catch (error) { 1062 | // Handle and format page visit errors 1063 | return { 1064 | content: [{ 1065 | type: "text" as const, 1066 | text: `Failed to visit page: ${(error as Error).message}` 1067 | }], 1068 | isError: true 1069 | }; 1070 | } 1071 | } 1072 | 1073 | // Handle standalone screenshot requests 1074 | case "take_screenshot": { 1075 | try { 1076 | // Step 1: Capture screenshot with retry mechanism 1077 | const screenshot = await withRetry(async () => { 1078 | // Take and optimize screenshot with default size limits 1079 | return await takeScreenshotWithSizeLimit(page); 1080 | }); 1081 | 1082 | // Step 2: Initialize session if needed 1083 | if (!currentSession) { 1084 | currentSession = { 1085 | query: "Screenshot Session", // Session identifier 1086 | results: [], // Empty results array 1087 | lastUpdated: new Date().toISOString(), // Current timestamp 1088 | }; 1089 | } 1090 | 1091 | // Step 3: Get current page information 1092 | const pageUrl = await page.url(); // Current page URL 1093 | const pageTitle = await page.title(); // Current page title 1094 | 1095 | // Step 4: Save screenshot to disk 1096 | const screenshotPath = await saveScreenshot(screenshot, pageTitle || 'untitled'); 1097 | 1098 | // Step 5: Create and store screenshot result 1099 | const resultIndex = currentSession ? currentSession.results.length : 0; 1100 | addResult({ 1101 | url: pageUrl, 1102 | title: pageTitle || "Untitled Page", // Fallback title if none available 1103 | content: "Screenshot taken", // Simple content description 1104 | timestamp: new Date().toISOString(), // Capture time 1105 | screenshotPath // Path to screenshot file 1106 | }); 1107 | 1108 | // Step 6: Notify clients about new screenshot resource 1109 | server.notification({ 1110 | method: "notifications/resources/list_changed" 1111 | }); 1112 | 1113 | // Step 7: Return success message with resource URI 1114 | const resourceUri = `research://screenshots/${resultIndex}`; 1115 | return { 1116 | content: [{ 1117 | type: "text" as const, 1118 | text: `Screenshot taken successfully. You can view it via *MCP Resources* (Paperclip icon) @ URI: ${resourceUri}` 1119 | }] 1120 | }; 1121 | } catch (error) { 1122 | // Handle and format screenshot errors 1123 | return { 1124 | content: [{ 1125 | type: "text" as const, 1126 | text: `Failed to take screenshot: ${(error as Error).message}` 1127 | }], 1128 | isError: true 1129 | }; 1130 | } 1131 | } 1132 | 1133 | // Handle unknown tool requests 1134 | default: 1135 | throw new McpError( 1136 | ErrorCode.MethodNotFound, 1137 | `Unknown tool: ${request.params.name}` 1138 | ); 1139 | } 1140 | }); 1141 | 1142 | // Register handler for prompt listing requests 1143 | server.setRequestHandler(ListPromptsRequestSchema, async () => { 1144 | // Return all available prompts 1145 | return { prompts: Object.values(PROMPTS) }; 1146 | }); 1147 | 1148 | // Register handler for prompt retrieval and execution 1149 | server.setRequestHandler(GetPromptRequestSchema, async (request) => { 1150 | // Extract and validate prompt name 1151 | const promptName = request.params.name as PromptName; 1152 | const prompt = PROMPTS[promptName]; 1153 | 1154 | // Handle unknown prompt requests 1155 | if (!prompt) { 1156 | throw new McpError(ErrorCode.InvalidRequest, `Prompt not found: ${promptName}`); 1157 | } 1158 | 1159 | // Handle agentic research prompt 1160 | if (promptName === "agentic-research") { 1161 | // Extract research topic from request arguments 1162 | const args = request.params.arguments as AgenticResearchArgs | undefined; 1163 | const topic = args?.topic || ""; // Use empty string if no topic provided 1164 | 1165 | // Return research assistant prompt with instructions 1166 | return { 1167 | messages: [ 1168 | // Initial assistant message establishing role 1169 | { 1170 | role: "assistant", 1171 | content: { 1172 | type: "text", 1173 | text: "I am ready to help you with your research. I will conduct thorough web research, explore topics deeply, and maintain a dialogue with you throughout the process." 1174 | } 1175 | }, 1176 | // Detailed research instructions for the user 1177 | { 1178 | role: "user", 1179 | content: { 1180 | type: "text", 1181 | text: `I'd like to research this topic: <topic>${topic}</topic> 1182 | 1183 | Please help me explore it deeply, like you're a thoughtful, highly-trained research assistant. 1184 | 1185 | General instructions: 1186 | 1. Start by proposing your research approach -- namely, formulate what initial query you will use to search the web. Propose a relatively broad search to understand the topic landscape. At the same time, make your queries optimized for returning high-quality results based on what you know about constructing Google search queries. 1187 | 2. Next, get my input on whether you should proceed with that query or if you should refine it. 1188 | 3. Once you have an approved query, perform the search. 1189 | 4. Prioritize high quality, authoritative sources when they are available and relevant to the topic. Avoid low quality or spammy sources. 1190 | 5. Retrieve information that is relevant to the topic at hand. 1191 | 6. Iteratively refine your research direction based on what you find. 1192 | 7. Keep me informed of what you find and let *me* guide the direction of the research interactively. 1193 | 8. If you run into a dead end while researching, do a Google search for the topic and attempt to find a URL for a relevant page. Then, explore that page in depth. 1194 | 9. Only conclude when my research goals are met. 1195 | 10. **Always cite your sources**, providing URLs to the sources you used in a citation block at the end of your response. 1196 | 1197 | You can use these tools: 1198 | - search_google: Search for information 1199 | - visit_page: Visit and extract content from web pages 1200 | 1201 | Do *NOT* use the following tools: 1202 | - Anything related to knowledge graphs or memory, unless explicitly instructed to do so by the user.` 1203 | } 1204 | } 1205 | ] 1206 | }; 1207 | } 1208 | 1209 | // Handle unsupported prompt types 1210 | throw new McpError(ErrorCode.InvalidRequest, "Prompt implementation not found"); 1211 | }); 1212 | 1213 | // In the ensureBrowser function, modify the initialization script: 1214 | async function ensureBrowser(): Promise<Page> { 1215 | if (!browser) { 1216 | browser = await chromium.launch({ 1217 | headless: true, 1218 | args: [ 1219 | '--disable-blink-features=AutomationControlled', 1220 | '--disable-features=IsolateOrigins,site-per-process', 1221 | '--disable-site-isolation-trials', 1222 | '--disable-setuid-sandbox', 1223 | '--no-sandbox', 1224 | '--disable-dev-shm-usage', 1225 | '--disable-accelerated-2d-canvas', 1226 | '--no-first-run', 1227 | '--no-service-autorun', 1228 | '--password-store=basic', 1229 | '--system-developer-mode', 1230 | '--enable-javascript', 1231 | `--window-size=${1366 + Math.floor(Math.random() * 100)},${768 + Math.floor(Math.random() * 100)}`, 1232 | ] 1233 | }); 1234 | 1235 | const context = await browser.newContext({ 1236 | userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36', 1237 | viewport: { width: 1366, height: 768 }, // Set fixed viewport instead of null 1238 | deviceScaleFactor: 1, 1239 | javaScriptEnabled: true, 1240 | }); 1241 | 1242 | await context.addInitScript(() => { 1243 | Object.defineProperty(navigator, 'webdriver', { 1244 | get: () => undefined 1245 | }); 1246 | 1247 | Object.defineProperty(navigator, 'permissions', { 1248 | get: () => ({ 1249 | query: async () => ({ state: 'prompt' as PermissionState }) 1250 | }) 1251 | }); 1252 | 1253 | window.chrome = { 1254 | runtime: {}, 1255 | loadTimes: function(){}, 1256 | csi: function(){}, 1257 | app: {}, 1258 | }; 1259 | 1260 | const originalToString = Error.prototype.toString; 1261 | Error.prototype.toString = function(this: Error) { 1262 | return originalToString.call(this).replace(/\n.*puppeteer.*\n/g, '\n'); 1263 | }; 1264 | 1265 | if (!window.Notification) { 1266 | const NotificationClass = function(title: string, options?: NotificationOptions) { 1267 | return { 1268 | title, 1269 | ...options 1270 | }; 1271 | } as unknown as typeof Notification; 1272 | 1273 | Object.defineProperty(NotificationClass, 'permission', { 1274 | value: 'default' as NotificationPermission, 1275 | writable: false, 1276 | configurable: false, 1277 | enumerable: true 1278 | }); 1279 | 1280 | NotificationClass.requestPermission = async () => 'default' as NotificationPermission; 1281 | NotificationClass.prototype = {} as Notification; 1282 | 1283 | Object.defineProperty(window, 'Notification', { 1284 | value: NotificationClass, 1285 | writable: false, 1286 | configurable: false 1287 | }); 1288 | } 1289 | 1290 | Object.defineProperty(navigator, 'languages', { 1291 | get: () => ['en-US', 'en'] 1292 | }); 1293 | }); 1294 | 1295 | page = await context.newPage(); 1296 | 1297 | await page.route('**', async (route) => { 1298 | const request = route.request(); 1299 | if (request.resourceType() === 'script') { 1300 | route.continue({ 1301 | headers: { 1302 | ...request.headers(), 1303 | 'sec-ch-ua': '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"', 1304 | 'sec-ch-ua-mobile': '?0', 1305 | 'sec-ch-ua-platform': '"Windows"' 1306 | } 1307 | }); 1308 | } else { 1309 | route.continue(); 1310 | } 1311 | }); 1312 | } 1313 | 1314 | if (!page) { 1315 | const context = await browser.newContext(); 1316 | page = await context.newPage(); 1317 | } 1318 | 1319 | return page; 1320 | } 1321 | 1322 | 1323 | // Cleanup function 1324 | async function cleanup(): Promise<void> { 1325 | try { 1326 | // Clean up screenshots first 1327 | await cleanupScreenshots(); 1328 | 1329 | // Then close the browser 1330 | if (browser) { 1331 | await browser.close(); 1332 | } 1333 | } catch (error) { 1334 | console.error('Error during cleanup:', error); 1335 | } finally { 1336 | browser = undefined; 1337 | page = undefined; 1338 | } 1339 | } 1340 | 1341 | // Register cleanup handlers 1342 | process.on('exit', cleanup); 1343 | process.on('SIGTERM', cleanup); 1344 | process.on('SIGINT', cleanup); 1345 | process.on('SIGHUP', cleanup); 1346 | ```