chuanmingliu/mcp-webresearch # codebase.md

This is page 1 of 2. Use http://codebase.md/chuanmingliu/mcp-webresearch?lines=true&page={x} to view the full context.

# Directory Structure

```
├── .cursorrules
├── .gitignore
├── docs
│   └── mcp_spec
│       └── llms-full.txt
├── index.ts
├── LICENSE
├── package.json
├── pnpm-lock.yaml
├── README.md
└── tsconfig.json
```

# Files

--------------------------------------------------------------------------------
/.cursorrules:
--------------------------------------------------------------------------------

```
1 | 1. Use pnpm instead of npm when generating packaging-related commands.
2 | 2. Only make changes to comments, code, or dependencies that are needed to accomplish the objective defined by the user. When editing code, don't remove comments or change dependencies or make changes that are unrelated to the code changes at hand. 
```

--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------

```
  1 | # Logs
  2 | logs
  3 | *.log
  4 | npm-debug.log*
  5 | yarn-debug.log*
  6 | yarn-error.log*
  7 | lerna-debug.log*
  8 | .pnpm-debug.log*
  9 | 
 10 | # Diagnostic reports (https://nodejs.org/api/report.html)
 11 | report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json
 12 | 
 13 | # Runtime data
 14 | pids
 15 | *.pid
 16 | *.seed
 17 | *.pid.lock
 18 | 
 19 | # Directory for instrumented libs generated by jscoverage/JSCover
 20 | lib-cov
 21 | 
 22 | # Coverage directory used by tools like istanbul
 23 | coverage
 24 | *.lcov
 25 | 
 26 | # nyc test coverage
 27 | .nyc_output
 28 | 
 29 | # Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files)
 30 | .grunt
 31 | 
 32 | # Bower dependency directory (https://bower.io/)
 33 | bower_components
 34 | 
 35 | # node-waf configuration
 36 | .lock-wscript
 37 | 
 38 | # Compiled binary addons (https://nodejs.org/api/addons.html)
 39 | build/Release
 40 | 
 41 | # Dependency directories
 42 | node_modules/
 43 | jspm_packages/
 44 | 
 45 | # Snowpack dependency directory (https://snowpack.dev/)
 46 | web_modules/
 47 | 
 48 | # TypeScript cache
 49 | *.tsbuildinfo
 50 | 
 51 | # Optional npm cache directory
 52 | .npm
 53 | 
 54 | # Optional eslint cache
 55 | .eslintcache
 56 | 
 57 | # Optional stylelint cache
 58 | .stylelintcache
 59 | 
 60 | # Microbundle cache
 61 | .rpt2_cache/
 62 | .rts2_cache_cjs/
 63 | .rts2_cache_es/
 64 | .rts2_cache_umd/
 65 | 
 66 | # Optional REPL history
 67 | .node_repl_history
 68 | 
 69 | # Output of 'npm pack'
 70 | *.tgz
 71 | 
 72 | # Yarn Integrity file
 73 | .yarn-integrity
 74 | 
 75 | # dotenv environment variable files
 76 | .env
 77 | .env.development.local
 78 | .env.test.local
 79 | .env.production.local
 80 | .env.local
 81 | 
 82 | # parcel-bundler cache (https://parceljs.org/)
 83 | .cache
 84 | .parcel-cache
 85 | 
 86 | # Next.js build output
 87 | .next
 88 | out
 89 | 
 90 | # Nuxt.js build / generate output
 91 | .nuxt
 92 | dist
 93 | 
 94 | # Gatsby files
 95 | .cache/
 96 | # Comment in the public line in if your project uses Gatsby and not Next.js
 97 | # https://nextjs.org/blog/next-9-1#public-directory-support
 98 | # public
 99 | 
100 | # vuepress build output
101 | .vuepress/dist
102 | 
103 | # vuepress v2.x temp and cache directory
104 | .temp
105 | .cache
106 | 
107 | # Docusaurus cache and generated files
108 | .docusaurus
109 | 
110 | # Serverless directories
111 | .serverless/
112 | 
113 | # FuseBox cache
114 | .fusebox/
115 | 
116 | # DynamoDB Local files
117 | .dynamodb/
118 | 
119 | # TernJS port file
120 | .tern-port
121 | 
122 | # Stores VSCode versions used for testing VSCode extensions
123 | .vscode-test
124 | 
125 | # yarn v2
126 | .yarn/cache
127 | .yarn/unplugged
128 | .yarn/build-state.yml
129 | .yarn/install-state.gz
130 | .pnp.*
131 | 
```

--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------

```markdown
  1 | # MCP Web Research Server
  2 | 
  3 | A Model Context Protocol (MCP) server for web research. 
  4 | 
  5 | Bring real-time info into Claude and easily research any topic.
  6 | 
  7 | ## Features
  8 | 
  9 | - Google search integration
 10 | - Webpage content extraction
 11 | - Research session tracking (list of visited pages, search queries, etc.)
 12 | - Screenshot capture
 13 | 
 14 | ## Prerequisites
 15 | 
 16 | - [Node.js](https://nodejs.org/) >= 18 (includes `npm` and `npx`)
 17 | - [Claude Desktop app](https://claude.ai/download)
 18 | 
 19 | ## Installation
 20 | 
 21 | First, ensure you've downloaded and installed the [Claude Desktop app](https://claude.ai/download) and you have npm installed.
 22 | 
 23 | Next, add this entry to your `claude_desktop_config.json` (on Mac, found at `~/Library/Application\ Support/Claude/claude_desktop_config.json`):
 24 | 
 25 | ```json
 26 | {
 27 |   "mcpServers": {
 28 |     "webresearch": {
 29 |       "command": "npx",
 30 |       "args": ["-y", "@mzxrai/mcp-webresearch@latest"]
 31 |     }
 32 |   }
 33 | }
 34 | ```
 35 | 
 36 | This config allows Claude Desktop to automatically start the web research MCP server when needed.
 37 | 
 38 | ## Usage
 39 | 
 40 | Simply start a chat with Claude and send a prompt that would benefit from web research. If you'd like a prebuilt prompt customized for deeper web research, you can use the `agentic-research` prompt that we provide through this package. Access that prompt in Claude Desktop by clicking the Paperclip icon in the chat input and then selecting `Choose an integration` → `webresearch` → `agentic-research`.
 41 | 
 42 | <img src="https://i.ibb.co/N6Y3C0q/Screenshot-2024-12-05-at-11-01-27-PM.png" alt="Example screenshot of web research" width="400"/>
 43 | 
 44 | ### Tools
 45 | 
 46 | 1. `search_google`
 47 |    - Performs Google searches and extracts results
 48 |    - Arguments: `{ query: string }`
 49 | 
 50 | 2. `visit_page`
 51 |    - Visits a webpage and extracts its content
 52 |    - Arguments: `{ url: string, takeScreenshot?: boolean }`
 53 | 
 54 | 3. `take_screenshot`
 55 |    - Takes a screenshot of the current page
 56 |    - No arguments required
 57 | 
 58 | ### Prompts
 59 | 
 60 | #### `agentic-research`
 61 | A guided research prompt that helps Claude conduct thorough web research. The prompt instructs Claude to:
 62 | - Start with broad searches to understand the topic landscape
 63 | - Prioritize high-quality, authoritative sources
 64 | - Iteratively refine the research direction based on findings
 65 | - Keep you informed and let you guide the research interactively
 66 | - Always cite sources with URLs
 67 | 
 68 | ### Resources
 69 | 
 70 | We expose two things as MCP resources: (1) captured webpage screenshots, and (2) the research session.
 71 | 
 72 | #### Screenshots
 73 | 
 74 | When you take a screenshot, it's saved as an MCP resource. You can access captured screenshots in Claude Desktop via the Paperclip icon.
 75 | 
 76 | #### Research Session
 77 | 
 78 | The server maintains a research session that includes:
 79 | - Search queries
 80 | - Visited pages
 81 | - Extracted content
 82 | - Screenshots
 83 | - Timestamps
 84 | 
 85 | ### Suggestions
 86 | 
 87 | For the best results, if you choose not to use the `agentic-research` prompt when doing your research, it may be helpful to suggest high-quality sources for Claude to use when researching general topics. For example, you could prompt `news today from reuters or AP` instead of `news today`.
 88 | 
 89 | ## Problems
 90 | 
 91 | This is very much pre-alpha code. And it is also AIGC, so expect bugs.
 92 | 
 93 | If you run into issues, it may be helpful to check Claude Desktop's MCP logs:
 94 | 
 95 | ```bash
 96 | tail -n 20 -f ~/Library/Logs/Claude/mcp*.log
 97 | ```
 98 | 
 99 | ## Development
100 | 
101 | ```bash
102 | # Install dependencies
103 | pnpm install
104 | 
105 | # Build the project
106 | pnpm build
107 | 
108 | # Watch for changes
109 | pnpm watch
110 | 
111 | # Run in development mode
112 | pnpm dev
113 | ```
114 | 
115 | ## Requirements
116 | 
117 | - Node.js >= 18
118 | - Playwright (automatically installed as a dependency)
119 | 
120 | ## Verified Platforms
121 | 
122 | - [x] macOS
123 | - [ ] Linux
124 | 
125 | ## License
126 | 
127 | MIT
128 | 
129 | ## Author
130 | 
131 | [mzxrai](https://github.com/mzxrai) 
```

--------------------------------------------------------------------------------
/tsconfig.json:
--------------------------------------------------------------------------------

```json
 1 | {
 2 |   "compilerOptions": {
 3 |     "target": "ES2023",
 4 |     "module": "NodeNext",
 5 |     "moduleResolution": "NodeNext",
 6 |     "esModuleInterop": true,
 7 |     "strict": true,
 8 |     "outDir": "dist",
 9 |     "sourceMap": true,
10 |     "declaration": true,
11 |     "skipLibCheck": true,
12 |     "lib": [
13 |       "ES2023",
14 |       "DOM",
15 |       "DOM.Iterable"
16 |     ]
17 |   },
18 |   "include": [
19 |     "*.ts"
20 |   ],
21 |   "exclude": [
22 |     "node_modules",
23 |     "dist"
24 |   ]
25 | }
```

--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------

```json
 1 | {
 2 |   "name": "@mzxrai/mcp-webresearch",
 3 |   "version": "0.1.7",
 4 |   "description": "MCP server for web research",
 5 |   "license": "MIT",
 6 |   "author": "mzxrai",
 7 |   "homepage": "https://github.com/mzxrai/mcp-webresearch",
 8 |   "bugs": "https://github.com/mzxrai/mcp-webresearch/issues",
 9 |   "type": "module",
10 |   "bin": {
11 |     "mcp-server-webresearch": "dist/index.js"
12 |   },
13 |   "files": [
14 |     "dist"
15 |   ],
16 |   "scripts": {
17 |     "build": "tsc && shx chmod +x dist/*.js",
18 |     "prepare": "pnpm run build",
19 |     "postinstall": "playwright install chromium",
20 |     "watch": "tsc --watch",
21 |     "dev": "tsx watch index.ts"
22 |   },
23 |   "publishConfig": {
24 |     "access": "public"
25 |   },
26 |   "keywords": [
27 |     "mcp",
28 |     "model-context-protocol",
29 |     "web-research",
30 |     "ai",
31 |     "web-scraping"
32 |   ],
33 |   "dependencies": {
34 |     "@modelcontextprotocol/sdk": "1.0.1",
35 |     "playwright": "^1.49.0",
36 |     "turndown": "^7.1.2"
37 |   },
38 |   "devDependencies": {
39 |     "shx": "^0.3.4",
40 |     "tsx": "^4.19.2",
41 |     "typescript": "^5.6.2",
42 |     "@types/turndown": "^5.0.4"
43 |   }
44 | }
```

--------------------------------------------------------------------------------
/index.ts:
--------------------------------------------------------------------------------

```typescript
   1 | #!/usr/bin/env node
   2 | 
   3 | // Core dependencies for MCP server and protocol handling
   4 | import { Server } from "@modelcontextprotocol/sdk/server/index.js";
   5 | import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
   6 | import {
   7 |     CallToolRequestSchema,
   8 |     ListResourcesRequestSchema,
   9 |     ListToolsRequestSchema,
  10 |     ReadResourceRequestSchema,
  11 |     ListPromptsRequestSchema,
  12 |     GetPromptRequestSchema,
  13 |     Tool,
  14 |     Resource,
  15 |     McpError,
  16 |     ErrorCode,
  17 |     TextContent,
  18 |     ImageContent,
  19 | } from "@modelcontextprotocol/sdk/types.js";
  20 | 
  21 | // Web scraping and content processing dependencies
  22 | import { chromium, Browser, Page } from 'playwright';
  23 | import TurndownService from "turndown";
  24 | import type { Node } from "turndown";
  25 | import * as fs from 'fs';
  26 | import * as path from 'path';
  27 | import * as os from 'os';
  28 | 
  29 | // Initialize temp directory for screenshots
  30 | const SCREENSHOTS_DIR = fs.mkdtempSync(path.join(os.tmpdir(), 'mcp-screenshots-'));
  31 | 
  32 | // Initialize Turndown service for converting HTML to Markdown
  33 | // Configure with specific formatting preferences
  34 | const turndownService: TurndownService = new TurndownService({
  35 |     headingStyle: 'atx',       // Use # style headings
  36 |     hr: '---',                 // Horizontal rule style
  37 |     bulletListMarker: '-',     // List item marker
  38 |     codeBlockStyle: 'fenced',  // Use ``` for code blocks
  39 |     emDelimiter: '_',          // Italics style
  40 |     strongDelimiter: '**',     // Bold style
  41 |     linkStyle: 'inlined',      // Use inline links
  42 | });
  43 | 
  44 | // Custom Turndown rules for better content extraction
  45 | // Remove script and style tags completely
  46 | turndownService.addRule('removeScripts', {
  47 |     filter: ['script', 'style', 'noscript'],
  48 |     replacement: () => ''
  49 | });
  50 | 
  51 | // Preserve link elements with their href attributes
  52 | turndownService.addRule('preserveLinks', {
  53 |     filter: 'a',
  54 |     replacement: (content: string, node: Node) => {
  55 |         const element = node as HTMLAnchorElement;
  56 |         const href = element.getAttribute('href');
  57 |         return href ? `[${content}](${href})` : content;
  58 |     }
  59 | });
  60 | 
  61 | // Preserve image elements with their src and alt attributes
  62 | turndownService.addRule('preserveImages', {
  63 |     filter: 'img',
  64 |     replacement: (content: string, node: Node) => {
  65 |         const element = node as HTMLImageElement;
  66 |         const alt = element.getAttribute('alt') || '';
  67 |         const src = element.getAttribute('src');
  68 |         return src ? `![${alt}](${src})` : '';
  69 |     }
  70 | });
  71 | 
  72 | // Core interfaces for research data management
  73 | interface ResearchResult {
  74 |     url: string;             // URL of the researched page
  75 |     title: string;           // Page title
  76 |     content: string;         // Extracted content in markdown
  77 |     timestamp: string;       // When the result was captured
  78 |     screenshotPath?: string; // Path to screenshot file on disk
  79 | }
  80 | 
  81 | // Define structure for research session data
  82 | interface ResearchSession {
  83 |     query: string;              // Search query that initiated the session
  84 |     results: ResearchResult[];  // Collection of research results
  85 |     lastUpdated: string;        // Timestamp of last update
  86 | }
  87 | 
  88 | // Screenshot management functions
  89 | async function saveScreenshot(screenshot: string, title: string): Promise<string> {
  90 |     // Convert screenshot from base64 to buffer
  91 |     const buffer = Buffer.from(screenshot, 'base64');
  92 | 
  93 |     // Check size before saving
  94 |     const MAX_SIZE = 5 * 1024 * 1024;  // 5MB
  95 |     if (buffer.length > MAX_SIZE) {
  96 |         throw new McpError(
  97 |             ErrorCode.InvalidRequest,
  98 |             `Screenshot too large: ${Math.round(buffer.length / (1024 * 1024))}MB exceeds ${MAX_SIZE / (1024 * 1024)}MB limit`
  99 |         );
 100 |     }
 101 | 
 102 |     // Generate a safe filename
 103 |     const timestamp = new Date().getTime();
 104 |     const safeTitle = title.replace(/[^a-z0-9]/gi, '_').toLowerCase();
 105 |     const filename = `${safeTitle}-${timestamp}.png`;
 106 |     const filepath = path.join(SCREENSHOTS_DIR, filename);
 107 | 
 108 |     // Save the validated screenshot
 109 |     await fs.promises.writeFile(filepath, buffer);
 110 | 
 111 |     // Return the filepath to the saved screenshot
 112 |     return filepath;
 113 | }
 114 | 
 115 | // Cleanup function to remove all screenshots from disk
 116 | async function cleanupScreenshots(): Promise<void> {
 117 |     try {
 118 |         // Remove all files in the screenshots directory
 119 |         const files = await fs.promises.readdir(SCREENSHOTS_DIR);
 120 |         await Promise.all(files.map(file =>
 121 |             fs.promises.unlink(path.join(SCREENSHOTS_DIR, file))
 122 |         ));
 123 | 
 124 |         // Remove the directory itself
 125 |         await fs.promises.rmdir(SCREENSHOTS_DIR);
 126 |     } catch (error) {
 127 |         console.error('Error cleaning up screenshots:', error);
 128 |     }
 129 | }
 130 | 
 131 | // Available tools for web research functionality
 132 | const TOOLS: Tool[] = [
 133 |     {
 134 |         name: "search_google",
 135 |         description: "Search Google for a query",
 136 |         inputSchema: {
 137 |             type: "object",
 138 |             properties: {
 139 |                 query: { type: "string", description: "Search query" },
 140 |             },
 141 |             required: ["query"],
 142 |         },
 143 |     },
 144 |     {
 145 |         name: "visit_page",
 146 |         description: "Visit a webpage and extract its content",
 147 |         inputSchema: {
 148 |             type: "object",
 149 |             properties: {
 150 |                 url: { type: "string", description: "URL to visit" },
 151 |                 takeScreenshot: { type: "boolean", description: "Whether to take a screenshot" },
 152 |             },
 153 |             required: ["url"],
 154 |         },
 155 |     },
 156 |     {
 157 |         name: "take_screenshot",
 158 |         description: "Take a screenshot of the current page",
 159 |         inputSchema: {
 160 |             type: "object",
 161 |             properties: {},  // No parameters needed
 162 |         },
 163 |     },
 164 | ];
 165 | 
 166 | // Define available prompt types for type safety
 167 | type PromptName = "agentic-research";
 168 | 
 169 | // Define structure for research prompt arguments
 170 | interface AgenticResearchArgs {
 171 |     topic: string;  // Research topic provided by user
 172 | }
 173 | 
 174 | // Configure available prompts with their specifications
 175 | const PROMPTS = {
 176 |     // Agentic research prompt configuration
 177 |     "agentic-research": {
 178 |         name: "agentic-research" as const,  // Type-safe name
 179 |         description: "Conduct iterative web research on a topic, exploring it thoroughly through multiple steps while maintaining a dialogue with the user",
 180 |         arguments: [
 181 |             {
 182 |                 name: "topic",                                     // Topic argument specification
 183 |                 description: "The topic or question to research",  // Description of the argument
 184 |                 required: true                                     // Topic is mandatory
 185 |             }
 186 |         ]
 187 |     }
 188 | } as const;  // Make object immutable
 189 | 
 190 | // Global state management for browser and research session
 191 | let browser: Browser | undefined;                 // Puppeteer browser instance
 192 | let page: Page | undefined;                       // Current active page
 193 | let currentSession: ResearchSession | undefined;  // Current research session data
 194 | 
 195 | // Configuration constants for session management
 196 | const MAX_RESULTS_PER_SESSION = 100;  // Maximum number of results to store per session
 197 | const MAX_RETRIES = 3;                // Maximum retry attempts for operations
 198 | const RETRY_DELAY = 1000;             // Delay between retries in milliseconds
 199 | 
 200 | // Generic retry mechanism for handling transient failures
 201 | async function withRetry<T>(
 202 |     operation: () => Promise<T>,  // Operation to retry
 203 |     retries = MAX_RETRIES,        // Number of retry attempts
 204 |     delay = RETRY_DELAY           // Delay between retries
 205 | ): Promise<T> {
 206 |     let lastError: Error;
 207 | 
 208 |     // Attempt operation up to max retries
 209 |     for (let i = 0; i < retries; i++) {
 210 |         try {
 211 |             return await operation();
 212 |         } catch (error) {
 213 |             lastError = error as Error;
 214 |             if (i < retries - 1) {
 215 |                 console.error(`Attempt ${i + 1} failed, retrying in ${delay}ms:`, error);
 216 |                 await new Promise(resolve => setTimeout(resolve, delay));
 217 |             }
 218 |         }
 219 |     }
 220 | 
 221 |     throw lastError!;  // Throw last error if all retries failed
 222 | }
 223 | 
 224 | // Add a new research result to the current session with data management
 225 | function addResult(result: ResearchResult): void {
 226 |     // If no current session exists, initialize a new one
 227 |     if (!currentSession) {
 228 |         currentSession = {
 229 |             query: "Research Session",
 230 |             results: [],
 231 |             lastUpdated: new Date().toISOString(),
 232 |         };
 233 |     }
 234 | 
 235 |     // If the session has reached the maximum number of results, remove the oldest result
 236 |     if (currentSession.results.length >= MAX_RESULTS_PER_SESSION) {
 237 |         currentSession.results.shift();
 238 |     }
 239 | 
 240 |     // Add the new result to the session and update the last updated timestamp
 241 |     currentSession.results.push(result);
 242 |     currentSession.lastUpdated = new Date().toISOString();
 243 | }
 244 | 
 245 | /**
 246 |  * Specifically handles Google's consent dialog in regions that require it
 247 |  * @param page - Playwright Page object
 248 |  */
 249 | async function dismissGoogleConsent(page: Page): Promise<void> {
 250 |     // Regions that commonly show cookie/consent banners
 251 |     const regions = [
 252 |         // Europe
 253 |         '.google.de', '.google.fr', '.google.co.uk',
 254 |         '.google.it', '.google.es', '.google.nl',
 255 |         '.google.pl', '.google.ie', '.google.dk',
 256 |         '.google.no', '.google.se', '.google.fi',
 257 |         '.google.at', '.google.ch', '.google.be',
 258 |         '.google.pt', '.google.gr', '.google.com.tr',
 259 |         // Asia Pacific
 260 |         '.google.co.id', '.google.com.sg', '.google.co.th',
 261 |         '.google.com.my', '.google.com.ph', '.google.com.au',
 262 |         '.google.co.nz', '.google.com.vn',
 263 |         // Generic domains
 264 |         '.google.com', '.google.co'
 265 |     ];
 266 | 
 267 |     try {
 268 |         // Get current URL
 269 |         const currentUrl = page.url();
 270 | 
 271 |         // Skip consent check if not in a supported region
 272 |         if (!regions.some(domain => currentUrl.includes(domain))) {
 273 |             return;
 274 |         }
 275 | 
 276 |         // Quick check for consent dialog existence
 277 |         const hasConsent = await page.$(
 278 |             'form:has(button[aria-label]), div[aria-modal="true"], ' +
 279 |             // Common dialog containers
 280 |             'div[role="dialog"], div[role="alertdialog"], ' +
 281 |             // Common cookie/consent specific elements
 282 |             'div[class*="consent"], div[id*="consent"], ' +
 283 |             'div[class*="cookie"], div[id*="cookie"], ' +
 284 |             // Common modal/popup classes
 285 |             'div[class*="modal"]:has(button), div[class*="popup"]:has(button), ' +
 286 |             // Common banner patterns
 287 |             'div[class*="banner"]:has(button), div[id*="banner"]:has(button)'
 288 |         ).then(Boolean);
 289 | 
 290 |         // If no consent dialog is found, return
 291 |         if (!hasConsent) {
 292 |             return;
 293 |         }
 294 | 
 295 |         // Handle the consent dialog using common consent button patterns
 296 |         await page.evaluate(() => {
 297 |             const consentPatterns = {
 298 |                 // Common accept button text patterns across languages
 299 |                 text: [
 300 |                     // English
 301 |                     'accept all', 'agree', 'consent',
 302 |                     // German
 303 |                     'alle akzeptieren', 'ich stimme zu', 'zustimmen',
 304 |                     // French
 305 |                     'tout accepter', 'j\'accepte',
 306 |                     // Spanish
 307 |                     'aceptar todo', 'acepto',
 308 |                     // Italian
 309 |                     'accetta tutto', 'accetto',
 310 |                     // Portuguese
 311 |                     'aceitar tudo', 'concordo',
 312 |                     // Dutch
 313 |                     'alles accepteren', 'akkoord',
 314 |                     // Polish
 315 |                     'zaakceptuj wszystko', 'zgadzam się',
 316 |                     // Swedish
 317 |                     'godkänn alla', 'godkänn',
 318 |                     // Danish
 319 |                     'accepter alle', 'accepter',
 320 |                     // Norwegian
 321 |                     'godta alle', 'godta',
 322 |                     // Finnish
 323 |                     'hyväksy kaikki', 'hyväksy',
 324 |                     // Indonesian
 325 |                     'terima semua', 'setuju', 'saya setuju',
 326 |                     // Malay
 327 |                     'terima semua', 'setuju',
 328 |                     // Thai
 329 |                     'ยอมรับทั้งหมด', 'ยอมรับ',
 330 |                     // Vietnamese
 331 |                     'chấp nhận tất cả', 'đồng ý',
 332 |                     // Filipino/Tagalog
 333 |                     'tanggapin lahat', 'sumang-ayon',
 334 |                     // Japanese
 335 |                     'すべて同意する', '同意する',
 336 |                     // Korean
 337 |                     '모두 동의', '동의'
 338 |                 ],
 339 |                 // Common aria-label patterns
 340 |                 ariaLabels: [
 341 |                     'consent', 'accept', 'agree',
 342 |                     'cookie', 'privacy', 'terms',
 343 |                     'persetujuan', 'setuju',  // Indonesian
 344 |                     'ยอมรับ',  // Thai
 345 |                     'đồng ý',  // Vietnamese
 346 |                     '同意'     // Japanese/Chinese
 347 |                 ]
 348 |             };
 349 | 
 350 |             // Finds the accept button by text or aria-label
 351 |             const findAcceptButton = () => {
 352 |                 // Get all buttons on the page
 353 |                 const buttons = Array.from(document.querySelectorAll('button'));
 354 | 
 355 |                 // Find the accept button
 356 |                 return buttons.find(button => {
 357 |                     // Get the text content and aria-label of the button
 358 |                     const text = button.textContent?.toLowerCase() || '';
 359 |                     const label = button.getAttribute('aria-label')?.toLowerCase() || '';
 360 | 
 361 |                     // Check for matching text patterns
 362 |                     const hasMatchingText = consentPatterns.text.some(pattern =>
 363 |                         text.includes(pattern)
 364 |                     );
 365 | 
 366 |                     // Check for matching aria-labels
 367 |                     const hasMatchingLabel = consentPatterns.ariaLabels.some(pattern =>
 368 |                         label.includes(pattern)
 369 |                     );
 370 | 
 371 |                     // Return true if either text or aria-label matches
 372 |                     return hasMatchingText || hasMatchingLabel;
 373 |                 });
 374 |             };
 375 | 
 376 |             // Find the accept button
 377 |             const acceptButton = findAcceptButton();
 378 | 
 379 |             // If an accept button is found, click it
 380 |             if (acceptButton) {
 381 |                 acceptButton.click();
 382 |             }
 383 |         });
 384 |     } catch (error) {
 385 |         console.log('Consent handling failed:', error);
 386 |     }
 387 | }
 388 | 
 389 | // Safe page navigation with error handling and bot detection
 390 | async function safePageNavigation(page: Page, url: string): Promise<void> {
 391 |     try {
 392 |         // Step 1: Set cookies to bypass consent banner
 393 |         await page.context().addCookies([{
 394 |             name: 'CONSENT',
 395 |             value: 'YES+',
 396 |             domain: '.google.com',
 397 |             path: '/'
 398 |         }]);
 399 | 
 400 |         // Step 2: Initial navigation
 401 |         const response = await page.goto(url, {
 402 |             waitUntil: 'domcontentloaded',
 403 |             timeout: 15000
 404 |         });
 405 | 
 406 |         // Step 3: Basic response validation
 407 |         if (!response) {
 408 |             throw new Error('Navigation failed: no response received');
 409 |         }
 410 | 
 411 |         // Check HTTP status code; if 400 or higher, throw an error
 412 |         const status = response.status();
 413 |         if (status >= 400) {
 414 |             throw new Error(`HTTP ${status}: ${response.statusText()}`);
 415 |         }
 416 | 
 417 |         // Step 4: Wait for network to become idle or timeout
 418 |         await Promise.race([
 419 |             page.waitForLoadState('networkidle', { timeout: 5000 })
 420 |                 .catch(() => {/* ignore timeout */ }),
 421 |             // Fallback timeout in case networkidle never occurs
 422 |             new Promise(resolve => setTimeout(resolve, 5000))
 423 |         ]);
 424 | 
 425 |         // Step 5: Security and content validation
 426 |         const validation = await page.evaluate(() => {
 427 |             const botProtectionExists = [
 428 |                 '#challenge-running',     // Cloudflare
 429 |                 '#cf-challenge-running',  // Cloudflare
 430 |                 '#px-captcha',            // PerimeterX
 431 |                 '#ddos-protection',       // Various
 432 |                 '#waf-challenge-html'     // Various WAFs
 433 |             ].some(selector => document.querySelector(selector));
 434 | 
 435 |             // Check for suspicious page titles
 436 |             const suspiciousTitle = [
 437 |                 'security check',
 438 |                 'ddos protection',
 439 |                 'please wait',
 440 |                 'just a moment',
 441 |                 'attention required'
 442 |             ].some(phrase => document.title.toLowerCase().includes(phrase));
 443 | 
 444 |             // Count words in the page content
 445 |             const bodyText = document.body.innerText || '';
 446 |             const words = bodyText.trim().split(/\s+/).length;
 447 | 
 448 |             // Return validation results
 449 |             return {
 450 |                 wordCount: words,
 451 |                 botProtection: botProtectionExists,
 452 |                 suspiciousTitle,
 453 |                 title: document.title
 454 |             };
 455 |         });
 456 | 
 457 |         // If bot protection is detected, throw an error
 458 |         if (validation.botProtection) {
 459 |             throw new Error('Bot protection detected');
 460 |         }
 461 | 
 462 |         // If the page title is suspicious, throw an error
 463 |         if (validation.suspiciousTitle) {
 464 |             throw new Error(`Suspicious page title detected: "${validation.title}"`);
 465 |         }
 466 | 
 467 |         // If the page contains insufficient content, throw an error
 468 |         if (validation.wordCount < 10) {
 469 |             throw new Error('Page contains insufficient content');
 470 |         }
 471 | 
 472 |     } catch (error) {
 473 |         // If an error occurs during navigation, throw an error with the URL and the error message
 474 |         throw new Error(`Navigation to ${url} failed: ${(error as Error).message}`);
 475 |     }
 476 | }
 477 | 
 478 | // Take and optimize a screenshot
 479 | async function takeScreenshotWithSizeLimit(page: Page): Promise<string> {
 480 |     const MAX_SIZE = 5 * 1024 * 1024;
 481 |     const MAX_DIMENSION = 1920;
 482 |     const MIN_DIMENSION = 800;
 483 | 
 484 |     // Set viewport size
 485 |     await page.setViewportSize({
 486 |         width: 1600,
 487 |         height: 900
 488 |     });
 489 | 
 490 |     // Take initial screenshot
 491 |     let screenshot = await page.screenshot({
 492 |         type: 'png',
 493 |         fullPage: false
 494 |     });
 495 | 
 496 |     // Handle buffer conversion
 497 |     let buffer = screenshot;
 498 |     let attempts = 0;
 499 |     const MAX_ATTEMPTS = 3;
 500 | 
 501 |     // While screenshot is too large, reduce size
 502 |     while (buffer.length > MAX_SIZE && attempts < MAX_ATTEMPTS) {
 503 |         // Get current viewport size
 504 |         const viewport = page.viewportSize();
 505 |         if (!viewport) continue;
 506 | 
 507 |         // Calculate new dimensions
 508 |         const scaleFactor = Math.pow(0.75, attempts + 1);
 509 |         let newWidth = Math.round(viewport.width * scaleFactor);
 510 |         let newHeight = Math.round(viewport.height * scaleFactor);
 511 | 
 512 |         // Ensure dimensions are within bounds
 513 |         newWidth = Math.max(MIN_DIMENSION, Math.min(MAX_DIMENSION, newWidth));
 514 |         newHeight = Math.max(MIN_DIMENSION, Math.min(MAX_DIMENSION, newHeight));
 515 | 
 516 |         // Update viewport with new dimensions
 517 |         await page.setViewportSize({
 518 |             width: newWidth,
 519 |             height: newHeight
 520 |         });
 521 | 
 522 |         // Take new screenshot
 523 |         screenshot = await page.screenshot({
 524 |             type: 'png',
 525 |             fullPage: false
 526 |         });
 527 | 
 528 |         // Update buffer with new screenshot
 529 |         buffer = screenshot;
 530 | 
 531 |         // Increment retry attempts
 532 |         attempts++;
 533 |     }
 534 | 
 535 |     // Final attempt with minimum settings
 536 |     if (buffer.length > MAX_SIZE) {
 537 |         await page.setViewportSize({
 538 |             width: MIN_DIMENSION,
 539 |             height: MIN_DIMENSION
 540 |         });
 541 | 
 542 |         // Take final screenshot
 543 |         screenshot = await page.screenshot({
 544 |             type: 'png',
 545 |             fullPage: false
 546 |         });
 547 | 
 548 |         // Update buffer with final screenshot
 549 |         buffer = screenshot;
 550 | 
 551 |         // Throw error if final screenshot is still too large
 552 |         if (buffer.length > MAX_SIZE) {
 553 |             throw new McpError(
 554 |                 ErrorCode.InvalidRequest,
 555 |                 `Failed to reduce screenshot to under 5MB even with minimum settings`
 556 |             );
 557 |         }
 558 |     }
 559 | 
 560 |     // Convert Buffer to base64 string before returning
 561 |     return buffer.toString('base64');
 562 | }
 563 | 
 564 | // Initialize MCP server with basic configuration
 565 | const server: Server = new Server(
 566 |     {
 567 |         name: "webresearch",  // Server name identifier
 568 |         version: "0.1.7",     // Server version number
 569 |     },
 570 |     {
 571 |         capabilities: {
 572 |             tools: {},      // Available tool configurations
 573 |             resources: {},  // Resource handling capabilities
 574 |             prompts: {}     // Prompt processing capabilities
 575 |         },
 576 |     }
 577 | );
 578 | 
 579 | // Register handler for tool listing requests
 580 | server.setRequestHandler(ListToolsRequestSchema, async () => ({
 581 |     tools: TOOLS  // Return list of available research tools
 582 | }));
 583 | 
 584 | // Register handler for resource listing requests
 585 | server.setRequestHandler(ListResourcesRequestSchema, async () => {
 586 |     // Return empty list if no active session
 587 |     if (!currentSession) {
 588 |         return { resources: [] };
 589 |     }
 590 | 
 591 |     // Compile list of available resources
 592 |     const resources: Resource[] = [
 593 |         // Add session summary resource
 594 |         {
 595 |             uri: "research://current/summary",  // Resource identifier
 596 |             name: "Current Research Session Summary",
 597 |             description: "Summary of the current research session including queries and results",
 598 |             mimeType: "application/json"
 599 |         },
 600 |         // Add screenshot resources if available
 601 |         ...currentSession.results
 602 |             .map((r, i): Resource | undefined => r.screenshotPath ? {
 603 |                 uri: `research://screenshots/${i}`,
 604 |                 name: `Screenshot of ${r.title}`,
 605 |                 description: `Screenshot taken from ${r.url}`,
 606 |                 mimeType: "image/png"
 607 |             } : undefined)
 608 |             .filter((r): r is Resource => r !== undefined)
 609 |     ];
 610 | 
 611 |     // Return compiled list of resources
 612 |     return { resources };
 613 | });
 614 | 
 615 | // Register handler for resource content requests
 616 | server.setRequestHandler(ReadResourceRequestSchema, async (request) => {
 617 |     const uri = request.params.uri.toString();
 618 | 
 619 |     // Handle session summary requests for research data
 620 |     if (uri === "research://current/summary") {
 621 |         if (!currentSession) {
 622 |             throw new McpError(
 623 |                 ErrorCode.InvalidRequest,
 624 |                 "No active research session"
 625 |             );
 626 |         }
 627 | 
 628 |         // Return compiled list of resources
 629 |         return {
 630 |             contents: [{
 631 |                 uri,
 632 |                 mimeType: "application/json",
 633 |                 text: JSON.stringify({
 634 |                     query: currentSession.query,
 635 |                     resultCount: currentSession.results.length,
 636 |                     lastUpdated: currentSession.lastUpdated,
 637 |                     results: currentSession.results.map(r => ({
 638 |                         title: r.title,
 639 |                         url: r.url,
 640 |                         timestamp: r.timestamp,
 641 |                         screenshotPath: r.screenshotPath
 642 |                     }))
 643 |                 }, null, 2)
 644 |             }]
 645 |         };
 646 |     }
 647 | 
 648 |     // Handle screenshot requests
 649 |     if (uri.startsWith("research://screenshots/")) {
 650 |         const index = parseInt(uri.split("/").pop() || "", 10);
 651 | 
 652 |         // Verify session exists
 653 |         if (!currentSession) {
 654 |             throw new McpError(
 655 |                 ErrorCode.InvalidRequest,
 656 |                 "No active research session"
 657 |             );
 658 |         }
 659 | 
 660 |         // Verify index is within bounds
 661 |         if (isNaN(index) || index < 0 || index >= currentSession.results.length) {
 662 |             throw new McpError(
 663 |                 ErrorCode.InvalidRequest,
 664 |                 `Screenshot index out of bounds: ${index}`
 665 |             );
 666 |         }
 667 | 
 668 |         // Get result containing screenshot
 669 |         const result = currentSession.results[index];
 670 |         if (!result?.screenshotPath) {
 671 |             throw new McpError(
 672 |                 ErrorCode.InvalidRequest,
 673 |                 `No screenshot available at index: ${index}`
 674 |             );
 675 |         }
 676 | 
 677 |         try {
 678 |             // Read the binary data and convert to base64
 679 |             const screenshotData = await fs.promises.readFile(result.screenshotPath);
 680 | 
 681 |             // Convert Buffer to base64 string before returning
 682 |             const base64Data = screenshotData.toString('base64');
 683 | 
 684 |             // Return compiled list of resources
 685 |             return {
 686 |                 contents: [{
 687 |                     uri,
 688 |                     mimeType: "image/png",
 689 |                     blob: base64Data
 690 |                 }]
 691 |             };
 692 |         } catch (error: unknown) {
 693 |             // Handle error if screenshot cannot be read
 694 |             const errorMessage = error instanceof Error ? error.message : 'Unknown error occurred';
 695 |             throw new McpError(
 696 |                 ErrorCode.InternalError,
 697 |                 `Failed to read screenshot: ${errorMessage}`
 698 |             );
 699 |         }
 700 |     }
 701 | 
 702 |     // Handle unknown resource types
 703 |     throw new McpError(
 704 |         ErrorCode.InvalidRequest,
 705 |         `Unknown resource: ${uri}`
 706 |     );
 707 | });
 708 | 
 709 | // Initialize MCP server connection using stdio transport
 710 | const transport = new StdioServerTransport();
 711 | server.connect(transport).catch((error) => {
 712 |     console.error("Failed to start server:", error);
 713 |     process.exit(1);
 714 | });
 715 | 
 716 | // Convert HTML content to clean, readable markdown format
 717 | async function extractContentAsMarkdown(
 718 |     page: Page,        // Puppeteer page to extract from
 719 |     selector?: string  // Optional CSS selector to target specific content
 720 | ): Promise<string> {
 721 |     // Step 1: Execute content extraction in browser context
 722 |     const html = await page.evaluate((sel) => {
 723 |         // Handle case where specific selector is provided
 724 |         if (sel) {
 725 |             const element = document.querySelector(sel);
 726 |             // Return element content or empty string if not found
 727 |             return element ? element.outerHTML : '';
 728 |         }
 729 | 
 730 |         // Step 2: Try standard content containers first
 731 |         const contentSelectors = [
 732 |             'main',           // HTML5 semantic main content
 733 |             'article',        // HTML5 semantic article content
 734 |             '[role="main"]',  // ARIA main content role
 735 |             '#content',       // Common content ID
 736 |             '.content',       // Common content class
 737 |             '.main',          // Alternative main class
 738 |             '.post',          // Blog post content
 739 |             '.article',       // Article content container
 740 |         ];
 741 | 
 742 |         // Try each selector in priority order
 743 |         for (const contentSelector of contentSelectors) {
 744 |             const element = document.querySelector(contentSelector);
 745 |             if (element) {
 746 |                 return element.outerHTML;  // Return first matching content
 747 |             }
 748 |         }
 749 | 
 750 |         // Step 3: Fallback to cleaning full body content
 751 |         const body = document.body;
 752 | 
 753 |         // Define elements to remove for cleaner content
 754 |         const elementsToRemove = [
 755 |             // Navigation elements
 756 |             'header',                    // Page header
 757 |             'footer',                    // Page footer
 758 |             'nav',                       // Navigation sections
 759 |             '[role="navigation"]',       // ARIA navigation elements
 760 | 
 761 |             // Sidebars and complementary content
 762 |             'aside',                     // Sidebar content
 763 |             '.sidebar',                  // Sidebar by class
 764 |             '[role="complementary"]',    // ARIA complementary content
 765 | 
 766 |             // Navigation-related elements
 767 |             '.nav',                      // Navigation classes
 768 |             '.menu',                     // Menu elements
 769 | 
 770 |             // Page structure elements
 771 |             '.header',                   // Header classes
 772 |             '.footer',                   // Footer classes
 773 | 
 774 |             // Advertising and notices
 775 |             '.advertisement',            // Advertisement containers
 776 |             '.ads',                      // Ad containers
 777 |             '.cookie-notice',            // Cookie consent notices
 778 |         ];
 779 | 
 780 |         // Remove each unwanted element from content
 781 |         elementsToRemove.forEach(sel => {
 782 |             body.querySelectorAll(sel).forEach(el => el.remove());
 783 |         });
 784 | 
 785 |         // Return cleaned body content
 786 |         return body.outerHTML;
 787 |     }, selector);
 788 | 
 789 |     // Step 4: Handle empty content case
 790 |     if (!html) {
 791 |         return '';
 792 |     }
 793 | 
 794 |     try {
 795 |         // Step 5: Convert HTML to Markdown
 796 |         const markdown = turndownService.turndown(html);
 797 | 
 798 |         // Step 6: Clean up and format markdown
 799 |         return markdown
 800 |             .replace(/\n{3,}/g, '\n\n')  // Replace excessive newlines with double
 801 |             .replace(/^- $/gm, '')       // Remove empty list items
 802 |             .replace(/^\s+$/gm, '')      // Remove whitespace-only lines
 803 |             .trim();                     // Remove leading/trailing whitespace
 804 | 
 805 |     } catch (error) {
 806 |         // Log conversion errors and return original HTML as fallback
 807 |         console.error('Error converting HTML to Markdown:', error);
 808 |         return html;
 809 |     }
 810 | }
 811 | 
 812 | // Validate URL format and ensure security constraints
 813 | function isValidUrl(urlString: string): boolean {
 814 |     try {
 815 |         // Attempt to parse URL string
 816 |         const url = new URL(urlString);
 817 | 
 818 |         // Only allow HTTP and HTTPS protocols for security
 819 |         return url.protocol === 'http:' || url.protocol === 'https:';
 820 |     } catch {
 821 |         // Return false for any invalid URL format
 822 |         return false;
 823 |     }
 824 | }
 825 | 
 826 | // Define result type for tool operations
 827 | type ToolResult = {
 828 |     content: (TextContent | ImageContent)[];  // Array of text or image content
 829 |     isError?: boolean;                        // Optional error flag
 830 | };
 831 | 
 832 | // Tool request handler for executing research operations
 833 | server.setRequestHandler(CallToolRequestSchema, async (request): Promise<ToolResult> => {
 834 |     // Initialize browser for tool operations
 835 |     const page = await ensureBrowser();
 836 | 
 837 |     switch (request.params.name) {
 838 |         // Handle Google search operations
 839 |         case "search_google": {
 840 |             // Extract search query from request parameters
 841 |             const { query } = request.params.arguments as { query: string };
 842 | 
 843 |             try {
 844 |                 // Execute search with retry mechanism
 845 |                 const results = await withRetry(async () => {
 846 |                     // Step 1: Navigate to Google search page
 847 |                     await safePageNavigation(page, 'https://www.google.com');
 848 |                     await dismissGoogleConsent(page);
 849 | 
 850 |                     // Step 2: Find and interact with search input
 851 |                     await withRetry(async () => {
 852 |                         // Wait for any search input element to appear
 853 |                         await Promise.race([
 854 |                             // Try multiple possible selectors for search input
 855 |                             page.waitForSelector('input[name="q"]', { timeout: 5000 }),
 856 |                             page.waitForSelector('textarea[name="q"]', { timeout: 5000 }),
 857 |                             page.waitForSelector('input[type="text"]', { timeout: 5000 })
 858 |                         ]).catch(() => {
 859 |                             throw new Error('Search input not found - no matching selectors');
 860 |                         });
 861 | 
 862 |                         // Find the actual search input element
 863 |                         const searchInput = await page.$('input[name="q"]') ||
 864 |                             await page.$('textarea[name="q"]') ||
 865 |                             await page.$('input[type="text"]');
 866 | 
 867 |                         // Verify search input was found
 868 |                         if (!searchInput) {
 869 |                             throw new Error('Search input element not found after waiting');
 870 |                         }
 871 | 
 872 |                         // Step 3: Enter search query
 873 |                         await searchInput.click({ clickCount: 3 });  // Select all existing text
 874 |                         await searchInput.press('Backspace');        // Clear selected text
 875 |                         await searchInput.type(query);               // Type new query
 876 |                     }, 3, 2000);  // Allow 3 retries with 2s delay
 877 | 
 878 |                     // Step 4: Submit search and wait for results
 879 |                     await withRetry(async () => {
 880 |                         await Promise.all([
 881 |                             page.keyboard.press('Enter'),
 882 |                             page.waitForLoadState('networkidle', { timeout: 15000 }),
 883 |                         ]);
 884 |                     });
 885 | 
 886 |                     // Step 5: Extract search results
 887 |                     const searchResults = await withRetry(async () => {
 888 |                         const results = await page.evaluate(() => {
 889 |                             // Find all search result containers
 890 |                             const elements = document.querySelectorAll('div.g');
 891 |                             if (!elements || elements.length === 0) {
 892 |                                 throw new Error('No search results found');
 893 |                             }
 894 | 
 895 |                             // Extract data from each result
 896 |                             return Array.from(elements).map((el) => {
 897 |                                 // Find required elements within result container
 898 |                                 const titleEl = el.querySelector('h3');            // Title element
 899 |                                 const linkEl = el.querySelector('a');              // Link element
 900 |                                 const snippetEl = el.querySelector('div.VwiC3b');  // Snippet element
 901 | 
 902 |                                 // Skip results missing required elements
 903 |                                 if (!titleEl || !linkEl || !snippetEl) {
 904 |                                     return null;
 905 |                                 }
 906 | 
 907 |                                 // Return structured result data
 908 |                                 return {
 909 |                                     title: titleEl.textContent || '',        // Result title
 910 |                                     url: linkEl.getAttribute('href') || '',  // Result URL
 911 |                                     snippet: snippetEl.textContent || '',    // Result description
 912 |                                 };
 913 |                             }).filter(result => result !== null);  // Remove invalid results
 914 |                         });
 915 | 
 916 |                         // Verify we found valid results
 917 |                         if (!results || results.length === 0) {
 918 |                             throw new Error('No valid search results found');
 919 |                         }
 920 | 
 921 |                         // Return compiled list of results
 922 |                         return results;
 923 |                     });
 924 | 
 925 |                     // Step 6: Store results in session
 926 |                     searchResults.forEach((result) => {
 927 |                         addResult({
 928 |                             url: result.url,
 929 |                             title: result.title,
 930 |                             content: result.snippet,
 931 |                             timestamp: new Date().toISOString(),
 932 |                         });
 933 |                     });
 934 | 
 935 |                     // Return compiled list of results
 936 |                     return searchResults;
 937 |                 });
 938 | 
 939 |                 // Step 7: Return formatted results
 940 |                 return {
 941 |                     content: [{
 942 |                         type: "text",
 943 |                         text: JSON.stringify(results, null, 2)  // Pretty-print JSON results
 944 |                     }]
 945 |                 };
 946 |             } catch (error) {
 947 |                 // Handle and format search errors
 948 |                 return {
 949 |                     content: [{
 950 |                         type: "text",
 951 |                         text: `Failed to perform search: ${(error as Error).message}`
 952 |                     }],
 953 |                     isError: true
 954 |                 };
 955 |             }
 956 |         }
 957 | 
 958 |         // Handle webpage visit and content extraction
 959 |         case "visit_page": {
 960 |             // Extract URL and screenshot flag from request
 961 |             const { url, takeScreenshot } = request.params.arguments as {
 962 |                 url: string;                    // Target URL to visit
 963 |                 takeScreenshot?: boolean;       // Optional screenshot flag
 964 |             };
 965 | 
 966 |             // Step 1: Validate URL format and security
 967 |             if (!isValidUrl(url)) {
 968 |                 return {
 969 |                     content: [{
 970 |                         type: "text" as const,
 971 |                         text: `Invalid URL: ${url}. Only http and https protocols are supported.`
 972 |                     }],
 973 |                     isError: true
 974 |                 };
 975 |             }
 976 | 
 977 |             try {
 978 |                 // Step 2: Visit page and extract content with retry mechanism
 979 |                 const result = await withRetry(async () => {
 980 |                     // Navigate to target URL safely
 981 |                     await safePageNavigation(page, url);
 982 |                     const title = await page.title();
 983 | 
 984 |                     // Step 3: Extract and process page content
 985 |                     const content = await withRetry(async () => {
 986 |                         // Convert page content to markdown
 987 |                         const extractedContent = await extractContentAsMarkdown(page);
 988 | 
 989 |                         // If no content is extracted, throw an error
 990 |                         if (!extractedContent) {
 991 |                             throw new Error('Failed to extract content');
 992 |                         }
 993 | 
 994 |                         // Return the extracted content
 995 |                         return extractedContent;
 996 |                     });
 997 | 
 998 |                     // Step 4: Create result object with page data
 999 |                     const pageResult: ResearchResult = {
1000 |                         url,      // Original URL
1001 |                         title,    // Page title
1002 |                         content,  // Markdown content
1003 |                         timestamp: new Date().toISOString(),  // Capture time
1004 |                     };
1005 | 
1006 |                     // Step 5: Take screenshot if requested
1007 |                     let screenshotUri: string | undefined;
1008 |                     if (takeScreenshot) {
1009 |                         // Capture and process screenshot
1010 |                         const screenshot = await takeScreenshotWithSizeLimit(page);
1011 |                         pageResult.screenshotPath = await saveScreenshot(screenshot, title);
1012 | 
1013 |                         // Get the index for the resource URI
1014 |                         const resultIndex = currentSession ? currentSession.results.length : 0;
1015 |                         screenshotUri = `research://screenshots/${resultIndex}`;
1016 | 
1017 |                         // Notify clients about new screenshot resource
1018 |                         server.notification({
1019 |                             method: "notifications/resources/list_changed"
1020 |                         });
1021 |                     }
1022 | 
1023 |                     // Step 6: Store result in session
1024 |                     addResult(pageResult);
1025 |                     return { pageResult, screenshotUri };
1026 |                 });
1027 | 
1028 |                 // Step 7: Return formatted result with screenshot URI if taken
1029 |                 const response: ToolResult = {
1030 |                     content: [{
1031 |                         type: "text" as const,
1032 |                         text: JSON.stringify({
1033 |                             url: result.pageResult.url,
1034 |                             title: result.pageResult.title,
1035 |                             content: result.pageResult.content,
1036 |                             timestamp: result.pageResult.timestamp,
1037 |                             screenshot: result.screenshotUri ? `View screenshot via *MCP Resources* (Paperclip icon) @ URI: ${result.screenshotUri}` : undefined
1038 |                         }, null, 2)
1039 |                     }]
1040 |                 };
1041 | 
1042 |                 return response;
1043 |             } catch (error) {
1044 |                 // Handle and format page visit errors
1045 |                 return {
1046 |                     content: [{
1047 |                         type: "text" as const,
1048 |                         text: `Failed to visit page: ${(error as Error).message}`
1049 |                     }],
1050 |                     isError: true
1051 |                 };
1052 |             }
1053 |         }
1054 | 
1055 |         // Handle standalone screenshot requests
1056 |         case "take_screenshot": {
1057 |             try {
1058 |                 // Step 1: Capture screenshot with retry mechanism
1059 |                 const screenshot = await withRetry(async () => {
1060 |                     // Take and optimize screenshot with default size limits
1061 |                     return await takeScreenshotWithSizeLimit(page);
1062 |                 });
1063 | 
1064 |                 // Step 2: Initialize session if needed
1065 |                 if (!currentSession) {
1066 |                     currentSession = {
1067 |                         query: "Screenshot Session",            // Session identifier
1068 |                         results: [],                            // Empty results array
1069 |                         lastUpdated: new Date().toISOString(),  // Current timestamp
1070 |                     };
1071 |                 }
1072 | 
1073 |                 // Step 3: Get current page information
1074 |                 const pageUrl = await page.url();      // Current page URL
1075 |                 const pageTitle = await page.title();  // Current page title
1076 | 
1077 |                 // Step 4: Save screenshot to disk
1078 |                 const screenshotPath = await saveScreenshot(screenshot, pageTitle || 'untitled');
1079 | 
1080 |                 // Step 5: Create and store screenshot result
1081 |                 const resultIndex = currentSession ? currentSession.results.length : 0;
1082 |                 addResult({
1083 |                     url: pageUrl,
1084 |                     title: pageTitle || "Untitled Page",  // Fallback title if none available
1085 |                     content: "Screenshot taken",          // Simple content description
1086 |                     timestamp: new Date().toISOString(),  // Capture time
1087 |                     screenshotPath                        // Path to screenshot file
1088 |                 });
1089 | 
1090 |                 // Step 6: Notify clients about new screenshot resource
1091 |                 server.notification({
1092 |                     method: "notifications/resources/list_changed"
1093 |                 });
1094 | 
1095 |                 // Step 7: Return success message with resource URI
1096 |                 const resourceUri = `research://screenshots/${resultIndex}`;
1097 |                 return {
1098 |                     content: [{
1099 |                         type: "text" as const,
1100 |                         text: `Screenshot taken successfully. You can view it via *MCP Resources* (Paperclip icon) @ URI: ${resourceUri}`
1101 |                     }]
1102 |                 };
1103 |             } catch (error) {
1104 |                 // Handle and format screenshot errors
1105 |                 return {
1106 |                     content: [{
1107 |                         type: "text" as const,
1108 |                         text: `Failed to take screenshot: ${(error as Error).message}`
1109 |                     }],
1110 |                     isError: true
1111 |                 };
1112 |             }
1113 |         }
1114 | 
1115 |         // Handle unknown tool requests
1116 |         default:
1117 |             throw new McpError(
1118 |                 ErrorCode.MethodNotFound,
1119 |                 `Unknown tool: ${request.params.name}`
1120 |             );
1121 |     }
1122 | });
1123 | 
1124 | // Register handler for prompt listing requests
1125 | server.setRequestHandler(ListPromptsRequestSchema, async () => {
1126 |     // Return all available prompts
1127 |     return { prompts: Object.values(PROMPTS) };
1128 | });
1129 | 
1130 | // Register handler for prompt retrieval and execution
1131 | server.setRequestHandler(GetPromptRequestSchema, async (request) => {
1132 |     // Extract and validate prompt name
1133 |     const promptName = request.params.name as PromptName;
1134 |     const prompt = PROMPTS[promptName];
1135 | 
1136 |     // Handle unknown prompt requests
1137 |     if (!prompt) {
1138 |         throw new McpError(ErrorCode.InvalidRequest, `Prompt not found: ${promptName}`);
1139 |     }
1140 | 
1141 |     // Handle agentic research prompt
1142 |     if (promptName === "agentic-research") {
1143 |         // Extract research topic from request arguments
1144 |         const args = request.params.arguments as AgenticResearchArgs | undefined;
1145 |         const topic = args?.topic || "";  // Use empty string if no topic provided
1146 | 
1147 |         // Return research assistant prompt with instructions
1148 |         return {
1149 |             messages: [
1150 |                 // Initial assistant message establishing role
1151 |                 {
1152 |                     role: "assistant",
1153 |                     content: {
1154 |                         type: "text",
1155 |                         text: "I am ready to help you with your research. I will conduct thorough web research, explore topics deeply, and maintain a dialogue with you throughout the process."
1156 |                     }
1157 |                 },
1158 |                 // Detailed research instructions for the user
1159 |                 {
1160 |                     role: "user",
1161 |                     content: {
1162 |                         type: "text",
1163 |                         text: `I'd like to research this topic: <topic>${topic}</topic>
1164 | 
1165 | Please help me explore it deeply, like you're a thoughtful, highly-trained research assistant.
1166 | 
1167 | General instructions:
1168 | 1. Start by proposing your research approach -- namely, formulate what initial query you will use to search the web. Propose a relatively broad search to understand the topic landscape. At the same time, make your queries optimized for returning high-quality results based on what you know about constructing Google search queries.
1169 | 2. Next, get my input on whether you should proceed with that query or if you should refine it.
1170 | 3. Once you have an approved query, perform the search.
1171 | 4. Prioritize high quality, authoritative sources when they are available and relevant to the topic. Avoid low quality or spammy sources.
1172 | 5. Retrieve information that is relevant to the topic at hand.
1173 | 6. Iteratively refine your research direction based on what you find.
1174 | 7. Keep me informed of what you find and let *me* guide the direction of the research interactively.
1175 | 8. If you run into a dead end while researching, do a Google search for the topic and attempt to find a URL for a relevant page. Then, explore that page in depth.
1176 | 9. Only conclude when my research goals are met.
1177 | 10. **Always cite your sources**, providing URLs to the sources you used in a citation block at the end of your response.
1178 | 
1179 | You can use these tools:
1180 | - search_google: Search for information
1181 | - visit_page: Visit and extract content from web pages
1182 | 
1183 | Do *NOT* use the following tools:
1184 | - Anything related to knowledge graphs or memory, unless explicitly instructed to do so by the user.`
1185 |                     }
1186 |                 }
1187 |             ]
1188 |         };
1189 |     }
1190 | 
1191 |     // Handle unsupported prompt types
1192 |     throw new McpError(ErrorCode.InvalidRequest, "Prompt implementation not found");
1193 | });
1194 | 
1195 | // Ensures browser is running, and creates a new page if needed
1196 | async function ensureBrowser(): Promise<Page> {
1197 |     // Launch browser if not already running
1198 |     if (!browser) {
1199 |         browser = await chromium.launch({
1200 |             headless: true,  // Run in headless mode for automation
1201 |         });
1202 | 
1203 |         // Create initial context and page
1204 |         const context = await browser.newContext();
1205 |         page = await context.newPage();
1206 |     }
1207 | 
1208 |     // Create new page if current one is closed/invalid
1209 |     if (!page) {
1210 |         const context = await browser.newContext();
1211 |         page = await context.newPage();
1212 |     }
1213 | 
1214 |     // Return the current page
1215 |     return page;
1216 | }
1217 | 
1218 | // Cleanup function
1219 | async function cleanup(): Promise<void> {
1220 |     try {
1221 |         // Clean up screenshots first
1222 |         await cleanupScreenshots();
1223 | 
1224 |         // Then close the browser
1225 |         if (browser) {
1226 |             await browser.close();
1227 |         }
1228 |     } catch (error) {
1229 |         console.error('Error during cleanup:', error);
1230 |     } finally {
1231 |         browser = undefined;
1232 |         page = undefined;
1233 |     }
1234 | }
1235 | 
1236 | // Register cleanup handlers
1237 | process.on('exit', cleanup);
1238 | process.on('SIGTERM', cleanup);
1239 | process.on('SIGINT', cleanup);
1240 | process.on('SIGHUP', cleanup);
```