#
tokens: 17410/50000 6/7 files (page 1/2)
lines: on (toggle) GitHub
raw markdown copy reset
This is page 1 of 2. Use http://codebase.md/phialsbasement/mcp-webresearch-stealthified?lines=true&page={x} to view the full context.

# Directory Structure

```
├── .cursorrules
├── .gitignore
├── docs
│   └── mcp_spec
│       └── llms-full.txt
├── index.ts
├── LICENSE
├── package.json
├── pnpm-lock.yaml
├── README.md
└── tsconfig.json
```

# Files

--------------------------------------------------------------------------------
/.cursorrules:
--------------------------------------------------------------------------------

```
1 | 1. Use pnpm instead of npm when generating packaging-related commands.
2 | 2. Only make changes to comments, code, or dependencies that are needed to accomplish the objective defined by the user. When editing code, don't remove comments or change dependencies or make changes that are unrelated to the code changes at hand. 
```

--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------

```
  1 | # Logs
  2 | logs
  3 | *.log
  4 | npm-debug.log*
  5 | yarn-debug.log*
  6 | yarn-error.log*
  7 | lerna-debug.log*
  8 | .pnpm-debug.log*
  9 | 
 10 | # Diagnostic reports (https://nodejs.org/api/report.html)
 11 | report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json
 12 | 
 13 | # Runtime data
 14 | pids
 15 | *.pid
 16 | *.seed
 17 | *.pid.lock
 18 | 
 19 | # Directory for instrumented libs generated by jscoverage/JSCover
 20 | lib-cov
 21 | 
 22 | # Coverage directory used by tools like istanbul
 23 | coverage
 24 | *.lcov
 25 | 
 26 | # nyc test coverage
 27 | .nyc_output
 28 | 
 29 | # Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files)
 30 | .grunt
 31 | 
 32 | # Bower dependency directory (https://bower.io/)
 33 | bower_components
 34 | 
 35 | # node-waf configuration
 36 | .lock-wscript
 37 | 
 38 | # Compiled binary addons (https://nodejs.org/api/addons.html)
 39 | build/Release
 40 | 
 41 | # Dependency directories
 42 | node_modules/
 43 | jspm_packages/
 44 | 
 45 | # Snowpack dependency directory (https://snowpack.dev/)
 46 | web_modules/
 47 | 
 48 | # TypeScript cache
 49 | *.tsbuildinfo
 50 | 
 51 | # Optional npm cache directory
 52 | .npm
 53 | 
 54 | # Optional eslint cache
 55 | .eslintcache
 56 | 
 57 | # Optional stylelint cache
 58 | .stylelintcache
 59 | 
 60 | # Microbundle cache
 61 | .rpt2_cache/
 62 | .rts2_cache_cjs/
 63 | .rts2_cache_es/
 64 | .rts2_cache_umd/
 65 | 
 66 | # Optional REPL history
 67 | .node_repl_history
 68 | 
 69 | # Output of 'npm pack'
 70 | *.tgz
 71 | 
 72 | # Yarn Integrity file
 73 | .yarn-integrity
 74 | 
 75 | # dotenv environment variable files
 76 | .env
 77 | .env.development.local
 78 | .env.test.local
 79 | .env.production.local
 80 | .env.local
 81 | 
 82 | # parcel-bundler cache (https://parceljs.org/)
 83 | .cache
 84 | .parcel-cache
 85 | 
 86 | # Next.js build output
 87 | .next
 88 | out
 89 | 
 90 | # Nuxt.js build / generate output
 91 | .nuxt
 92 | dist
 93 | 
 94 | # Gatsby files
 95 | .cache/
 96 | # Comment in the public line in if your project uses Gatsby and not Next.js
 97 | # https://nextjs.org/blog/next-9-1#public-directory-support
 98 | # public
 99 | 
100 | # vuepress build output
101 | .vuepress/dist
102 | 
103 | # vuepress v2.x temp and cache directory
104 | .temp
105 | .cache
106 | 
107 | # Docusaurus cache and generated files
108 | .docusaurus
109 | 
110 | # Serverless directories
111 | .serverless/
112 | 
113 | # FuseBox cache
114 | .fusebox/
115 | 
116 | # DynamoDB Local files
117 | .dynamodb/
118 | 
119 | # TernJS port file
120 | .tern-port
121 | 
122 | # Stores VSCode versions used for testing VSCode extensions
123 | .vscode-test
124 | 
125 | # yarn v2
126 | .yarn/cache
127 | .yarn/unplugged
128 | .yarn/build-state.yml
129 | .yarn/install-state.gz
130 | .pnp.*
131 | 
```

--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------

```markdown
  1 | # MCP Web Research Server
  2 | 
  3 | A Model Context Protocol (MCP) server for web research. 
  4 | 
  5 | Bring real-time info into Claude and easily research any topic.
  6 | 
  7 | ## Features
  8 | 
  9 | - Google search integration --- THIS FORK FIXES THIS --- NOW NO LONGER GETTING CAPTCHA BLOCKED
 10 | - Webpage content extraction
 11 | - Research session tracking (list of visited pages, search queries, etc.)
 12 | - Screenshot capture
 13 | 
 14 | ## Prerequisites
 15 | 
 16 | - [Node.js](https://nodejs.org/) >= 18 (includes `npm` and `npx`)
 17 | - [Claude Desktop app](https://claude.ai/download)
 18 | 
 19 | ## Installation
 20 | 
 21 | First, ensure you've downloaded and installed the [Claude Desktop app](https://claude.ai/download) and you have npm installed.
 22 | 
 23 | Next, add this entry to your `claude_desktop_config.json` (on Mac, found at `~/Library/Application\ Support/Claude/claude_desktop_config.json`):
 24 | 
 25 | ```json
 26 | {
 27 |   "mcpServers": {
 28 |     "webresearch": {
 29 |       "command": "npx",
 30 |       "args": ["-y", "@mzxrai/mcp-webresearch@latest"]
 31 |     }
 32 |   }
 33 | }
 34 | ```
 35 | 
 36 | This config allows Claude Desktop to automatically start the web research MCP server when needed.
 37 | 
 38 | ## Usage
 39 | 
 40 | Simply start a chat with Claude and send a prompt that would benefit from web research. If you'd like a prebuilt prompt customized for deeper web research, you can use the `agentic-research` prompt that we provide through this package. Access that prompt in Claude Desktop by clicking the Paperclip icon in the chat input and then selecting `Choose an integration` → `webresearch` → `agentic-research`.
 41 | 
 42 | <img src="https://i.ibb.co/N6Y3C0q/Screenshot-2024-12-05-at-11-01-27-PM.png" alt="Example screenshot of web research" width="400"/>
 43 | 
 44 | ### Tools
 45 | 
 46 | 1. `search_google`
 47 |    - Performs Google searches and extracts results
 48 |    - Arguments: `{ query: string }`
 49 | 
 50 | 2. `visit_page`
 51 |    - Visits a webpage and extracts its content
 52 |    - Arguments: `{ url: string, takeScreenshot?: boolean }`
 53 | 
 54 | 3. `take_screenshot`
 55 |    - Takes a screenshot of the current page
 56 |    - No arguments required
 57 | 
 58 | ### Prompts
 59 | 
 60 | #### `agentic-research`
 61 | A guided research prompt that helps Claude conduct thorough web research. The prompt instructs Claude to:
 62 | - Start with broad searches to understand the topic landscape
 63 | - Prioritize high-quality, authoritative sources
 64 | - Iteratively refine the research direction based on findings
 65 | - Keep you informed and let you guide the research interactively
 66 | - Always cite sources with URLs
 67 | 
 68 | ### Resources
 69 | 
 70 | We expose two things as MCP resources: (1) captured webpage screenshots, and (2) the research session.
 71 | 
 72 | #### Screenshots
 73 | 
 74 | When you take a screenshot, it's saved as an MCP resource. You can access captured screenshots in Claude Desktop via the Paperclip icon.
 75 | 
 76 | #### Research Session
 77 | 
 78 | The server maintains a research session that includes:
 79 | - Search queries
 80 | - Visited pages
 81 | - Extracted content
 82 | - Screenshots
 83 | - Timestamps
 84 | 
 85 | ### Suggestions
 86 | 
 87 | For the best results, if you choose not to use the `agentic-research` prompt when doing your research, it may be helpful to suggest high-quality sources for Claude to use when researching general topics. For example, you could prompt `news today from reuters or AP` instead of `news today`.
 88 | 
 89 | ## Problems
 90 | 
 91 | This is very much pre-alpha code. And it is also AIGC, so expect bugs.
 92 | 
 93 | If you run into issues, it may be helpful to check Claude Desktop's MCP logs:
 94 | 
 95 | ```bash
 96 | tail -n 20 -f ~/Library/Logs/Claude/mcp*.log
 97 | ```
 98 | 
 99 | ## Development
100 | 
101 | ```bash
102 | # Install dependencies
103 | pnpm install
104 | 
105 | # Build the project
106 | pnpm build
107 | 
108 | # Watch for changes
109 | pnpm watch
110 | 
111 | # Run in development mode
112 | pnpm dev
113 | ```
114 | 
115 | ## Requirements
116 | 
117 | - Node.js >= 18
118 | - Playwright (automatically installed as a dependency)
119 | 
120 | ## Verified Platforms
121 | 
122 | - [x] macOS
123 | - [x] Linux
124 | - [x] Windows
125 | 
126 | ## License
127 | 
128 | MIT
129 | 
130 | ## Author
131 | 
132 | [mzxrai](https://github.com/mzxrai) 
133 | 
```

--------------------------------------------------------------------------------
/tsconfig.json:
--------------------------------------------------------------------------------

```json
 1 | {
 2 |   "compilerOptions": {
 3 |     "target": "ES2023",
 4 |     "module": "NodeNext",
 5 |     "moduleResolution": "NodeNext",
 6 |     "esModuleInterop": true,
 7 |     "strict": true,
 8 |     "outDir": "dist",
 9 |     "sourceMap": true,
10 |     "declaration": true,
11 |     "skipLibCheck": true,
12 |     "lib": [
13 |       "ES2023",
14 |       "DOM",
15 |       "DOM.Iterable"
16 |     ]
17 |   },
18 |   "include": [
19 |     "*.ts"
20 |   ],
21 |   "exclude": [
22 |     "node_modules",
23 |     "dist"
24 |   ]
25 | }
```

--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------

```json
 1 | {
 2 |   "name": "@mzxrai/mcp-webresearch",
 3 |   "version": "0.1.7",
 4 |   "description": "MCP server for web research",
 5 |   "license": "MIT",
 6 |   "author": "mzxrai",
 7 |   "homepage": "https://github.com/mzxrai/mcp-webresearch",
 8 |   "bugs": "https://github.com/mzxrai/mcp-webresearch/issues",
 9 |   "type": "module",
10 |   "bin": {
11 |     "mcp-server-webresearch": "dist/index.js"
12 |   },
13 |   "files": [
14 |     "dist"
15 |   ],
16 |   "scripts": {
17 |     "build": "tsc && shx chmod +x dist/*.js",
18 |     "prepare": "pnpm run build",
19 |     "postinstall": "playwright install chromium",
20 |     "watch": "tsc --watch",
21 |     "dev": "tsx watch index.ts"
22 |   },
23 |   "publishConfig": {
24 |     "access": "public"
25 |   },
26 |   "keywords": [
27 |     "mcp",
28 |     "model-context-protocol",
29 |     "web-research",
30 |     "ai",
31 |     "web-scraping"
32 |   ],
33 |   "dependencies": {
34 |     "@modelcontextprotocol/sdk": "1.0.1",
35 |     "playwright": "^1.49.0",
36 |     "turndown": "^7.1.2"
37 |   },
38 |   "devDependencies": {
39 |     "shx": "^0.3.4",
40 |     "tsx": "^4.19.2",
41 |     "typescript": "^5.6.2",
42 |     "@types/turndown": "^5.0.4"
43 |   }
44 | }
```

--------------------------------------------------------------------------------
/index.ts:
--------------------------------------------------------------------------------

```typescript
   1 | #!/usr/bin/env node
   2 | 
   3 | // Core dependencies for MCP server and protocol handling
   4 | import { Server } from "@modelcontextprotocol/sdk/server/index.js";
   5 | import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
   6 | import {
   7 |     CallToolRequestSchema,
   8 |     ListResourcesRequestSchema,
   9 |     ListToolsRequestSchema,
  10 |     ReadResourceRequestSchema,
  11 |     ListPromptsRequestSchema,
  12 |     GetPromptRequestSchema,
  13 |     Tool,
  14 |     Resource,
  15 |     McpError,
  16 |     ErrorCode,
  17 |     TextContent,
  18 |     ImageContent,
  19 | } from "@modelcontextprotocol/sdk/types.js";
  20 | 
  21 | // Web scraping and content processing dependencies
  22 | import { chromium, Browser, Page } from 'playwright';
  23 | import TurndownService from "turndown";
  24 | import type { Node } from "turndown";
  25 | import * as fs from 'fs';
  26 | import * as path from 'path';
  27 | import * as os from 'os';
  28 | // Add type declaration for window extensions
  29 | declare global {
  30 |     interface Window {
  31 |         chrome: {
  32 |             runtime: Record<string, unknown>;
  33 |             loadTimes: () => void;
  34 |             csi: () => void;
  35 |             app: Record<string, unknown>;
  36 |         };
  37 |         Notification: {
  38 |             permission: NotificationPermission;
  39 |             requestPermission?: (callback?: NotificationPermissionCallback) => Promise<NotificationPermission>;
  40 |         };
  41 |     }
  42 | }
  43 | // Initialize temp directory for screenshots
  44 | const SCREENSHOTS_DIR = fs.mkdtempSync(path.join(os.tmpdir(), 'mcp-screenshots-'));
  45 | 
  46 | // Initialize Turndown service for converting HTML to Markdown
  47 | // Configure with specific formatting preferences
  48 | const turndownService: TurndownService = new TurndownService({
  49 |     headingStyle: 'atx',       // Use # style headings
  50 |     hr: '---',                 // Horizontal rule style
  51 |     bulletListMarker: '-',     // List item marker
  52 |     codeBlockStyle: 'fenced',  // Use ``` for code blocks
  53 |     emDelimiter: '_',          // Italics style
  54 |     strongDelimiter: '**',     // Bold style
  55 |     linkStyle: 'inlined',      // Use inline links
  56 | });
  57 | 
  58 | // Custom Turndown rules for better content extraction
  59 | // Remove script and style tags completely
  60 | turndownService.addRule('removeScripts', {
  61 |     filter: ['script', 'style', 'noscript'],
  62 |     replacement: () => ''
  63 | });
  64 | 
  65 | // Preserve link elements with their href attributes
  66 | turndownService.addRule('preserveLinks', {
  67 |     filter: 'a',
  68 |     replacement: (content: string, node: Node) => {
  69 |         const element = node as HTMLAnchorElement;
  70 |         const href = element.getAttribute('href');
  71 |         return href ? `[${content}](${href})` : content;
  72 |     }
  73 | });
  74 | 
  75 | // Preserve image elements with their src and alt attributes
  76 | turndownService.addRule('preserveImages', {
  77 |     filter: 'img',
  78 |     replacement: (content: string, node: Node) => {
  79 |         const element = node as HTMLImageElement;
  80 |         const alt = element.getAttribute('alt') || '';
  81 |         const src = element.getAttribute('src');
  82 |         return src ? `![${alt}](${src})` : '';
  83 |     }
  84 | });
  85 | 
  86 | // Core interfaces for research data management
  87 | interface ResearchResult {
  88 |     url: string;             // URL of the researched page
  89 |     title: string;           // Page title
  90 |     content: string;         // Extracted content in markdown
  91 |     timestamp: string;       // When the result was captured
  92 |     screenshotPath?: string; // Path to screenshot file on disk
  93 | }
  94 | 
  95 | // Define structure for research session data
  96 | interface ResearchSession {
  97 |     query: string;              // Search query that initiated the session
  98 |     results: ResearchResult[];  // Collection of research results
  99 |     lastUpdated: string;        // Timestamp of last update
 100 | }
 101 | 
 102 | // Screenshot management functions
 103 | async function saveScreenshot(screenshot: string, title: string): Promise<string> {
 104 |     // Convert screenshot from base64 to buffer
 105 |     const buffer = Buffer.from(screenshot, 'base64');
 106 | 
 107 |     // Check size before saving
 108 |     const MAX_SIZE = 5 * 1024 * 1024;  // 5MB
 109 |     if (buffer.length > MAX_SIZE) {
 110 |         throw new McpError(
 111 |             ErrorCode.InvalidRequest,
 112 |             `Screenshot too large: ${Math.round(buffer.length / (1024 * 1024))}MB exceeds ${MAX_SIZE / (1024 * 1024)}MB limit`
 113 |         );
 114 |     }
 115 | 
 116 |     // Generate a safe filename
 117 |     const timestamp = new Date().getTime();
 118 |     const safeTitle = title.replace(/[^a-z0-9]/gi, '_').toLowerCase();
 119 |     const filename = `${safeTitle}-${timestamp}.png`;
 120 |     const filepath = path.join(SCREENSHOTS_DIR, filename);
 121 | 
 122 |     // Save the validated screenshot
 123 |     await fs.promises.writeFile(filepath, buffer);
 124 | 
 125 |     // Return the filepath to the saved screenshot
 126 |     return filepath;
 127 | }
 128 | 
 129 | // Fix the cleanupScreenshots function
 130 | async function cleanupScreenshots(): Promise<void> {
 131 |     try {
 132 |         const files = await fs.promises.readdir(SCREENSHOTS_DIR);
 133 |         await Promise.all(files.map((file: string) =>
 134 |         fs.promises.unlink(path.join(SCREENSHOTS_DIR, file))
 135 |         ));
 136 |         await fs.promises.rmdir(SCREENSHOTS_DIR);
 137 |     } catch (error) {
 138 |         console.error('Error cleaning up screenshots:', error);
 139 |     }
 140 | }
 141 | 
 142 | // Available tools for web research functionality
 143 | const TOOLS: Tool[] = [
 144 |     {
 145 |         name: "search_google",
 146 |         description: "Performs a web search using Google, ideal for finding current information, news, websites, and general knowledge. Use this tool when you need to research topics, find recent information, or gather data from the web. Returns structured search results with titles, URLs, and snippets.",
 147 |         inputSchema: {
 148 |             type: "object",
 149 |             properties: {
 150 |                 query: { type: "string", description: "Search query" },
 151 |             },
 152 |             required: ["query"],
 153 |         },
 154 |     },
 155 | {
 156 |     name: "visit_page",
 157 |     description: "Navigates to a specific URL and extracts the page content in readable format, with option to capture a screenshot. Use this tool to deeply analyze specific web pages, read articles, examine documentation, or verify information directly from the source. Especially useful for in-depth research after identifying relevant pages via search.",
 158 |     inputSchema: {
 159 |         type: "object",
 160 |         properties: {
 161 |             url: { type: "string", description: "URL to visit" },
 162 |             takeScreenshot: { type: "boolean", description: "Whether to take a screenshot" },
 163 |         },
 164 |         required: ["url"],
 165 |     },
 166 | },
 167 | {
 168 |     name: "take_screenshot",
 169 |     description: "Captures a visual image of the currently loaded webpage. Use this tool when you need to preserve visual information, analyze page layouts, or document the current state of a webpage. Perfect for situations where textual content alone doesn't convey the full context.",
 170 |     inputSchema: {
 171 |         type: "object",
 172 |         properties: {},  // No parameters needed
 173 |     },
 174 | },
 175 | {
 176 |     name: "search_scholar",
 177 |     description: "Searches Google Scholar for academic papers and scholarly articles. Use this tool when researching scientific topics, looking for peer-reviewed research, academic citations, or scholarly literature. Returns structured data including titles, authors, publication details, and citation counts. Ideal for academic research and evidence-based inquiries.",
 178 |     inputSchema: {
 179 |         type: "object",
 180 |         properties: {
 181 |             query: { type: "string", description: "Academic search query" },
 182 |         },
 183 |         required: ["query"],
 184 |     },
 185 | },
 186 | ];
 187 | 
 188 | // Define available prompt types for type safety
 189 | type PromptName = "agentic-research";
 190 | 
 191 | // Define structure for research prompt arguments
 192 | interface AgenticResearchArgs {
 193 |     topic: string;  // Research topic provided by user
 194 | }
 195 | 
 196 | // Configure available prompts with their specifications
 197 | const PROMPTS = {
 198 |     // Agentic research prompt configuration
 199 |     "agentic-research": {
 200 |         name: "agentic-research" as const,  // Type-safe name
 201 |         description: "Conduct iterative web research on a topic, exploring it thoroughly through multiple steps while maintaining a dialogue with the user",
 202 |         arguments: [
 203 |             {
 204 |                 name: "topic",                                     // Topic argument specification
 205 |                 description: "The topic or question to research",  // Description of the argument
 206 |                 required: true                                     // Topic is mandatory
 207 |             }
 208 |         ]
 209 |     }
 210 | } as const;  // Make object immutable
 211 | 
 212 | // Global state management for browser and research session
 213 | let browser: Browser | undefined;                 // Puppeteer browser instance
 214 | let page: Page | undefined;                       // Current active page
 215 | let currentSession: ResearchSession | undefined;  // Current research session data
 216 | 
 217 | // Configuration constants for session management
 218 | const MAX_RESULTS_PER_SESSION = 100;  // Maximum number of results to store per session
 219 | const MAX_RETRIES = 3;                // Maximum retry attempts for operations
 220 | const RETRY_DELAY = 1000;             // Delay between retries in milliseconds
 221 | 
 222 | // Generic retry mechanism for handling transient failures
 223 | async function withRetry<T>(
 224 |     operation: () => Promise<T>,  // Operation to retry
 225 |                             retries = MAX_RETRIES,        // Number of retry attempts
 226 |                             delay = RETRY_DELAY           // Delay between retries
 227 | ): Promise<T> {
 228 |     let lastError: Error;
 229 | 
 230 |     // Attempt operation up to max retries
 231 |     for (let i = 0; i < retries; i++) {
 232 |         try {
 233 |             return await operation();
 234 |         } catch (error) {
 235 |             lastError = error as Error;
 236 |             if (i < retries - 1) {
 237 |                 console.error(`Attempt ${i + 1} failed, retrying in ${delay}ms:`, error);
 238 |                 await new Promise(resolve => setTimeout(resolve, delay));
 239 |             }
 240 |         }
 241 |     }
 242 | 
 243 |     throw lastError!;  // Throw last error if all retries failed
 244 | }
 245 | 
 246 | // Add a new research result to the current session with data management
 247 | function addResult(result: ResearchResult): void {
 248 |     // If no current session exists, initialize a new one
 249 |     if (!currentSession) {
 250 |         currentSession = {
 251 |             query: "Research Session",
 252 |             results: [],
 253 |             lastUpdated: new Date().toISOString(),
 254 |         };
 255 |     }
 256 | 
 257 |     // If the session has reached the maximum number of results, remove the oldest result
 258 |     if (currentSession.results.length >= MAX_RESULTS_PER_SESSION) {
 259 |         currentSession.results.shift();
 260 |     }
 261 | 
 262 |     // Add the new result to the session and update the last updated timestamp
 263 |     currentSession.results.push(result);
 264 |     currentSession.lastUpdated = new Date().toISOString();
 265 | }
 266 | 
 267 | // Safe page navigation with error handling and bot detection
 268 | async function safePageNavigation(page: Page, url: string): Promise<void> {
 269 |     try {
 270 |         // Step 1: Set cookies to bypass consent banner
 271 |         await page.context().addCookies([{
 272 |             name: 'CONSENT',
 273 |             value: 'YES+',
 274 |             domain: '.google.com',
 275 |             path: '/'
 276 |         }]);
 277 | 
 278 |         // Step 2: Initial navigation
 279 |         const response = await page.goto(url, {
 280 |             waitUntil: 'domcontentloaded',
 281 |             timeout: 15000
 282 |         });
 283 | 
 284 |         // Step 3: Basic response validation
 285 |         if (!response) {
 286 |             throw new Error('Navigation failed: no response received');
 287 |         }
 288 | 
 289 |         // Check HTTP status code; if 400 or higher, throw an error
 290 |         const status = response.status();
 291 |         if (status >= 400) {
 292 |             throw new Error(`HTTP ${status}: ${response.statusText()}`);
 293 |         }
 294 | 
 295 |         // Step 4: Wait for network to become idle or timeout
 296 |         await Promise.race([
 297 |             page.waitForLoadState('networkidle', { timeout: 5000 })
 298 |             .catch(() => {/* ignore timeout */ }),
 299 |                            // Fallback timeout in case networkidle never occurs
 300 |                            new Promise(resolve => setTimeout(resolve, 5000))
 301 |         ]);
 302 | 
 303 |         // Step 5: Security and content validation
 304 |         const validation = await page.evaluate(() => {
 305 |             const botProtectionExists = [
 306 |                 '#challenge-running',     // Cloudflare
 307 |                 '#cf-challenge-running',  // Cloudflare
 308 |                 '#px-captcha',           // PerimeterX
 309 |                 '#ddos-protection',       // Various
 310 |                 '#waf-challenge-html'     // Various WAFs
 311 |             ].some(selector => document.querySelector(selector));
 312 | 
 313 |             // Check for suspicious page titles
 314 |             const suspiciousTitle = [
 315 |                 'security check',
 316 |                 'ddos protection',
 317 |                 'please wait',
 318 |                 'just a moment',
 319 |                 'attention required'
 320 |             ].some(phrase => document.title.toLowerCase().includes(phrase));
 321 | 
 322 |             // Count words in the page content
 323 |             const bodyText = document.body.innerText || '';
 324 |             const words = bodyText.trim().split(/\s+/).length;
 325 | 
 326 |             // Return validation results
 327 |             return {
 328 |                 wordCount: words,
 329 |                 botProtection: botProtectionExists,
 330 |                 suspiciousTitle,
 331 |                 title: document.title
 332 |             };
 333 |         });
 334 | 
 335 |         // If bot protection is detected, throw an error
 336 |         if (validation.botProtection) {
 337 |             throw new Error('Bot protection detected');
 338 |         }
 339 | 
 340 |         // If the page title is suspicious, throw an error
 341 |         if (validation.suspiciousTitle) {
 342 |             throw new Error(`Suspicious page title detected: "${validation.title}"`);
 343 |         }
 344 | 
 345 |         // If the page contains insufficient content, throw an error
 346 |         if (validation.wordCount < 1) {
 347 |             throw new Error('Page contains insufficient content');
 348 |         }
 349 | 
 350 |     } catch (error) {
 351 |         // If an error occurs during navigation, throw an error with the URL and the error message
 352 |         throw new Error(`Navigation to ${url} failed: ${(error as Error).message}`);
 353 |     }
 354 | }
 355 | 
 356 | // Take and optimize a screenshot
 357 | async function takeScreenshotWithSizeLimit(page: Page): Promise<string> {
 358 |     const MAX_SIZE = 5 * 1024 * 1024;
 359 |     const MAX_DIMENSION = 1920;
 360 |     const MIN_DIMENSION = 800;
 361 | 
 362 |     // Set viewport size
 363 |     await page.setViewportSize({
 364 |         width: 1920,
 365 |         height: 1080
 366 |     });
 367 | 
 368 |     // Take initial screenshot
 369 |     let screenshot = await page.screenshot({
 370 |         type: 'png',
 371 |         fullPage: false
 372 |     });
 373 | 
 374 |     // Handle buffer conversion
 375 |     let buffer = screenshot;
 376 |     let attempts = 0;
 377 |     const MAX_ATTEMPTS = 3;
 378 | 
 379 |     // While screenshot is too large, reduce size
 380 |     while (buffer.length > MAX_SIZE && attempts < MAX_ATTEMPTS) {
 381 |         // Get current viewport size
 382 |         const viewport = page.viewportSize();
 383 |         if (!viewport) continue;
 384 | 
 385 |         // Calculate new dimensions
 386 |         const scaleFactor = Math.pow(0.75, attempts + 1);
 387 |         let newWidth = Math.round(viewport.width * scaleFactor);
 388 |         let newHeight = Math.round(viewport.height * scaleFactor);
 389 | 
 390 |         // Ensure dimensions are within bounds
 391 |         newWidth = Math.max(MIN_DIMENSION, Math.min(MAX_DIMENSION, newWidth));
 392 |         newHeight = Math.max(MIN_DIMENSION, Math.min(MAX_DIMENSION, newHeight));
 393 | 
 394 |         // Update viewport with new dimensions
 395 |         await page.setViewportSize({
 396 |             width: newWidth,
 397 |             height: newHeight
 398 |         });
 399 | 
 400 |         // Take new screenshot
 401 |         screenshot = await page.screenshot({
 402 |             type: 'png',
 403 |             fullPage: false
 404 |         });
 405 | 
 406 |         // Update buffer with new screenshot
 407 |         buffer = screenshot;
 408 | 
 409 |         // Increment retry attempts
 410 |         attempts++;
 411 |     }
 412 | 
 413 |     // Final attempt with minimum settings
 414 |     if (buffer.length > MAX_SIZE) {
 415 |         await page.setViewportSize({
 416 |             width: MIN_DIMENSION,
 417 |             height: MIN_DIMENSION
 418 |         });
 419 | 
 420 |         // Take final screenshot
 421 |         screenshot = await page.screenshot({
 422 |             type: 'png',
 423 |             fullPage: false
 424 |         });
 425 | 
 426 |         // Update buffer with final screenshot
 427 |         buffer = screenshot;
 428 | 
 429 |         // Throw error if final screenshot is still too large
 430 |         if (buffer.length > MAX_SIZE) {
 431 |             throw new McpError(
 432 |                 ErrorCode.InvalidRequest,
 433 |                 `Failed to reduce screenshot to under 5MB even with minimum settings`
 434 |             );
 435 |         }
 436 |     }
 437 | 
 438 |     // Convert Buffer to base64 string before returning
 439 |     return buffer.toString('base64');
 440 | }
 441 | 
 442 | // Initialize MCP server with basic configuration
 443 | const server: Server = new Server(
 444 |     {
 445 |         name: "webresearch",  // Server name identifier
 446 |         version: "0.1.6",     // Server version number
 447 |     },
 448 |     {
 449 |         capabilities: {
 450 |             tools: {},      // Available tool configurations
 451 |             resources: {},  // Resource handling capabilities
 452 |             prompts: {}     // Prompt processing capabilities
 453 |         },
 454 |     }
 455 | );
 456 | 
 457 | // Register handler for tool listing requests
 458 | server.setRequestHandler(ListToolsRequestSchema, async () => ({
 459 |     tools: TOOLS  // Return list of available research tools
 460 | }));
 461 | 
 462 | // Register handler for resource listing requests
 463 | server.setRequestHandler(ListResourcesRequestSchema, async () => {
 464 |     // Return empty list if no active session
 465 |     if (!currentSession) {
 466 |         return { resources: [] };
 467 |     }
 468 | 
 469 |     // Compile list of available resources
 470 |     const resources: Resource[] = [
 471 |         // Add session summary resource
 472 |         {
 473 |             uri: "research://current/summary",  // Resource identifier
 474 |             name: "Current Research Session Summary",
 475 |             description: "Summary of the current research session including queries and results",
 476 |             mimeType: "application/json"
 477 |         },
 478 |         // Add screenshot resources if available
 479 |         ...currentSession.results
 480 |         .map((r, i): Resource | undefined => r.screenshotPath ? {
 481 |             uri: `research://screenshots/${i}`,
 482 |             name: `Screenshot of ${r.title}`,
 483 |             description: `Screenshot taken from ${r.url}`,
 484 |             mimeType: "image/png"
 485 |         } : undefined)
 486 |         .filter((r): r is Resource => r !== undefined)
 487 |     ];
 488 | 
 489 |     // Return compiled list of resources
 490 |     return { resources };
 491 | });
 492 | 
 493 | // Register handler for resource content requests
 494 | server.setRequestHandler(ReadResourceRequestSchema, async (request) => {
 495 |     const uri = request.params.uri.toString();
 496 | 
 497 |     // Handle session summary requests for research data
 498 |     if (uri === "research://current/summary") {
 499 |         if (!currentSession) {
 500 |             throw new McpError(
 501 |                 ErrorCode.InvalidRequest,
 502 |                 "No active research session"
 503 |             );
 504 |         }
 505 | 
 506 |         // Return compiled list of resources
 507 |         return {
 508 |             contents: [{
 509 |                 uri,
 510 |                 mimeType: "application/json",
 511 |                 text: JSON.stringify({
 512 |                     query: currentSession.query,
 513 |                     resultCount: currentSession.results.length,
 514 |                     lastUpdated: currentSession.lastUpdated,
 515 |                     results: currentSession.results.map(r => ({
 516 |                         title: r.title,
 517 |                         url: r.url,
 518 |                         timestamp: r.timestamp,
 519 |                         screenshotPath: r.screenshotPath
 520 |                     }))
 521 |                 }, null, 2)
 522 |             }]
 523 |         };
 524 |     }
 525 | 
 526 |     // Handle screenshot requests
 527 |     if (uri.startsWith("research://screenshots/")) {
 528 |         const index = parseInt(uri.split("/").pop() || "", 10);
 529 | 
 530 |         // Verify session exists
 531 |         if (!currentSession) {
 532 |             throw new McpError(
 533 |                 ErrorCode.InvalidRequest,
 534 |                 "No active research session"
 535 |             );
 536 |         }
 537 | 
 538 |         // Verify index is within bounds
 539 |         if (isNaN(index) || index < 0 || index >= currentSession.results.length) {
 540 |             throw new McpError(
 541 |                 ErrorCode.InvalidRequest,
 542 |                 `Screenshot index out of bounds: ${index}`
 543 |             );
 544 |         }
 545 | 
 546 |         // Get result containing screenshot
 547 |         const result = currentSession.results[index];
 548 |         if (!result?.screenshotPath) {
 549 |             throw new McpError(
 550 |                 ErrorCode.InvalidRequest,
 551 |                 `No screenshot available at index: ${index}`
 552 |             );
 553 |         }
 554 | 
 555 |         try {
 556 |             // Read the binary data and convert to base64
 557 |             const screenshotData = await fs.promises.readFile(result.screenshotPath);
 558 | 
 559 |             // Convert Buffer to base64 string before returning
 560 |             const base64Data = screenshotData.toString('base64');
 561 | 
 562 |             // Return compiled list of resources
 563 |             return {
 564 |                 contents: [{
 565 |                     uri,
 566 |                     mimeType: "image/png",
 567 |                     blob: base64Data
 568 |                 }]
 569 |             };
 570 |         } catch (error: unknown) {
 571 |             // Handle error if screenshot cannot be read
 572 |             const errorMessage = error instanceof Error ? error.message : 'Unknown error occurred';
 573 | throw new McpError(
 574 |     ErrorCode.InternalError,
 575 |     `Failed to read screenshot: ${errorMessage}`
 576 | );
 577 |         }
 578 |     }
 579 | 
 580 |     // Handle unknown resource types
 581 |     throw new McpError(
 582 |         ErrorCode.InvalidRequest,
 583 |         `Unknown resource: ${uri}`
 584 |     );
 585 | });
 586 | 
 587 | // Initialize MCP server connection using stdio transport
 588 | const transport = new StdioServerTransport();
 589 | server.connect(transport).catch((error) => {
 590 |     console.error("Failed to start server:", error);
 591 |     process.exit(1);
 592 | });
 593 | 
 594 | // Convert HTML content to clean, readable markdown format
 595 | async function extractContentAsMarkdown(
 596 |     page: Page,        // Puppeteer page to extract from
 597 |     selector?: string  // Optional CSS selector to target specific content
 598 | ): Promise<string> {
 599 |     // Step 1: Execute content extraction in browser context
 600 |     const html = await page.evaluate((sel) => {
 601 |         // Handle case where specific selector is provided
 602 |         if (sel) {
 603 |             const element = document.querySelector(sel);
 604 |             // Return element content or empty string if not found
 605 |             return element ? element.outerHTML : '';
 606 |         }
 607 | 
 608 |         // Step 2: Try standard content containers first
 609 |         const contentSelectors = [
 610 |             'main',           // HTML5 semantic main content
 611 |             'article',        // HTML5 semantic article content
 612 |             '[role="main"]',  // ARIA main content role
 613 |             '#content',       // Common content ID
 614 |             '.content',       // Common content class
 615 |             '.main',          // Alternative main class
 616 |             '.post',          // Blog post content
 617 |             '.article',       // Article content container
 618 |         ];
 619 | 
 620 |         // Try each selector in priority order
 621 |         for (const contentSelector of contentSelectors) {
 622 |             const element = document.querySelector(contentSelector);
 623 |             if (element) {
 624 |                 return element.outerHTML;  // Return first matching content
 625 |             }
 626 |         }
 627 | 
 628 |         // Step 3: Fallback to cleaning full body content
 629 |         const body = document.body;
 630 | 
 631 |         // Define elements to remove for cleaner content
 632 |         const elementsToRemove = [
 633 |             // Navigation elements
 634 |             'header',                    // Page header
 635 |             'footer',                    // Page footer
 636 |             'nav',                       // Navigation sections
 637 |             '[role="navigation"]',       // ARIA navigation elements
 638 | 
 639 |             // Sidebars and complementary content
 640 |             'aside',                     // Sidebar content
 641 |             '.sidebar',                  // Sidebar by class
 642 |             '[role="complementary"]',    // ARIA complementary content
 643 | 
 644 |             // Navigation-related elements
 645 |             '.nav',                      // Navigation classes
 646 |             '.menu',                     // Menu elements
 647 | 
 648 |             // Page structure elements
 649 |             '.header',                   // Header classes
 650 |             '.footer',                   // Footer classes
 651 | 
 652 |             // Advertising and notices
 653 |             '.advertisement',            // Advertisement containers
 654 |             '.ads',                      // Ad containers
 655 |             '.cookie-notice',            // Cookie consent notices
 656 |         ];
 657 | 
 658 |         // Remove each unwanted element from content
 659 |         elementsToRemove.forEach(sel => {
 660 |             body.querySelectorAll(sel).forEach(el => el.remove());
 661 |         });
 662 | 
 663 |         // Return cleaned body content
 664 |         return body.outerHTML;
 665 |     }, selector);
 666 | 
 667 |     // Step 4: Handle empty content case
 668 |     if (!html) {
 669 |         return '';
 670 |     }
 671 | 
 672 |     try {
 673 |         // Step 5: Convert HTML to Markdown
 674 |         const markdown = turndownService.turndown(html);
 675 | 
 676 |         // Step 6: Clean up and format markdown
 677 |         return markdown
 678 |         .replace(/\n{3,}/g, '\n\n')  // Replace excessive newlines with double
 679 |         .replace(/^- $/gm, '')       // Remove empty list items
 680 |         .replace(/^\s+$/gm, '')      // Remove whitespace-only lines
 681 |         .trim();                     // Remove leading/trailing whitespace
 682 | 
 683 |     } catch (error) {
 684 |         // Log conversion errors and return original HTML as fallback
 685 |         console.error('Error converting HTML to Markdown:', error);
 686 |         return html;
 687 |     }
 688 | }
 689 | 
 690 | // Validate URL format and ensure security constraints
 691 | function isValidUrl(urlString: string): boolean {
 692 |     try {
 693 |         // Attempt to parse URL string
 694 |         const url = new URL(urlString);
 695 | 
 696 |         // Only allow HTTP and HTTPS protocols for security
 697 |         return url.protocol === 'http:' || url.protocol === 'https:';
 698 |     } catch {
 699 |         // Return false for any invalid URL format
 700 |         return false;
 701 |     }
 702 | }
 703 | 
 704 | // Define result type for tool operations
 705 | type ToolResult = {
 706 |     content: (TextContent | ImageContent)[];  // Array of text or image content
 707 |     isError?: boolean;                        // Optional error flag
 708 | };
 709 | 
 710 | // Tool request handler for executing research operations
 711 | server.setRequestHandler(CallToolRequestSchema, async (request): Promise<ToolResult> => {
 712 |     // Initialize browser for tool operations
 713 |     const page = await ensureBrowser();
 714 | 
 715 |     switch (request.params.name) {
 716 |         // Handle Google search operations
 717 |         case "search_google": {
 718 |             // Extract search query from request parameters
 719 |             const { query } = request.params.arguments as { query: string };
 720 | 
 721 |             try {
 722 |                 // Execute search with retry mechanism
 723 |                 const results = await withRetry(async () => {
 724 |                     // Step 1: Navigate to Google search page
 725 |                     await safePageNavigation(page, 'https://www.google.com');
 726 | 
 727 |                     // Step 2: Find and interact with search input
 728 |                     await withRetry(async () => {
 729 |                         // Wait for any search input element to appear
 730 |                         await Promise.race([
 731 |                             // Try multiple possible selectors for search input
 732 |                             page.waitForSelector('input[name="q"]', { timeout: 5000 }),
 733 |                                            page.waitForSelector('textarea[name="q"]', { timeout: 5000 }),
 734 |                                            page.waitForSelector('input[type="text"]', { timeout: 5000 })
 735 |                         ]).catch(() => {
 736 |                             throw new Error('Search input not found - no matching selectors');
 737 |                         });
 738 | 
 739 |                         // Find the actual search input element
 740 |                         const searchInput = await page.$('input[name="q"]') ||
 741 |                         await page.$('textarea[name="q"]') ||
 742 |                         await page.$('input[type="text"]');
 743 | 
 744 |                         // Verify search input was found
 745 |                         if (!searchInput) {
 746 |                             throw new Error('Search input element not found after waiting');
 747 |                         }
 748 | 
 749 |                         // Step 3: Enter search query
 750 |                         await searchInput.click({ clickCount: 3 });  // Select all existing text
 751 |                         await searchInput.press('Backspace');        // Clear selected text
 752 |                         await searchInput.type(query);               // Type new query
 753 |                     }, 3, 2000);  // Allow 3 retries with 2s delay
 754 | 
 755 |                     // Step 4: Submit search and wait for results
 756 |                     await withRetry(async () => {
 757 |                         await Promise.all([
 758 |                             page.keyboard.press('Enter'),
 759 |                                           page.waitForLoadState('networkidle', { timeout: 15000 }),
 760 |                         ]);
 761 |                     });
 762 | 
 763 |                     // Step 5: Extract search results
 764 |                     const searchResults = await withRetry(async () => {
 765 |                         const results = await page.evaluate(() => {
 766 |                             // Find all search result containers
 767 |                             const elements = document.querySelectorAll('div.g');
 768 |                             if (!elements || elements.length === 0) {
 769 |                                 throw new Error('No search results found');
 770 |                             }
 771 | 
 772 |                             // Extract data from each result
 773 |                             return Array.from(elements).map((el) => {
 774 |                                 // Find required elements within result container
 775 |                                 const titleEl = el.querySelector('h3');            // Title element
 776 |                                 const linkEl = el.querySelector('a');              // Link element
 777 |                                 const snippetEl = el.querySelector('div.VwiC3b');  // Snippet element
 778 | 
 779 |                                 // Skip results missing required elements
 780 |                                 if (!titleEl || !linkEl || !snippetEl) {
 781 |                                     return null;
 782 |                                 }
 783 | 
 784 |                                 // Return structured result data
 785 |                                 return {
 786 |                                     title: titleEl.textContent || '',        // Result title
 787 |                                     url: linkEl.getAttribute('href') || '',  // Result URL
 788 |                                                             snippet: snippetEl.textContent || '',    // Result description
 789 |                                 };
 790 |                             }).filter(result => result !== null);  // Remove invalid results
 791 |                         });
 792 | 
 793 |                         // Verify we found valid results
 794 |                         if (!results || results.length === 0) {
 795 |                             throw new Error('No valid search results found');
 796 |                         }
 797 | 
 798 |                         // Return compiled list of results
 799 |                         return results;
 800 |                     });
 801 | 
 802 |                     // Step 6: Store results in session
 803 |                     searchResults.forEach((result) => {
 804 |                         addResult({
 805 |                             url: result.url,
 806 |                             title: result.title,
 807 |                             content: result.snippet,
 808 |                             timestamp: new Date().toISOString(),
 809 |                         });
 810 |                     });
 811 | 
 812 |                     // Return compiled list of results
 813 |                     return searchResults;
 814 |                 });
 815 | 
 816 |                 // Step 7: Return formatted results
 817 |                 return {
 818 |                     content: [{
 819 |                         type: "text",
 820 |                         text: JSON.stringify(results, null, 2)  // Pretty-print JSON results
 821 |                     }]
 822 |                 };
 823 |             } catch (error) {
 824 |                 // Handle and format search errors
 825 |                 return {
 826 |                     content: [{
 827 |                         type: "text",
 828 |                         text: `Failed to perform search: ${(error as Error).message}`
 829 |                     }],
 830 |                     isError: true
 831 |                 };
 832 |             }
 833 |         }
 834 |         // Handle Google Scholar search operations
 835 |         case "search_scholar": {
 836 |             // Extract search query from request parameters
 837 |             const { query } = request.params.arguments as { query: string };
 838 | 
 839 |             try {
 840 |                 // Execute search with retry mechanism
 841 |                 const results = await withRetry(async () => {
 842 |                     // Step 1: Navigate to Google Scholar search page
 843 |                     await safePageNavigation(page, 'https://scholar.google.com');
 844 | 
 845 |                     // Step 2: Find and interact with search input
 846 |                     await withRetry(async () => {
 847 |                         // Wait for search input element to appear
 848 |                         await page.waitForSelector('input[name="q"]', { timeout: 5000 })
 849 |                         .catch(() => {
 850 |                             throw new Error('Scholar search input not found');
 851 |                         });
 852 | 
 853 |                         // Find the search input element
 854 |                         const searchInput = await page.$('input[name="q"]');
 855 | 
 856 |                         // Verify search input was found
 857 |                         if (!searchInput) {
 858 |                             throw new Error('Scholar search input element not found after waiting');
 859 |                         }
 860 | 
 861 |                         // Step 3: Enter search query
 862 |                         await searchInput.click({ clickCount: 3 });  // Select all existing text
 863 |                         await searchInput.press('Backspace');        // Clear selected text
 864 |                         await searchInput.type(query);               // Type new query
 865 |                     }, 3, 2000);  // Allow 3 retries with 2s delay
 866 | 
 867 |                     // Step 4: Submit search and wait for results
 868 |                     await withRetry(async () => {
 869 |                         await Promise.all([
 870 |                             page.keyboard.press('Enter'),
 871 |                                           page.waitForLoadState('networkidle', { timeout: 15000 }),
 872 |                         ]);
 873 |                     });
 874 | 
 875 |                     // Step 5: Extract scholar search results
 876 |                     const scholarResults = await withRetry(async () => {
 877 |                         const results = await page.evaluate(() => {
 878 |                             // Find all scholar result containers
 879 |                             const elements = document.querySelectorAll('.gs_r.gs_or.gs_scl');
 880 |                             if (!elements || elements.length === 0) {
 881 |                                 throw new Error('No scholar search results found');
 882 |                             }
 883 | 
 884 |                             // Extract data from each result
 885 |                             return Array.from(elements).map((el) => {
 886 |                                 try {
 887 |                                     // Find required elements within result container
 888 |                                     const titleEl = el.querySelector('.gs_rt');                // Title element
 889 |                                     const authorEl = el.querySelector('.gs_a');                // Authors, venue, year
 890 |                                     const snippetEl = el.querySelector('.gs_rs');              // Snippet/abstract
 891 |                                     const citedByEl = el.querySelector('.gs_fl a:nth-child(3)'); // Cited by element
 892 | 
 893 |                                     // Extract title and URL
 894 |                                     let title = '';
 895 |                                     let url = '';
 896 |                             if (titleEl) {
 897 |                                 const titleLink = titleEl.querySelector('a');
 898 |                                 title = titleEl.textContent?.trim() || '';
 899 |                                 url = titleLink?.getAttribute('href') || '';
 900 |                             }
 901 | 
 902 |                             // Extract author, venue, and year information
 903 |                             const authorInfo = authorEl?.textContent?.trim() || '';
 904 | 
 905 |                             // Extract snippet
 906 |                             const snippet = snippetEl?.textContent?.trim() || '';
 907 | 
 908 |                             // Extract citation count
 909 |                             let citationCount = '';
 910 |                             if (citedByEl && citedByEl.textContent?.includes('Cited by')) {
 911 |                                 citationCount = citedByEl.textContent.trim();
 912 |                             }
 913 | 
 914 |                             // Skip results missing critical data
 915 |                             if (!title) {
 916 |                                 return null;
 917 |                             }
 918 | 
 919 |                             // Return structured result data
 920 |                             return {
 921 |                                 title,                 // Paper title
 922 |                                 url,                   // Paper URL if available
 923 |                                 authorInfo,            // Authors, venue, year
 924 |                                 snippet,               // Abstract/snippet
 925 |                                 citationCount,         // Citation information
 926 |                             };
 927 |                                 } catch (err) {
 928 |                                     // Skip problematic results
 929 |                                     return null;
 930 |                                 }
 931 |                             }).filter(result => result !== null);  // Remove invalid results
 932 |                         });
 933 | 
 934 |                         // Verify we found valid results
 935 |                         if (!results || results.length === 0) {
 936 |                             throw new Error('No valid scholar search results found');
 937 |                         }
 938 | 
 939 |                         // Return compiled list of results
 940 |                         return results;
 941 |                     });
 942 | 
 943 |                     // Step 6: Store results in session
 944 |                     scholarResults.forEach((result) => {
 945 |                         addResult({
 946 |                             url: result.url || 'https://scholar.google.com',
 947 |                             title: result.title,
 948 |                             content: `${result.authorInfo}\n\n${result.snippet}\n\n${result.citationCount}`,
 949 |                             timestamp: new Date().toISOString(),
 950 |                         });
 951 |                     });
 952 | 
 953 |                     // Return compiled list of results
 954 |                     return scholarResults;
 955 |                 });
 956 | 
 957 |                 // Step 7: Return formatted results
 958 |                 return {
 959 |                     content: [{
 960 |                         type: "text",
 961 |                         text: JSON.stringify(results, null, 2)  // Pretty-print JSON results
 962 |                     }]
 963 |                 };
 964 |             } catch (error) {
 965 |                 // Handle and format search errors
 966 |                 return {
 967 |                     content: [{
 968 |                         type: "text",
 969 |                         text: `Failed to perform scholar search: ${(error as Error).message}`
 970 |                     }],
 971 |                     isError: true
 972 |                 };
 973 |             }
 974 |         }
 975 | 
 976 |         // Handle webpage visit and content extraction
 977 |         case "visit_page": {
 978 |             // Extract URL and screenshot flag from request
 979 |             const { url, takeScreenshot } = request.params.arguments as {
 980 |                 url: string;                    // Target URL to visit
 981 |                 takeScreenshot?: boolean;       // Optional screenshot flag
 982 |             };
 983 | 
 984 |             // Step 1: Validate URL format and security
 985 |             if (!isValidUrl(url)) {
 986 |                 return {
 987 |                     content: [{
 988 |                         type: "text" as const,
 989 |                         text: `Invalid URL: ${url}. Only http and https protocols are supported.`
 990 |                     }],
 991 |                     isError: true
 992 |                 };
 993 |             }
 994 | 
 995 |             try {
 996 |                 // Step 2: Visit page and extract content with retry mechanism
 997 |                 const result = await withRetry(async () => {
 998 |                     // Navigate to target URL safely
 999 |                     await safePageNavigation(page, url);
1000 |                     const title = await page.title();
1001 | 
1002 |                     // Step 3: Extract and process page content
1003 |                     const content = await withRetry(async () => {
1004 |                         // Convert page content to markdown
1005 |                         const extractedContent = await extractContentAsMarkdown(page);
1006 | 
1007 |                         // If no content is extracted, throw an error
1008 |                         if (!extractedContent) {
1009 |                             throw new Error('Failed to extract content');
1010 |                         }
1011 | 
1012 |                         // Return the extracted content
1013 |                         return extractedContent;
1014 |                     });
1015 | 
1016 |                     // Step 4: Create result object with page data
1017 |                     const pageResult: ResearchResult = {
1018 |                         url,      // Original URL
1019 |                         title,    // Page title
1020 |                         content,  // Markdown content
1021 |                         timestamp: new Date().toISOString(),  // Capture time
1022 |                     };
1023 | 
1024 |                     // Step 5: Take screenshot if requested
1025 |                     let screenshotUri: string | undefined;
1026 |                     if (takeScreenshot) {
1027 |                         // Capture and process screenshot
1028 |                         const screenshot = await takeScreenshotWithSizeLimit(page);
1029 |                         pageResult.screenshotPath = await saveScreenshot(screenshot, title);
1030 | 
1031 |                         // Get the index for the resource URI
1032 |                         const resultIndex = currentSession ? currentSession.results.length : 0;
1033 |                         screenshotUri = `research://screenshots/${resultIndex}`;
1034 | 
1035 |                         // Notify clients about new screenshot resource
1036 |                         server.notification({
1037 |                             method: "notifications/resources/list_changed"
1038 |                         });
1039 |                     }
1040 | 
1041 |                     // Step 6: Store result in session
1042 |                     addResult(pageResult);
1043 |                     return { pageResult, screenshotUri };
1044 |                 });
1045 | 
1046 |                 // Step 7: Return formatted result with screenshot URI if taken
1047 |                 const response: ToolResult = {
1048 |                     content: [{
1049 |                         type: "text" as const,
1050 |                         text: JSON.stringify({
1051 |                             url: result.pageResult.url,
1052 |                             title: result.pageResult.title,
1053 |                             content: result.pageResult.content,
1054 |                             timestamp: result.pageResult.timestamp,
1055 |                             screenshot: result.screenshotUri ? `View screenshot via *MCP Resources* (Paperclip icon) @ URI: ${result.screenshotUri}` : undefined
1056 |                         }, null, 2)
1057 |                     }]
1058 |                 };
1059 | 
1060 |                 return response;
1061 |             } catch (error) {
1062 |                 // Handle and format page visit errors
1063 |                 return {
1064 |                     content: [{
1065 |                         type: "text" as const,
1066 |                         text: `Failed to visit page: ${(error as Error).message}`
1067 |                     }],
1068 |                     isError: true
1069 |                 };
1070 |             }
1071 |         }
1072 | 
1073 |         // Handle standalone screenshot requests
1074 |         case "take_screenshot": {
1075 |             try {
1076 |                 // Step 1: Capture screenshot with retry mechanism
1077 |                 const screenshot = await withRetry(async () => {
1078 |                     // Take and optimize screenshot with default size limits
1079 |                     return await takeScreenshotWithSizeLimit(page);
1080 |                 });
1081 | 
1082 |                 // Step 2: Initialize session if needed
1083 |                 if (!currentSession) {
1084 |                     currentSession = {
1085 |                         query: "Screenshot Session",            // Session identifier
1086 |                         results: [],                            // Empty results array
1087 |                         lastUpdated: new Date().toISOString(),  // Current timestamp
1088 |                     };
1089 |                 }
1090 | 
1091 |                 // Step 3: Get current page information
1092 |                 const pageUrl = await page.url();      // Current page URL
1093 |                 const pageTitle = await page.title();  // Current page title
1094 | 
1095 |                 // Step 4: Save screenshot to disk
1096 |                 const screenshotPath = await saveScreenshot(screenshot, pageTitle || 'untitled');
1097 | 
1098 |                 // Step 5: Create and store screenshot result
1099 |                 const resultIndex = currentSession ? currentSession.results.length : 0;
1100 |                 addResult({
1101 |                     url: pageUrl,
1102 |                     title: pageTitle || "Untitled Page",  // Fallback title if none available
1103 |                     content: "Screenshot taken",          // Simple content description
1104 |                     timestamp: new Date().toISOString(),  // Capture time
1105 |                           screenshotPath                        // Path to screenshot file
1106 |                 });
1107 | 
1108 |                 // Step 6: Notify clients about new screenshot resource
1109 |                 server.notification({
1110 |                     method: "notifications/resources/list_changed"
1111 |                 });
1112 | 
1113 |                 // Step 7: Return success message with resource URI
1114 |                 const resourceUri = `research://screenshots/${resultIndex}`;
1115 |                 return {
1116 |                     content: [{
1117 |                         type: "text" as const,
1118 |                         text: `Screenshot taken successfully. You can view it via *MCP Resources* (Paperclip icon) @ URI: ${resourceUri}`
1119 |                     }]
1120 |                 };
1121 |             } catch (error) {
1122 |                 // Handle and format screenshot errors
1123 |                 return {
1124 |                     content: [{
1125 |                         type: "text" as const,
1126 |                         text: `Failed to take screenshot: ${(error as Error).message}`
1127 |                     }],
1128 |                     isError: true
1129 |                 };
1130 |             }
1131 |         }
1132 | 
1133 |         // Handle unknown tool requests
1134 |         default:
1135 |             throw new McpError(
1136 |                 ErrorCode.MethodNotFound,
1137 |                 `Unknown tool: ${request.params.name}`
1138 |             );
1139 |     }
1140 | });
1141 | 
1142 | // Register handler for prompt listing requests
1143 | server.setRequestHandler(ListPromptsRequestSchema, async () => {
1144 |     // Return all available prompts
1145 |     return { prompts: Object.values(PROMPTS) };
1146 | });
1147 | 
1148 | // Register handler for prompt retrieval and execution
1149 | server.setRequestHandler(GetPromptRequestSchema, async (request) => {
1150 |     // Extract and validate prompt name
1151 |     const promptName = request.params.name as PromptName;
1152 |     const prompt = PROMPTS[promptName];
1153 | 
1154 |     // Handle unknown prompt requests
1155 |     if (!prompt) {
1156 |         throw new McpError(ErrorCode.InvalidRequest, `Prompt not found: ${promptName}`);
1157 |     }
1158 | 
1159 |     // Handle agentic research prompt
1160 |     if (promptName === "agentic-research") {
1161 |         // Extract research topic from request arguments
1162 |         const args = request.params.arguments as AgenticResearchArgs | undefined;
1163 |         const topic = args?.topic || "";  // Use empty string if no topic provided
1164 | 
1165 |         // Return research assistant prompt with instructions
1166 |         return {
1167 |             messages: [
1168 |                 // Initial assistant message establishing role
1169 |                 {
1170 |                     role: "assistant",
1171 |                     content: {
1172 |                         type: "text",
1173 |                          text: "I am ready to help you with your research. I will conduct thorough web research, explore topics deeply, and maintain a dialogue with you throughout the process."
1174 |                     }
1175 |                 },
1176 |                 // Detailed research instructions for the user
1177 |                 {
1178 |                     role: "user",
1179 |                     content: {
1180 |                         type: "text",
1181 |                          text: `I'd like to research this topic: <topic>${topic}</topic>
1182 | 
1183 |                          Please help me explore it deeply, like you're a thoughtful, highly-trained research assistant.
1184 | 
1185 |                          General instructions:
1186 |                          1. Start by proposing your research approach -- namely, formulate what initial query you will use to search the web. Propose a relatively broad search to understand the topic landscape. At the same time, make your queries optimized for returning high-quality results based on what you know about constructing Google search queries.
1187 |                          2. Next, get my input on whether you should proceed with that query or if you should refine it.
1188 |                          3. Once you have an approved query, perform the search.
1189 |                          4. Prioritize high quality, authoritative sources when they are available and relevant to the topic. Avoid low quality or spammy sources.
1190 |                          5. Retrieve information that is relevant to the topic at hand.
1191 |                          6. Iteratively refine your research direction based on what you find.
1192 |                          7. Keep me informed of what you find and let *me* guide the direction of the research interactively.
1193 |                          8. If you run into a dead end while researching, do a Google search for the topic and attempt to find a URL for a relevant page. Then, explore that page in depth.
1194 |                          9. Only conclude when my research goals are met.
1195 |                          10. **Always cite your sources**, providing URLs to the sources you used in a citation block at the end of your response.
1196 | 
1197 |                          You can use these tools:
1198 |                          - search_google: Search for information
1199 |                          - visit_page: Visit and extract content from web pages
1200 | 
1201 |                          Do *NOT* use the following tools:
1202 |                          - Anything related to knowledge graphs or memory, unless explicitly instructed to do so by the user.`
1203 |                     }
1204 |                 }
1205 |             ]
1206 |         };
1207 |     }
1208 | 
1209 |     // Handle unsupported prompt types
1210 |     throw new McpError(ErrorCode.InvalidRequest, "Prompt implementation not found");
1211 | });
1212 | 
1213 | // In the ensureBrowser function, modify the initialization script:
1214 | async function ensureBrowser(): Promise<Page> {
1215 |     if (!browser) {
1216 |         browser = await chromium.launch({
1217 |             headless: true,
1218 |             args: [
1219 |                 '--disable-blink-features=AutomationControlled',
1220 |                 '--disable-features=IsolateOrigins,site-per-process',
1221 |                 '--disable-site-isolation-trials',
1222 |                 '--disable-setuid-sandbox',
1223 |                 '--no-sandbox',
1224 |                 '--disable-dev-shm-usage',
1225 |                 '--disable-accelerated-2d-canvas',
1226 |                 '--no-first-run',
1227 |                 '--no-service-autorun',
1228 |                 '--password-store=basic',
1229 |                 '--system-developer-mode',
1230 |                 '--enable-javascript',
1231 |                 `--window-size=${1366 + Math.floor(Math.random() * 100)},${768 + Math.floor(Math.random() * 100)}`,
1232 |             ]
1233 |         });
1234 | 
1235 |         const context = await browser.newContext({
1236 |             userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
1237 |                                                  viewport: { width: 1366, height: 768 },  // Set fixed viewport instead of null
1238 |                                                  deviceScaleFactor: 1,
1239 |                                                  javaScriptEnabled: true,
1240 |         });
1241 | 
1242 |         await context.addInitScript(() => {
1243 |             Object.defineProperty(navigator, 'webdriver', {
1244 |                 get: () => undefined
1245 |             });
1246 | 
1247 |             Object.defineProperty(navigator, 'permissions', {
1248 |                 get: () => ({
1249 |                     query: async () => ({ state: 'prompt' as PermissionState })
1250 |                 })
1251 |             });
1252 | 
1253 |             window.chrome = {
1254 |                 runtime: {},
1255 |                 loadTimes: function(){},
1256 |                                     csi: function(){},
1257 |                                     app: {},
1258 |             };
1259 | 
1260 |             const originalToString = Error.prototype.toString;
1261 |             Error.prototype.toString = function(this: Error) {
1262 |                 return originalToString.call(this).replace(/\n.*puppeteer.*\n/g, '\n');
1263 |             };
1264 | 
1265 |             if (!window.Notification) {
1266 |                 const NotificationClass = function(title: string, options?: NotificationOptions) {
1267 |                     return {
1268 |                         title,
1269 |                         ...options
1270 |                     };
1271 |                 } as unknown as typeof Notification;
1272 | 
1273 |                 Object.defineProperty(NotificationClass, 'permission', {
1274 |                     value: 'default' as NotificationPermission,
1275 |                     writable: false,
1276 |                     configurable: false,
1277 |                     enumerable: true
1278 |                 });
1279 | 
1280 |                 NotificationClass.requestPermission = async () => 'default' as NotificationPermission;
1281 |                 NotificationClass.prototype = {} as Notification;
1282 | 
1283 |                 Object.defineProperty(window, 'Notification', {
1284 |                     value: NotificationClass,
1285 |                     writable: false,
1286 |                     configurable: false
1287 |                 });
1288 |             }
1289 | 
1290 |             Object.defineProperty(navigator, 'languages', {
1291 |                 get: () => ['en-US', 'en']
1292 |             });
1293 |         });
1294 | 
1295 |         page = await context.newPage();
1296 | 
1297 |         await page.route('**', async (route) => {
1298 |             const request = route.request();
1299 |             if (request.resourceType() === 'script') {
1300 |                 route.continue({
1301 |                     headers: {
1302 |                         ...request.headers(),
1303 |                                'sec-ch-ua': '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"',
1304 |                                'sec-ch-ua-mobile': '?0',
1305 |                                'sec-ch-ua-platform': '"Windows"'
1306 |                     }
1307 |                 });
1308 |             } else {
1309 |                 route.continue();
1310 |             }
1311 |         });
1312 |     }
1313 | 
1314 |     if (!page) {
1315 |         const context = await browser.newContext();
1316 |         page = await context.newPage();
1317 |     }
1318 | 
1319 |     return page;
1320 | }
1321 | 
1322 | 
1323 | // Cleanup function
1324 | async function cleanup(): Promise<void> {
1325 |     try {
1326 |         // Clean up screenshots first
1327 |         await cleanupScreenshots();
1328 | 
1329 |         // Then close the browser
1330 |         if (browser) {
1331 |             await browser.close();
1332 |         }
1333 |     } catch (error) {
1334 |         console.error('Error during cleanup:', error);
1335 |     } finally {
1336 |         browser = undefined;
1337 |         page = undefined;
1338 |     }
1339 | }
1340 | 
1341 | // Register cleanup handlers
1342 | process.on('exit', cleanup);
1343 | process.on('SIGTERM', cleanup);
1344 | process.on('SIGINT', cleanup);
1345 | process.on('SIGHUP', cleanup);
1346 | 
```
Page 1/2FirstPrevNextLast