This is page 1 of 2. Use http://codebase.md/chuanmingliu/mcp-webresearch?lines=true&page={x} to view the full context.
# Directory Structure
```
├── .cursorrules
├── .gitignore
├── docs
│ └── mcp_spec
│ └── llms-full.txt
├── index.ts
├── LICENSE
├── package.json
├── pnpm-lock.yaml
├── README.md
└── tsconfig.json
```
# Files
--------------------------------------------------------------------------------
/.cursorrules:
--------------------------------------------------------------------------------
```
1 | 1. Use pnpm instead of npm when generating packaging-related commands.
2 | 2. Only make changes to comments, code, or dependencies that are needed to accomplish the objective defined by the user. When editing code, don't remove comments or change dependencies or make changes that are unrelated to the code changes at hand.
```
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
```
1 | # Logs
2 | logs
3 | *.log
4 | npm-debug.log*
5 | yarn-debug.log*
6 | yarn-error.log*
7 | lerna-debug.log*
8 | .pnpm-debug.log*
9 |
10 | # Diagnostic reports (https://nodejs.org/api/report.html)
11 | report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json
12 |
13 | # Runtime data
14 | pids
15 | *.pid
16 | *.seed
17 | *.pid.lock
18 |
19 | # Directory for instrumented libs generated by jscoverage/JSCover
20 | lib-cov
21 |
22 | # Coverage directory used by tools like istanbul
23 | coverage
24 | *.lcov
25 |
26 | # nyc test coverage
27 | .nyc_output
28 |
29 | # Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files)
30 | .grunt
31 |
32 | # Bower dependency directory (https://bower.io/)
33 | bower_components
34 |
35 | # node-waf configuration
36 | .lock-wscript
37 |
38 | # Compiled binary addons (https://nodejs.org/api/addons.html)
39 | build/Release
40 |
41 | # Dependency directories
42 | node_modules/
43 | jspm_packages/
44 |
45 | # Snowpack dependency directory (https://snowpack.dev/)
46 | web_modules/
47 |
48 | # TypeScript cache
49 | *.tsbuildinfo
50 |
51 | # Optional npm cache directory
52 | .npm
53 |
54 | # Optional eslint cache
55 | .eslintcache
56 |
57 | # Optional stylelint cache
58 | .stylelintcache
59 |
60 | # Microbundle cache
61 | .rpt2_cache/
62 | .rts2_cache_cjs/
63 | .rts2_cache_es/
64 | .rts2_cache_umd/
65 |
66 | # Optional REPL history
67 | .node_repl_history
68 |
69 | # Output of 'npm pack'
70 | *.tgz
71 |
72 | # Yarn Integrity file
73 | .yarn-integrity
74 |
75 | # dotenv environment variable files
76 | .env
77 | .env.development.local
78 | .env.test.local
79 | .env.production.local
80 | .env.local
81 |
82 | # parcel-bundler cache (https://parceljs.org/)
83 | .cache
84 | .parcel-cache
85 |
86 | # Next.js build output
87 | .next
88 | out
89 |
90 | # Nuxt.js build / generate output
91 | .nuxt
92 | dist
93 |
94 | # Gatsby files
95 | .cache/
96 | # Comment in the public line in if your project uses Gatsby and not Next.js
97 | # https://nextjs.org/blog/next-9-1#public-directory-support
98 | # public
99 |
100 | # vuepress build output
101 | .vuepress/dist
102 |
103 | # vuepress v2.x temp and cache directory
104 | .temp
105 | .cache
106 |
107 | # Docusaurus cache and generated files
108 | .docusaurus
109 |
110 | # Serverless directories
111 | .serverless/
112 |
113 | # FuseBox cache
114 | .fusebox/
115 |
116 | # DynamoDB Local files
117 | .dynamodb/
118 |
119 | # TernJS port file
120 | .tern-port
121 |
122 | # Stores VSCode versions used for testing VSCode extensions
123 | .vscode-test
124 |
125 | # yarn v2
126 | .yarn/cache
127 | .yarn/unplugged
128 | .yarn/build-state.yml
129 | .yarn/install-state.gz
130 | .pnp.*
131 |
```
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
```markdown
1 | # MCP Web Research Server
2 |
3 | A Model Context Protocol (MCP) server for web research.
4 |
5 | Bring real-time info into Claude and easily research any topic.
6 |
7 | ## Features
8 |
9 | - Google search integration
10 | - Webpage content extraction
11 | - Research session tracking (list of visited pages, search queries, etc.)
12 | - Screenshot capture
13 |
14 | ## Prerequisites
15 |
16 | - [Node.js](https://nodejs.org/) >= 18 (includes `npm` and `npx`)
17 | - [Claude Desktop app](https://claude.ai/download)
18 |
19 | ## Installation
20 |
21 | First, ensure you've downloaded and installed the [Claude Desktop app](https://claude.ai/download) and you have npm installed.
22 |
23 | Next, add this entry to your `claude_desktop_config.json` (on Mac, found at `~/Library/Application\ Support/Claude/claude_desktop_config.json`):
24 |
25 | ```json
26 | {
27 | "mcpServers": {
28 | "webresearch": {
29 | "command": "npx",
30 | "args": ["-y", "@mzxrai/mcp-webresearch@latest"]
31 | }
32 | }
33 | }
34 | ```
35 |
36 | This config allows Claude Desktop to automatically start the web research MCP server when needed.
37 |
38 | ## Usage
39 |
40 | Simply start a chat with Claude and send a prompt that would benefit from web research. If you'd like a prebuilt prompt customized for deeper web research, you can use the `agentic-research` prompt that we provide through this package. Access that prompt in Claude Desktop by clicking the Paperclip icon in the chat input and then selecting `Choose an integration` → `webresearch` → `agentic-research`.
41 |
42 | <img src="https://i.ibb.co/N6Y3C0q/Screenshot-2024-12-05-at-11-01-27-PM.png" alt="Example screenshot of web research" width="400"/>
43 |
44 | ### Tools
45 |
46 | 1. `search_google`
47 | - Performs Google searches and extracts results
48 | - Arguments: `{ query: string }`
49 |
50 | 2. `visit_page`
51 | - Visits a webpage and extracts its content
52 | - Arguments: `{ url: string, takeScreenshot?: boolean }`
53 |
54 | 3. `take_screenshot`
55 | - Takes a screenshot of the current page
56 | - No arguments required
57 |
58 | ### Prompts
59 |
60 | #### `agentic-research`
61 | A guided research prompt that helps Claude conduct thorough web research. The prompt instructs Claude to:
62 | - Start with broad searches to understand the topic landscape
63 | - Prioritize high-quality, authoritative sources
64 | - Iteratively refine the research direction based on findings
65 | - Keep you informed and let you guide the research interactively
66 | - Always cite sources with URLs
67 |
68 | ### Resources
69 |
70 | We expose two things as MCP resources: (1) captured webpage screenshots, and (2) the research session.
71 |
72 | #### Screenshots
73 |
74 | When you take a screenshot, it's saved as an MCP resource. You can access captured screenshots in Claude Desktop via the Paperclip icon.
75 |
76 | #### Research Session
77 |
78 | The server maintains a research session that includes:
79 | - Search queries
80 | - Visited pages
81 | - Extracted content
82 | - Screenshots
83 | - Timestamps
84 |
85 | ### Suggestions
86 |
87 | For the best results, if you choose not to use the `agentic-research` prompt when doing your research, it may be helpful to suggest high-quality sources for Claude to use when researching general topics. For example, you could prompt `news today from reuters or AP` instead of `news today`.
88 |
89 | ## Problems
90 |
91 | This is very much pre-alpha code. And it is also AIGC, so expect bugs.
92 |
93 | If you run into issues, it may be helpful to check Claude Desktop's MCP logs:
94 |
95 | ```bash
96 | tail -n 20 -f ~/Library/Logs/Claude/mcp*.log
97 | ```
98 |
99 | ## Development
100 |
101 | ```bash
102 | # Install dependencies
103 | pnpm install
104 |
105 | # Build the project
106 | pnpm build
107 |
108 | # Watch for changes
109 | pnpm watch
110 |
111 | # Run in development mode
112 | pnpm dev
113 | ```
114 |
115 | ## Requirements
116 |
117 | - Node.js >= 18
118 | - Playwright (automatically installed as a dependency)
119 |
120 | ## Verified Platforms
121 |
122 | - [x] macOS
123 | - [ ] Linux
124 |
125 | ## License
126 |
127 | MIT
128 |
129 | ## Author
130 |
131 | [mzxrai](https://github.com/mzxrai)
```
--------------------------------------------------------------------------------
/tsconfig.json:
--------------------------------------------------------------------------------
```json
1 | {
2 | "compilerOptions": {
3 | "target": "ES2023",
4 | "module": "NodeNext",
5 | "moduleResolution": "NodeNext",
6 | "esModuleInterop": true,
7 | "strict": true,
8 | "outDir": "dist",
9 | "sourceMap": true,
10 | "declaration": true,
11 | "skipLibCheck": true,
12 | "lib": [
13 | "ES2023",
14 | "DOM",
15 | "DOM.Iterable"
16 | ]
17 | },
18 | "include": [
19 | "*.ts"
20 | ],
21 | "exclude": [
22 | "node_modules",
23 | "dist"
24 | ]
25 | }
```
--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------
```json
1 | {
2 | "name": "@mzxrai/mcp-webresearch",
3 | "version": "0.1.7",
4 | "description": "MCP server for web research",
5 | "license": "MIT",
6 | "author": "mzxrai",
7 | "homepage": "https://github.com/mzxrai/mcp-webresearch",
8 | "bugs": "https://github.com/mzxrai/mcp-webresearch/issues",
9 | "type": "module",
10 | "bin": {
11 | "mcp-server-webresearch": "dist/index.js"
12 | },
13 | "files": [
14 | "dist"
15 | ],
16 | "scripts": {
17 | "build": "tsc && shx chmod +x dist/*.js",
18 | "prepare": "pnpm run build",
19 | "postinstall": "playwright install chromium",
20 | "watch": "tsc --watch",
21 | "dev": "tsx watch index.ts"
22 | },
23 | "publishConfig": {
24 | "access": "public"
25 | },
26 | "keywords": [
27 | "mcp",
28 | "model-context-protocol",
29 | "web-research",
30 | "ai",
31 | "web-scraping"
32 | ],
33 | "dependencies": {
34 | "@modelcontextprotocol/sdk": "1.0.1",
35 | "playwright": "^1.49.0",
36 | "turndown": "^7.1.2"
37 | },
38 | "devDependencies": {
39 | "shx": "^0.3.4",
40 | "tsx": "^4.19.2",
41 | "typescript": "^5.6.2",
42 | "@types/turndown": "^5.0.4"
43 | }
44 | }
```
--------------------------------------------------------------------------------
/index.ts:
--------------------------------------------------------------------------------
```typescript
1 | #!/usr/bin/env node
2 |
3 | // Core dependencies for MCP server and protocol handling
4 | import { Server } from "@modelcontextprotocol/sdk/server/index.js";
5 | import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
6 | import {
7 | CallToolRequestSchema,
8 | ListResourcesRequestSchema,
9 | ListToolsRequestSchema,
10 | ReadResourceRequestSchema,
11 | ListPromptsRequestSchema,
12 | GetPromptRequestSchema,
13 | Tool,
14 | Resource,
15 | McpError,
16 | ErrorCode,
17 | TextContent,
18 | ImageContent,
19 | } from "@modelcontextprotocol/sdk/types.js";
20 |
21 | // Web scraping and content processing dependencies
22 | import { chromium, Browser, Page } from 'playwright';
23 | import TurndownService from "turndown";
24 | import type { Node } from "turndown";
25 | import * as fs from 'fs';
26 | import * as path from 'path';
27 | import * as os from 'os';
28 |
29 | // Initialize temp directory for screenshots
30 | const SCREENSHOTS_DIR = fs.mkdtempSync(path.join(os.tmpdir(), 'mcp-screenshots-'));
31 |
32 | // Initialize Turndown service for converting HTML to Markdown
33 | // Configure with specific formatting preferences
34 | const turndownService: TurndownService = new TurndownService({
35 | headingStyle: 'atx', // Use # style headings
36 | hr: '---', // Horizontal rule style
37 | bulletListMarker: '-', // List item marker
38 | codeBlockStyle: 'fenced', // Use ``` for code blocks
39 | emDelimiter: '_', // Italics style
40 | strongDelimiter: '**', // Bold style
41 | linkStyle: 'inlined', // Use inline links
42 | });
43 |
44 | // Custom Turndown rules for better content extraction
45 | // Remove script and style tags completely
46 | turndownService.addRule('removeScripts', {
47 | filter: ['script', 'style', 'noscript'],
48 | replacement: () => ''
49 | });
50 |
51 | // Preserve link elements with their href attributes
52 | turndownService.addRule('preserveLinks', {
53 | filter: 'a',
54 | replacement: (content: string, node: Node) => {
55 | const element = node as HTMLAnchorElement;
56 | const href = element.getAttribute('href');
57 | return href ? `[${content}](${href})` : content;
58 | }
59 | });
60 |
61 | // Preserve image elements with their src and alt attributes
62 | turndownService.addRule('preserveImages', {
63 | filter: 'img',
64 | replacement: (content: string, node: Node) => {
65 | const element = node as HTMLImageElement;
66 | const alt = element.getAttribute('alt') || '';
67 | const src = element.getAttribute('src');
68 | return src ? `` : '';
69 | }
70 | });
71 |
72 | // Core interfaces for research data management
73 | interface ResearchResult {
74 | url: string; // URL of the researched page
75 | title: string; // Page title
76 | content: string; // Extracted content in markdown
77 | timestamp: string; // When the result was captured
78 | screenshotPath?: string; // Path to screenshot file on disk
79 | }
80 |
81 | // Define structure for research session data
82 | interface ResearchSession {
83 | query: string; // Search query that initiated the session
84 | results: ResearchResult[]; // Collection of research results
85 | lastUpdated: string; // Timestamp of last update
86 | }
87 |
88 | // Screenshot management functions
89 | async function saveScreenshot(screenshot: string, title: string): Promise<string> {
90 | // Convert screenshot from base64 to buffer
91 | const buffer = Buffer.from(screenshot, 'base64');
92 |
93 | // Check size before saving
94 | const MAX_SIZE = 5 * 1024 * 1024; // 5MB
95 | if (buffer.length > MAX_SIZE) {
96 | throw new McpError(
97 | ErrorCode.InvalidRequest,
98 | `Screenshot too large: ${Math.round(buffer.length / (1024 * 1024))}MB exceeds ${MAX_SIZE / (1024 * 1024)}MB limit`
99 | );
100 | }
101 |
102 | // Generate a safe filename
103 | const timestamp = new Date().getTime();
104 | const safeTitle = title.replace(/[^a-z0-9]/gi, '_').toLowerCase();
105 | const filename = `${safeTitle}-${timestamp}.png`;
106 | const filepath = path.join(SCREENSHOTS_DIR, filename);
107 |
108 | // Save the validated screenshot
109 | await fs.promises.writeFile(filepath, buffer);
110 |
111 | // Return the filepath to the saved screenshot
112 | return filepath;
113 | }
114 |
115 | // Cleanup function to remove all screenshots from disk
116 | async function cleanupScreenshots(): Promise<void> {
117 | try {
118 | // Remove all files in the screenshots directory
119 | const files = await fs.promises.readdir(SCREENSHOTS_DIR);
120 | await Promise.all(files.map(file =>
121 | fs.promises.unlink(path.join(SCREENSHOTS_DIR, file))
122 | ));
123 |
124 | // Remove the directory itself
125 | await fs.promises.rmdir(SCREENSHOTS_DIR);
126 | } catch (error) {
127 | console.error('Error cleaning up screenshots:', error);
128 | }
129 | }
130 |
131 | // Available tools for web research functionality
132 | const TOOLS: Tool[] = [
133 | {
134 | name: "search_google",
135 | description: "Search Google for a query",
136 | inputSchema: {
137 | type: "object",
138 | properties: {
139 | query: { type: "string", description: "Search query" },
140 | },
141 | required: ["query"],
142 | },
143 | },
144 | {
145 | name: "visit_page",
146 | description: "Visit a webpage and extract its content",
147 | inputSchema: {
148 | type: "object",
149 | properties: {
150 | url: { type: "string", description: "URL to visit" },
151 | takeScreenshot: { type: "boolean", description: "Whether to take a screenshot" },
152 | },
153 | required: ["url"],
154 | },
155 | },
156 | {
157 | name: "take_screenshot",
158 | description: "Take a screenshot of the current page",
159 | inputSchema: {
160 | type: "object",
161 | properties: {}, // No parameters needed
162 | },
163 | },
164 | ];
165 |
166 | // Define available prompt types for type safety
167 | type PromptName = "agentic-research";
168 |
169 | // Define structure for research prompt arguments
170 | interface AgenticResearchArgs {
171 | topic: string; // Research topic provided by user
172 | }
173 |
174 | // Configure available prompts with their specifications
175 | const PROMPTS = {
176 | // Agentic research prompt configuration
177 | "agentic-research": {
178 | name: "agentic-research" as const, // Type-safe name
179 | description: "Conduct iterative web research on a topic, exploring it thoroughly through multiple steps while maintaining a dialogue with the user",
180 | arguments: [
181 | {
182 | name: "topic", // Topic argument specification
183 | description: "The topic or question to research", // Description of the argument
184 | required: true // Topic is mandatory
185 | }
186 | ]
187 | }
188 | } as const; // Make object immutable
189 |
190 | // Global state management for browser and research session
191 | let browser: Browser | undefined; // Puppeteer browser instance
192 | let page: Page | undefined; // Current active page
193 | let currentSession: ResearchSession | undefined; // Current research session data
194 |
195 | // Configuration constants for session management
196 | const MAX_RESULTS_PER_SESSION = 100; // Maximum number of results to store per session
197 | const MAX_RETRIES = 3; // Maximum retry attempts for operations
198 | const RETRY_DELAY = 1000; // Delay between retries in milliseconds
199 |
200 | // Generic retry mechanism for handling transient failures
201 | async function withRetry<T>(
202 | operation: () => Promise<T>, // Operation to retry
203 | retries = MAX_RETRIES, // Number of retry attempts
204 | delay = RETRY_DELAY // Delay between retries
205 | ): Promise<T> {
206 | let lastError: Error;
207 |
208 | // Attempt operation up to max retries
209 | for (let i = 0; i < retries; i++) {
210 | try {
211 | return await operation();
212 | } catch (error) {
213 | lastError = error as Error;
214 | if (i < retries - 1) {
215 | console.error(`Attempt ${i + 1} failed, retrying in ${delay}ms:`, error);
216 | await new Promise(resolve => setTimeout(resolve, delay));
217 | }
218 | }
219 | }
220 |
221 | throw lastError!; // Throw last error if all retries failed
222 | }
223 |
224 | // Add a new research result to the current session with data management
225 | function addResult(result: ResearchResult): void {
226 | // If no current session exists, initialize a new one
227 | if (!currentSession) {
228 | currentSession = {
229 | query: "Research Session",
230 | results: [],
231 | lastUpdated: new Date().toISOString(),
232 | };
233 | }
234 |
235 | // If the session has reached the maximum number of results, remove the oldest result
236 | if (currentSession.results.length >= MAX_RESULTS_PER_SESSION) {
237 | currentSession.results.shift();
238 | }
239 |
240 | // Add the new result to the session and update the last updated timestamp
241 | currentSession.results.push(result);
242 | currentSession.lastUpdated = new Date().toISOString();
243 | }
244 |
245 | /**
246 | * Specifically handles Google's consent dialog in regions that require it
247 | * @param page - Playwright Page object
248 | */
249 | async function dismissGoogleConsent(page: Page): Promise<void> {
250 | // Regions that commonly show cookie/consent banners
251 | const regions = [
252 | // Europe
253 | '.google.de', '.google.fr', '.google.co.uk',
254 | '.google.it', '.google.es', '.google.nl',
255 | '.google.pl', '.google.ie', '.google.dk',
256 | '.google.no', '.google.se', '.google.fi',
257 | '.google.at', '.google.ch', '.google.be',
258 | '.google.pt', '.google.gr', '.google.com.tr',
259 | // Asia Pacific
260 | '.google.co.id', '.google.com.sg', '.google.co.th',
261 | '.google.com.my', '.google.com.ph', '.google.com.au',
262 | '.google.co.nz', '.google.com.vn',
263 | // Generic domains
264 | '.google.com', '.google.co'
265 | ];
266 |
267 | try {
268 | // Get current URL
269 | const currentUrl = page.url();
270 |
271 | // Skip consent check if not in a supported region
272 | if (!regions.some(domain => currentUrl.includes(domain))) {
273 | return;
274 | }
275 |
276 | // Quick check for consent dialog existence
277 | const hasConsent = await page.$(
278 | 'form:has(button[aria-label]), div[aria-modal="true"], ' +
279 | // Common dialog containers
280 | 'div[role="dialog"], div[role="alertdialog"], ' +
281 | // Common cookie/consent specific elements
282 | 'div[class*="consent"], div[id*="consent"], ' +
283 | 'div[class*="cookie"], div[id*="cookie"], ' +
284 | // Common modal/popup classes
285 | 'div[class*="modal"]:has(button), div[class*="popup"]:has(button), ' +
286 | // Common banner patterns
287 | 'div[class*="banner"]:has(button), div[id*="banner"]:has(button)'
288 | ).then(Boolean);
289 |
290 | // If no consent dialog is found, return
291 | if (!hasConsent) {
292 | return;
293 | }
294 |
295 | // Handle the consent dialog using common consent button patterns
296 | await page.evaluate(() => {
297 | const consentPatterns = {
298 | // Common accept button text patterns across languages
299 | text: [
300 | // English
301 | 'accept all', 'agree', 'consent',
302 | // German
303 | 'alle akzeptieren', 'ich stimme zu', 'zustimmen',
304 | // French
305 | 'tout accepter', 'j\'accepte',
306 | // Spanish
307 | 'aceptar todo', 'acepto',
308 | // Italian
309 | 'accetta tutto', 'accetto',
310 | // Portuguese
311 | 'aceitar tudo', 'concordo',
312 | // Dutch
313 | 'alles accepteren', 'akkoord',
314 | // Polish
315 | 'zaakceptuj wszystko', 'zgadzam się',
316 | // Swedish
317 | 'godkänn alla', 'godkänn',
318 | // Danish
319 | 'accepter alle', 'accepter',
320 | // Norwegian
321 | 'godta alle', 'godta',
322 | // Finnish
323 | 'hyväksy kaikki', 'hyväksy',
324 | // Indonesian
325 | 'terima semua', 'setuju', 'saya setuju',
326 | // Malay
327 | 'terima semua', 'setuju',
328 | // Thai
329 | 'ยอมรับทั้งหมด', 'ยอมรับ',
330 | // Vietnamese
331 | 'chấp nhận tất cả', 'đồng ý',
332 | // Filipino/Tagalog
333 | 'tanggapin lahat', 'sumang-ayon',
334 | // Japanese
335 | 'すべて同意する', '同意する',
336 | // Korean
337 | '모두 동의', '동의'
338 | ],
339 | // Common aria-label patterns
340 | ariaLabels: [
341 | 'consent', 'accept', 'agree',
342 | 'cookie', 'privacy', 'terms',
343 | 'persetujuan', 'setuju', // Indonesian
344 | 'ยอมรับ', // Thai
345 | 'đồng ý', // Vietnamese
346 | '同意' // Japanese/Chinese
347 | ]
348 | };
349 |
350 | // Finds the accept button by text or aria-label
351 | const findAcceptButton = () => {
352 | // Get all buttons on the page
353 | const buttons = Array.from(document.querySelectorAll('button'));
354 |
355 | // Find the accept button
356 | return buttons.find(button => {
357 | // Get the text content and aria-label of the button
358 | const text = button.textContent?.toLowerCase() || '';
359 | const label = button.getAttribute('aria-label')?.toLowerCase() || '';
360 |
361 | // Check for matching text patterns
362 | const hasMatchingText = consentPatterns.text.some(pattern =>
363 | text.includes(pattern)
364 | );
365 |
366 | // Check for matching aria-labels
367 | const hasMatchingLabel = consentPatterns.ariaLabels.some(pattern =>
368 | label.includes(pattern)
369 | );
370 |
371 | // Return true if either text or aria-label matches
372 | return hasMatchingText || hasMatchingLabel;
373 | });
374 | };
375 |
376 | // Find the accept button
377 | const acceptButton = findAcceptButton();
378 |
379 | // If an accept button is found, click it
380 | if (acceptButton) {
381 | acceptButton.click();
382 | }
383 | });
384 | } catch (error) {
385 | console.log('Consent handling failed:', error);
386 | }
387 | }
388 |
389 | // Safe page navigation with error handling and bot detection
390 | async function safePageNavigation(page: Page, url: string): Promise<void> {
391 | try {
392 | // Step 1: Set cookies to bypass consent banner
393 | await page.context().addCookies([{
394 | name: 'CONSENT',
395 | value: 'YES+',
396 | domain: '.google.com',
397 | path: '/'
398 | }]);
399 |
400 | // Step 2: Initial navigation
401 | const response = await page.goto(url, {
402 | waitUntil: 'domcontentloaded',
403 | timeout: 15000
404 | });
405 |
406 | // Step 3: Basic response validation
407 | if (!response) {
408 | throw new Error('Navigation failed: no response received');
409 | }
410 |
411 | // Check HTTP status code; if 400 or higher, throw an error
412 | const status = response.status();
413 | if (status >= 400) {
414 | throw new Error(`HTTP ${status}: ${response.statusText()}`);
415 | }
416 |
417 | // Step 4: Wait for network to become idle or timeout
418 | await Promise.race([
419 | page.waitForLoadState('networkidle', { timeout: 5000 })
420 | .catch(() => {/* ignore timeout */ }),
421 | // Fallback timeout in case networkidle never occurs
422 | new Promise(resolve => setTimeout(resolve, 5000))
423 | ]);
424 |
425 | // Step 5: Security and content validation
426 | const validation = await page.evaluate(() => {
427 | const botProtectionExists = [
428 | '#challenge-running', // Cloudflare
429 | '#cf-challenge-running', // Cloudflare
430 | '#px-captcha', // PerimeterX
431 | '#ddos-protection', // Various
432 | '#waf-challenge-html' // Various WAFs
433 | ].some(selector => document.querySelector(selector));
434 |
435 | // Check for suspicious page titles
436 | const suspiciousTitle = [
437 | 'security check',
438 | 'ddos protection',
439 | 'please wait',
440 | 'just a moment',
441 | 'attention required'
442 | ].some(phrase => document.title.toLowerCase().includes(phrase));
443 |
444 | // Count words in the page content
445 | const bodyText = document.body.innerText || '';
446 | const words = bodyText.trim().split(/\s+/).length;
447 |
448 | // Return validation results
449 | return {
450 | wordCount: words,
451 | botProtection: botProtectionExists,
452 | suspiciousTitle,
453 | title: document.title
454 | };
455 | });
456 |
457 | // If bot protection is detected, throw an error
458 | if (validation.botProtection) {
459 | throw new Error('Bot protection detected');
460 | }
461 |
462 | // If the page title is suspicious, throw an error
463 | if (validation.suspiciousTitle) {
464 | throw new Error(`Suspicious page title detected: "${validation.title}"`);
465 | }
466 |
467 | // If the page contains insufficient content, throw an error
468 | if (validation.wordCount < 10) {
469 | throw new Error('Page contains insufficient content');
470 | }
471 |
472 | } catch (error) {
473 | // If an error occurs during navigation, throw an error with the URL and the error message
474 | throw new Error(`Navigation to ${url} failed: ${(error as Error).message}`);
475 | }
476 | }
477 |
478 | // Take and optimize a screenshot
479 | async function takeScreenshotWithSizeLimit(page: Page): Promise<string> {
480 | const MAX_SIZE = 5 * 1024 * 1024;
481 | const MAX_DIMENSION = 1920;
482 | const MIN_DIMENSION = 800;
483 |
484 | // Set viewport size
485 | await page.setViewportSize({
486 | width: 1600,
487 | height: 900
488 | });
489 |
490 | // Take initial screenshot
491 | let screenshot = await page.screenshot({
492 | type: 'png',
493 | fullPage: false
494 | });
495 |
496 | // Handle buffer conversion
497 | let buffer = screenshot;
498 | let attempts = 0;
499 | const MAX_ATTEMPTS = 3;
500 |
501 | // While screenshot is too large, reduce size
502 | while (buffer.length > MAX_SIZE && attempts < MAX_ATTEMPTS) {
503 | // Get current viewport size
504 | const viewport = page.viewportSize();
505 | if (!viewport) continue;
506 |
507 | // Calculate new dimensions
508 | const scaleFactor = Math.pow(0.75, attempts + 1);
509 | let newWidth = Math.round(viewport.width * scaleFactor);
510 | let newHeight = Math.round(viewport.height * scaleFactor);
511 |
512 | // Ensure dimensions are within bounds
513 | newWidth = Math.max(MIN_DIMENSION, Math.min(MAX_DIMENSION, newWidth));
514 | newHeight = Math.max(MIN_DIMENSION, Math.min(MAX_DIMENSION, newHeight));
515 |
516 | // Update viewport with new dimensions
517 | await page.setViewportSize({
518 | width: newWidth,
519 | height: newHeight
520 | });
521 |
522 | // Take new screenshot
523 | screenshot = await page.screenshot({
524 | type: 'png',
525 | fullPage: false
526 | });
527 |
528 | // Update buffer with new screenshot
529 | buffer = screenshot;
530 |
531 | // Increment retry attempts
532 | attempts++;
533 | }
534 |
535 | // Final attempt with minimum settings
536 | if (buffer.length > MAX_SIZE) {
537 | await page.setViewportSize({
538 | width: MIN_DIMENSION,
539 | height: MIN_DIMENSION
540 | });
541 |
542 | // Take final screenshot
543 | screenshot = await page.screenshot({
544 | type: 'png',
545 | fullPage: false
546 | });
547 |
548 | // Update buffer with final screenshot
549 | buffer = screenshot;
550 |
551 | // Throw error if final screenshot is still too large
552 | if (buffer.length > MAX_SIZE) {
553 | throw new McpError(
554 | ErrorCode.InvalidRequest,
555 | `Failed to reduce screenshot to under 5MB even with minimum settings`
556 | );
557 | }
558 | }
559 |
560 | // Convert Buffer to base64 string before returning
561 | return buffer.toString('base64');
562 | }
563 |
564 | // Initialize MCP server with basic configuration
565 | const server: Server = new Server(
566 | {
567 | name: "webresearch", // Server name identifier
568 | version: "0.1.7", // Server version number
569 | },
570 | {
571 | capabilities: {
572 | tools: {}, // Available tool configurations
573 | resources: {}, // Resource handling capabilities
574 | prompts: {} // Prompt processing capabilities
575 | },
576 | }
577 | );
578 |
579 | // Register handler for tool listing requests
580 | server.setRequestHandler(ListToolsRequestSchema, async () => ({
581 | tools: TOOLS // Return list of available research tools
582 | }));
583 |
584 | // Register handler for resource listing requests
585 | server.setRequestHandler(ListResourcesRequestSchema, async () => {
586 | // Return empty list if no active session
587 | if (!currentSession) {
588 | return { resources: [] };
589 | }
590 |
591 | // Compile list of available resources
592 | const resources: Resource[] = [
593 | // Add session summary resource
594 | {
595 | uri: "research://current/summary", // Resource identifier
596 | name: "Current Research Session Summary",
597 | description: "Summary of the current research session including queries and results",
598 | mimeType: "application/json"
599 | },
600 | // Add screenshot resources if available
601 | ...currentSession.results
602 | .map((r, i): Resource | undefined => r.screenshotPath ? {
603 | uri: `research://screenshots/${i}`,
604 | name: `Screenshot of ${r.title}`,
605 | description: `Screenshot taken from ${r.url}`,
606 | mimeType: "image/png"
607 | } : undefined)
608 | .filter((r): r is Resource => r !== undefined)
609 | ];
610 |
611 | // Return compiled list of resources
612 | return { resources };
613 | });
614 |
615 | // Register handler for resource content requests
616 | server.setRequestHandler(ReadResourceRequestSchema, async (request) => {
617 | const uri = request.params.uri.toString();
618 |
619 | // Handle session summary requests for research data
620 | if (uri === "research://current/summary") {
621 | if (!currentSession) {
622 | throw new McpError(
623 | ErrorCode.InvalidRequest,
624 | "No active research session"
625 | );
626 | }
627 |
628 | // Return compiled list of resources
629 | return {
630 | contents: [{
631 | uri,
632 | mimeType: "application/json",
633 | text: JSON.stringify({
634 | query: currentSession.query,
635 | resultCount: currentSession.results.length,
636 | lastUpdated: currentSession.lastUpdated,
637 | results: currentSession.results.map(r => ({
638 | title: r.title,
639 | url: r.url,
640 | timestamp: r.timestamp,
641 | screenshotPath: r.screenshotPath
642 | }))
643 | }, null, 2)
644 | }]
645 | };
646 | }
647 |
648 | // Handle screenshot requests
649 | if (uri.startsWith("research://screenshots/")) {
650 | const index = parseInt(uri.split("/").pop() || "", 10);
651 |
652 | // Verify session exists
653 | if (!currentSession) {
654 | throw new McpError(
655 | ErrorCode.InvalidRequest,
656 | "No active research session"
657 | );
658 | }
659 |
660 | // Verify index is within bounds
661 | if (isNaN(index) || index < 0 || index >= currentSession.results.length) {
662 | throw new McpError(
663 | ErrorCode.InvalidRequest,
664 | `Screenshot index out of bounds: ${index}`
665 | );
666 | }
667 |
668 | // Get result containing screenshot
669 | const result = currentSession.results[index];
670 | if (!result?.screenshotPath) {
671 | throw new McpError(
672 | ErrorCode.InvalidRequest,
673 | `No screenshot available at index: ${index}`
674 | );
675 | }
676 |
677 | try {
678 | // Read the binary data and convert to base64
679 | const screenshotData = await fs.promises.readFile(result.screenshotPath);
680 |
681 | // Convert Buffer to base64 string before returning
682 | const base64Data = screenshotData.toString('base64');
683 |
684 | // Return compiled list of resources
685 | return {
686 | contents: [{
687 | uri,
688 | mimeType: "image/png",
689 | blob: base64Data
690 | }]
691 | };
692 | } catch (error: unknown) {
693 | // Handle error if screenshot cannot be read
694 | const errorMessage = error instanceof Error ? error.message : 'Unknown error occurred';
695 | throw new McpError(
696 | ErrorCode.InternalError,
697 | `Failed to read screenshot: ${errorMessage}`
698 | );
699 | }
700 | }
701 |
702 | // Handle unknown resource types
703 | throw new McpError(
704 | ErrorCode.InvalidRequest,
705 | `Unknown resource: ${uri}`
706 | );
707 | });
708 |
709 | // Initialize MCP server connection using stdio transport
710 | const transport = new StdioServerTransport();
711 | server.connect(transport).catch((error) => {
712 | console.error("Failed to start server:", error);
713 | process.exit(1);
714 | });
715 |
716 | // Convert HTML content to clean, readable markdown format
717 | async function extractContentAsMarkdown(
718 | page: Page, // Puppeteer page to extract from
719 | selector?: string // Optional CSS selector to target specific content
720 | ): Promise<string> {
721 | // Step 1: Execute content extraction in browser context
722 | const html = await page.evaluate((sel) => {
723 | // Handle case where specific selector is provided
724 | if (sel) {
725 | const element = document.querySelector(sel);
726 | // Return element content or empty string if not found
727 | return element ? element.outerHTML : '';
728 | }
729 |
730 | // Step 2: Try standard content containers first
731 | const contentSelectors = [
732 | 'main', // HTML5 semantic main content
733 | 'article', // HTML5 semantic article content
734 | '[role="main"]', // ARIA main content role
735 | '#content', // Common content ID
736 | '.content', // Common content class
737 | '.main', // Alternative main class
738 | '.post', // Blog post content
739 | '.article', // Article content container
740 | ];
741 |
742 | // Try each selector in priority order
743 | for (const contentSelector of contentSelectors) {
744 | const element = document.querySelector(contentSelector);
745 | if (element) {
746 | return element.outerHTML; // Return first matching content
747 | }
748 | }
749 |
750 | // Step 3: Fallback to cleaning full body content
751 | const body = document.body;
752 |
753 | // Define elements to remove for cleaner content
754 | const elementsToRemove = [
755 | // Navigation elements
756 | 'header', // Page header
757 | 'footer', // Page footer
758 | 'nav', // Navigation sections
759 | '[role="navigation"]', // ARIA navigation elements
760 |
761 | // Sidebars and complementary content
762 | 'aside', // Sidebar content
763 | '.sidebar', // Sidebar by class
764 | '[role="complementary"]', // ARIA complementary content
765 |
766 | // Navigation-related elements
767 | '.nav', // Navigation classes
768 | '.menu', // Menu elements
769 |
770 | // Page structure elements
771 | '.header', // Header classes
772 | '.footer', // Footer classes
773 |
774 | // Advertising and notices
775 | '.advertisement', // Advertisement containers
776 | '.ads', // Ad containers
777 | '.cookie-notice', // Cookie consent notices
778 | ];
779 |
780 | // Remove each unwanted element from content
781 | elementsToRemove.forEach(sel => {
782 | body.querySelectorAll(sel).forEach(el => el.remove());
783 | });
784 |
785 | // Return cleaned body content
786 | return body.outerHTML;
787 | }, selector);
788 |
789 | // Step 4: Handle empty content case
790 | if (!html) {
791 | return '';
792 | }
793 |
794 | try {
795 | // Step 5: Convert HTML to Markdown
796 | const markdown = turndownService.turndown(html);
797 |
798 | // Step 6: Clean up and format markdown
799 | return markdown
800 | .replace(/\n{3,}/g, '\n\n') // Replace excessive newlines with double
801 | .replace(/^- $/gm, '') // Remove empty list items
802 | .replace(/^\s+$/gm, '') // Remove whitespace-only lines
803 | .trim(); // Remove leading/trailing whitespace
804 |
805 | } catch (error) {
806 | // Log conversion errors and return original HTML as fallback
807 | console.error('Error converting HTML to Markdown:', error);
808 | return html;
809 | }
810 | }
811 |
812 | // Validate URL format and ensure security constraints
813 | function isValidUrl(urlString: string): boolean {
814 | try {
815 | // Attempt to parse URL string
816 | const url = new URL(urlString);
817 |
818 | // Only allow HTTP and HTTPS protocols for security
819 | return url.protocol === 'http:' || url.protocol === 'https:';
820 | } catch {
821 | // Return false for any invalid URL format
822 | return false;
823 | }
824 | }
825 |
826 | // Define result type for tool operations
827 | type ToolResult = {
828 | content: (TextContent | ImageContent)[]; // Array of text or image content
829 | isError?: boolean; // Optional error flag
830 | };
831 |
832 | // Tool request handler for executing research operations
833 | server.setRequestHandler(CallToolRequestSchema, async (request): Promise<ToolResult> => {
834 | // Initialize browser for tool operations
835 | const page = await ensureBrowser();
836 |
837 | switch (request.params.name) {
838 | // Handle Google search operations
839 | case "search_google": {
840 | // Extract search query from request parameters
841 | const { query } = request.params.arguments as { query: string };
842 |
843 | try {
844 | // Execute search with retry mechanism
845 | const results = await withRetry(async () => {
846 | // Step 1: Navigate to Google search page
847 | await safePageNavigation(page, 'https://www.google.com');
848 | await dismissGoogleConsent(page);
849 |
850 | // Step 2: Find and interact with search input
851 | await withRetry(async () => {
852 | // Wait for any search input element to appear
853 | await Promise.race([
854 | // Try multiple possible selectors for search input
855 | page.waitForSelector('input[name="q"]', { timeout: 5000 }),
856 | page.waitForSelector('textarea[name="q"]', { timeout: 5000 }),
857 | page.waitForSelector('input[type="text"]', { timeout: 5000 })
858 | ]).catch(() => {
859 | throw new Error('Search input not found - no matching selectors');
860 | });
861 |
862 | // Find the actual search input element
863 | const searchInput = await page.$('input[name="q"]') ||
864 | await page.$('textarea[name="q"]') ||
865 | await page.$('input[type="text"]');
866 |
867 | // Verify search input was found
868 | if (!searchInput) {
869 | throw new Error('Search input element not found after waiting');
870 | }
871 |
872 | // Step 3: Enter search query
873 | await searchInput.click({ clickCount: 3 }); // Select all existing text
874 | await searchInput.press('Backspace'); // Clear selected text
875 | await searchInput.type(query); // Type new query
876 | }, 3, 2000); // Allow 3 retries with 2s delay
877 |
878 | // Step 4: Submit search and wait for results
879 | await withRetry(async () => {
880 | await Promise.all([
881 | page.keyboard.press('Enter'),
882 | page.waitForLoadState('networkidle', { timeout: 15000 }),
883 | ]);
884 | });
885 |
886 | // Step 5: Extract search results
887 | const searchResults = await withRetry(async () => {
888 | const results = await page.evaluate(() => {
889 | // Find all search result containers
890 | const elements = document.querySelectorAll('div.g');
891 | if (!elements || elements.length === 0) {
892 | throw new Error('No search results found');
893 | }
894 |
895 | // Extract data from each result
896 | return Array.from(elements).map((el) => {
897 | // Find required elements within result container
898 | const titleEl = el.querySelector('h3'); // Title element
899 | const linkEl = el.querySelector('a'); // Link element
900 | const snippetEl = el.querySelector('div.VwiC3b'); // Snippet element
901 |
902 | // Skip results missing required elements
903 | if (!titleEl || !linkEl || !snippetEl) {
904 | return null;
905 | }
906 |
907 | // Return structured result data
908 | return {
909 | title: titleEl.textContent || '', // Result title
910 | url: linkEl.getAttribute('href') || '', // Result URL
911 | snippet: snippetEl.textContent || '', // Result description
912 | };
913 | }).filter(result => result !== null); // Remove invalid results
914 | });
915 |
916 | // Verify we found valid results
917 | if (!results || results.length === 0) {
918 | throw new Error('No valid search results found');
919 | }
920 |
921 | // Return compiled list of results
922 | return results;
923 | });
924 |
925 | // Step 6: Store results in session
926 | searchResults.forEach((result) => {
927 | addResult({
928 | url: result.url,
929 | title: result.title,
930 | content: result.snippet,
931 | timestamp: new Date().toISOString(),
932 | });
933 | });
934 |
935 | // Return compiled list of results
936 | return searchResults;
937 | });
938 |
939 | // Step 7: Return formatted results
940 | return {
941 | content: [{
942 | type: "text",
943 | text: JSON.stringify(results, null, 2) // Pretty-print JSON results
944 | }]
945 | };
946 | } catch (error) {
947 | // Handle and format search errors
948 | return {
949 | content: [{
950 | type: "text",
951 | text: `Failed to perform search: ${(error as Error).message}`
952 | }],
953 | isError: true
954 | };
955 | }
956 | }
957 |
958 | // Handle webpage visit and content extraction
959 | case "visit_page": {
960 | // Extract URL and screenshot flag from request
961 | const { url, takeScreenshot } = request.params.arguments as {
962 | url: string; // Target URL to visit
963 | takeScreenshot?: boolean; // Optional screenshot flag
964 | };
965 |
966 | // Step 1: Validate URL format and security
967 | if (!isValidUrl(url)) {
968 | return {
969 | content: [{
970 | type: "text" as const,
971 | text: `Invalid URL: ${url}. Only http and https protocols are supported.`
972 | }],
973 | isError: true
974 | };
975 | }
976 |
977 | try {
978 | // Step 2: Visit page and extract content with retry mechanism
979 | const result = await withRetry(async () => {
980 | // Navigate to target URL safely
981 | await safePageNavigation(page, url);
982 | const title = await page.title();
983 |
984 | // Step 3: Extract and process page content
985 | const content = await withRetry(async () => {
986 | // Convert page content to markdown
987 | const extractedContent = await extractContentAsMarkdown(page);
988 |
989 | // If no content is extracted, throw an error
990 | if (!extractedContent) {
991 | throw new Error('Failed to extract content');
992 | }
993 |
994 | // Return the extracted content
995 | return extractedContent;
996 | });
997 |
998 | // Step 4: Create result object with page data
999 | const pageResult: ResearchResult = {
1000 | url, // Original URL
1001 | title, // Page title
1002 | content, // Markdown content
1003 | timestamp: new Date().toISOString(), // Capture time
1004 | };
1005 |
1006 | // Step 5: Take screenshot if requested
1007 | let screenshotUri: string | undefined;
1008 | if (takeScreenshot) {
1009 | // Capture and process screenshot
1010 | const screenshot = await takeScreenshotWithSizeLimit(page);
1011 | pageResult.screenshotPath = await saveScreenshot(screenshot, title);
1012 |
1013 | // Get the index for the resource URI
1014 | const resultIndex = currentSession ? currentSession.results.length : 0;
1015 | screenshotUri = `research://screenshots/${resultIndex}`;
1016 |
1017 | // Notify clients about new screenshot resource
1018 | server.notification({
1019 | method: "notifications/resources/list_changed"
1020 | });
1021 | }
1022 |
1023 | // Step 6: Store result in session
1024 | addResult(pageResult);
1025 | return { pageResult, screenshotUri };
1026 | });
1027 |
1028 | // Step 7: Return formatted result with screenshot URI if taken
1029 | const response: ToolResult = {
1030 | content: [{
1031 | type: "text" as const,
1032 | text: JSON.stringify({
1033 | url: result.pageResult.url,
1034 | title: result.pageResult.title,
1035 | content: result.pageResult.content,
1036 | timestamp: result.pageResult.timestamp,
1037 | screenshot: result.screenshotUri ? `View screenshot via *MCP Resources* (Paperclip icon) @ URI: ${result.screenshotUri}` : undefined
1038 | }, null, 2)
1039 | }]
1040 | };
1041 |
1042 | return response;
1043 | } catch (error) {
1044 | // Handle and format page visit errors
1045 | return {
1046 | content: [{
1047 | type: "text" as const,
1048 | text: `Failed to visit page: ${(error as Error).message}`
1049 | }],
1050 | isError: true
1051 | };
1052 | }
1053 | }
1054 |
1055 | // Handle standalone screenshot requests
1056 | case "take_screenshot": {
1057 | try {
1058 | // Step 1: Capture screenshot with retry mechanism
1059 | const screenshot = await withRetry(async () => {
1060 | // Take and optimize screenshot with default size limits
1061 | return await takeScreenshotWithSizeLimit(page);
1062 | });
1063 |
1064 | // Step 2: Initialize session if needed
1065 | if (!currentSession) {
1066 | currentSession = {
1067 | query: "Screenshot Session", // Session identifier
1068 | results: [], // Empty results array
1069 | lastUpdated: new Date().toISOString(), // Current timestamp
1070 | };
1071 | }
1072 |
1073 | // Step 3: Get current page information
1074 | const pageUrl = await page.url(); // Current page URL
1075 | const pageTitle = await page.title(); // Current page title
1076 |
1077 | // Step 4: Save screenshot to disk
1078 | const screenshotPath = await saveScreenshot(screenshot, pageTitle || 'untitled');
1079 |
1080 | // Step 5: Create and store screenshot result
1081 | const resultIndex = currentSession ? currentSession.results.length : 0;
1082 | addResult({
1083 | url: pageUrl,
1084 | title: pageTitle || "Untitled Page", // Fallback title if none available
1085 | content: "Screenshot taken", // Simple content description
1086 | timestamp: new Date().toISOString(), // Capture time
1087 | screenshotPath // Path to screenshot file
1088 | });
1089 |
1090 | // Step 6: Notify clients about new screenshot resource
1091 | server.notification({
1092 | method: "notifications/resources/list_changed"
1093 | });
1094 |
1095 | // Step 7: Return success message with resource URI
1096 | const resourceUri = `research://screenshots/${resultIndex}`;
1097 | return {
1098 | content: [{
1099 | type: "text" as const,
1100 | text: `Screenshot taken successfully. You can view it via *MCP Resources* (Paperclip icon) @ URI: ${resourceUri}`
1101 | }]
1102 | };
1103 | } catch (error) {
1104 | // Handle and format screenshot errors
1105 | return {
1106 | content: [{
1107 | type: "text" as const,
1108 | text: `Failed to take screenshot: ${(error as Error).message}`
1109 | }],
1110 | isError: true
1111 | };
1112 | }
1113 | }
1114 |
1115 | // Handle unknown tool requests
1116 | default:
1117 | throw new McpError(
1118 | ErrorCode.MethodNotFound,
1119 | `Unknown tool: ${request.params.name}`
1120 | );
1121 | }
1122 | });
1123 |
1124 | // Register handler for prompt listing requests
1125 | server.setRequestHandler(ListPromptsRequestSchema, async () => {
1126 | // Return all available prompts
1127 | return { prompts: Object.values(PROMPTS) };
1128 | });
1129 |
1130 | // Register handler for prompt retrieval and execution
1131 | server.setRequestHandler(GetPromptRequestSchema, async (request) => {
1132 | // Extract and validate prompt name
1133 | const promptName = request.params.name as PromptName;
1134 | const prompt = PROMPTS[promptName];
1135 |
1136 | // Handle unknown prompt requests
1137 | if (!prompt) {
1138 | throw new McpError(ErrorCode.InvalidRequest, `Prompt not found: ${promptName}`);
1139 | }
1140 |
1141 | // Handle agentic research prompt
1142 | if (promptName === "agentic-research") {
1143 | // Extract research topic from request arguments
1144 | const args = request.params.arguments as AgenticResearchArgs | undefined;
1145 | const topic = args?.topic || ""; // Use empty string if no topic provided
1146 |
1147 | // Return research assistant prompt with instructions
1148 | return {
1149 | messages: [
1150 | // Initial assistant message establishing role
1151 | {
1152 | role: "assistant",
1153 | content: {
1154 | type: "text",
1155 | text: "I am ready to help you with your research. I will conduct thorough web research, explore topics deeply, and maintain a dialogue with you throughout the process."
1156 | }
1157 | },
1158 | // Detailed research instructions for the user
1159 | {
1160 | role: "user",
1161 | content: {
1162 | type: "text",
1163 | text: `I'd like to research this topic: <topic>${topic}</topic>
1164 |
1165 | Please help me explore it deeply, like you're a thoughtful, highly-trained research assistant.
1166 |
1167 | General instructions:
1168 | 1. Start by proposing your research approach -- namely, formulate what initial query you will use to search the web. Propose a relatively broad search to understand the topic landscape. At the same time, make your queries optimized for returning high-quality results based on what you know about constructing Google search queries.
1169 | 2. Next, get my input on whether you should proceed with that query or if you should refine it.
1170 | 3. Once you have an approved query, perform the search.
1171 | 4. Prioritize high quality, authoritative sources when they are available and relevant to the topic. Avoid low quality or spammy sources.
1172 | 5. Retrieve information that is relevant to the topic at hand.
1173 | 6. Iteratively refine your research direction based on what you find.
1174 | 7. Keep me informed of what you find and let *me* guide the direction of the research interactively.
1175 | 8. If you run into a dead end while researching, do a Google search for the topic and attempt to find a URL for a relevant page. Then, explore that page in depth.
1176 | 9. Only conclude when my research goals are met.
1177 | 10. **Always cite your sources**, providing URLs to the sources you used in a citation block at the end of your response.
1178 |
1179 | You can use these tools:
1180 | - search_google: Search for information
1181 | - visit_page: Visit and extract content from web pages
1182 |
1183 | Do *NOT* use the following tools:
1184 | - Anything related to knowledge graphs or memory, unless explicitly instructed to do so by the user.`
1185 | }
1186 | }
1187 | ]
1188 | };
1189 | }
1190 |
1191 | // Handle unsupported prompt types
1192 | throw new McpError(ErrorCode.InvalidRequest, "Prompt implementation not found");
1193 | });
1194 |
1195 | // Ensures browser is running, and creates a new page if needed
1196 | async function ensureBrowser(): Promise<Page> {
1197 | // Launch browser if not already running
1198 | if (!browser) {
1199 | browser = await chromium.launch({
1200 | headless: true, // Run in headless mode for automation
1201 | });
1202 |
1203 | // Create initial context and page
1204 | const context = await browser.newContext();
1205 | page = await context.newPage();
1206 | }
1207 |
1208 | // Create new page if current one is closed/invalid
1209 | if (!page) {
1210 | const context = await browser.newContext();
1211 | page = await context.newPage();
1212 | }
1213 |
1214 | // Return the current page
1215 | return page;
1216 | }
1217 |
1218 | // Cleanup function
1219 | async function cleanup(): Promise<void> {
1220 | try {
1221 | // Clean up screenshots first
1222 | await cleanupScreenshots();
1223 |
1224 | // Then close the browser
1225 | if (browser) {
1226 | await browser.close();
1227 | }
1228 | } catch (error) {
1229 | console.error('Error during cleanup:', error);
1230 | } finally {
1231 | browser = undefined;
1232 | page = undefined;
1233 | }
1234 | }
1235 |
1236 | // Register cleanup handlers
1237 | process.on('exit', cleanup);
1238 | process.on('SIGTERM', cleanup);
1239 | process.on('SIGINT', cleanup);
1240 | process.on('SIGHUP', cleanup);
```