# Directory Structure
```
├── package-lock.json
├── package.json
├── README.md
├── src
│ ├── ai
│ │ ├── page-interactions.ts
│ │ └── vision-analyzer.ts
│ ├── config.ts
│ ├── index.ts
│ ├── scrapers
│ │ ├── content-processor.ts
│ │ └── webpage-scraper.ts
│ ├── server
│ │ ├── mcp-server.ts
│ │ ├── tools.ts
│ │ └── transports.ts
│ ├── types
│ │ └── index.ts
│ └── utils
│ ├── html-helpers.ts
│ └── markdown-formatters.ts
└── tsconfig.json
```
# Files
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
```markdown
[](https://mseep.ai/app/djannot-puppeteer-vision-mcp)
# Puppeteer vision MCP Server
This Model Context Protocol (MCP) server provides a tool for scraping webpages and converting them to markdown format using Puppeteer, Readability, and Turndown. It features AI-driven interaction capabilities to handle cookies, captchas, and other interactive elements automatically.
**Now easily runnable via `npx`!**
## Features
- Scrapes webpages using Puppeteer with stealth mode
- Uses AI-powered interaction to automatically handle:
- Cookie consent banners
- CAPTCHAs
- Newsletter or subscription prompts
- Paywalls and login walls
- Age verification prompts
- Interstitial ads
- Any other interactive elements blocking content
- Extracts main content with Mozilla's Readability
- Converts HTML to well-formatted Markdown
- Special handling for code blocks, tables, and other structured content
- Accessible via the Model Context Protocol
- Option to view browser interaction in real-time by disabling headless mode
- Easily consumable as an `npx` package.
## Quick Start with NPX
The recommended way to use this server is via `npx`, which ensures you're running the latest version without needing to clone or manually install.
1. **Prerequisites:** Ensure you have Node.js and npm installed.
2. **Environment Setup:**
The server requires an `OPENAI_API_KEY`. You can provide this and other optional configurations in two ways:
* **`.env` file:** Create a `.env` file in the directory where you will run the `npx` command.
* **Shell Environment Variables:** Export the variables in your terminal session.
**Example `.env` file or shell exports:**
```env
# Required
OPENAI_API_KEY=your_api_key_here
# Optional (defaults shown)
# VISION_MODEL=gpt-4.1
# API_BASE_URL=https://api.openai.com/v1 # Uncomment to override
# TRANSPORT_TYPE=stdio # Options: stdio, sse, http
# USE_SSE=true # Deprecated: use TRANSPORT_TYPE=sse instead
# PORT=3001 # Only used in sse/http modes
# DISABLE_HEADLESS=true # Uncomment to see the browser in action
```
3. **Run the Server:**
Open your terminal and run:
```bash
npx -y puppeteer-vision-mcp-server
```
* The `-y` flag automatically confirms any prompts from `npx`.
* This command will download (if not already cached) and execute the server.
* By default, it starts in `stdio` mode. Set `TRANSPORT_TYPE=sse` or `TRANSPORT_TYPE=http` for HTTP server modes.
## Using as an MCP Tool with NPX
This server is designed to be integrated as a tool within an MCP-compatible LLM orchestrator. Here's an example configuration snippet:
```json
{
"mcpServers": {
"web-scraper": {
"command": "npx",
"args": ["-y", "puppeteer-vision-mcp-server"],
"env": {
"OPENAI_API_KEY": "YOUR_OPENAI_API_KEY_HERE",
// Optional:
// "VISION_MODEL": "gpt-4.1",
// "API_BASE_URL": "https://api.example.com/v1",
// "TRANSPORT_TYPE": "stdio", // or "sse" or "http"
// "DISABLE_HEADLESS": "true" // To see the browser during operations
}
}
// ... other MCP servers
}
}
```
When configured this way, the MCP orchestrator will manage the lifecycle of the `puppeteer-vision-mcp-server` process.
## Environment Configuration Details
Regardless of how you run the server (NPX or local development), it uses the following environment variables:
- **`OPENAI_API_KEY`**: (Required) Your API key for accessing the vision model.
- **`VISION_MODEL`**: (Optional) The model to use for vision analysis.
- Default: `gpt-4.1`
- Can be any model with vision capabilities.
- **`API_BASE_URL`**: (Optional) Custom API endpoint URL.
- Use this to connect to alternative OpenAI-compatible providers (e.g., Together.ai, Groq, Anthropic, local deployments).
- **`TRANSPORT_TYPE`**: (Optional) The transport protocol to use.
- Options: `stdio` (default), `sse`, `http`
- `stdio`: Direct process communication (recommended for most use cases)
- `sse`: Server-Sent Events over HTTP (legacy mode)
- `http`: Streamable HTTP transport with session management
- **`USE_SSE`**: (Optional, deprecated) Set to `true` to enable SSE mode over HTTP.
- Deprecated: Use `TRANSPORT_TYPE=sse` instead.
- **`PORT`**: (Optional) The port for the HTTP server in SSE or HTTP mode.
- Default: `3001`.
- **`DISABLE_HEADLESS`**: (Optional) Set to `true` to run the browser in visible mode.
- Default: `false` (browser runs in headless mode).
## Communication Modes
The server supports three communication modes:
1. **stdio (Default)**: Communicates via standard input/output.
- Perfect for direct integration with LLM tools that manage processes.
- Ideal for command-line usage and scripting.
- No HTTP server is started. This is the default mode.
2. **SSE mode**: Communicates via Server-Sent Events over HTTP.
- Enable by setting `TRANSPORT_TYPE=sse` in your environment.
- Starts an HTTP server on the specified `PORT` (default: 3001).
- Use when you need to connect to the tool over a network.
- Connect to: `http://localhost:3001/sse`
3. **HTTP mode**: Communicates via Streamable HTTP transport with session management.
- Enable by setting `TRANSPORT_TYPE=http` in your environment.
- Starts an HTTP server on the specified `PORT` (default: 3001).
- Supports full session management and resumable connections.
- Connect to: `http://localhost:3001/mcp`
## Tool Usage (MCP Invocation)
The server provides a `scrape-webpage` tool.
**Tool Parameters:**
- `url` (string, required): The URL of the webpage to scrape.
- `autoInteract` (boolean, optional, default: true): Whether to automatically handle interactive elements.
- `maxInteractionAttempts` (number, optional, default: 3): Maximum number of AI interaction attempts.
- `waitForNetworkIdle` (boolean, optional, default: true): Whether to wait for network to be idle before processing.
**Response Format:**
The tool returns its result in a structured format:
- **`content`**: An array containing a single text object with the raw markdown of the scraped webpage.
- **`metadata`**: Contains additional information:
- `message`: Status message.
- `success`: Boolean indicating success.
- `contentSize`: Size of the content in characters (on success).
*Example Success Response:*
```json
{
"content": [
{
"type": "text",
"text": "# Page Title\n\nThis is the content..."
}
],
"metadata": {
"message": "Scraping successful",
"success": true,
"contentSize": 8734
}
}
```
*Example Error Response:*
```json
{
"content": [
{
"type": "text",
"text": ""
}
],
"metadata": {
"message": "Error scraping webpage: Failed to load the URL",
"success": false
}
}
```
## How It Works
### AI-Driven Interaction
The system uses vision-capable AI models (configurable via `VISION_MODEL` and `API_BASE_URL`) to analyze screenshots of web pages and decide on actions like clicking, typing, or scrolling to bypass overlays and consent forms. This process repeats up to `maxInteractionAttempts`.
### Content Extraction
After interactions, Mozilla's Readability extracts the main content, which is then sanitized and converted to Markdown using Turndown with custom rules for code blocks and tables.
## Installation & Development (for Modifying the Code)
If you wish to contribute, modify the server, or run a local development version:
1. **Clone the Repository:**
```bash
git clone https://github.com/djannot/puppeteer-vision-mcp.git
cd puppeteer-vision-mcp
```
2. **Install Dependencies:**
```bash
npm install
```
3. **Build the Project:**
```bash
npm run build
```
4. **Set Up Environment:**
Create a `.env` file in the project's root directory with your `OPENAI_API_KEY` and any other desired configurations (see "Environment Configuration Details" above).
5. **Run for Development:**
```bash
npm start # Starts the server using the local build
```
Or, for automatic rebuilding on changes:
```bash
npm run dev
```
## Customization (for Developers)
You can modify the behavior of the scraper by editing:
- `src/ai/vision-analyzer.ts` (`analyzePageWithAI` function): Customize the AI prompt.
- `src/ai/page-interactions.ts` (`executeAction` function): Add new action types.
- `src/scrapers/webpage-scraper.ts` (`visitWebPage` function): Change Puppeteer options.
- `src/utils/markdown-formatters.ts`: Adjust Turndown rules for Markdown conversion.
## Dependencies
Key dependencies include:
- `@modelcontextprotocol/sdk`
- `puppeteer`, `puppeteer-extra`
- `@mozilla/readability`, `jsdom`
- `turndown`, `sanitize-html`
- `openai` (or compatible API for vision models)
- `express` (for SSE mode)
- `zod`
```
--------------------------------------------------------------------------------
/src/index.ts:
--------------------------------------------------------------------------------
```typescript
#!/usr/bin/env node
import dotenv from 'dotenv';
// Load environment variables before importing other modules
dotenv.config();
// Import the main server module
import { createMcpServer } from './server/mcp-server.js';
// Start the server
createMcpServer();
```
--------------------------------------------------------------------------------
/src/utils/html-helpers.ts:
--------------------------------------------------------------------------------
```typescript
/**
* Recursively marks parent elements of code blocks to ensure they're preserved
* @param node The DOM node to check and mark
*/
export function markCodeParents(node: Element | null) {
if (!node) return;
// If the node contains a <pre> or <code>, mark it
if (node.querySelector('pre, code')) {
node.classList.add('article-content');
node.setAttribute('data-readable-content-score', '100');
}
// Recursively mark parents
markCodeParents(node.parentElement);
}
```
--------------------------------------------------------------------------------
/tsconfig.json:
--------------------------------------------------------------------------------
```json
{
"$schema": "https://json.schemastore.org/tsconfig",
"compilerOptions": {
"declaration": true,
"declarationMap": true,
"esModuleInterop": true,
"incremental": false,
"isolatedModules": true,
"lib": ["ES2015", "DOM", "DOM.Iterable"],
"module": "NodeNext",
"moduleDetection": "force",
"moduleResolution": "NodeNext",
"noUncheckedIndexedAccess": true,
"resolveJsonModule": true,
"skipLibCheck": true,
"strict": true,
"target": "ES2022",
"outDir": "./build",
"rootDir": "./src"
},
"include": [
"src/**/*"
],
"exclude": [
"dist",
"build",
"node_modules"
]
}
```
--------------------------------------------------------------------------------
/src/server/mcp-server.ts:
--------------------------------------------------------------------------------
```typescript
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { registerTools } from './tools.js';
import { setupTransport } from './transports.js';
// Server information
const serverName = "web-scraper-mcp-server";
const serverVersion = "1.0.0";
/**
* Creates and initializes the MCP server
* @returns The initialized MCP server instance
*/
export function createMcpServer(): McpServer {
// Create the MCP server instance
const server = new McpServer({
name: serverName,
version: serverVersion,
capabilities: {},
});
// Register available tools
registerTools(server);
// Configure transport
setupTransport(server, serverName, serverVersion);
return server;
}
```
--------------------------------------------------------------------------------
/src/types/index.ts:
--------------------------------------------------------------------------------
```typescript
import { Page, ElementHandle } from 'puppeteer';
// AI action recommendation types
export interface AIAction {
action: 'click' | 'scroll' | 'type' | 'wait' | 'none';
targetText?: string;
targetSelector?: string;
inputText?: string;
scrollAmount?: number;
waitTime?: number;
reason?: string;
}
// Scraper options
export interface WebpageScrapeOptions {
url: string;
autoInteract?: boolean;
maxInteractionAttempts?: number;
waitForNetworkIdle?: boolean;
}
// Scraper result
export interface ScrapeResult {
data?: string;
error?: { message: string };
}
// MCP tool response - updated to match MCP SDK expectations
export interface ToolResponse {
content: {
type: "text";
text: string;
}[];
metadata?: {
message: string;
success: boolean;
contentSize?: number;
};
isError?: boolean;
}
```
--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------
```json
{
"name": "puppeteer-vision-mcp-server",
"version": "0.2.0",
"description": "MCP Server for scraping webpages and converting to markdown",
"main": "build/index.js",
"type": "module",
"bin": {
"puppeteer-vision-mcp-server": "./build/index.js"
},
"scripts": {
"build": "tsc && chmod 755 build/index.js",
"start": "node build/index.js",
"dev": "npm run build && npm start"
},
"files": [
"build"
],
"keywords": [
"mcp",
"puppeteer",
"web-scraper",
"markdown"
],
"author": "Denis Jannot",
"license": "Apache-2.0",
"dependencies": {
"@modelcontextprotocol/sdk": "^1.9.0",
"@mozilla/readability": "^0.5.0",
"dotenv": "^16.4.7",
"express": "^5.1.0",
"jsdom": "^24.0.0",
"openai": "^4.89.0",
"puppeteer": "^22.3.0",
"puppeteer-extra": "^3.3.6",
"puppeteer-extra-plugin-stealth": "^2.11.2",
"sanitize-html": "^2.13.0",
"turndown": "^7.1.2",
"zod": "^3.24.2"
},
"devDependencies": {
"@types/express": "^5.0.1",
"@types/jsdom": "^21.1.6",
"@types/node": "^22.14.0",
"@types/sanitize-html": "^2.9.5",
"@types/turndown": "^5.0.4",
"typescript": "^5.4.2"
}
}
```
--------------------------------------------------------------------------------
/src/config.ts:
--------------------------------------------------------------------------------
```typescript
import dotenv from 'dotenv';
// Load environment variables
dotenv.config();
// Configuration & Environment Check
export const config = {
apiKey: process.env.OPENAI_API_KEY,
visionModel: process.env.VISION_MODEL || 'gpt-4.1',
apiBaseUrl: process.env.API_BASE_URL,
serverPort: parseInt(process.env.PORT || '3001', 10),
useSSE: process.env.USE_SSE === 'true',
transportType: process.env.TRANSPORT_TYPE || (process.env.USE_SSE === 'true' ? 'sse' : 'stdio'), // 'stdio', 'sse', or 'http'
headless: process.env.DISABLE_HEADLESS !== 'true', // Default to headless mode unless explicitly disabled
};
// Validate essential configuration
if (!config.apiKey) {
console.error("Error: OPENAI_API_KEY environment variable is not set.");
process.exit(1);
}
// Validate transport type
if (!['stdio', 'sse', 'http'].includes(config.transportType)) {
console.error(`Error: Invalid TRANSPORT_TYPE "${config.transportType}". Must be 'stdio', 'sse', or 'http'.`);
process.exit(1);
}
// Configure API client settings
export const apiConfig: any = {
apiKey: config.apiKey,
};
// Add custom base URL if provided
if (config.apiBaseUrl) {
apiConfig.baseURL = config.apiBaseUrl;
console.log(`Using custom API endpoint: ${config.apiBaseUrl}`);
}
console.log(`Using vision model: ${config.visionModel}`);
console.log(`Transport type: ${config.transportType}`);
console.log(`Browser mode: ${config.headless ? 'headless' : 'visible'}`);
```
--------------------------------------------------------------------------------
/src/scrapers/content-processor.ts:
--------------------------------------------------------------------------------
```typescript
import { JSDOM } from 'jsdom';
import { Readability } from '@mozilla/readability';
import sanitizeHtml from 'sanitize-html';
import { configureTurndownService } from '../utils/markdown-formatters.js';
import { markCodeParents } from '../utils/html-helpers.js';
/**
* Processes HTML content to extract the main content and convert it to Markdown
* @param htmlContent The raw HTML content to process
* @returns Markdown formatted content
*/
export async function processHtmlContent(htmlContent: string): Promise<string> {
// Create DOM
const dom = new JSDOM(htmlContent);
const document = dom.window.document;
// Mark code blocks to influence Readability scoring
const preElements = document.querySelectorAll('pre');
preElements.forEach(pre => {
// Add classes that Readability considers as content
pre.classList.add('article-content');
// Set a very high readable score
pre.setAttribute('data-readable-content-score', '100');
// Make parent more likely to be kept
if (pre.parentElement) {
pre.parentElement.classList.add('article-content');
}
});
document.querySelectorAll('pre, code').forEach(pre => {
markCodeParents(pre.parentElement);
});
// Modify Readability options to be more lenient
const readerOptions = {
charThreshold: 20,
classesToPreserve: ['article-content'],
};
// Now run Readability
const reader = new Readability(document, readerOptions);
const article = reader.parse();
// Continue with sanitization and markdown conversion
if (!article) {
throw new Error('Failed to parse the article content.');
}
const cleanHtml = sanitizeHtml(article.content, {
allowedTags: [
'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'p', 'a', 'ul', 'ol',
'li', 'b', 'i', 'strong', 'em', 'code', 'pre',
'div', 'span', 'table', 'thead', 'tbody', 'tr', 'th', 'td'
],
allowedAttributes: {
'a': ['href'],
'pre': ['class', 'data-language'],
'code': ['class', 'data-language'],
'div': ['class'],
'span': ['class']
}
});
const turndownService = configureTurndownService();
const markdown = turndownService.turndown(cleanHtml);
return markdown;
}
```
--------------------------------------------------------------------------------
/src/ai/vision-analyzer.ts:
--------------------------------------------------------------------------------
```typescript
import { OpenAI } from 'openai';
import { apiConfig, config } from '../config.js';
import { AIAction } from '../types/index.js';
// Initialize the OpenAI client
const openai = new OpenAI(apiConfig);
/**
* Analyzes a screenshot of a webpage using AI vision to determine if interactions are needed
* @param base64Image Screenshot in base64 format
* @returns A recommended action to take on the page
*/
export async function analyzePageWithAI(base64Image: string): Promise<AIAction> {
const genericInteractionPrompt = `
You are an AI assistant helping to navigate a webpage. Analyze this screenshot and determine if there are any interactions needed to proceed with normal browsing.
Look for elements such as:
1. Cookie consent banners or popups (buttons like "Accept", "I agree", "Accept all", etc.)
2. CAPTCHA challenges
3. Login walls or paywalls
4. Newsletter or subscription prompts
5. Age verification prompts
6. Interstitial ads
7. "Continue reading" buttons
8. Any other interactive element blocking normal content viewing
If you identify any such element, respond with a JSON object specifying:
- "action": The action to take ("click", "type", "scroll", "wait", or "none")
- "targetText": The exact text of any button to click
- "targetSelector": (Optional) A CSS selector if the element has no visible text
- "inputText": Text to input if required
- "scrollAmount": Pixels to scroll if needed
- "waitTime": Time to wait in milliseconds if needed
- "reason": A brief explanation of what you identified and why this action is recommended
If no action is needed, respond with: {"action": "none", "reason": "No interaction needed"}
IMPORTANT: Your response must be valid JSON.
`;
try {
const response = await openai.chat.completions.create({
model: config.visionModel,
messages: [
{
role: "user",
content: [
{
type: "text",
text: genericInteractionPrompt
},
{
type: "image_url",
image_url: {
url: `data:image/jpeg;base64,${base64Image}`,
detail: "high"
}
}
]
}
],
response_format: { type: "json_object" },
max_tokens: 500
});
const content = response.choices[0]?.message.content || '{"action": "none", "reason": "Failed to get response"}';
console.log("AI analysis:", content);
return JSON.parse(content) as AIAction;
} catch (e) {
console.error('Failed to parse AI response:', e);
return {
action: 'none',
reason: 'Error parsing AI response'
};
}
}
```
--------------------------------------------------------------------------------
/src/utils/markdown-formatters.ts:
--------------------------------------------------------------------------------
```typescript
import TurndownService from 'turndown';
/**
* Configures a Turndown service with custom rules for code blocks, tables, etc.
* @returns A configured TurndownService instance
*/
export function configureTurndownService(): TurndownService {
const turndownService = new TurndownService({
codeBlockStyle: 'fenced',
headingStyle: 'atx'
});
// Code blocks rule
turndownService.addRule('codeBlocks', {
filter: function (node) {
return node.nodeName === 'PRE';
},
replacement: function (content, node) {
// Cast node to HTMLElement to access getAttribute
const htmlNode = node as HTMLElement;
const code = htmlNode.querySelector('code');
const language = code?.className?.match(/language-(\w+)/)?.[1] ||
htmlNode.getAttribute('data-language') ||
'yaml';
const cleanContent = content
.replace(/^\n+|\n+$/g, '')
.replace(/\n\n+/g, '\n');
return `\n\`\`\`${language}\n${cleanContent}\n\`\`\`\n`;
}
});
// Table cell rule
turndownService.addRule('tableCell', {
filter: ['th', 'td'],
replacement: function (content, node) {
const htmlNode = node as HTMLElement;
// Extract text from nested paragraph if present
let cellContent = '';
if (htmlNode.querySelector('p')) {
cellContent = Array.from(htmlNode.querySelectorAll('p'))
.map(p => p.textContent || '')
.join(' ')
.trim();
} else {
cellContent = content.trim();
}
return ` ${cellContent.replace(/\|/g, '\\|')} |`;
}
});
// Table row rule
turndownService.addRule('tableRow', {
filter: 'tr',
replacement: function (content, node) {
const htmlNode = node as HTMLTableRowElement;
const cells = Array.from(htmlNode.cells);
const isHeader = htmlNode.parentNode?.nodeName === 'THEAD';
let output = '|' + content.trimEnd();
// If this is a header row, add the separator row without extra newline
if (isHeader) {
const separator = cells.map(() => '---').join(' | ');
output += '\n|' + separator + '|';
}
// Only add newline if not a header row or if there's no next row
if (!isHeader || !htmlNode.nextElementSibling) {
output += '\n';
}
return output;
}
});
// Table rule
turndownService.addRule('table', {
filter: 'table',
replacement: function (content) {
// Clean up any potential double newlines
return '\n' + content.replace(/\n+/g, '\n').trim() + '\n';
}
});
// Custom rule to preserve whitespace in table cells
turndownService.addRule('preserveTableWhitespace', {
filter: function (node) {
return (
(node.nodeName === 'TD' || node.nodeName === 'TH') &&
node.textContent?.trim().length === 0
);
},
replacement: function () {
return ' |';
}
});
return turndownService;
}
```
--------------------------------------------------------------------------------
/src/scrapers/webpage-scraper.ts:
--------------------------------------------------------------------------------
```typescript
import puppeteerExtraImport from 'puppeteer-extra';
import StealthPluginImport from 'puppeteer-extra-plugin-stealth';
import fs from 'fs';
import { handlePageInteractions } from '../ai/page-interactions.js';
import { processHtmlContent } from './content-processor.js';
import { ScrapeResult, WebpageScrapeOptions } from '../types/index.js';
import { config } from '../config.js';
// Work around TypeScript issues with puppeteer-extra
const puppeteerExtra = puppeteerExtraImport as any;
const StealthPlugin = StealthPluginImport as any;
// Apply stealth plugin
puppeteerExtra.use(StealthPlugin());
/**
* Visits a webpage, handles interactions, and extracts content
* @param options Configuration options for the scraping operation
* @returns Markdown content or error message
*/
export async function visitWebPage({
url,
autoInteract = true,
maxInteractionAttempts = 3,
waitForNetworkIdle = true,
}: WebpageScrapeOptions): Promise<ScrapeResult> {
// Launch puppeteer with stealth plugin and respect headless configuration
const browser = await puppeteerExtra.launch({
headless: config.headless ? "new" : false, // Use config.headless setting
args: ['--no-sandbox', '--disable-setuid-sandbox'],
});
try {
console.log(`Visiting webpage: ${url}`);
const page = await browser.newPage();
// Set viewport to a standard desktop size
await page.setViewport({ width: 1280, height: 800 });
// Navigate to the URL
await page.goto(url, {
waitUntil: waitForNetworkIdle ? 'networkidle2' : 'domcontentloaded'
});
// Allow initial page load to complete
await new Promise(resolve => setTimeout(resolve, 2000));
// Handle page interactions if enabled
if (autoInteract) {
console.log("Checking for interactive elements that need handling...");
await handlePageInteractions(page, maxInteractionAttempts);
}
// Extract content after handling interactions
const htmlContent: string = await page.evaluate(() => {
// Try to select the main content area, fallback to the body if no specific selector
const main = document.querySelector('main') ||
document.querySelector('article') ||
document.querySelector('.content') ||
document.querySelector('#content') ||
document.body;
return main.innerHTML;
});
// Process the HTML content
const markdown = await processHtmlContent(htmlContent);
await browser.close();
console.log(`Successfully scraped and converted to markdown: ${url}`);
return { data: markdown };
}
catch(error) {
await browser.close();
if (error instanceof Error) {
console.error(`Error scraping ${url}:`, error.message);
return {
error: {
message: error.message,
},
};
} else {
console.error(`Unknown error scraping ${url}`);
return {
error: {
message: "An unknown error occurred",
},
};
}
}
}
```
--------------------------------------------------------------------------------
/src/server/tools.ts:
--------------------------------------------------------------------------------
```typescript
import { z } from 'zod';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { visitWebPage } from '../scrapers/webpage-scraper.js';
import { ScrapeResult } from '../types/index.js';
/**
* Registers MCP tools with the server
* @param server The MCP server instance
*/
export function registerTools(server: McpServer): void {
server.tool(
"scrape-webpage",
"Scrapes a webpage and converts it to markdown format",
{
url: z.string().url().describe("The URL of the webpage to scrape"),
autoInteract: z.boolean().optional().default(true).describe("Whether to automatically handle interactive elements like cookies, captchas, etc."),
maxInteractionAttempts: z.number().int().min(0).max(10).optional().default(3).describe("Maximum number of interaction attempts"),
waitForNetworkIdle: z.boolean().optional().default(true).describe("Whether to wait for network to be idle before processing")
},
async ({ url, autoInteract, maxInteractionAttempts, waitForNetworkIdle }, _extra) => {
console.log(`Received scrape request for URL: ${url}, autoInteract: ${autoInteract}, maxAttempts: ${maxInteractionAttempts}`);
try {
const result = await visitWebPage({
url,
autoInteract,
maxInteractionAttempts,
waitForNetworkIdle
});
if (result.error) {
return createErrorResponse(result.error.message);
}
// Limit the size of returned content if too large
const maxLength = 100000; // Set a reasonable limit
let markdownContent = result.data || "";
let message = "Scraping successful";
if (markdownContent.length > maxLength) {
markdownContent = markdownContent.substring(0, maxLength);
message = `Content truncated due to size (total size: ${markdownContent.length} characters)`;
}
console.log(`Scraping successful. Payload size: ${markdownContent.length} chars.`);
return createSuccessResponse(markdownContent, message);
} catch (error: any) {
console.error("Error processing 'scrape-webpage' tool:", error);
return createErrorResponse(`Error scraping webpage: ${error.message}`);
}
}
);
}
/**
* Creates a success response for the MCP tool
* @param text The markdown text content
* @param message An optional message to include
* @returns The formatted tool response
*/
function createSuccessResponse(text: string, message: string = "Scraping successful") {
return {
content: [{ type: "text" as const, text }],
_meta: {
message,
success: true,
contentSize: text.length
},
isError: false
};
}
/**
* Creates an error response for the MCP tool
* @param message The error message
* @returns The formatted tool response
*/
function createErrorResponse(message: string) {
return {
content: [{ type: "text" as const, text: "" }],
_meta: {
message: message,
success: false
},
isError: true
};
}
```
--------------------------------------------------------------------------------
/src/ai/page-interactions.ts:
--------------------------------------------------------------------------------
```typescript
import fs from 'fs';
import { Page } from 'puppeteer';
import { AIAction } from '../types/index.js';
import { analyzePageWithAI } from './vision-analyzer.js';
/**
* Executes an AI-recommended action on the page
* @param page Puppeteer page instance
* @param action The action to execute
* @returns Whether the action was successfully executed
*/
export async function executeAction(page: Page, action: AIAction): Promise<boolean> {
console.log(`Executing action: ${action.action}`, action);
switch (action.action) {
case 'click':
if (action.targetText) {
// Try to click by text content
await clickElementsByText(page, action.targetText);
return true;
} else if (action.targetSelector) {
// Try to click by CSS selector
try {
await page.waitForSelector(action.targetSelector, { timeout: 5000 });
await page.click(action.targetSelector);
console.log(`Clicked element with selector: ${action.targetSelector}`);
return true;
} catch (error) {
console.error(`Failed to click element with selector ${action.targetSelector}:`, error);
return false;
}
}
return false;
case 'type':
if (action.targetSelector && action.inputText) {
try {
await page.waitForSelector(action.targetSelector, { timeout: 5000 });
await page.type(action.targetSelector, action.inputText);
console.log(`Typed "${action.inputText}" into element with selector: ${action.targetSelector}`);
return true;
} catch (error) {
console.error(`Failed to type into element with selector ${action.targetSelector}:`, error);
return false;
}
}
return false;
case 'scroll':
if (action.scrollAmount) {
await page.evaluate((amount) => window.scrollBy(0, amount), action.scrollAmount);
console.log(`Scrolled by ${action.scrollAmount} pixels`);
return true;
}
return false;
case 'wait':
if (action.waitTime) {
await new Promise(resolve => setTimeout(resolve, action.waitTime));
console.log(`Waited for ${action.waitTime} milliseconds`);
return true;
}
return false;
default:
console.log("No action taken");
return false;
}
}
/**
* Clicks on elements that match the target text across all frames
* @param page Puppeteer page instance
* @param targetText The text to search for in clickable elements
*/
export async function clickElementsByText(page: Page, targetText: string): Promise<void> {
const frames = page.frames();
const searchText = targetText.toLowerCase();
// Process all frames in parallel using Promise.all
const results = await Promise.all(
frames.map(async (frame) => {
try {
// 1. Gather indexes of all matching elements
const elementIndexes = await frame.$$eval(
'a, button',
(elements, t) => {
const matches: number[] = [];
elements.forEach((el, idx) => {
if (el.textContent?.toLowerCase().includes(t)) {
matches.push(idx);
}
});
return matches;
},
searchText
);
if (elementIndexes.length === 0) {
return {
frame: frame.name() || frame.url(),
found: false,
count: 0
};
}
// 2. Click all matching elements
const allElements = await frame.$$('a, button');
await Promise.all(
elementIndexes.map(async (idx) => {
try {
if (allElements[idx]) {
await allElements[idx].click();
}
} catch (clickError) {
console.error(`Error clicking element at index ${idx}:`, clickError);
}
})
);
return {
frame: frame.name() || frame.url(),
found: true,
count: elementIndexes.length
};
} catch (error) {
console.error(`Error in frame "${frame.name() || frame.url()}"`, error);
return {
frame: frame.name() || frame.url(),
found: false,
count: 0,
error
};
}
})
);
// Aggregate and log results
const successfulFrames = results.filter(r => r.found);
const totalClicks = successfulFrames.reduce((sum, r) => sum + r.count, 0);
if (successfulFrames.length > 0) {
console.log(`Found and clicked ${totalClicks} element(s) with text "${targetText}" across ${successfulFrames.length} frame(s):`);
successfulFrames.forEach(r => {
console.log(`- Frame "${r.frame}": ${r.count} element(s)`);
});
} else {
console.log(`No elements with text "${targetText}" were found in any frame.`);
}
}
/**
* Handles interactions with the page using AI vision analysis
* @param page Puppeteer page instance
* @param maxAttempts Maximum number of interaction attempts
* @returns Whether any interactions were performed
*/
export async function handlePageInteractions(page: Page, maxAttempts: number = 3): Promise<boolean> {
let interactionFound = false;
let attempts = 0;
while (attempts < maxAttempts) {
console.log(`Interaction attempt ${attempts + 1}/${maxAttempts}`);
// Take screenshot of the current page state
const screenshot = await page.screenshot({ encoding: 'base64' }) as string;
// Save the screenshot for debugging (optional)
const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
const filename = `screenshot-interaction-${timestamp}.png`;
const buffer = Buffer.from(screenshot, 'base64');
await fs.promises.writeFile(filename, buffer);
console.log(`Saved screenshot to ${filename}`);
// Analyze the page using AI
const action = await analyzePageWithAI(screenshot);
// If no interaction needed, we're done
if (action.action === 'none') {
console.log("No interactions needed:", action.reason);
return interactionFound;
}
// Try to execute the recommended action
const actionSuccess = await executeAction(page, action);
if (actionSuccess) {
interactionFound = true;
console.log(`Successfully executed ${action.action} action: ${action.reason}`);
// Wait for any page changes to settle
await new Promise(resolve => setTimeout(resolve, 2000));
} else {
console.log(`Failed to execute ${action.action} action`);
}
attempts += 1;
}
return interactionFound;
}
```
--------------------------------------------------------------------------------
/src/server/transports.ts:
--------------------------------------------------------------------------------
```typescript
import express, { Request, Response } from 'express';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { SSEServerTransport } from '@modelcontextprotocol/sdk/server/sse.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { StreamableHTTPServerTransport } from '@modelcontextprotocol/sdk/server/streamableHttp.js';
import { randomUUID } from 'crypto';
import { config } from '../config.js';
/**
* Sets up the appropriate transport for the MCP server
* @param server The MCP server instance
* @param serverName Server name for logging
* @param serverVersion Server version for logging
*/
export function setupTransport(server: McpServer, serverName: string, serverVersion: string): void {
switch (config.transportType) {
case 'stdio':
setupStdioTransport(server, serverName, serverVersion);
break;
case 'sse':
setupSSETransport(server, serverName, serverVersion);
break;
case 'http':
setupHTTPTransport(server, serverName, serverVersion);
break;
default:
console.error(`Unknown transport type: ${config.transportType}. Use 'stdio', 'sse', or 'http'.`);
process.exit(1);
}
}
/**
* Sets up an SSE transport over HTTP
* @param server The MCP server instance
* @param serverName Server name for logging
* @param serverVersion Server version for logging
*/
function setupSSETransport(server: McpServer, serverName: string, serverVersion: string): void {
// Setup Express server for SSE mode
const app = express();
// to support multiple simultaneous connections we have a lookup object from
// sessionId to transport
const transports: {[sessionId: string]: SSEServerTransport} = {};
app.get("/sse", async (_: Request, res: Response) => {
console.error('Received SSE connection request');
const transport = new SSEServerTransport('/messages', res);
transports[transport.sessionId] = transport;
res.on("close", () => {
console.error(`SSE connection closed for session ${transport.sessionId}`);
delete transports[transport.sessionId];
});
await server.connect(transport);
});
app.post("/messages", async (req: Request, res: Response) => {
console.error('Received SSE message POST request');
const sessionId = req.query.sessionId as string;
const transport = transports[sessionId];
if (transport) {
await transport.handlePostMessage(req, res);
} else {
console.error(`No SSE transport found for sessionId: ${sessionId}`);
res.status(400).send('No transport found for sessionId');
}
});
const webserver = app.listen(config.serverPort, () => {
console.error(`${serverName} v${serverVersion} is running on port ${config.serverPort} with SSE transport`);
console.error(`Connect to: http://localhost:${config.serverPort}/sse`);
if (config.apiBaseUrl) {
console.error(`Using custom API endpoint: ${config.apiBaseUrl}`);
}
});
webserver.keepAliveTimeout = 3000;
// Keep the process alive
webserver.on('error', (error) => {
console.error('HTTP server error:', error);
});
// Handle server shutdown
process.on('SIGINT', async () => {
console.error('Shutting down SSE server...');
// Close all active SSE transports
for (const [sessionId, transport] of Object.entries(transports)) {
try {
console.error(`Closing SSE transport for session ${sessionId}`);
// SSE transports typically don't have a close method, cleanup happens via res.on("close")
delete transports[sessionId];
} catch (error) {
console.error(`Error cleaning up SSE transport for session ${sessionId}:`, error);
}
}
console.error('SSE server shutdown complete');
process.exit(0);
});
// Prevent the process from exiting
process.stdin.resume();
}
/**
* Sets up an HTTP transport for web-based communication
* @param server The MCP server instance
* @param serverName Server name for logging
* @param serverVersion Server version for logging
*/
function setupHTTPTransport(server: McpServer, serverName: string, serverVersion: string): void {
console.error("Starting MCP server with HTTP transport...");
const app = express();
const transports: Map<string, StreamableHTTPServerTransport> = new Map<string, StreamableHTTPServerTransport>();
// Handle POST requests for MCP initialization and method calls
app.post('/mcp', async (req: Request, res: Response) => {
console.error('Received MCP POST request');
try {
// Check for existing session ID
const sessionId = req.headers['mcp-session-id'] as string | undefined;
let transport: StreamableHTTPServerTransport;
if (sessionId && transports.has(sessionId)) {
// Reuse existing transport
transport = transports.get(sessionId)!;
} else if (!sessionId) {
// New initialization request
transport = new StreamableHTTPServerTransport({
sessionIdGenerator: () => randomUUID(),
onsessioninitialized: (sessionId: string) => {
// Store the transport by session ID when session is initialized
console.error(`Session initialized with ID: ${sessionId}`);
transports.set(sessionId, transport);
}
});
// Set up onclose handler to clean up transport when closed
transport.onclose = async () => {
const sid = transport.sessionId;
if (sid && transports.has(sid)) {
console.error(`Transport closed for session ${sid}, removing from transports map`);
transports.delete(sid);
}
};
// Connect the transport to the MCP server BEFORE handling the request
await server.connect(transport);
await transport.handleRequest(req, res);
return; // Already handled
} else {
// Invalid request - no session ID or not initialization request
res.status(400).json({
jsonrpc: '2.0',
error: {
code: -32000,
message: 'Bad Request: No valid session ID provided',
},
id: req?.body?.id,
});
return;
}
// Handle the request with existing transport
await transport.handleRequest(req, res);
} catch (error) {
console.error('Error handling MCP request:', error);
if (!res.headersSent) {
res.status(500).json({
jsonrpc: '2.0',
error: {
code: -32603,
message: 'Internal server error',
},
id: req?.body?.id,
});
}
}
});
// Handle GET requests for SSE streams
app.get('/mcp', async (req: Request, res: Response) => {
console.error('Received MCP GET request');
const sessionId = req.headers['mcp-session-id'] as string | undefined;
if (!sessionId || !transports.has(sessionId)) {
res.status(400).json({
jsonrpc: '2.0',
error: {
code: -32000,
message: 'Bad Request: No valid session ID provided',
},
id: req?.body?.id,
});
return;
}
// Check for Last-Event-ID header for resumability
const lastEventId = req.headers['last-event-id'] as string | undefined;
if (lastEventId) {
console.error(`Client reconnecting with Last-Event-ID: ${lastEventId}`);
} else {
console.error(`Establishing new SSE stream for session ${sessionId}`);
}
const transport = transports.get(sessionId);
await transport!.handleRequest(req, res);
});
// Handle DELETE requests for session termination
app.delete('/mcp', async (req: Request, res: Response) => {
const sessionId = req.headers['mcp-session-id'] as string | undefined;
if (!sessionId || !transports.has(sessionId)) {
res.status(400).json({
jsonrpc: '2.0',
error: {
code: -32000,
message: 'Bad Request: No valid session ID provided',
},
id: req?.body?.id,
});
return;
}
console.error(`Received session termination request for session ${sessionId}`);
try {
const transport = transports.get(sessionId);
await transport!.handleRequest(req, res);
} catch (error) {
console.error('Error handling session termination:', error);
if (!res.headersSent) {
res.status(500).json({
jsonrpc: '2.0',
error: {
code: -32603,
message: 'Error handling session termination',
},
id: req?.body?.id,
});
}
}
});
const webserver = app.listen(config.serverPort, () => {
console.error(`${serverName} v${serverVersion} is running on port ${config.serverPort} with HTTP transport`);
console.error(`Connect to: http://localhost:${config.serverPort}/mcp`);
if (config.apiBaseUrl) {
console.error(`Using custom API endpoint: ${config.apiBaseUrl}`);
}
});
webserver.keepAliveTimeout = 3000;
// Keep the process alive
webserver.on('error', (error) => {
console.error('HTTP server error:', error);
});
// Handle server shutdown
process.on('SIGINT', async () => {
console.error('Shutting down server...');
// Close all active transports to properly clean up resources
for (const [sessionId, transport] of transports) {
try {
console.error(`Closing transport for session ${sessionId}`);
await transport.close();
transports.delete(sessionId);
} catch (error) {
console.error(`Error closing transport for session ${sessionId}:`, error);
}
}
console.error('Server shutdown complete');
process.exit(0);
});
// Prevent the process from exiting
process.stdin.resume();
}
/**
* Sets up a stdio transport for command-line usage
* @param server The MCP server instance
* @param serverName Server name for logging
* @param serverVersion Server version for logging
*/
function setupStdioTransport(server: McpServer, serverName: string, serverVersion: string): void {
// Use Stdio transport (default mode)
const stdioTransport = new StdioServerTransport();
console.log(`${serverName} v${serverVersion} starting in stdio mode`);
if (config.apiBaseUrl) {
console.log(`Using custom API endpoint: ${config.apiBaseUrl}`);
}
// Connect the transport to the server
server.connect(stdioTransport).catch((error) => {
console.error("Error connecting transport:", error);
process.exit(1);
});
}
```