jwaldor/mcp-scrape-copilot # codebase.md

This is page 1 of 2. Use http://codebase.md/jwaldor/mcp-scrape-copilot?lines=true&page={x} to view the full context.

# Directory Structure

```
├── .gitignore
├── Dockerfile
├── index.ts
├── llm.txt
├── package-lock.json
├── package.json
├── README.md
├── tsconfig.json
├── types.ts
└── utilities.ts
```

# Files

--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------

```
1 | node_modules
2 | dist
```

--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------

```markdown
  1 | # Puppeteer
  2 | 
  3 | A Model Context Protocol server that provides browser automation capabilities using Puppeteer. This server enables LLMs to interact with web pages, take screenshots, and execute JavaScript in a real browser environment.
  4 | 
  5 | ## Components
  6 | 
  7 | ### Tools
  8 | 
  9 | - **puppeteer_navigate**
 10 |   - Navigate to any URL in the browser
 11 |   - Input: `url` (string)
 12 | 
 13 | - **puppeteer_screenshot**
 14 |   - Capture screenshots of the entire page or specific elements
 15 |   - Inputs:
 16 |     - `name` (string, required): Name for the screenshot
 17 |     - `selector` (string, optional): CSS selector for element to screenshot
 18 |     - `width` (number, optional, default: 800): Screenshot width
 19 |     - `height` (number, optional, default: 600): Screenshot height
 20 | 
 21 | - **puppeteer_click**
 22 |   - Click elements on the page
 23 |   - Input: `selector` (string): CSS selector for element to click
 24 | 
 25 | - **puppeteer_hover**
 26 |   - Hover elements on the page
 27 |   - Input: `selector` (string): CSS selector for element to hover
 28 | 
 29 | - **puppeteer_fill**
 30 |   - Fill out input fields
 31 |   - Inputs:
 32 |     - `selector` (string): CSS selector for input field
 33 |     - `value` (string): Value to fill
 34 | 
 35 | - **puppeteer_select**
 36 |   - Select an element with SELECT tag
 37 |   - Inputs:
 38 |     - `selector` (string): CSS selector for element to select
 39 |     - `value` (string): Value to select
 40 | 
 41 | - **puppeteer_evaluate**
 42 |   - Execute JavaScript in the browser console
 43 |   - Input: `script` (string): JavaScript code to execute
 44 | 
 45 | ### Resources
 46 | 
 47 | The server provides access to two types of resources:
 48 | 
 49 | 1. **Console Logs** (`console://logs`)
 50 |    - Browser console output in text format
 51 |    - Includes all console messages from the browser
 52 | 
 53 | 2. **Screenshots** (`screenshot://<name>`)
 54 |    - PNG images of captured screenshots
 55 |    - Accessible via the screenshot name specified during capture
 56 | 
 57 | ## Key Features
 58 | 
 59 | - Browser automation
 60 | - Console log monitoring
 61 | - Screenshot capabilities
 62 | - JavaScript execution
 63 | - Basic web interaction (navigation, clicking, form filling)
 64 | 
 65 | ## Configuration to use Puppeteer Server
 66 | Here's the Claude Desktop configuration to use the Puppeter server:
 67 | 
 68 | ### Docker
 69 | 
 70 | **NOTE** The docker implementation will use headless chromium, where as the NPX version will open a browser window.
 71 | 
 72 | ```json
 73 | {
 74 |   "mcpServers": {
 75 |     "puppeteer": {
 76 |       "command": "docker",
 77 |       "args": ["run", "-i", "--rm", "--init", "-e", "DOCKER_CONTAINER=true", "mcp/puppeteer"]
 78 |     }
 79 |   }
 80 | }
 81 | ```
 82 | 
 83 | ### NPX
 84 | 
 85 | ```json
 86 | {
 87 |   "mcpServers": {
 88 |     "puppeteer": {
 89 |       "command": "npx",
 90 |       "args": ["-y", "@modelcontextprotocol/server-puppeteer"]
 91 |     }
 92 |   }
 93 | }
 94 | ```
 95 | 
 96 | ## Build
 97 | 
 98 | Docker build:
 99 | 
100 | ```bash
101 | docker build -t mcp/puppeteer -f src/puppeteer/Dockerfile .
102 | ```
103 | 
104 | ## License
105 | 
106 | This MCP server is licensed under the MIT License. This means you are free to use, modify, and distribute the software, subject to the terms and conditions of the MIT License. For more details, please see the LICENSE file in the project repository.
107 | 
```

--------------------------------------------------------------------------------
/types.ts:
--------------------------------------------------------------------------------

```typescript
1 | export interface RequestRecord {
2 |   url: string;
3 |   method: string;
4 |   headers: Record<string, string>;
5 |   resourceType: string;
6 |   postData: any;
7 |   embedding: number[];
8 | }
9 | 
```

--------------------------------------------------------------------------------
/tsconfig.json:
--------------------------------------------------------------------------------

```json
 1 | {
 2 |   "compilerOptions": {
 3 |     "outDir": "./dist",
 4 |     "rootDir": ".",
 5 |     "module": "ES2020",
 6 |     "moduleResolution": "node",
 7 |     "target": "ES2015"
 8 |   },
 9 |   "include": [
10 |     "./**/*.ts"
11 |   ]
12 | }
13 | 
```

--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------

```json
 1 | {
 2 |   "name": "@modelcontextprotocol/server-puppeteer",
 3 |   "version": "0.6.2",
 4 |   "description": "MCP server for browser automation using Puppeteer",
 5 |   "license": "MIT",
 6 |   "author": "Anthropic, PBC (https://anthropic.com)",
 7 |   "homepage": "https://modelcontextprotocol.io",
 8 |   "bugs": "https://github.com/modelcontextprotocol/servers/issues",
 9 |   "type": "module",
10 |   "bin": {
11 |     "mcp-server-puppeteer": "dist/index.js"
12 |   },
13 |   "files": [
14 |     "dist"
15 |   ],
16 |   "scripts": {
17 |     "build": "tsc && shx chmod +x dist/*.js",
18 |     "prepare": "npm run build",
19 |     "watch": "tsc --watch"
20 |   },
21 |   "dependencies": {
22 |     "@modelcontextprotocol/sdk": "1.0.1",
23 |     "@xenova/transformers": "^2.17.2",
24 |     "puppeteer": "^23.4.0"
25 |   },
26 |   "devDependencies": {
27 |     "shx": "^0.3.4",
28 |     "typescript": "^5.7.2"
29 |   }
30 | }
31 | 
```

--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------

```dockerfile
 1 | FROM node:22-bookworm-slim
 2 | 
 3 | ENV DEBIAN_FRONTEND noninteractive
 4 | 
 5 | # for arm64 support we need to install chromium provided by debian
 6 | # npm ERR! The chromium binary is not available for arm64.
 7 | # https://github.com/puppeteer/puppeteer/issues/7740
 8 | 
 9 | ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
10 | ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium
11 | 
12 | RUN apt-get update && \
13 |     apt-get install -y wget gnupg && \
14 |     apt-get install -y fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf libxss1 \
15 |         libgtk2.0-0 libnss3 libatk-bridge2.0-0 libdrm2 libxkbcommon0 libgbm1 libasound2 && \
16 |     apt-get install -y chromium && \
17 |     apt-get clean
18 | 
19 | COPY src/puppeteer /project
20 | COPY tsconfig.json /tsconfig.json
21 | 
22 | WORKDIR /project
23 | 
24 | RUN npm install
25 | 
26 | ENTRYPOINT ["node", "dist/index.js"]
```

--------------------------------------------------------------------------------
/utilities.ts:
--------------------------------------------------------------------------------

```typescript
  1 | import { RequestRecord } from "./types.js";
  2 | import { FeatureExtractionPipeline } from "@xenova/transformers";
  3 | 
  4 | export async function makeRequest(
  5 |   url: string,
  6 |   type: string,
  7 |   headers: Record<string, string>,
  8 |   body: any
  9 | ) {
 10 |   try {
 11 |     const response = await fetch(url, {
 12 |       method: type,
 13 |       headers,
 14 |       body:
 15 |         body && (type === "POST" || type === "PUT")
 16 |           ? JSON.stringify(body)
 17 |           : undefined,
 18 |     });
 19 | 
 20 |     if (!response.ok) {
 21 |       throw new Error(`HTTP error! status: ${response.status}`);
 22 |     }
 23 | 
 24 |     return {
 25 |       status: response.status,
 26 |       data: await response.text(),
 27 |       headers: Object.fromEntries(response.headers),
 28 |     };
 29 |   } catch (error) {
 30 |     console.error("Error making request:", error);
 31 |     throw error;
 32 |   }
 33 | }
 34 | 
 35 | export async function initializeModelSentTransformer() {
 36 |   try {
 37 |     const model = await import("@xenova/transformers");
 38 |     // Store original console.log
 39 |     const originalLog = console.log;
 40 |     // Redirect console.log to console.error
 41 |     console.log = console.error;
 42 |     const pipeline = await model.pipeline("feature-extraction", undefined, {
 43 |       progress_callback: async (message: {
 44 |         status: string;
 45 |         message: string;
 46 |       }) => {
 47 |         console.error("transformer.log", `${JSON.stringify(message)}\n`, {
 48 |           append: true,
 49 |         });
 50 |       },
 51 |     });
 52 |     console.log = originalLog;
 53 | 
 54 |     return pipeline;
 55 |   } catch (error) {
 56 |     console.error("Error initializing model:", error);
 57 |   }
 58 | }
 59 | export async function getEmbeddingSentTransformer(
 60 |   text: string,
 61 |   pipeline: FeatureExtractionPipeline
 62 | ): Promise<number[]> {
 63 |   const embedding = await pipeline(text);
 64 |   return Array.from(embedding.data);
 65 | }
 66 | 
 67 | // async function testGetEmbedding() {
 68 | //   console.log("Loading model...");
 69 | //   await tf.setBackend("cpu");
 70 | //   await tf.ready();
 71 | //   const model = await use.load();
 72 | //   console.log("Model loaded");
 73 | //   const embedding = await getEmbedding("Hello, world!", model);
 74 | //   console.log(embedding);
 75 | // }
 76 | 
 77 | // testGetEmbedding();
 78 | 
 79 | export async function semanticSearchRequestsSentTransformer(
 80 |   query: string,
 81 |   requests: Array<RequestRecord>,
 82 |   pipeline: FeatureExtractionPipeline
 83 | ): Promise<Array<RequestRecord & { similarity: number }>> {
 84 |   // Get embedding for the query
 85 |   const queryEmbedding = await getEmbeddingSentTransformer(query, pipeline);
 86 | 
 87 |   // Calculate cosine similarity scores for all requests
 88 |   const scoredRequests = requests.map((request) => {
 89 |     // Compute cosine similarity between query and request embeddings
 90 |     const similarity = cosineSimilarity(queryEmbedding, request.embedding);
 91 |     return { ...request, similarity };
 92 |   });
 93 | 
 94 |   // Sort by similarity score (highest first) and take top 10
 95 |   return scoredRequests
 96 |     .sort((a, b) => b.similarity - a.similarity)
 97 |     .slice(0, 10);
 98 | }
 99 | 
100 | // Helper function to compute cosine similarity between two vectors
101 | function cosineSimilarity(a: number[], b: number[]): number {
102 |   const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
103 |   const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
104 |   const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
105 |   return dotProduct / (magnitudeA * magnitudeB);
106 | }
107 | 
```

--------------------------------------------------------------------------------
/index.ts:
--------------------------------------------------------------------------------

```typescript
  1 | #!/usr/bin/env node
  2 | 
  3 | import { Server } from "@modelcontextprotocol/sdk/server/index.js";
  4 | import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
  5 | import {
  6 |   CallToolRequestSchema,
  7 |   ListToolsRequestSchema,
  8 |   CallToolResult,
  9 |   Tool,
 10 | } from "@modelcontextprotocol/sdk/types.js";
 11 | import puppeteer, { Browser, Page } from "puppeteer";
 12 | import {
 13 |   getEmbeddingSentTransformer,
 14 |   initializeModelSentTransformer,
 15 |   makeRequest,
 16 |   semanticSearchRequestsSentTransformer,
 17 | } from "./utilities.js";
 18 | 
 19 | import { RequestRecord } from "./types.js";
 20 | import { FeatureExtractionPipeline } from "@xenova/transformers";
 21 | // Define the tools once to avoid repetition
 22 | const TOOLS: Tool[] = [
 23 |   {
 24 |     name: "puppeteer_navigate",
 25 |     description: "Navigate to a URL",
 26 |     inputSchema: {
 27 |       type: "object",
 28 |       properties: {
 29 |         url: { type: "string" },
 30 |       },
 31 |       required: ["url"],
 32 |     },
 33 |   },
 34 |   {
 35 |     name: "puppeteer_page_history",
 36 |     description: "Get the history of visited URLs, most recent urls first",
 37 |     inputSchema: {
 38 |       type: "object",
 39 |       properties: {},
 40 |       required: [],
 41 |     },
 42 |   },
 43 |   {
 44 |     name: "make_http_request",
 45 |     description: "Make an HTTP request with curl",
 46 |     inputSchema: {
 47 |       type: "object",
 48 |       properties: {
 49 |         type: {
 50 |           type: "string",
 51 |           description: "Type of the request. GET, POST, PUT, DELETE",
 52 |         },
 53 |         url: {
 54 |           type: "string",
 55 |           description: "Url to make the request to",
 56 |         },
 57 |         headers: {
 58 |           type: "object",
 59 |           description: "Headers to include in the request",
 60 |         },
 61 |         body: {
 62 |           type: "object",
 63 |           description: "Body to include in the request",
 64 |         },
 65 |       },
 66 |       required: ["type", "url", "headers", "body"],
 67 |     },
 68 |   },
 69 |   {
 70 |     name: "semantic_search_requests",
 71 |     description:
 72 |       "Semantically search for requests that occurred within a page URL. Returns the top 10 results.",
 73 |     inputSchema: {
 74 |       type: "object",
 75 |       properties: {
 76 |         query: {
 77 |           type: "string",
 78 |           description:
 79 |             "Your search request. Make this specific and detailed to get the best results",
 80 |         },
 81 |         page_url: {
 82 |           type: "string",
 83 |           description: "The page within which to search for requests",
 84 |         },
 85 |       },
 86 |       required: ["query", "page_url"],
 87 |     },
 88 |   },
 89 | ];
 90 | 
 91 | // Global state
 92 | let browser: Browser | undefined;
 93 | let page: Page | undefined;
 94 | const consoleLogs: string[] = [];
 95 | const requests: Map<string, RequestRecord[]> = new Map(); // collects all results
 96 | const urlHistory: Array<string> = [];
 97 | 
 98 | let pipeline: FeatureExtractionPipeline | undefined;
 99 | 
100 | initializeModelSentTransformer().then((sent_pipeline) => {
101 |   console.error("model loaded");
102 |   console.error("model", sent_pipeline);
103 |   pipeline = sent_pipeline;
104 | });
105 | 
106 | async function ensureBrowser() {
107 |   if (!browser) {
108 |     const npx_args = { headless: false };
109 |     const docker_args = {
110 |       headless: true,
111 |       args: ["--no-sandbox", "--single-process", "--no-zygote"],
112 |     };
113 |     browser = await puppeteer.launch(
114 |       process.env.DOCKER_CONTAINER ? docker_args : npx_args
115 |     );
116 |     const pages = await browser.pages();
117 |     page = pages[0];
118 |     page.setRequestInterception(true);
119 | 
120 |     page.on("console", (msg) => {
121 |       const logEntry = `[${msg.type()}] ${msg.text()}`;
122 |       consoleLogs.push(logEntry);
123 |       server.notification({
124 |         method: "notifications/resources/updated",
125 |         params: { uri: "console://logs" },
126 |       });
127 |     });
128 | 
129 |     page.on("request", async (request) => {
130 |       if (!pipeline) {
131 |         console.error(
132 |           "Request made before model was loaded.",
133 |           request.url(),
134 |           page.url()
135 |         );
136 |         request.continue();
137 |         return;
138 |       }
139 |       if (requests.has(page.url())) {
140 |         requests.get(page.url()).unshift({
141 |           url: request.url(),
142 |           resourceType: request.resourceType(),
143 |           method: request.method(),
144 |           headers: request.headers(),
145 |           postData: request.postData(),
146 |           embedding: await getEmbeddingSentTransformer(
147 |             request.url() +
148 |               request.method() +
149 |               JSON.stringify(request.headers()) +
150 |               JSON.stringify(request.postData()),
151 |             pipeline
152 |           ),
153 |         });
154 |       } else {
155 |         requests.set(page.url(), [
156 |           {
157 |             url: request.url(),
158 |             resourceType: request.resourceType(),
159 |             method: request.method(),
160 |             headers: request.headers(),
161 |             postData: request.postData(),
162 |             embedding: await getEmbeddingSentTransformer(
163 |               request.url() +
164 |                 request.method() +
165 |                 JSON.stringify(request.headers()) +
166 |                 JSON.stringify(request.postData()),
167 |               pipeline
168 |             ),
169 |           },
170 |         ]);
171 |       }
172 |       request.continue();
173 |     });
174 |   }
175 |   return page!;
176 | }
177 | 
178 | declare global {
179 |   interface Window {
180 |     mcpHelper: {
181 |       logs: string[];
182 |       originalConsole: Partial<typeof console>;
183 |     };
184 |   }
185 | }
186 | 
187 | async function handleToolCall(
188 |   name: string,
189 |   args: any
190 | ): Promise<CallToolResult> {
191 |   const page = await ensureBrowser();
192 |   switch (name) {
193 |     case "puppeteer_navigate":
194 |       await page.goto(args.url);
195 |       return {
196 |         content: [
197 |           {
198 |             type: "text",
199 |             text: `Navigated to ${args.url}`,
200 |           },
201 |         ],
202 |         isError: false,
203 |       };
204 | 
205 |     case "page_history":
206 |       return {
207 |         content: [
208 |           {
209 |             type: "text",
210 |             text: urlHistory.reverse().join("\n"),
211 |           },
212 |         ],
213 |         isError: false,
214 |       };
215 | 
216 |     case "make_http_request": {
217 |       const response = await makeRequest(
218 |         args.url,
219 |         args.type,
220 |         args.headers,
221 |         args.body
222 |       );
223 |       return {
224 |         content: [{ type: "text", text: JSON.stringify(response, null, 2) }],
225 |         isError: false,
226 |       };
227 |     }
228 | 
229 |     case "semantic_search_requests": {
230 |       if (!pipeline) {
231 |         return {
232 |           content: [{ type: "text", text: "Model not defined" }],
233 |           isError: true,
234 |         };
235 |       }
236 |       const searchResults = await semanticSearchRequestsSentTransformer(
237 |         args.query,
238 |         requests.get(args.page_url),
239 |         pipeline
240 |       );
241 |       const withoutEmbedding = searchResults.map(
242 |         ({ embedding, similarity, ...rest }) => rest
243 |       );
244 |       return {
245 |         content: [
246 |           { type: "text", text: JSON.stringify(withoutEmbedding, null, 2) },
247 |         ],
248 |         isError: false,
249 |       };
250 |     }
251 | 
252 |     default:
253 |       return {
254 |         content: [
255 |           {
256 |             type: "text",
257 |             text: `Unknown tool: ${name}`,
258 |           },
259 |         ],
260 |         isError: true,
261 |       };
262 |   }
263 | }
264 | 
265 | const server = new Server(
266 |   {
267 |     name: "mcp-scrape-copilot",
268 |     version: "0.1.0",
269 |   },
270 |   {
271 |     capabilities: {
272 |       resources: {},
273 |       tools: {},
274 |     },
275 |   }
276 | );
277 | 
278 | server.setRequestHandler(ListToolsRequestSchema, async () => ({
279 |   tools: TOOLS,
280 | }));
281 | 
282 | server.setRequestHandler(CallToolRequestSchema, async (request) =>
283 |   handleToolCall(request.params.name, request.params.arguments ?? {})
284 | );
285 | 
286 | async function runServer() {
287 |   const transport = new StdioServerTransport();
288 |   await server.connect(transport);
289 | }
290 | 
291 | runServer().catch(console.error);
292 | 
293 | process.stdin.on("close", () => {
294 |   console.error("Puppeteer MCP Server closed");
295 |   server.close();
296 | });
297 | 
```