# Directory Structure ``` ├── .gitignore ├── cline_mcp_settings.json.example ├── CONTRIBUTING.md ├── examples │ ├── basic-worker-example.js │ ├── debugging-tools │ │ └── debug-test.js │ ├── minimal-worker-example.js │ └── testing │ └── content-test.js ├── experiments │ ├── basic-rest-api │ │ └── index.ts │ ├── content-extraction │ │ └── index.ts │ └── puppeteer-binding │ └── index.ts ├── LICENSE ├── package-lock.json ├── package.json ├── puppeteer-worker.js ├── README.md ├── src │ ├── browser-client.ts │ ├── content-processor.ts │ ├── index.ts │ └── server.ts ├── test-puppeteer.js ├── tsconfig.json └── wrangler.toml ``` # Files -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- ``` 1 | # Node.js dependencies 2 | node_modules/ 3 | 4 | # Build output 5 | dist/ 6 | 7 | # Environment variables 8 | .env 9 | .env.local 10 | .env.development.local 11 | .env.test.local 12 | .env.production.local 13 | 14 | # Logs 15 | logs 16 | *.log 17 | npm-debug.log* 18 | yarn-debug.log* 19 | yarn-error.log* 20 | 21 | # Editor directories and files 22 | .idea/ 23 | .vscode/ 24 | *.swp 25 | *.swo 26 | .DS_Store 27 | ``` -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- ```markdown 1 | # Cloudflare Browser Rendering Experiments & MCP Server 2 | 3 | This project demonstrates how to use Cloudflare Browser Rendering to extract web content for LLM context. It includes experiments with the REST API and Workers Binding API, as well as an MCP server implementation that can be used to provide web context to LLMs. 4 | 5 | <a href="https://glama.ai/mcp/servers/wg9fikq571"> 6 | <img width="380" height="200" src="https://glama.ai/mcp/servers/wg9fikq571/badge" alt="Web Content Server MCP server" /> 7 | </a> 8 | 9 | ## Project Structure 10 | 11 | ``` 12 | cloudflare-browser-rendering/ 13 | ├── examples/ # Example implementations and utilities 14 | │ ├── basic-worker-example.js # Basic Worker with Browser Rendering 15 | │ ├── minimal-worker-example.js # Minimal implementation 16 | │ ├── debugging-tools/ # Tools for debugging 17 | │ │ └── debug-test.js # Debug test utility 18 | │ └── testing/ # Testing utilities 19 | │ └── content-test.js # Content testing utility 20 | ├── experiments/ # Educational experiments 21 | │ ├── basic-rest-api/ # REST API tests 22 | │ ├── puppeteer-binding/ # Workers Binding API tests 23 | │ └── content-extraction/ # Content processing tests 24 | ├── src/ # MCP server source code 25 | │ ├── index.ts # Main entry point 26 | │ ├── server.ts # MCP server implementation 27 | │ ├── browser-client.ts # Browser Rendering client 28 | │ └── content-processor.ts # Content processing utilities 29 | ├── puppeteer-worker.js # Cloudflare Worker with Browser Rendering binding 30 | ├── test-puppeteer.js # Tests for the main implementation 31 | ├── wrangler.toml # Wrangler configuration for the Worker 32 | ├── cline_mcp_settings.json.example # Example MCP settings for Cline 33 | ├── .gitignore # Git ignore file 34 | └── LICENSE # MIT License 35 | ``` 36 | 37 | ## Prerequisites 38 | 39 | - Node.js (v16 or later) 40 | - A Cloudflare account with Browser Rendering enabled 41 | - TypeScript 42 | - Wrangler CLI (for deploying the Worker) 43 | 44 | ## Installation 45 | 46 | 1. Clone the repository: 47 | 48 | ```bash 49 | git clone https://github.com/yourusername/cloudflare-browser-rendering.git 50 | cd cloudflare-browser-rendering 51 | ``` 52 | 53 | 2. Install dependencies: 54 | 55 | ```bash 56 | npm install 57 | ``` 58 | 59 | ## Cloudflare Worker Setup 60 | 61 | 1. Install the Cloudflare Puppeteer package: 62 | 63 | ```bash 64 | npm install @cloudflare/puppeteer 65 | ``` 66 | 67 | 2. Configure Wrangler: 68 | 69 | ```toml 70 | # wrangler.toml 71 | name = "browser-rendering-api" 72 | main = "puppeteer-worker.js" 73 | compatibility_date = "2023-10-30" 74 | compatibility_flags = ["nodejs_compat"] 75 | 76 | [browser] 77 | binding = "browser" 78 | ``` 79 | 80 | 3. Deploy the Worker: 81 | 82 | ```bash 83 | npx wrangler deploy 84 | ``` 85 | 86 | 4. Test the Worker: 87 | 88 | ```bash 89 | node test-puppeteer.js 90 | ``` 91 | 92 | ## Running the Experiments 93 | 94 | ### Basic REST API Experiment 95 | 96 | This experiment demonstrates how to use the Cloudflare Browser Rendering REST API to fetch and process web content: 97 | 98 | ```bash 99 | npm run experiment:rest 100 | ``` 101 | 102 | ### Puppeteer Binding API Experiment 103 | 104 | This experiment demonstrates how to use the Cloudflare Browser Rendering Workers Binding API with Puppeteer for more advanced browser automation: 105 | 106 | ```bash 107 | npm run experiment:puppeteer 108 | ``` 109 | 110 | ### Content Extraction Experiment 111 | 112 | This experiment demonstrates how to extract and process web content specifically for use as context in LLMs: 113 | 114 | ```bash 115 | npm run experiment:content 116 | ``` 117 | 118 | ## MCP Server 119 | 120 | The MCP server provides tools for fetching and processing web content using Cloudflare Browser Rendering for use as context in LLMs. 121 | 122 | ### Building the MCP Server 123 | 124 | ```bash 125 | npm run build 126 | ``` 127 | 128 | ### Running the MCP Server 129 | 130 | ```bash 131 | npm start 132 | ``` 133 | 134 | Or, for development: 135 | 136 | ```bash 137 | npm run dev 138 | ``` 139 | 140 | ### MCP Server Tools 141 | 142 | The MCP server provides the following tools: 143 | 144 | 1. `fetch_page` - Fetches and processes a web page for LLM context 145 | 2. `search_documentation` - Searches Cloudflare documentation and returns relevant content 146 | 3. `extract_structured_content` - Extracts structured content from a web page using CSS selectors 147 | 4. `summarize_content` - Summarizes web content for more concise LLM context 148 | 149 | ## Configuration 150 | 151 | To use your Cloudflare Browser Rendering endpoint, set the `BROWSER_RENDERING_API` environment variable: 152 | 153 | ```bash 154 | export BROWSER_RENDERING_API=https://YOUR_WORKER_URL_HERE 155 | ``` 156 | 157 | Replace `YOUR_WORKER_URL_HERE` with the URL of your deployed Cloudflare Worker. You'll need to replace this placeholder in several files: 158 | 159 | 1. In test files: `test-puppeteer.js`, `examples/debugging-tools/debug-test.js`, `examples/testing/content-test.js` 160 | 2. In the MCP server configuration: `cline_mcp_settings.json.example` 161 | 3. In the browser client: `src/browser-client.ts` (as a fallback if the environment variable is not set) 162 | 163 | ## Integrating with Cline 164 | 165 | To integrate the MCP server with Cline, copy the `cline_mcp_settings.json.example` file to the appropriate location: 166 | 167 | ```bash 168 | cp cline_mcp_settings.json.example ~/Library/Application\ Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json 169 | ``` 170 | 171 | Or add the configuration to your existing `cline_mcp_settings.json` file. 172 | 173 | ## Key Learnings 174 | 175 | 1. Cloudflare Browser Rendering requires the `@cloudflare/puppeteer` package to interact with the browser binding. 176 | 2. The correct pattern for using the browser binding is: 177 | ```javascript 178 | import puppeteer from '@cloudflare/puppeteer'; 179 | 180 | // Then in your handler: 181 | const browser = await puppeteer.launch(env.browser); 182 | const page = await browser.newPage(); 183 | ``` 184 | 3. When deploying a Worker that uses the Browser Rendering binding, you need to enable the `nodejs_compat` compatibility flag. 185 | 4. Always close the browser after use to avoid resource leaks. 186 | 187 | ## License 188 | 189 | MIT ``` -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- ```markdown 1 | # Contributing to Cloudflare Browser Rendering MCP Server 2 | 3 | Thank you for your interest in contributing to this project! This document provides guidelines and instructions for contributing. 4 | 5 | ## Code of Conduct 6 | 7 | Please be respectful and considerate of others when contributing to this project. 8 | 9 | ## How to Contribute 10 | 11 | 1. Fork the repository 12 | 2. Create a new branch for your feature or bug fix 13 | 3. Make your changes 14 | 4. Run tests to ensure your changes don't break existing functionality 15 | 5. Submit a pull request 16 | 17 | ## Development Setup 18 | 19 | 1. Clone the repository: 20 | ```bash 21 | git clone https://github.com/yourusername/cloudflare-browser-rendering.git 22 | cd cloudflare-browser-rendering 23 | ``` 24 | 25 | 2. Install dependencies: 26 | ```bash 27 | npm install 28 | ``` 29 | 30 | 3. Build the project: 31 | ```bash 32 | npm run build 33 | ``` 34 | 35 | ## Project Structure 36 | 37 | Please refer to the README.md for a detailed explanation of the project structure. 38 | 39 | ## Testing 40 | 41 | Before submitting a pull request, please ensure that your changes pass all tests: 42 | 43 | ```bash 44 | npm test 45 | ``` 46 | 47 | ## Pull Request Process 48 | 49 | 1. Update the README.md with details of changes if applicable 50 | 2. Update the examples if you've added new functionality 51 | 3. The PR will be merged once it's reviewed and approved 52 | 53 | ## Adding New Features 54 | 55 | If you're adding a new feature: 56 | 57 | 1. Add appropriate tests 58 | 2. Update documentation 59 | 3. Add examples if applicable 60 | 61 | ## Reporting Bugs 62 | 63 | When reporting bugs, please include: 64 | 65 | 1. A clear description of the bug 66 | 2. Steps to reproduce 67 | 3. Expected behavior 68 | 4. Actual behavior 69 | 5. Environment details (OS, Node.js version, etc.) 70 | 71 | ## Feature Requests 72 | 73 | Feature requests are welcome. Please provide a clear description of the feature and why it would be beneficial to the project. 74 | 75 | ## License 76 | 77 | By contributing to this project, you agree that your contributions will be licensed under the project's MIT License. 78 | ``` -------------------------------------------------------------------------------- /wrangler.toml: -------------------------------------------------------------------------------- ```toml 1 | # Cloudflare Worker with Browser Rendering binding 2 | 3 | name = "browser-rendering-api" 4 | main = "puppeteer-worker.js" 5 | compatibility_date = "2023-10-30" 6 | compatibility_flags = ["nodejs_compat"] 7 | 8 | # Configure the Browser Rendering binding 9 | [browser] 10 | binding = "browser" 11 | ``` -------------------------------------------------------------------------------- /tsconfig.json: -------------------------------------------------------------------------------- ```json 1 | { 2 | "compilerOptions": { 3 | "target": "ES2020", 4 | "module": "NodeNext", 5 | "moduleResolution": "NodeNext", 6 | "esModuleInterop": true, 7 | "strict": true, 8 | "outDir": "dist", 9 | "declaration": true, 10 | "sourceMap": true, 11 | "resolveJsonModule": true 12 | }, 13 | "include": ["src/**/*", "experiments/**/*"], 14 | "exclude": ["node_modules", "dist"] 15 | } 16 | ``` -------------------------------------------------------------------------------- /src/index.ts: -------------------------------------------------------------------------------- ```typescript 1 | #!/usr/bin/env node 2 | import { Server } from '@modelcontextprotocol/sdk/server/index.js'; 3 | import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'; 4 | import { BrowserRenderingServer } from './server.js'; 5 | 6 | /** 7 | * Main entry point for the Cloudflare Browser Rendering MCP server 8 | */ 9 | async function main() { 10 | try { 11 | const server = new BrowserRenderingServer(); 12 | await server.run(); 13 | console.error('Cloudflare Browser Rendering MCP server running on stdio'); 14 | } catch (error) { 15 | console.error('Failed to start MCP server:', error); 16 | process.exit(1); 17 | } 18 | } 19 | 20 | // Run the server 21 | main().catch(console.error); 22 | ``` -------------------------------------------------------------------------------- /examples/minimal-worker-example.js: -------------------------------------------------------------------------------- ```javascript 1 | /** 2 | * Minimal Cloudflare Worker with Browser Rendering binding 3 | * 4 | * This is a minimal example that just returns information about 5 | * the environment and the browser binding. 6 | */ 7 | 8 | export default { 9 | async fetch(request, env, ctx) { 10 | // Check if browser binding exists 11 | const hasBrowser = 'browser' in env; 12 | 13 | // Return information about the environment 14 | return new Response(JSON.stringify({ 15 | env: Object.keys(env), 16 | hasBrowser, 17 | browser: hasBrowser ? { 18 | type: typeof env.browser, 19 | methods: Object.getOwnPropertyNames(Object.getPrototypeOf(env.browser) || {}) 20 | .filter(prop => typeof env.browser[prop] === 'function'), 21 | } : null, 22 | }, null, 2), { 23 | headers: { 'Content-Type': 'application/json' }, 24 | }); 25 | }, 26 | }; 27 | ``` -------------------------------------------------------------------------------- /examples/basic-worker-example.js: -------------------------------------------------------------------------------- ```javascript 1 | /** 2 | * Basic Cloudflare Worker with Browser Rendering binding 3 | * 4 | * This is a simple example of how to use the Browser Rendering binding 5 | * in a Cloudflare Worker. 6 | */ 7 | 8 | export default { 9 | async fetch(request, env, ctx) { 10 | // Get the URL from the request query parameters 11 | const url = new URL(request.url); 12 | const targetUrl = url.searchParams.get('url') || 'https://example.com'; 13 | 14 | // Create a response with information about the request 15 | const info = { 16 | url: targetUrl, 17 | timestamp: new Date().toISOString(), 18 | worker: { 19 | name: 'browser-rendering-example', 20 | environment: Object.keys(env), 21 | }, 22 | }; 23 | 24 | // Return the information as JSON 25 | return new Response(JSON.stringify(info, null, 2), { 26 | headers: { 'Content-Type': 'application/json' }, 27 | }); 28 | }, 29 | }; 30 | ``` -------------------------------------------------------------------------------- /package.json: -------------------------------------------------------------------------------- ```json 1 | { 2 | "name": "cloudflare-browser-rendering", 3 | "version": "1.0.0", 4 | "description": "MCP server for providing web context to LLMs using Cloudflare Browser Rendering", 5 | "main": "dist/src/index.js", 6 | "scripts": { 7 | "test": "node test-puppeteer.js", 8 | "build": "tsc", 9 | "start": "node dist/src/index.js", 10 | "dev": "ts-node src/index.ts", 11 | "experiment:rest": "ts-node experiments/basic-rest-api/index.ts", 12 | "experiment:puppeteer": "ts-node experiments/puppeteer-binding/index.ts", 13 | "experiment:content": "ts-node experiments/content-extraction/index.ts", 14 | "deploy:worker": "npx wrangler deploy" 15 | }, 16 | "keywords": [ 17 | "cloudflare", 18 | "browser-rendering", 19 | "mcp", 20 | "llm", 21 | "context" 22 | ], 23 | "author": "", 24 | "license": "MIT", 25 | "devDependencies": { 26 | "@types/node": "^22.13.5", 27 | "ts-node": "^10.9.2", 28 | "typescript": "^5.7.3", 29 | "wrangler": "^3.111.0" 30 | }, 31 | "dependencies": { 32 | "@cloudflare/puppeteer": "^0.0.14", 33 | "@modelcontextprotocol/sdk": "^1.6.1", 34 | "axios": "^1.8.1" 35 | } 36 | } 37 | ``` -------------------------------------------------------------------------------- /examples/debugging-tools/debug-test.js: -------------------------------------------------------------------------------- ```javascript 1 | /** 2 | * Debug Test for Cloudflare Browser Rendering 3 | * 4 | * This script helps debug issues with the Browser Rendering binding 5 | * by making a request to the info endpoint of the Worker. 6 | */ 7 | 8 | const axios = require('axios'); 9 | 10 | // The URL of the deployed Worker 11 | // Replace YOUR_WORKER_URL_HERE with your actual worker URL when testing 12 | const WORKER_URL = 'https://YOUR_WORKER_URL_HERE'; 13 | 14 | /** 15 | * Test the info endpoint 16 | */ 17 | async function testInfoEndpoint() { 18 | console.log('Testing info endpoint...'); 19 | 20 | try { 21 | // Make a request to the info endpoint 22 | const response = await axios.get(`${WORKER_URL}/info`); 23 | 24 | console.log('Response status:', response.status); 25 | console.log('Response data:', response.data); 26 | 27 | return response.data; 28 | } catch (error) { 29 | console.error('Error testing info endpoint:', error.message); 30 | if (error.response) { 31 | console.error('Response data:', error.response.data); 32 | console.error('Response status:', error.response.status); 33 | } 34 | throw error; 35 | } 36 | } 37 | 38 | /** 39 | * Main function to run the tests 40 | */ 41 | async function runTests() { 42 | try { 43 | // Test the info endpoint 44 | await testInfoEndpoint(); 45 | 46 | console.log('\nTests completed successfully!'); 47 | } catch (error) { 48 | console.error('Tests failed:', error); 49 | } 50 | } 51 | 52 | // Run the tests 53 | runTests(); 54 | ``` -------------------------------------------------------------------------------- /examples/testing/content-test.js: -------------------------------------------------------------------------------- ```javascript 1 | /** 2 | * Content Test for Cloudflare Browser Rendering 3 | * 4 | * This script tests the content fetching functionality of the Worker 5 | * by making a request to the content endpoint. 6 | */ 7 | 8 | const axios = require('axios'); 9 | 10 | // The URL of the deployed Worker 11 | // Replace YOUR_WORKER_URL_HERE with your actual worker URL when testing 12 | const WORKER_URL = 'https://YOUR_WORKER_URL_HERE'; 13 | 14 | /** 15 | * Test the content endpoint 16 | */ 17 | async function testContentEndpoint() { 18 | console.log('Testing content endpoint...'); 19 | 20 | try { 21 | // Make a request to the content endpoint 22 | const response = await axios.post(`${WORKER_URL}/content`, { 23 | url: 'https://example.com', 24 | rejectResourceTypes: ['image', 'font', 'media'], 25 | }); 26 | 27 | console.log('Response status:', response.status); 28 | console.log('Content length:', response.data.content.length); 29 | console.log('Content preview:', response.data.content.substring(0, 200) + '...'); 30 | 31 | return response.data.content; 32 | } catch (error) { 33 | console.error('Error testing content endpoint:', error.message); 34 | if (error.response) { 35 | console.error('Response data:', error.response.data); 36 | } 37 | throw error; 38 | } 39 | } 40 | 41 | /** 42 | * Main function to run the tests 43 | */ 44 | async function runTests() { 45 | try { 46 | // Test the content endpoint 47 | await testContentEndpoint(); 48 | 49 | console.log('\nTests completed successfully!'); 50 | } catch (error) { 51 | console.error('Tests failed:', error); 52 | } 53 | } 54 | 55 | // Run the tests 56 | runTests(); 57 | ``` -------------------------------------------------------------------------------- /src/browser-client.ts: -------------------------------------------------------------------------------- ```typescript 1 | import axios from 'axios'; 2 | 3 | /** 4 | * Client for interacting with Cloudflare Browser Rendering 5 | */ 6 | export class BrowserClient { 7 | private apiEndpoint: string; 8 | 9 | constructor() { 10 | // Use the deployed Cloudflare Worker or a default placeholder 11 | // Replace YOUR_WORKER_URL_HERE with your actual worker URL when deploying 12 | this.apiEndpoint = process.env.BROWSER_RENDERING_API || 'https://YOUR_WORKER_URL_HERE'; 13 | } 14 | 15 | /** 16 | * Fetches rendered HTML content from a URL 17 | * @param url The URL to fetch content from 18 | * @returns The rendered HTML content 19 | */ 20 | async fetchContent(url: string): Promise<string> { 21 | try { 22 | console.log(`Fetching content from: ${url}`); 23 | 24 | // Make the API call to the Cloudflare Worker 25 | const response = await axios.post(`${this.apiEndpoint}/content`, { 26 | url, 27 | rejectResourceTypes: ['image', 'font', 'media'], 28 | waitUntil: 'networkidle0', 29 | }); 30 | 31 | return response.data.content; 32 | } catch (error) { 33 | console.error('Error fetching content:', error); 34 | throw new Error(`Failed to fetch content: ${error instanceof Error ? error.message : String(error)}`); 35 | } 36 | } 37 | 38 | /** 39 | * Takes a screenshot of a URL 40 | * @param url The URL to take a screenshot of 41 | * @returns Base64-encoded screenshot image 42 | */ 43 | async takeScreenshot(url: string): Promise<string> { 44 | try { 45 | console.log(`Taking screenshot of: ${url}`); 46 | 47 | // Make the API call to the Cloudflare Worker 48 | const response = await axios.post(`${this.apiEndpoint}/screenshot`, { 49 | url, 50 | fullPage: false, 51 | width: 1280, 52 | height: 800, 53 | }); 54 | 55 | return response.data.screenshot; 56 | } catch (error) { 57 | console.error('Error taking screenshot:', error); 58 | throw new Error(`Failed to take screenshot: ${error instanceof Error ? error.message : String(error)}`); 59 | } 60 | } 61 | } 62 | ``` -------------------------------------------------------------------------------- /test-puppeteer.js: -------------------------------------------------------------------------------- ```javascript 1 | const axios = require('axios'); 2 | 3 | // The URL of the deployed Worker 4 | // Replace YOUR_WORKER_URL_HERE with your actual worker URL when testing 5 | const WORKER_URL = 'https://YOUR_WORKER_URL_HERE'; 6 | 7 | // Test the /content endpoint 8 | async function testContentEndpoint() { 9 | console.log('Testing /content endpoint...'); 10 | 11 | try { 12 | const response = await axios.post(`${WORKER_URL}/content`, { 13 | url: 'https://example.com', 14 | rejectResourceTypes: ['image', 'font', 'media'], 15 | }); 16 | 17 | console.log('Response status:', response.status); 18 | console.log('Content length:', response.data.content.length); 19 | console.log('Content preview:', response.data.content.substring(0, 200) + '...'); 20 | 21 | return response.data.content; 22 | } catch (error) { 23 | console.error('Error testing /content endpoint:', error.message); 24 | if (error.response) { 25 | console.error('Response data:', error.response.data); 26 | } 27 | throw error; 28 | } 29 | } 30 | 31 | // Test the /screenshot endpoint 32 | async function testScreenshotEndpoint() { 33 | console.log('\nTesting /screenshot endpoint...'); 34 | 35 | try { 36 | const response = await axios.post(`${WORKER_URL}/screenshot`, { 37 | url: 'https://example.com', 38 | width: 800, 39 | height: 600, 40 | }); 41 | 42 | console.log('Response status:', response.status); 43 | console.log('Screenshot data length:', response.data.screenshot.length); 44 | console.log('Screenshot data preview:', response.data.screenshot.substring(0, 50) + '...'); 45 | 46 | return response.data.screenshot; 47 | } catch (error) { 48 | console.error('Error testing /screenshot endpoint:', error.message); 49 | if (error.response) { 50 | console.error('Response data:', error.response.data); 51 | } 52 | throw error; 53 | } 54 | } 55 | 56 | // Main function to run the tests 57 | async function runTests() { 58 | try { 59 | // Test the /content endpoint 60 | await testContentEndpoint(); 61 | 62 | // Test the /screenshot endpoint 63 | await testScreenshotEndpoint(); 64 | 65 | console.log('\nTests completed successfully!'); 66 | } catch (error) { 67 | console.error('Tests failed:', error); 68 | } 69 | } 70 | 71 | // Run the tests 72 | runTests(); 73 | ``` -------------------------------------------------------------------------------- /experiments/basic-rest-api/index.ts: -------------------------------------------------------------------------------- ```typescript 1 | import axios from 'axios'; 2 | 3 | // Cloudflare Browser Rendering REST API endpoint 4 | // Note: In a real implementation, this would be your Cloudflare account's endpoint 5 | const BROWSER_RENDERING_API = 'https://browser-rendering.youraccount.workers.dev'; 6 | 7 | /** 8 | * Fetches rendered HTML content from a URL using Cloudflare Browser Rendering 9 | * @param url The URL to fetch content from 10 | * @returns The rendered HTML content 11 | */ 12 | async function fetchRenderedContent(url: string): Promise<string> { 13 | try { 14 | const response = await axios.post(`${BROWSER_RENDERING_API}/content`, { 15 | url, 16 | // Optional parameters to optimize performance 17 | rejectResourceTypes: ['image', 'font', 'media'], 18 | // Wait for network to be idle (no requests for 500ms) 19 | waitUntil: 'networkidle0', 20 | }); 21 | 22 | return response.data.content; 23 | } catch (error) { 24 | if (axios.isAxiosError(error)) { 25 | console.error('Error fetching content:', error.message); 26 | if (error.response) { 27 | console.error('Response data:', error.response.data); 28 | } 29 | } else { 30 | console.error('Unexpected error:', error); 31 | } 32 | throw error; 33 | } 34 | } 35 | 36 | /** 37 | * Extracts main content from Cloudflare documentation HTML 38 | * @param html The full HTML content 39 | * @returns The extracted main content 40 | */ 41 | function extractMainContent(html: string): string { 42 | // In a real implementation, we would use a proper HTML parser 43 | // For this experiment, we'll use a simple regex approach 44 | 45 | // Try to find the main content container in Cloudflare docs 46 | const mainContentMatch = html.match(/<main[^>]*>([\s\S]*?)<\/main>/i); 47 | 48 | if (mainContentMatch && mainContentMatch[1]) { 49 | return mainContentMatch[1]; 50 | } 51 | 52 | return 'Could not extract main content'; 53 | } 54 | 55 | /** 56 | * Main function to run the experiment 57 | */ 58 | async function runExperiment() { 59 | console.log('Starting Browser Rendering REST API experiment...'); 60 | 61 | // Fetch content from Cloudflare docs 62 | const url = 'https://developers.cloudflare.com/browser-rendering/'; 63 | console.log(`Fetching content from: ${url}`); 64 | 65 | try { 66 | // Note: In a real implementation, you would use your actual Cloudflare Browser Rendering endpoint 67 | // For this experiment, we'll simulate the response 68 | console.log('This is a simulation since we need a real Cloudflare Browser Rendering endpoint'); 69 | 70 | // Simulated content 71 | const simulatedHtml = ` 72 | <!DOCTYPE html> 73 | <html> 74 | <head> 75 | <title>Cloudflare Browser Rendering</title> 76 | </head> 77 | <body> 78 | <header> 79 | <nav>Navigation</nav> 80 | </header> 81 | <main> 82 | <h1>Browser Rendering</h1> 83 | <p>Cloudflare Browser Rendering is a serverless headless browser service that allows execution of browser actions within Cloudflare Workers.</p> 84 | <h2>Features</h2> 85 | <ul> 86 | <li>REST API for simple operations</li> 87 | <li>Workers Binding API for complex automation</li> 88 | <li>Puppeteer integration</li> 89 | </ul> 90 | </main> 91 | <footer>Footer content</footer> 92 | </body> 93 | </html> 94 | `; 95 | 96 | // Extract main content 97 | const mainContent = extractMainContent(simulatedHtml); 98 | console.log('\nExtracted main content:'); 99 | console.log(mainContent); 100 | 101 | console.log('\nIn a real implementation, you would:'); 102 | console.log('1. Replace the BROWSER_RENDERING_API constant with your actual endpoint'); 103 | console.log('2. Uncomment the fetchRenderedContent call'); 104 | console.log('3. Use a proper HTML parser for content extraction'); 105 | 106 | } catch (error) { 107 | console.error('Experiment failed:', error); 108 | } 109 | } 110 | 111 | // Run the experiment 112 | runExperiment(); 113 | ``` -------------------------------------------------------------------------------- /src/content-processor.ts: -------------------------------------------------------------------------------- ```typescript 1 | /** 2 | * Processes web content for LLM context 3 | */ 4 | export class ContentProcessor { 5 | /** 6 | * Processes HTML content for LLM context 7 | * @param html The HTML content to process 8 | * @param url The URL of the content 9 | * @returns Processed content suitable for LLM context 10 | */ 11 | processForLLM(html: string, url: string): string { 12 | // Extract metadata 13 | const metadata = this.extractMetadata(html, url); 14 | 15 | // Clean the content 16 | const cleanedContent = this.cleanContent(html); 17 | 18 | // Format for LLM context 19 | return this.formatForLLM(cleanedContent, metadata); 20 | } 21 | 22 | /** 23 | * Extracts metadata from HTML content 24 | * @param html The HTML content 25 | * @param url The URL of the content 26 | * @returns Metadata object 27 | */ 28 | private extractMetadata(html: string, url: string): Record<string, string> { 29 | const titleMatch = html.match(/<title[^>]*>([\s\S]*?)<\/title>/i); 30 | const descriptionMatch = html.match(/<meta name="description" content="([^"]*)">/i); 31 | 32 | return { 33 | title: titleMatch ? titleMatch[1].trim() : 'Unknown Title', 34 | description: descriptionMatch ? descriptionMatch[1].trim() : 'No description available', 35 | url, 36 | source: new URL(url).hostname, 37 | extractedAt: new Date().toISOString(), 38 | }; 39 | } 40 | 41 | /** 42 | * Cleans HTML content for LLM context 43 | * @param html The HTML content to clean 44 | * @returns Cleaned content 45 | */ 46 | private cleanContent(html: string): string { 47 | // Extract the main content 48 | // In a real implementation, you would use a proper HTML parser 49 | // For this simulation, we'll use a simple approach with regex 50 | 51 | // Try to find the main content container 52 | let content = html; 53 | 54 | // Try to extract article content 55 | const articleMatch = html.match(/<article[^>]*>([\s\S]*?)<\/article>/i); 56 | if (articleMatch && articleMatch[1]) { 57 | content = articleMatch[1]; 58 | } else { 59 | // Try to extract main content 60 | const mainMatch = html.match(/<main[^>]*>([\s\S]*?)<\/main>/i); 61 | if (mainMatch && mainMatch[1]) { 62 | content = mainMatch[1]; 63 | } 64 | } 65 | 66 | // Remove HTML tags but preserve headings and paragraph structure 67 | content = content 68 | // Replace headings with markdown-style headings 69 | .replace(/<h1[^>]*>([\s\S]*?)<\/h1>/gi, '# $1\n\n') 70 | .replace(/<h2[^>]*>([\s\S]*?)<\/h2>/gi, '## $1\n\n') 71 | .replace(/<h3[^>]*>([\s\S]*?)<\/h3>/gi, '### $1\n\n') 72 | .replace(/<h4[^>]*>([\s\S]*?)<\/h4>/gi, '#### $1\n\n') 73 | .replace(/<h5[^>]*>([\s\S]*?)<\/h5>/gi, '##### $1\n\n') 74 | .replace(/<h6[^>]*>([\s\S]*?)<\/h6>/gi, '###### $1\n\n') 75 | // Replace list items with markdown-style list items 76 | .replace(/<li[^>]*>([\s\S]*?)<\/li>/gi, '- $1\n') 77 | // Replace paragraphs with newline-separated text 78 | .replace(/<p[^>]*>([\s\S]*?)<\/p>/gi, '$1\n\n') 79 | // Replace code blocks with markdown-style code blocks 80 | .replace(/<pre[^>]*><code[^>]*>([\s\S]*?)<\/code><\/pre>/gi, '```\n$1\n```\n\n') 81 | // Replace inline code with markdown-style inline code 82 | .replace(/<code[^>]*>([\s\S]*?)<\/code>/gi, '`$1`') 83 | // Replace links with markdown-style links 84 | .replace(/<a[^>]*href="([^"]*)"[^>]*>([\s\S]*?)<\/a>/gi, '[$2]($1)') 85 | // Replace strong/bold with markdown-style bold 86 | .replace(/<(strong|b)[^>]*>([\s\S]*?)<\/(strong|b)>/gi, '**$2**') 87 | // Replace emphasis/italic with markdown-style italic 88 | .replace(/<(em|i)[^>]*>([\s\S]*?)<\/(em|i)>/gi, '*$2*') 89 | // Remove all other HTML tags 90 | .replace(/<[^>]*>/g, '') 91 | // Fix multiple newlines 92 | .replace(/\n{3,}/g, '\n\n') 93 | // Decode HTML entities 94 | .replace(/ /g, ' ') 95 | .replace(/</g, '<') 96 | .replace(/>/g, '>') 97 | .replace(/&/g, '&') 98 | .replace(/"/g, '"') 99 | .replace(/'/g, "'") 100 | // Trim whitespace 101 | .trim(); 102 | 103 | return content; 104 | } 105 | 106 | /** 107 | * Formats content for LLM context 108 | * @param content The cleaned content 109 | * @param metadata The metadata 110 | * @returns Formatted content for LLM context 111 | */ 112 | private formatForLLM(content: string, metadata: Record<string, string>): string { 113 | // Create a header with metadata 114 | const header = ` 115 | Title: ${metadata.title} 116 | Source: ${metadata.source} 117 | URL: ${metadata.url} 118 | Extracted: ${metadata.extractedAt} 119 | Description: ${metadata.description} 120 | --- 121 | 122 | `; 123 | 124 | // Combine header and content 125 | return header + content; 126 | } 127 | 128 | /** 129 | * Summarizes content (in a real implementation, this would call an LLM API) 130 | * @param content The content to summarize 131 | * @param maxLength Maximum length of the summary 132 | * @returns Summarized content 133 | */ 134 | summarizeContent(content: string, maxLength: number = 500): string { 135 | // In a real implementation, you would call an LLM API here 136 | console.log('Simulating content summarization...'); 137 | 138 | // For this simulation, we'll return a mock summary 139 | const mockSummary = ` 140 | # Browser Rendering API Summary 141 | 142 | Cloudflare Browser Rendering is a serverless headless browser service for Cloudflare Workers that enables: 143 | 144 | 1. Rendering JavaScript-heavy websites 145 | 2. Taking screenshots and generating PDFs 146 | 3. Extracting structured data 147 | 4. Automating browser interactions 148 | 149 | It offers two main interfaces: 150 | 151 | - **REST API**: Simple endpoints for common tasks 152 | - **Workers Binding API**: Advanced integration with Puppeteer 153 | 154 | The service runs within Cloudflare's network, providing low-latency access to browser capabilities without managing infrastructure. 155 | `.trim(); 156 | 157 | // Truncate if necessary 158 | return mockSummary.length > maxLength 159 | ? mockSummary.substring(0, maxLength) + '...' 160 | : mockSummary; 161 | } 162 | } 163 | ``` -------------------------------------------------------------------------------- /puppeteer-worker.js: -------------------------------------------------------------------------------- ```javascript 1 | import puppeteer from '@cloudflare/puppeteer'; 2 | 3 | /** 4 | * Cloudflare Worker with Browser Rendering binding 5 | * 6 | * This worker demonstrates how to use the Browser Rendering binding 7 | * with the @cloudflare/puppeteer package. 8 | */ 9 | 10 | // Define the allowed origins for CORS 11 | const ALLOWED_ORIGINS = [ 12 | 'https://example.com', 13 | 'http://localhost:3000', 14 | ]; 15 | 16 | /** 17 | * Handle CORS preflight requests 18 | */ 19 | function handleOptions(request) { 20 | const origin = request.headers.get('Origin'); 21 | const isAllowedOrigin = ALLOWED_ORIGINS.includes(origin); 22 | 23 | return new Response(null, { 24 | status: 204, 25 | headers: { 26 | 'Access-Control-Allow-Origin': isAllowedOrigin ? origin : ALLOWED_ORIGINS[0], 27 | 'Access-Control-Allow-Methods': 'GET, POST, OPTIONS', 28 | 'Access-Control-Allow-Headers': 'Content-Type, Authorization', 29 | 'Access-Control-Max-Age': '86400', 30 | }, 31 | }); 32 | } 33 | 34 | /** 35 | * Add CORS headers to a response 36 | */ 37 | function addCorsHeaders(response, request) { 38 | const origin = request.headers.get('Origin'); 39 | const isAllowedOrigin = ALLOWED_ORIGINS.includes(origin); 40 | 41 | const headers = new Headers(response.headers); 42 | headers.set('Access-Control-Allow-Origin', isAllowedOrigin ? origin : ALLOWED_ORIGINS[0]); 43 | 44 | return new Response(response.body, { 45 | status: response.status, 46 | statusText: response.statusText, 47 | headers, 48 | }); 49 | } 50 | 51 | /** 52 | * Handle the /content endpoint 53 | */ 54 | async function handleContent(request, env) { 55 | try { 56 | // Parse the request body 57 | const body = await request.json(); 58 | 59 | if (!body.url) { 60 | return new Response(JSON.stringify({ error: 'URL is required' }), { 61 | status: 400, 62 | headers: { 'Content-Type': 'application/json' }, 63 | }); 64 | } 65 | 66 | // Launch a browser using the binding 67 | const browser = await puppeteer.launch(env.browser); 68 | 69 | try { 70 | // Create a new page 71 | const page = await browser.newPage(); 72 | 73 | // Set viewport size 74 | await page.setViewport({ 75 | width: 1280, 76 | height: 800, 77 | }); 78 | 79 | // Set request rejection patterns if provided 80 | if (body.rejectResourceTypes && Array.isArray(body.rejectResourceTypes)) { 81 | await page.setRequestInterception(true); 82 | page.on('request', (req) => { 83 | if (body.rejectResourceTypes.includes(req.resourceType())) { 84 | req.abort(); 85 | } else { 86 | req.continue(); 87 | } 88 | }); 89 | } 90 | 91 | // Navigate to the URL 92 | await page.goto(body.url, { 93 | waitUntil: body.waitUntil || 'networkidle0', 94 | timeout: body.timeout || 30000, 95 | }); 96 | 97 | // Get the page content 98 | const content = await page.content(); 99 | 100 | // Return the content 101 | return new Response(JSON.stringify({ content }), { 102 | headers: { 'Content-Type': 'application/json' }, 103 | }); 104 | } finally { 105 | // Always close the browser to avoid resource leaks 106 | await browser.close(); 107 | } 108 | } catch (error) { 109 | return new Response(JSON.stringify({ error: error.message, stack: error.stack }), { 110 | status: 500, 111 | headers: { 'Content-Type': 'application/json' }, 112 | }); 113 | } 114 | } 115 | 116 | /** 117 | * Handle the /screenshot endpoint 118 | */ 119 | async function handleScreenshot(request, env) { 120 | try { 121 | // Parse the request body 122 | const body = await request.json(); 123 | 124 | if (!body.url) { 125 | return new Response(JSON.stringify({ error: 'URL is required' }), { 126 | status: 400, 127 | headers: { 'Content-Type': 'application/json' }, 128 | }); 129 | } 130 | 131 | // Launch a browser using the binding 132 | const browser = await puppeteer.launch(env.browser); 133 | 134 | try { 135 | // Create a new page 136 | const page = await browser.newPage(); 137 | 138 | // Set viewport size 139 | await page.setViewport({ 140 | width: body.width || 1280, 141 | height: body.height || 800, 142 | }); 143 | 144 | // Navigate to the URL 145 | await page.goto(body.url, { 146 | waitUntil: body.waitUntil || 'networkidle0', 147 | timeout: body.timeout || 30000, 148 | }); 149 | 150 | // Take a screenshot 151 | const screenshot = await page.screenshot({ 152 | fullPage: body.fullPage || false, 153 | type: 'png', 154 | encoding: 'base64', 155 | }); 156 | 157 | // Return the screenshot 158 | return new Response(JSON.stringify({ screenshot }), { 159 | headers: { 'Content-Type': 'application/json' }, 160 | }); 161 | } finally { 162 | // Always close the browser to avoid resource leaks 163 | await browser.close(); 164 | } 165 | } catch (error) { 166 | return new Response(JSON.stringify({ error: error.message, stack: error.stack }), { 167 | status: 500, 168 | headers: { 'Content-Type': 'application/json' }, 169 | }); 170 | } 171 | } 172 | 173 | /** 174 | * Main worker handler 175 | */ 176 | export default { 177 | async fetch(request, env, ctx) { 178 | // Handle CORS preflight requests 179 | if (request.method === 'OPTIONS') { 180 | return handleOptions(request); 181 | } 182 | 183 | // Get the URL pathname 184 | const url = new URL(request.url); 185 | const path = url.pathname.toLowerCase(); 186 | 187 | // Route the request to the appropriate handler 188 | let response; 189 | 190 | try { 191 | if (path === '/content' && request.method === 'POST') { 192 | response = await handleContent(request, env); 193 | } else if (path === '/screenshot' && request.method === 'POST') { 194 | response = await handleScreenshot(request, env); 195 | } else if (path === '/info') { 196 | // Return information about the environment 197 | response = new Response(JSON.stringify({ 198 | env: Object.keys(env), 199 | hasBrowser: 'browser' in env, 200 | puppeteer: { 201 | version: puppeteer.version, 202 | }, 203 | }, null, 2), { 204 | headers: { 'Content-Type': 'application/json' }, 205 | }); 206 | } else { 207 | response = new Response(JSON.stringify({ 208 | error: 'Endpoint not found', 209 | availableEndpoints: ['/content', '/screenshot', '/info'], 210 | method: request.method, 211 | }), { 212 | status: 404, 213 | headers: { 'Content-Type': 'application/json' }, 214 | }); 215 | } 216 | } catch (error) { 217 | response = new Response(JSON.stringify({ 218 | error: error.message, 219 | stack: error.stack, 220 | }), { 221 | status: 500, 222 | headers: { 'Content-Type': 'application/json' }, 223 | }); 224 | } 225 | 226 | // Add CORS headers to the response 227 | return addCorsHeaders(response, request); 228 | }, 229 | }; 230 | ``` -------------------------------------------------------------------------------- /experiments/puppeteer-binding/index.ts: -------------------------------------------------------------------------------- ```typescript 1 | /** 2 | * Experiment: Cloudflare Browser Rendering Workers Binding API with Puppeteer 3 | * 4 | * Note: This is a simulation of how you would use the Cloudflare Browser Rendering 5 | * Workers Binding API with Puppeteer. In a real implementation, this code would 6 | * run within a Cloudflare Worker with the Browser Rendering binding. 7 | */ 8 | 9 | // In a real Cloudflare Worker, you would import Puppeteer like this: 10 | // import puppeteer from '@cloudflare/puppeteer'; 11 | 12 | /** 13 | * Simulated function to navigate through Cloudflare documentation and extract structured information 14 | */ 15 | async function navigateAndExtractContent() { 16 | console.log('Simulating Puppeteer navigation and content extraction...'); 17 | 18 | // In a real implementation, you would initialize Puppeteer like this: 19 | /* 20 | const browser = await puppeteer.launch({ 21 | // Browser Rendering specific options 22 | userDataDir: '/tmp/puppeteer_user_data', 23 | }); 24 | 25 | try { 26 | const page = await browser.newPage(); 27 | 28 | // Navigate to Cloudflare docs 29 | await page.goto('https://developers.cloudflare.com/browser-rendering/', { 30 | waitUntil: 'networkidle0', 31 | }); 32 | 33 | // Extract headings 34 | const headings = await page.evaluate(() => { 35 | const headingElements = document.querySelectorAll('h1, h2, h3'); 36 | return Array.from(headingElements).map(el => ({ 37 | level: el.tagName.toLowerCase(), 38 | text: el.textContent?.trim() || '', 39 | })); 40 | }); 41 | 42 | // Extract code examples 43 | const codeExamples = await page.evaluate(() => { 44 | const codeElements = document.querySelectorAll('pre code'); 45 | return Array.from(codeElements).map(el => ({ 46 | language: el.className.replace('language-', ''), 47 | code: el.textContent?.trim() || '', 48 | })); 49 | }); 50 | 51 | // Navigate to a different section 52 | await page.click('a[href*="rest-api"]'); 53 | await page.waitForNavigation({ waitUntil: 'networkidle0' }); 54 | 55 | // Extract API endpoints 56 | const apiEndpoints = await page.evaluate(() => { 57 | const endpointElements = document.querySelectorAll('.endpoint'); 58 | return Array.from(endpointElements).map(el => ({ 59 | method: el.querySelector('.method')?.textContent?.trim() || '', 60 | path: el.querySelector('.path')?.textContent?.trim() || '', 61 | description: el.querySelector('.description')?.textContent?.trim() || '', 62 | })); 63 | }); 64 | 65 | return { 66 | headings, 67 | codeExamples, 68 | apiEndpoints, 69 | }; 70 | } finally { 71 | // In a real implementation with session reuse, you would use: 72 | // await browser.disconnect(); 73 | // Instead of: 74 | // await browser.close(); 75 | } 76 | */ 77 | 78 | // For this simulation, we'll return mock data 79 | return { 80 | headings: [ 81 | { level: 'h1', text: 'Browser Rendering' }, 82 | { level: 'h2', text: 'Overview' }, 83 | { level: 'h2', text: 'REST API' }, 84 | { level: 'h3', text: 'Content Endpoint' }, 85 | { level: 'h3', text: 'Screenshot Endpoint' }, 86 | { level: 'h2', text: 'Workers Binding API' }, 87 | ], 88 | codeExamples: [ 89 | { 90 | language: 'javascript', 91 | code: ` 92 | // Example of using the REST API 93 | fetch('https://browser-rendering.example.workers.dev/content', { 94 | method: 'POST', 95 | body: JSON.stringify({ 96 | url: 'https://example.com', 97 | rejectResourceTypes: ['image', 'font'] 98 | }) 99 | }) 100 | .then(response => response.json()) 101 | .then(data => console.log(data.content)); 102 | ` 103 | }, 104 | { 105 | language: 'javascript', 106 | code: ` 107 | // Example of using the Workers Binding API 108 | import puppeteer from '@cloudflare/puppeteer'; 109 | 110 | export default { 111 | async fetch(request, env) { 112 | const browser = await puppeteer.launch(); 113 | const page = await browser.newPage(); 114 | await page.goto('https://example.com'); 115 | const content = await page.content(); 116 | await browser.disconnect(); 117 | return new Response(content); 118 | } 119 | }; 120 | ` 121 | } 122 | ], 123 | apiEndpoints: [ 124 | { 125 | method: 'POST', 126 | path: '/content', 127 | description: 'Fetches rendered HTML content from a URL' 128 | }, 129 | { 130 | method: 'POST', 131 | path: '/screenshot', 132 | description: 'Captures a screenshot of a web page' 133 | }, 134 | { 135 | method: 'POST', 136 | path: '/pdf', 137 | description: 'Renders a web page as a PDF document' 138 | }, 139 | { 140 | method: 'POST', 141 | path: '/scrape', 142 | description: 'Extracts structured data from HTML elements' 143 | } 144 | ] 145 | }; 146 | } 147 | 148 | /** 149 | * Simulated function to demonstrate session reuse 150 | */ 151 | async function demonstrateSessionReuse() { 152 | console.log('Simulating Puppeteer session reuse...'); 153 | 154 | // In a real implementation, you would use code like this: 155 | /* 156 | // Get existing browser sessions 157 | const sessions = await puppeteer.sessions(); 158 | 159 | let browser; 160 | if (sessions.length > 0) { 161 | // Connect to an existing session 162 | browser = await puppeteer.connect({ sessionId: sessions[0].id }); 163 | console.log('Connected to existing session'); 164 | } else { 165 | // Create a new session 166 | browser = await puppeteer.launch(); 167 | console.log('Created new session'); 168 | } 169 | 170 | try { 171 | // Use the browser... 172 | const page = await browser.newPage(); 173 | await page.goto('https://example.com'); 174 | // ... 175 | } finally { 176 | // Disconnect instead of closing to keep the session alive 177 | await browser.disconnect(); 178 | } 179 | */ 180 | 181 | console.log('In a real implementation, you would:'); 182 | console.log('1. Check for existing sessions with puppeteer.sessions()'); 183 | console.log('2. Connect to an existing session or create a new one'); 184 | console.log('3. Use browser.disconnect() instead of browser.close() to keep the session alive'); 185 | } 186 | 187 | /** 188 | * Main function to run the experiment 189 | */ 190 | async function runExperiment() { 191 | console.log('Starting Browser Rendering Workers Binding API experiment...'); 192 | 193 | try { 194 | // Simulate navigating and extracting content 195 | const extractedData = await navigateAndExtractContent(); 196 | 197 | // Display the extracted data 198 | console.log('\nExtracted headings:'); 199 | extractedData.headings.forEach(heading => { 200 | console.log(`${heading.level}: ${heading.text}`); 201 | }); 202 | 203 | console.log('\nExtracted API endpoints:'); 204 | extractedData.apiEndpoints.forEach(endpoint => { 205 | console.log(`${endpoint.method} ${endpoint.path} - ${endpoint.description}`); 206 | }); 207 | 208 | console.log('\nExtracted code examples (first example):'); 209 | if (extractedData.codeExamples.length > 0) { 210 | console.log(`Language: ${extractedData.codeExamples[0].language}`); 211 | console.log(extractedData.codeExamples[0].code); 212 | } 213 | 214 | // Simulate session reuse 215 | await demonstrateSessionReuse(); 216 | 217 | console.log('\nNote: This is a simulation. In a real implementation:'); 218 | console.log('1. This code would run within a Cloudflare Worker'); 219 | console.log('2. You would use the actual @cloudflare/puppeteer package'); 220 | console.log('3. You would need to set up the Browser Rendering binding in your wrangler.toml'); 221 | 222 | } catch (error) { 223 | console.error('Experiment failed:', error); 224 | } 225 | } 226 | 227 | // Run the experiment 228 | runExperiment(); 229 | ``` -------------------------------------------------------------------------------- /experiments/content-extraction/index.ts: -------------------------------------------------------------------------------- ```typescript 1 | import axios from 'axios'; 2 | 3 | /** 4 | * Experiment: Content Extraction and Processing for LLM Context 5 | * 6 | * This experiment demonstrates how to extract and process web content 7 | * specifically for use as context in LLMs using Cloudflare Browser Rendering. 8 | */ 9 | 10 | /** 11 | * Simulated function to extract clean content from a web page 12 | * @param url The URL to extract content from 13 | */ 14 | async function extractCleanContent(url: string) { 15 | console.log(`Simulating content extraction from: ${url}`); 16 | 17 | // In a real implementation with Cloudflare Browser Rendering, you would: 18 | // 1. Use the /content endpoint to get the rendered HTML 19 | // 2. Use the /scrape endpoint to extract specific elements 20 | // 3. Process the content to make it suitable for LLM context 21 | 22 | // Simulated HTML content from Cloudflare docs 23 | const simulatedHtml = ` 24 | <!DOCTYPE html> 25 | <html> 26 | <head> 27 | <title>Browser Rendering API | Cloudflare Docs</title> 28 | <meta name="description" content="Learn how to use Cloudflare's Browser Rendering API"> 29 | </head> 30 | <body> 31 | <header> 32 | <nav> 33 | <ul> 34 | <li><a href="/">Home</a></li> 35 | <li><a href="/products">Products</a></li> 36 | <li><a href="/developers">Developers</a></li> 37 | </ul> 38 | </nav> 39 | </header> 40 | <main> 41 | <article> 42 | <h1>Browser Rendering API</h1> 43 | <p class="description">Cloudflare Browser Rendering is a serverless headless browser service that allows execution of browser actions within Cloudflare Workers.</p> 44 | 45 | <section id="overview"> 46 | <h2>Overview</h2> 47 | <p>Browser Rendering allows you to run a headless browser directly within Cloudflare's network, enabling you to:</p> 48 | <ul> 49 | <li>Render JavaScript-heavy websites</li> 50 | <li>Take screenshots of web pages</li> 51 | <li>Generate PDFs</li> 52 | <li>Extract structured data</li> 53 | <li>Automate browser interactions</li> 54 | </ul> 55 | </section> 56 | 57 | <section id="rest-api"> 58 | <h2>REST API</h2> 59 | <p>The REST API provides simple endpoints for common browser tasks:</p> 60 | <div class="endpoint"> 61 | <h3>/content</h3> 62 | <p>Fetches rendered HTML content from a URL after JavaScript execution.</p> 63 | <pre><code> 64 | POST /content 65 | { 66 | "url": "https://example.com", 67 | "rejectResourceTypes": ["image", "font"] 68 | } 69 | </code></pre> 70 | </div> 71 | 72 | <div class="endpoint"> 73 | <h3>/screenshot</h3> 74 | <p>Captures a screenshot of a web page.</p> 75 | </div> 76 | </section> 77 | 78 | <section id="workers-binding"> 79 | <h2>Workers Binding API</h2> 80 | <p>For more advanced use cases, you can use the Workers Binding API with Puppeteer.</p> 81 | <pre><code> 82 | import puppeteer from '@cloudflare/puppeteer'; 83 | 84 | export default { 85 | async fetch(request, env) { 86 | const browser = await puppeteer.launch(); 87 | const page = await browser.newPage(); 88 | await page.goto('https://example.com'); 89 | const content = await page.content(); 90 | await browser.disconnect(); 91 | return new Response(content); 92 | } 93 | }; 94 | </code></pre> 95 | </section> 96 | </article> 97 | </main> 98 | <footer> 99 | <p>© 2025 Cloudflare, Inc.</p> 100 | <nav> 101 | <ul> 102 | <li><a href="/terms">Terms</a></li> 103 | <li><a href="/privacy">Privacy</a></li> 104 | </ul> 105 | </nav> 106 | </footer> 107 | </body> 108 | </html> 109 | `; 110 | 111 | return simulatedHtml; 112 | } 113 | 114 | /** 115 | * Extracts main content from HTML and removes unnecessary elements 116 | * @param html The HTML content 117 | * @returns Cleaned content suitable for LLM context 118 | */ 119 | function cleanContentForLLM(html: string): string { 120 | // In a real implementation, you would use a proper HTML parser 121 | // For this experiment, we'll use a simple approach with regex 122 | 123 | // Extract the article content 124 | const articleMatch = html.match(/<article[^>]*>([\s\S]*?)<\/article>/i); 125 | let content = articleMatch ? articleMatch[1] : html; 126 | 127 | // Remove HTML tags but preserve headings and paragraph structure 128 | content = content 129 | // Replace headings with markdown-style headings 130 | .replace(/<h1[^>]*>([\s\S]*?)<\/h1>/gi, '# $1\n\n') 131 | .replace(/<h2[^>]*>([\s\S]*?)<\/h2>/gi, '## $1\n\n') 132 | .replace(/<h3[^>]*>([\s\S]*?)<\/h3>/gi, '### $1\n\n') 133 | // Replace list items with markdown-style list items 134 | .replace(/<li[^>]*>([\s\S]*?)<\/li>/gi, '- $1\n') 135 | // Replace paragraphs with newline-separated text 136 | .replace(/<p[^>]*>([\s\S]*?)<\/p>/gi, '$1\n\n') 137 | // Replace code blocks with markdown-style code blocks 138 | .replace(/<pre[^>]*><code[^>]*>([\s\S]*?)<\/code><\/pre>/gi, '```\n$1\n```\n\n') 139 | // Remove all other HTML tags 140 | .replace(/<[^>]*>/g, '') 141 | // Fix multiple newlines 142 | .replace(/\n{3,}/g, '\n\n') 143 | // Trim whitespace 144 | .trim(); 145 | 146 | return content; 147 | } 148 | 149 | /** 150 | * Extracts metadata from HTML 151 | * @param html The HTML content 152 | * @returns Metadata object 153 | */ 154 | function extractMetadata(html: string): Record<string, string> { 155 | const titleMatch = html.match(/<title[^>]*>([\s\S]*?)<\/title>/i); 156 | const descriptionMatch = html.match(/<meta name="description" content="([^"]*)">/i); 157 | 158 | return { 159 | title: titleMatch ? titleMatch[1].trim() : 'Unknown Title', 160 | description: descriptionMatch ? descriptionMatch[1].trim() : 'No description available', 161 | url: 'https://developers.cloudflare.com/browser-rendering/', // Simulated URL 162 | source: 'Cloudflare Documentation', 163 | extractedAt: new Date().toISOString(), 164 | }; 165 | } 166 | 167 | /** 168 | * Formats content for LLM context 169 | * @param content The cleaned content 170 | * @param metadata The metadata 171 | * @returns Formatted content for LLM context 172 | */ 173 | function formatForLLMContext(content: string, metadata: Record<string, string>): string { 174 | // Create a header with metadata 175 | const header = ` 176 | Title: ${metadata.title} 177 | Source: ${metadata.source} 178 | URL: ${metadata.url} 179 | Extracted: ${metadata.extractedAt} 180 | Description: ${metadata.description} 181 | --- 182 | 183 | `; 184 | 185 | // Combine header and content 186 | return header + content; 187 | } 188 | 189 | /** 190 | * Simulates content summarization using an LLM 191 | * @param content The content to summarize 192 | * @returns Summarized content 193 | */ 194 | function simulateLLMSummarization(content: string): string { 195 | // In a real implementation, you would call an LLM API here 196 | console.log('Simulating LLM summarization...'); 197 | 198 | // For this simulation, we'll return a mock summary 199 | return ` 200 | # Browser Rendering API Summary 201 | 202 | Cloudflare Browser Rendering is a serverless headless browser service for Cloudflare Workers that enables: 203 | 204 | 1. Rendering JavaScript-heavy websites 205 | 2. Taking screenshots and generating PDFs 206 | 3. Extracting structured data 207 | 4. Automating browser interactions 208 | 209 | It offers two main interfaces: 210 | 211 | - **REST API**: Simple endpoints for common tasks like fetching content (/content) and taking screenshots (/screenshot) 212 | - **Workers Binding API**: Advanced integration with Puppeteer for complex automation 213 | 214 | The service runs within Cloudflare's network, providing low-latency access to browser capabilities without managing infrastructure. 215 | `; 216 | } 217 | 218 | /** 219 | * Main function to run the experiment 220 | */ 221 | async function runExperiment() { 222 | console.log('Starting Content Extraction and Processing experiment...'); 223 | 224 | try { 225 | // Extract content from Cloudflare docs 226 | const url = 'https://developers.cloudflare.com/browser-rendering/'; 227 | const html = await extractCleanContent(url); 228 | 229 | // Clean the content for LLM context 230 | const cleanedContent = cleanContentForLLM(html); 231 | console.log('\nCleaned content for LLM:'); 232 | console.log(cleanedContent.substring(0, 500) + '...'); 233 | 234 | // Extract metadata 235 | const metadata = extractMetadata(html); 236 | console.log('\nExtracted metadata:'); 237 | console.log(metadata); 238 | 239 | // Format for LLM context 240 | const formattedContent = formatForLLMContext(cleanedContent, metadata); 241 | console.log('\nFormatted content for LLM context:'); 242 | console.log(formattedContent.substring(0, 300) + '...'); 243 | 244 | // Simulate LLM summarization 245 | const summarizedContent = simulateLLMSummarization(formattedContent); 246 | console.log('\nSimulated LLM summarization:'); 247 | console.log(summarizedContent); 248 | 249 | console.log('\nIn a real implementation, you would:'); 250 | console.log('1. Use Cloudflare Browser Rendering to fetch the actual content'); 251 | console.log('2. Use a proper HTML parser for content extraction'); 252 | console.log('3. Call a real LLM API for summarization'); 253 | console.log('4. Store the processed content in Cloudflare R2 or another storage solution'); 254 | 255 | } catch (error) { 256 | console.error('Experiment failed:', error); 257 | } 258 | } 259 | 260 | // Run the experiment 261 | runExperiment(); 262 | ``` -------------------------------------------------------------------------------- /src/server.ts: -------------------------------------------------------------------------------- ```typescript 1 | import { Server } from '@modelcontextprotocol/sdk/server/index.js'; 2 | import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'; 3 | import { 4 | CallToolRequestSchema, 5 | ErrorCode, 6 | ListToolsRequestSchema, 7 | McpError, 8 | } from '@modelcontextprotocol/sdk/types.js'; 9 | import { BrowserClient } from './browser-client.js'; 10 | import { ContentProcessor } from './content-processor.js'; 11 | 12 | /** 13 | * Cloudflare Browser Rendering MCP Server 14 | * 15 | * This server provides tools for fetching and processing web content 16 | * using Cloudflare Browser Rendering for use as context in LLMs. 17 | */ 18 | export class BrowserRenderingServer { 19 | private server: Server; 20 | private browserClient: BrowserClient; 21 | private contentProcessor: ContentProcessor; 22 | 23 | constructor() { 24 | this.server = new Server( 25 | { 26 | name: 'cloudflare-browser-rendering', 27 | version: '0.1.0', 28 | }, 29 | { 30 | capabilities: { 31 | tools: {}, 32 | }, 33 | } 34 | ); 35 | 36 | // Initialize the browser client and content processor 37 | this.browserClient = new BrowserClient(); 38 | this.contentProcessor = new ContentProcessor(); 39 | 40 | // Set up request handlers 41 | this.setupToolHandlers(); 42 | 43 | // Error handling 44 | this.server.onerror = (error) => console.error('[MCP Error]', error); 45 | process.on('SIGINT', async () => { 46 | await this.server.close(); 47 | process.exit(0); 48 | }); 49 | } 50 | 51 | /** 52 | * Set up tool handlers for the MCP server 53 | */ 54 | private setupToolHandlers() { 55 | // List available tools 56 | this.server.setRequestHandler(ListToolsRequestSchema, async () => ({ 57 | tools: [ 58 | { 59 | name: 'fetch_page', 60 | description: 'Fetches and processes a web page for LLM context', 61 | inputSchema: { 62 | type: 'object', 63 | properties: { 64 | url: { 65 | type: 'string', 66 | description: 'URL to fetch', 67 | }, 68 | includeScreenshot: { 69 | type: 'boolean', 70 | description: 'Whether to include a screenshot (base64 encoded)', 71 | }, 72 | maxContentLength: { 73 | type: 'number', 74 | description: 'Maximum content length to return', 75 | }, 76 | }, 77 | required: ['url'], 78 | }, 79 | }, 80 | { 81 | name: 'search_documentation', 82 | description: 'Searches Cloudflare documentation and returns relevant content', 83 | inputSchema: { 84 | type: 'object', 85 | properties: { 86 | query: { 87 | type: 'string', 88 | description: 'Search query', 89 | }, 90 | maxResults: { 91 | type: 'number', 92 | description: 'Maximum number of results to return', 93 | }, 94 | }, 95 | required: ['query'], 96 | }, 97 | }, 98 | { 99 | name: 'extract_structured_content', 100 | description: 'Extracts structured content from a web page using CSS selectors', 101 | inputSchema: { 102 | type: 'object', 103 | properties: { 104 | url: { 105 | type: 'string', 106 | description: 'URL to extract content from', 107 | }, 108 | selectors: { 109 | type: 'object', 110 | description: 'CSS selectors to extract content', 111 | additionalProperties: { 112 | type: 'string', 113 | }, 114 | }, 115 | }, 116 | required: ['url', 'selectors'], 117 | }, 118 | }, 119 | { 120 | name: 'summarize_content', 121 | description: 'Summarizes web content for more concise LLM context', 122 | inputSchema: { 123 | type: 'object', 124 | properties: { 125 | url: { 126 | type: 'string', 127 | description: 'URL to summarize', 128 | }, 129 | maxLength: { 130 | type: 'number', 131 | description: 'Maximum length of the summary', 132 | }, 133 | }, 134 | required: ['url'], 135 | }, 136 | }, 137 | ], 138 | })); 139 | 140 | // Handle tool calls 141 | this.server.setRequestHandler(CallToolRequestSchema, async (request) => { 142 | const { name, arguments: args } = request.params; 143 | 144 | try { 145 | switch (name) { 146 | case 'fetch_page': 147 | return await this.handleFetchPage(args); 148 | case 'search_documentation': 149 | return await this.handleSearchDocumentation(args); 150 | case 'extract_structured_content': 151 | return await this.handleExtractStructuredContent(args); 152 | case 'summarize_content': 153 | return await this.handleSummarizeContent(args); 154 | default: 155 | throw new McpError( 156 | ErrorCode.MethodNotFound, 157 | `Unknown tool: ${name}` 158 | ); 159 | } 160 | } catch (error) { 161 | if (error instanceof McpError) { 162 | throw error; 163 | } 164 | console.error(`Error in tool ${name}:`, error); 165 | throw new McpError( 166 | ErrorCode.InternalError, 167 | `Error executing tool ${name}: ${error instanceof Error ? error.message : String(error)}` 168 | ); 169 | } 170 | }); 171 | } 172 | 173 | /** 174 | * Handle the fetch_page tool 175 | */ 176 | private async handleFetchPage(args: any) { 177 | // Validate arguments 178 | if (typeof args !== 'object' || args === null || typeof args.url !== 'string') { 179 | throw new McpError(ErrorCode.InvalidParams, 'Invalid arguments for fetch_page'); 180 | } 181 | 182 | const { url, includeScreenshot = false, maxContentLength = 10000 } = args; 183 | 184 | try { 185 | // Fetch the page content 186 | const html = await this.browserClient.fetchContent(url); 187 | 188 | // Process the content for LLM 189 | const processedContent = this.contentProcessor.processForLLM(html, url); 190 | 191 | // Truncate if necessary 192 | const truncatedContent = processedContent.length > maxContentLength 193 | ? processedContent.substring(0, maxContentLength) + '...' 194 | : processedContent; 195 | 196 | // Get screenshot if requested 197 | let screenshot = null; 198 | if (includeScreenshot) { 199 | screenshot = await this.browserClient.takeScreenshot(url); 200 | } 201 | 202 | // Return the result 203 | return { 204 | content: [ 205 | { 206 | type: 'text', 207 | text: truncatedContent, 208 | }, 209 | ...(screenshot ? [{ 210 | type: 'image', 211 | image: screenshot, 212 | }] : []), 213 | ], 214 | }; 215 | } catch (error) { 216 | console.error('Error fetching page:', error); 217 | return { 218 | content: [ 219 | { 220 | type: 'text', 221 | text: `Error fetching page: ${error instanceof Error ? error.message : String(error)}`, 222 | }, 223 | ], 224 | isError: true, 225 | }; 226 | } 227 | } 228 | 229 | /** 230 | * Handle the search_documentation tool 231 | */ 232 | private async handleSearchDocumentation(args: any) { 233 | // Validate arguments 234 | if (typeof args !== 'object' || args === null || typeof args.query !== 'string') { 235 | throw new McpError(ErrorCode.InvalidParams, 'Invalid arguments for search_documentation'); 236 | } 237 | 238 | const { query, maxResults = 3 } = args; 239 | 240 | try { 241 | // In a real implementation, you would: 242 | // 1. Use Cloudflare Browser Rendering to navigate to the docs 243 | // 2. Use the search functionality on the docs site 244 | // 3. Extract the search results 245 | 246 | // For this simulation, we'll return mock results 247 | const mockResults = [ 248 | { 249 | title: 'Browser Rendering API Overview', 250 | url: 'https://developers.cloudflare.com/browser-rendering/', 251 | snippet: 'Cloudflare Browser Rendering is a serverless headless browser service that allows execution of browser actions within Cloudflare Workers.', 252 | }, 253 | { 254 | title: 'REST API Reference', 255 | url: 'https://developers.cloudflare.com/browser-rendering/rest-api/', 256 | snippet: 'The REST API provides simple endpoints for common browser tasks like fetching content, taking screenshots, and generating PDFs.', 257 | }, 258 | { 259 | title: 'Workers Binding API Reference', 260 | url: 'https://developers.cloudflare.com/browser-rendering/workers-binding/', 261 | snippet: 'For more advanced use cases, you can use the Workers Binding API with Puppeteer to automate browser interactions.', 262 | }, 263 | ].slice(0, maxResults); 264 | 265 | // Format the results 266 | const formattedResults = mockResults.map(result => 267 | `## [${result.title}](${result.url})\n${result.snippet}\n` 268 | ).join('\n'); 269 | 270 | return { 271 | content: [ 272 | { 273 | type: 'text', 274 | text: `# Search Results for "${query}"\n\n${formattedResults}`, 275 | }, 276 | ], 277 | }; 278 | } catch (error) { 279 | console.error('Error searching documentation:', error); 280 | return { 281 | content: [ 282 | { 283 | type: 'text', 284 | text: `Error searching documentation: ${error instanceof Error ? error.message : String(error)}`, 285 | }, 286 | ], 287 | isError: true, 288 | }; 289 | } 290 | } 291 | 292 | /** 293 | * Handle the extract_structured_content tool 294 | */ 295 | private async handleExtractStructuredContent(args: any) { 296 | // Validate arguments 297 | if ( 298 | typeof args !== 'object' || 299 | args === null || 300 | typeof args.url !== 'string' || 301 | typeof args.selectors !== 'object' 302 | ) { 303 | throw new McpError(ErrorCode.InvalidParams, 'Invalid arguments for extract_structured_content'); 304 | } 305 | 306 | const { url, selectors } = args; 307 | 308 | try { 309 | // In a real implementation, you would: 310 | // 1. Use Cloudflare Browser Rendering to fetch the page 311 | // 2. Use the /scrape endpoint to extract content based on selectors 312 | 313 | // For this simulation, we'll return mock results 314 | const mockResults: Record<string, string> = {}; 315 | 316 | for (const [key, selector] of Object.entries(selectors)) { 317 | if (typeof selector === 'string') { 318 | // Simulate extraction based on selector 319 | mockResults[key] = `Extracted content for selector "${selector}"`; 320 | } 321 | } 322 | 323 | // Format the results 324 | const formattedResults = Object.entries(mockResults) 325 | .map(([key, value]) => `## ${key}\n${value}`) 326 | .join('\n\n'); 327 | 328 | return { 329 | content: [ 330 | { 331 | type: 'text', 332 | text: `# Structured Content from ${url}\n\n${formattedResults}`, 333 | }, 334 | ], 335 | }; 336 | } catch (error) { 337 | console.error('Error extracting structured content:', error); 338 | return { 339 | content: [ 340 | { 341 | type: 'text', 342 | text: `Error extracting structured content: ${error instanceof Error ? error.message : String(error)}`, 343 | }, 344 | ], 345 | isError: true, 346 | }; 347 | } 348 | } 349 | 350 | /** 351 | * Handle the summarize_content tool 352 | */ 353 | private async handleSummarizeContent(args: any) { 354 | // Validate arguments 355 | if (typeof args !== 'object' || args === null || typeof args.url !== 'string') { 356 | throw new McpError(ErrorCode.InvalidParams, 'Invalid arguments for summarize_content'); 357 | } 358 | 359 | const { url, maxLength = 500 } = args; 360 | 361 | try { 362 | // In a real implementation, you would: 363 | // 1. Fetch the page content using Cloudflare Browser Rendering 364 | // 2. Process the content for LLM 365 | // 3. Call an LLM API to summarize the content 366 | 367 | // For this simulation, we'll return a mock summary 368 | const mockSummary = ` 369 | # Browser Rendering API Summary 370 | 371 | Cloudflare Browser Rendering is a serverless headless browser service for Cloudflare Workers that enables: 372 | 373 | 1. Rendering JavaScript-heavy websites 374 | 2. Taking screenshots and generating PDFs 375 | 3. Extracting structured data 376 | 4. Automating browser interactions 377 | 378 | It offers two main interfaces: 379 | 380 | - **REST API**: Simple endpoints for common tasks 381 | - **Workers Binding API**: Advanced integration with Puppeteer 382 | 383 | The service runs within Cloudflare's network, providing low-latency access to browser capabilities without managing infrastructure. 384 | `.trim(); 385 | 386 | // Truncate if necessary 387 | const truncatedSummary = mockSummary.length > maxLength 388 | ? mockSummary.substring(0, maxLength) + '...' 389 | : mockSummary; 390 | 391 | return { 392 | content: [ 393 | { 394 | type: 'text', 395 | text: truncatedSummary, 396 | }, 397 | ], 398 | }; 399 | } catch (error) { 400 | console.error('Error summarizing content:', error); 401 | return { 402 | content: [ 403 | { 404 | type: 'text', 405 | text: `Error summarizing content: ${error instanceof Error ? error.message : String(error)}`, 406 | }, 407 | ], 408 | isError: true, 409 | }; 410 | } 411 | } 412 | 413 | /** 414 | * Run the MCP server 415 | */ 416 | async run() { 417 | const transport = new StdioServerTransport(); 418 | await this.server.connect(transport); 419 | } 420 | } 421 | ```