# Directory Structure ``` ├── .gitignore ├── bun.lock ├── package.json ├── README.md ├── src │ ├── chrome-interface.ts │ ├── runtime-templates │ │ ├── ariaInteractiveElements.js │ │ └── removeTargetAttributes.js │ └── server.ts └── tsconfig.json ``` # Files -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- ``` node_modules/ ``` -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- ```markdown # Chrome MCP Server A Model Context Protocol (MCP) server that provides fine-grained control over a Chrome browser instance through the Chrome DevTools Protocol (CDP). ## Prerequisites - [Bun](https://bun.sh/) (recommended) or Node.js (v14 or higher) - Chrome browser with remote debugging enabled ## Setup ### Installing Bun 1. Install Bun (if not already installed): ```bash # macOS, Linux, or WSL curl -fsSL https://bun.sh/install | bash # Windows (using PowerShell) powershell -c "irm bun.sh/install.ps1 | iex" # Alternatively, using npm npm install -g bun ``` 2. Start Chrome with remote debugging enabled: ```bash # macOS /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 # Windows start chrome --remote-debugging-port=9222 # Linux google-chrome --remote-debugging-port=9222 ``` 3. Install dependencies: ```bash bun install ``` 4. Start the server: ```bash bun start ``` For development with hot reloading: ```bash bun dev ``` The server will start on port 3000 by default. You can change this by setting the `PORT` environment variable. ## Configuring Roo Code to use this MCP server To use this Chrome MCP server with Roo Code: 1. Open Roo Code settings 2. Navigate to the MCP settings configuration file at: - macOS: `~/Library/Application Support/Code/User/globalStorage/rooveterinaryinc.roo-cline/settings/cline_mcp_settings.json` - Windows: `%APPDATA%\Code\User\globalStorage\rooveterinaryinc.roo-cline\settings\cline_mcp_settings.json` - Linux: `~/.config/Code/User/globalStorage/rooveterinaryinc.roo-cline/settings/cline_mcp_settings.json` 3. Add the following configuration to the `mcpServers` object: ```json { "mcpServers": { "chrome-control": { "url": "http://localhost:3000/sse", "disabled": false, "alwaysAllow": [] } } } ``` 4. Save the file and restart Roo Code to apply the changes. 5. You can now use the Chrome MCP tools in Roo Code to control the browser. ## Available Tools The server provides the following tools for browser control: ### navigate Navigate to a specific URL. Parameters: - `url` (string): The URL to navigate to ### click Click at specific coordinates. Parameters: - `x` (number): X coordinate - `y` (number): Y coordinate ### type Type text at the current focus. Parameters: - `text` (string): Text to type ### clickElement Click on an element by its index in the page info. Parameters: - `selector` (string): Element index (e.g., "0" for the first element) ### getText Get text content of an element using a CSS selector. Parameters: - `selector` (string): CSS selector to find the element ### getPageInfo Get semantic information about the page including interactive elements and text nodes. ### getPageState Get current page state including URL, title, scroll position, and viewport size. ## Usage The server implements the Model Context Protocol with SSE transport. Connect to the server at: - SSE endpoint: `http://localhost:3000/sse` - Messages endpoint: `http://localhost:3000/message?sessionId=...` When using with Roo Code, the configuration in the MCP settings file will handle the connection automatically. ## Development To run the server in development mode with hot reloading: ```bash bun dev ``` This uses Bun's built-in watch mode to automatically restart the server when files change. ## License MIT ``` -------------------------------------------------------------------------------- /src/runtime-templates/removeTargetAttributes.js: -------------------------------------------------------------------------------- ```javascript // Find all links and remove target attributes document.querySelectorAll('a').forEach(link => { if (link.hasAttribute('target')) { link.removeAttribute('target'); } }); console.log('[MCP Browser] Modified all links to prevent opening in new windows/tabs'); ``` -------------------------------------------------------------------------------- /tsconfig.json: -------------------------------------------------------------------------------- ```json { "compilerOptions": { "target": "ES2020", "module": "ES2020", "moduleResolution": "node", "lib": ["ES2020", "DOM"], "outDir": "./dist", "rootDir": "./src", "strict": true, "esModuleInterop": true, "skipLibCheck": true, "forceConsistentCasingInFileNames": true, "resolveJsonModule": true, "declaration": true, "types": ["node", "express"] }, "include": ["src/**/*"], "exclude": ["node_modules", "dist"] } ``` -------------------------------------------------------------------------------- /package.json: -------------------------------------------------------------------------------- ```json { "name": "chrome-mcp", "version": "1.0.0", "description": "MCP server for Chrome browser control", "type": "module", "scripts": { "start": "bun run src/server.ts", "dev": "bun --watch src/server.ts" }, "dependencies": { "@modelcontextprotocol/sdk": "^1.8.0", "chrome-remote-interface": "^0.33.3", "cors": "^2.8.5", "diff": "^7.0.0", "express": "^4.21.2", "uuid": "^11.1.0" }, "devDependencies": { "@types/chrome-remote-interface": "^0.31.14", "@types/cors": "^2.8.17", "@types/diff": "^7.0.2", "@types/express": "^5.0.1", "@types/uuid": "^10.0.0" } } ``` -------------------------------------------------------------------------------- /src/server.ts: -------------------------------------------------------------------------------- ```typescript import express, { Request, Response, NextFunction } from "express"; import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js"; import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js"; import { z } from "zod"; import cors from "cors"; import * as diff from 'diff'; import { ChromeInterface } from './chrome-interface'; // Type for content items type ContentItem = { type: "text"; text: string; }; // Type for tool responses type ToolResponse = { content: ContentItem[]; }; // Helper function for logging function logToolUsage<T extends Record<string, unknown>>(toolName: string, input: T, output: ToolResponse) { console.log(`\n[Tool Used] ${toolName}`); console.log('Input:', JSON.stringify(input, null, 2)); console.log('Output:', JSON.stringify(output, null, 2)); console.log('----------------------------------------'); } async function startServer() { // Create Chrome interface const chrome = new ChromeInterface(); let lastPageInfo: string | null = null; // Create an MCP server const server = new McpServer({ name: "Chrome MCP Server", version: "1.0.0", description: "Chrome browser automation using MCP. When user is asking to 'navigate' or 'go to' a URL, use the tools provided by this server. If fails, try again." }); // Connect to Chrome console.log("Connecting to Chrome..."); await chrome.connect().catch(error => { console.error('Failed to connect to Chrome:', error); process.exit(1); }); // Add Chrome tools server.tool( "navigate", "Navigate to a specified URL in the browser. Only use this if you have reasonably inferred the URL from the user's request. When navigation an existing session, prefer the other tools, like click, goBack, goForward, etc.", { url: z.string().url() }, async ({ url }): Promise<ToolResponse> => { const result: ToolResponse = { content: [{ type: "text", text: `Navigated to ${url}` }] }; await chrome.navigate(url); logToolUsage("navigate", { url }, result); return result; } ); server.tool( "click", "Click at specific x,y coordinates in the browser window. IMPORTANT: Always check the page info after clicking. When interacting with dropdowns, use ArrowUp and ArrowDown keys. Try to figure out what the selected item is when interacting with the dropdowns and use that to navigate.", { x: z.number(), y: z.number() }, async ({ x, y }): Promise<ToolResponse> => { await chrome.click(x, y); // Delay for 1 second await new Promise(resolve => setTimeout(resolve, 1000)); const result: ToolResponse = { content: [{ type: "text", text: `Clicked at (${x}, ${y})` }] }; logToolUsage("click", { x, y }, result); return result; } ); server.tool( "clickElementByIndex", "Click an interactive element by its index in the page. Indices are returned by getPageInfo. Always check the page info after clicking. For text input fields, prefer using focusElementByIndex instead.", { index: z.number() }, async ({ index }): Promise<ToolResponse> => { await chrome.clickElementByIndex(index); const result: ToolResponse = { content: [{ type: "text", text: `Clicked element at index: ${index}` }] }; logToolUsage("clickElementByIndex", { index }, result); return result; } ); server.tool( "focusElementByIndex", "Focus an interactive element by its index in the page. Indices are returned by getPageInfo. This is the preferred method for focusing text input fields before typing. Always check the page info after focusing.", { index: z.number() }, async ({ index }): Promise<ToolResponse> => { await chrome.focusElementByIndex(index); const result: ToolResponse = { content: [{ type: "text", text: `Focused element at index: ${index}` }] }; logToolUsage("focusElementByIndex", { index }, result); return result; } ); server.tool( "type", "Type text into the currently focused element, with support for special keys like {Enter}, {Tab}, etc. Use {Enter} for newlines in textareas or to submit forms. NEVER USE \n\n IN THE TEXT YOU TYPE. Use {Ctrl+A} to select all text in the focused element. If you think you're in a rich text editor, you probably can use {Ctrl+B} to bold, {Ctrl+I} to italic, {Ctrl+U} to underline, etc. IMPORTANT: Always use focusElementByIndex on text input fields before typing. ALSO IMPORTANT. NEVER RELY ON TABS AT ALL TO FOCUS ELEMENTS. EXPLICITLY USE focusElementByIndex ON ELEMENTS BEFORE TYPING. ALSO, ALWAYS CHECK THE PAGE INFO AFTER TYPING. Always check the page info after typing.", { text: z.string() }, async ({ text }): Promise<ToolResponse> => { await chrome.type(text); const result: ToolResponse = { content: [{ type: "text", text: `Typed: ${text}` }] }; logToolUsage("type", { text }, result); return result; } ); server.tool( "doubleClick", "Double click at specific x,y coordinates in the browser window. Useful for text selection or other double-click specific actions. Always check the page info after double clicking.", { x: z.number(), y: z.number() }, async ({ x, y }): Promise<ToolResponse> => { await chrome.doubleClick(x, y); const result: ToolResponse = { content: [{ type: "text", text: `Double clicked at (${x}, ${y})` }] }; logToolUsage("doubleClick", { x, y }, result); return result; } ); server.tool( "tripleClick", "Triple click at specific x,y coordinates in the browser window. Useful for selecting entire paragraphs or lines of text. Always check the page info after triple clicking.", { x: z.number(), y: z.number() }, async ({ x, y }): Promise<ToolResponse> => { await chrome.tripleClick(x, y); const result: ToolResponse = { content: [{ type: "text", text: `Triple clicked at (${x}, ${y})` }] }; logToolUsage("tripleClick", { x, y }, result); return result; } ); // server.tool( // "getText", // "Get text content of an element matching the specified CSS selector", // { selector: z.string() }, // async ({ selector }) => { // const text = await chrome.getElementText(selector); // return { content: [{ type: "text", text }] }; // } // ); server.tool( "getPageInfo", "Get semantic information about the current page, including interactive elements, their indices, and all the text content on the page. Returns a diff from one of the previous calls if available and if the diff is smaller than the full content. If you're missing context of the element indices, refer to one of your previous pageInfo results. If page info is fully incomplete, or you don't have context of the element indices, or previous page info results, use the force flag to try again. WARNING: don't use the force flag unless you're sure you need it. You can also use the search and percent flags to search for a specific term and navigate to a specific percentage of the page. Use evaluate to execute JavaScript code in order to udnerstand where in the viewport you are and infer the percent if needed. This is useful when navigating anchor links.", { force: z.boolean().optional(), cursor: z.number().optional(), remainingPages: z.number().optional(), search: z.string().optional(), percent: z.number().optional() }, async ({ force = false, cursor = 0, remainingPages = 1, search, percent }): Promise<ToolResponse> => { const PAGE_SIZE = 10 * 1024; // 10KB per page const CONTEXT_SIZE = 100; // Characters of context around search matches const currentPageInfo = await chrome.getPageInfo(); // Helper function to get text chunk by percentage const getTextByPercent = (text: string, percentage: number) => { if (percentage < 0 || percentage > 100) return 0; return Math.floor((text.length * percentage) / 100); }; // Helper function to get search results with context const getSearchResults = (text: string, searchTerm: string) => { if (!searchTerm) return null; const results: { start: number; end: number; text: string }[] = []; const regex = new RegExp(searchTerm, 'gi'); let match; while ((match = regex.exec(text)) !== null) { const start = Math.max(0, match.index - CONTEXT_SIZE); const end = Math.min(text.length, match.index + match[0].length + CONTEXT_SIZE); // Merge with previous section if they overlap if (results.length > 0 && start <= results[results.length - 1].end) { results[results.length - 1].end = end; } else { results.push({ start, end, text: text.slice(start, end) }); } } if (results.length === 0) return null; return results.map(({ start, end, text }) => { return `---- Match at position ${start}-${end} ----\n${text}`; }).join('\n'); }; // Helper function to paginate text const paginateText = (text: string, start: number, pageSize: number) => { const end = start + pageSize; const chunk = text.slice(start, end); const hasMore = end < text.length; const nextCursor = hasMore ? end : -1; const remainingSize = Math.ceil((text.length - end) / pageSize); return { chunk, nextCursor, remainingSize }; }; // Handle percentage-based navigation if (typeof percent === 'number') { cursor = getTextByPercent(currentPageInfo, percent); } // If force is true or there's no previous page info, return the paginated full content if (force || !lastPageInfo) { lastPageInfo = currentPageInfo; // If search is specified, return search results if (search) { const searchResults = getSearchResults(currentPageInfo, search); if (!searchResults) { return { content: [{ type: "text", text: `No matches found for "${search}"` }] }; } const { chunk, nextCursor, remainingSize } = paginateText(searchResults, cursor, PAGE_SIZE); const paginationInfo = nextCursor >= 0 ? `\n[Page info: next_cursor=${nextCursor}, remaining_pages=${remainingSize}]` : '\n[Page info: end of content]'; const result: ToolResponse = { content: [{ type: "text", text: chunk + paginationInfo }] }; logToolUsage("getPageInfo", { force, cursor, remainingPages, search, percent }, result); return result; } const { chunk, nextCursor, remainingSize } = paginateText(currentPageInfo, cursor, PAGE_SIZE); const paginationInfo = nextCursor >= 0 ? `\n[Page info: next_cursor=${nextCursor}, remaining_pages=${remainingSize}]` : '\n[Page info: end of content]'; const result: ToolResponse = { content: [{ type: "text", text: chunk + paginationInfo }] }; logToolUsage("getPageInfo", { force, cursor, remainingPages, search, percent }, result); return result; } // Calculate the diff between the last and current page info const changes = diff.diffWords(lastPageInfo, currentPageInfo); const diffText = changes .filter(part => part.added || part.removed) .map(part => { if (part.added) return `[ADDED] ${part.value}`; if (part.removed) return `[REMOVED] ${part.value}`; return ''; }) .join('\n'); // Helper function to check if diff is meaningful const isNonMeaningfulDiff = (diff: string) => { // Check if diff is mostly just numbers const lines = diff.split('\n'); const numericLines = lines.filter(line => { const value = line.replace(/\[ADDED\]|\[REMOVED\]/, '').trim(); return /^\d+$/.test(value); }); if (numericLines.length / lines.length > 0.5) { return true; } // Check if diff is too fragmented (lots of tiny changes) if (lines.length > 10 && lines.every(line => line.length < 10)) { return true; } return false; }; // If the diff is larger than the current content or not meaningful, return the paginated full content if (diffText.length > currentPageInfo.length || isNonMeaningfulDiff(diffText)) { lastPageInfo = currentPageInfo; // If search is specified, return search results if (search) { const searchResults = getSearchResults(currentPageInfo, search); if (!searchResults) { return { content: [{ type: "text", text: `No matches found for "${search}"` }] }; } const { chunk, nextCursor, remainingSize } = paginateText(searchResults, cursor, PAGE_SIZE); const paginationInfo = nextCursor >= 0 ? `\n[Page info: next_cursor=${nextCursor}, remaining_pages=${remainingSize}]` : '\n[Page info: end of content]'; const result: ToolResponse = { content: [{ type: "text", text: chunk + paginationInfo }] }; logToolUsage("getPageInfo", { force, cursor, remainingPages, search, percent }, result); return result; } const { chunk, nextCursor, remainingSize } = paginateText(currentPageInfo, cursor, PAGE_SIZE); const paginationInfo = nextCursor >= 0 ? `\n[Page info: next_cursor=${nextCursor}, remaining_pages=${remainingSize}]` : '\n[Page info: end of content]'; const result: ToolResponse = { content: [{ type: "text", text: chunk + paginationInfo }] }; logToolUsage("getPageInfo", { force, cursor, remainingPages, search, percent }, result); return result; } // Update the last page info and return the paginated diff lastPageInfo = currentPageInfo; const baseText = diffText || 'No changes detected'; // If search is specified, return search results from the diff if (search) { const searchResults = getSearchResults(baseText, search); if (!searchResults) { return { content: [{ type: "text", text: `No matches found for "${search}"` }] }; } const { chunk, nextCursor, remainingSize } = paginateText(searchResults, cursor, PAGE_SIZE); const paginationInfo = nextCursor >= 0 ? `\n[Page info: next_cursor=${nextCursor}, remaining_pages=${remainingSize}]` : '\n[Page info: end of content]'; const result: ToolResponse = { content: [{ type: "text", text: chunk + paginationInfo }] }; logToolUsage("getPageInfo", { force, cursor, remainingPages, search, percent }, result); return result; } const { chunk, nextCursor, remainingSize } = paginateText(baseText, cursor, PAGE_SIZE); const paginationInfo = nextCursor >= 0 ? `\n[Page info: next_cursor=${nextCursor}, remaining_pages=${remainingSize}]` : '\n[Page info: end of content]'; const result: ToolResponse = { content: [{ type: "text", text: chunk + paginationInfo }] }; logToolUsage("getPageInfo", { force, cursor, remainingPages, search, percent }, result); return result; } ); // server.tool( // "getPageState", // "Get current page state including URL, title, scroll position, and viewport size", // {}, // async () => { // const state = await chrome.getPageState(); // return { content: [{ type: "text", text: JSON.stringify(state) }] }; // } // ); server.tool( "goBack", "Navigate back one step in the browser history", {}, async (): Promise<ToolResponse> => { await chrome.goBack(); const result: ToolResponse = { content: [{ type: "text", text: "Navigated back" }] }; logToolUsage("goBack", {}, result); return result; } ); server.tool( "goForward", "Navigate forward one step in the browser history", {}, async (): Promise<ToolResponse> => { await chrome.goForward(); const result: ToolResponse = { content: [{ type: "text", text: "Navigated forward" }] }; logToolUsage("goForward", {}, result); return result; } ); server.tool( "evaluate", "Execute JavaScript code in the context of the current page", { expression: z.string() }, async ({ expression }): Promise<ToolResponse> => { const result = await chrome.evaluate(expression); const response: ToolResponse = { content: [{ type: "text", text: JSON.stringify(result) }] }; logToolUsage("evaluate", { expression }, response); return response; } ); // Create Express app const app = express(); app.use(cors()); // Store active transports const transports: {[sessionId: string]: SSEServerTransport} = {}; // SSE endpoint for client connectiWons app.get("/sse", async (_: Request, res: Response) => { const transport = new SSEServerTransport('/messages', res); transports[transport.sessionId] = transport; // Clean up when connection closes res.on("close", () => { delete transports[transport.sessionId]; }); // Connect the transport to our MCP server await server.connect(transport); }); // Endpoint for receiving messages from clients app.post("/messages", async (req: Request, res: Response) => { const sessionId = req.query.sessionId as string; const transport = transports[sessionId]; if (transport) { await transport.handlePostMessage(req, res); } else { res.status(400).send('No transport found for sessionId'); } }); // Start the server const port = 3000; app.listen(port, '0.0.0.0', () => { console.log(`MCP Server running at http://localhost:${port}`); console.log(`SSE endpoint: http://localhost:${port}/sse`); console.log(`Messages endpoint: http://localhost:${port}/messages`); }); // Handle cleanup process.on('SIGINT', async () => { await chrome.close(); process.exit(0); }); } // Start the server startServer().catch(error => { console.error('Failed to start server:', error); process.exit(1); }); ``` -------------------------------------------------------------------------------- /src/chrome-interface.ts: -------------------------------------------------------------------------------- ```typescript import CDP from 'chrome-remote-interface'; import fs from 'fs'; import path from 'path'; import diff from 'diff'; // Types for Chrome DevTools Protocol interactions interface NavigationResult { navigation: string; pageInfo: string; pageState: { url: string; title: string; readyState: string; scrollPosition: { x: number; y: number }; viewportSize: { width: number; height: number }; }; } type MouseButton = 'left' | 'right' | 'middle'; interface MouseEventOptions { type: 'mouseMoved' | 'mousePressed' | 'mouseReleased' | 'mouseWheel'; x: number; y: number; button?: MouseButton; buttons?: number; clickCount?: number; } interface SpecialKeyConfig { key: string; code: string; text?: string; unmodifiedText?: string; windowsVirtualKeyCode: number; nativeVirtualKeyCode: number; autoRepeat: boolean; isKeypad: boolean; isSystemKey: boolean; } // Function to load template file function loadAriaTemplate(): string { const TEMPLATES_DIR = path.join(__dirname, 'runtime-templates'); try { return fs.readFileSync(path.join(TEMPLATES_DIR, 'ariaInteractiveElements.js'), 'utf-8'); } catch (error) { console.error('Failed to load ariaInteractiveElements template:', error); throw error; } } // Chrome interface class to handle CDP interactions export class ChromeInterface { private client: CDP.Client | null = null; private page: any | null = null; private ariaScriptTemplate: string = ''; constructor() { this.ariaScriptTemplate = loadAriaTemplate(); } /** * Connects to Chrome and sets up necessary event listeners */ async connect() { try { this.client = await CDP(); const { Page, DOM, Runtime, Network } = this.client; // Enable necessary domains await Promise.all([ Page.enable(), DOM.enable(), Runtime.enable(), Network.enable(), ]); // Set up simple page load handler that injects the script Page.loadEventFired(async () => { console.log('[Page Load] Load event fired, injecting ARIA script'); await this.injectAriaScript(); }); return true; } catch (error) { console.error('Failed to connect to Chrome:', error); return false; } } /** * Injects the ARIA interactive elements script into the page */ private async injectAriaScript() { if (!this.client?.Runtime) return; console.log('[ARIA] Injecting ARIA interactive elements script'); await this.client.Runtime.evaluate({ expression: this.ariaScriptTemplate }); } /** * Navigates to a URL and waits for page load */ async navigate(url: string): Promise<NavigationResult> { if (!this.client) throw new Error('Chrome not connected'); console.log(`[Navigation] Starting navigation to ${url}`); try { const NAVIGATION_TIMEOUT = 30000; // 30 seconds timeout await Promise.race([ this.client.Page.navigate({ url }), new Promise((_, reject) => setTimeout(() => reject(new Error('Navigation timeout')), NAVIGATION_TIMEOUT) ) ]); console.log('[Navigation] Navigation successful'); const pageInfo = await this.getPageInfo(); const pageState = await this.getPageState(); return { navigation: `Successfully navigated to ${url}`, pageInfo, pageState }; } catch (error) { console.error('[Navigation] Navigation error:', error); throw error; } } /** * Simulates a mouse click at specified coordinates with verification */ async click(x: number, y: number) { if (!this.client) throw new Error('Chrome not connected'); const { Input, Runtime } = this.client; // Get element info before clicking const preClickInfo = await Runtime.evaluate({ expression: ` (function() { const element = document.elementFromPoint(${x}, ${y}); return element ? { tagName: element.tagName, href: element instanceof HTMLAnchorElement ? element.href : null, type: element instanceof HTMLInputElement ? element.type : null, isInteractive: ( element instanceof HTMLButtonElement || element instanceof HTMLAnchorElement || element instanceof HTMLInputElement || element.hasAttribute('role') || window.getComputedStyle(element).cursor === 'pointer' ) } : null; })() `, returnByValue: true }); const elementInfo = preClickInfo.result.value; console.log('[Click] Clicking element:', elementInfo); // Dispatch a complete mouse event sequence const dispatchMouseEvent = async (options: MouseEventOptions) => { await Input.dispatchMouseEvent({ ...options, button: 'left', buttons: options.type === 'mouseMoved' ? 0 : 1, clickCount: (options.type === 'mousePressed' || options.type === 'mouseReleased') ? 1 : 0, }); }; // Natural mouse movement sequence with hover first await dispatchMouseEvent({ type: 'mouseMoved', x: x - 50, y: y - 50 }); await new Promise(resolve => setTimeout(resolve, 50)); // Small delay for hover await dispatchMouseEvent({ type: 'mouseMoved', x, y }); await new Promise(resolve => setTimeout(resolve, 50)); // Small delay for hover effect // Click sequence await dispatchMouseEvent({ type: 'mousePressed', x, y }); await new Promise(resolve => setTimeout(resolve, 50)); // Small delay between press and release await dispatchMouseEvent({ type: 'mouseReleased', x, y, buttons: 0 }); // Verify the click had an effect and show visual feedback await Runtime.evaluate({ expression: ` (function() { const element = document.elementFromPoint(${x}, ${y}); if (element) { // Add a brief flash to show where we clicked const div = document.createElement('div'); div.style.position = 'fixed'; div.style.left = '${x}px'; div.style.top = '${y}px'; div.style.width = '20px'; div.style.height = '20px'; div.style.backgroundColor = 'rgba(255, 255, 0, 0.5)'; div.style.borderRadius = '50%'; div.style.pointerEvents = 'none'; div.style.zIndex = '999999'; div.style.transition = 'all 0.3s ease-out'; document.body.appendChild(div); // Animate the feedback setTimeout(() => { div.style.transform = 'scale(1.5)'; div.style.opacity = '0'; setTimeout(() => div.remove(), 300); }, 50); // For links, verify navigation started if (element instanceof HTMLAnchorElement) { element.dispatchEvent(new MouseEvent('click', { bubbles: true, cancelable: true, view: window })); } } })() ` }); // Additional delay for link clicks to start navigation if (elementInfo?.href) { await new Promise(resolve => setTimeout(resolve, 100)); } } /** * Simulates a double click at specified coordinates */ async doubleClick(x: number, y: number) { if (!this.client) throw new Error('Chrome not connected'); const { Input } = this.client; const dispatchMouseEvent = async (options: MouseEventOptions) => { await Input.dispatchMouseEvent({ ...options, button: 'left', buttons: options.type === 'mouseMoved' ? 0 : 1, clickCount: (options.type === 'mousePressed' || options.type === 'mouseReleased') ? 2 : 0, }); }; // Natural mouse movement sequence with double click await dispatchMouseEvent({ type: 'mouseMoved', x: x - 50, y: y - 50 }); await dispatchMouseEvent({ type: 'mouseMoved', x, y }); await dispatchMouseEvent({ type: 'mousePressed', x, y }); await dispatchMouseEvent({ type: 'mouseReleased', x, y, buttons: 0 }); } /** * Simulates a triple click at specified coordinates */ async tripleClick(x: number, y: number) { if (!this.client) throw new Error('Chrome not connected'); const { Input } = this.client; const dispatchMouseEvent = async (options: MouseEventOptions) => { await Input.dispatchMouseEvent({ ...options, button: 'left', buttons: options.type === 'mouseMoved' ? 0 : 1, clickCount: (options.type === 'mousePressed' || options.type === 'mouseReleased') ? 3 : 0, }); }; // Natural mouse movement sequence with triple click await dispatchMouseEvent({ type: 'mouseMoved', x: x - 50, y: y - 50 }); await dispatchMouseEvent({ type: 'mouseMoved', x, y }); await dispatchMouseEvent({ type: 'mousePressed', x, y }); await dispatchMouseEvent({ type: 'mouseReleased', x, y, buttons: 0 }); } /** * Focuses an element by its index in the interactive elements array */ async focusElementByIndex(index: number) { if (!this.client) throw new Error('Chrome not connected'); const { Runtime } = this.client; // Get element and focus it const { result } = await Runtime.evaluate({ expression: ` (function() { const element = window.interactiveElements[${index}]; if (!element) throw new Error('Element not found at index ' + ${index}); // Scroll into view with smooth behavior element.scrollIntoView({ behavior: 'smooth', block: 'center' }); // Wait a bit for scroll to complete return new Promise(resolve => { setTimeout(() => { element.focus(); resolve(true); }, 1000); }); })() `, awaitPromise: true, returnByValue: true }); if (result.subtype === 'error') { throw new Error(result.description); } // Highlight the element after focusing await this.highlightElement(`window.interactiveElements[${index}]`); } /** * Clicks an element by its index in the interactive elements array */ async clickElementByIndex(index: number) { if (!this.client) throw new Error('Chrome not connected'); const { Runtime } = this.client; // Get element info and coordinates const elementInfo = await Runtime.evaluate({ expression: ` (function() { const element = window.interactiveElements[${index}]; if (!element) throw new Error('Element not found at index ' + ${index}); // Scroll into view with smooth behavior element.scrollIntoView({ behavior: 'smooth', block: 'center' }); return new Promise(resolve => { setTimeout(() => { const rect = element.getBoundingClientRect(); resolve({ rect: { x: Math.round(rect.left + (rect.width * 0.5)), // Click in center y: Math.round(rect.top + (rect.height * 0.5)) }, tagName: element.tagName, href: element instanceof HTMLAnchorElement ? element.href : null, type: element instanceof HTMLInputElement ? element.type : null }); }, 1000); // Wait for scroll }); })() `, awaitPromise: true, returnByValue: true }); if (elementInfo.result.subtype === 'error') { throw new Error(elementInfo.result.description); } const { x, y } = elementInfo.result.value.rect; // Highlight the element before clicking await this.highlightElement(`window.interactiveElements[${index}]`); // Add a small delay to make the highlight visible await new Promise(resolve => setTimeout(resolve, 300)); // Perform the physical click await this.click(x, y); // For inputs, ensure they're focused after click if (elementInfo.result.value.type) { await Runtime.evaluate({ expression: `window.interactiveElements[${index}].focus()` }); } } /** * Types text with support for special keys */ async type(text: string) { if (!this.client) throw new Error('Chrome not connected'); const { Input } = this.client; // Add random delay between keystrokes to simulate human typing const getRandomDelay = () => { // Base delay between 100-200ms with occasional longer pauses return Math.random() * 20 + 20; }; const specialKeys: Record<string, SpecialKeyConfig> = { Enter: { key: 'Enter', code: 'Enter', text: '\r', unmodifiedText: '\r', windowsVirtualKeyCode: 13, nativeVirtualKeyCode: 13, autoRepeat: false, isKeypad: false, isSystemKey: false, }, Tab: { key: 'Tab', code: 'Tab', windowsVirtualKeyCode: 9, nativeVirtualKeyCode: 9, autoRepeat: false, isKeypad: false, isSystemKey: false, }, Backspace: { key: 'Backspace', code: 'Backspace', windowsVirtualKeyCode: 8, nativeVirtualKeyCode: 8, autoRepeat: false, isKeypad: false, isSystemKey: false, }, ArrowUp: { key: 'ArrowUp', code: 'ArrowUp', windowsVirtualKeyCode: 38, nativeVirtualKeyCode: 38, autoRepeat: false, isKeypad: false, isSystemKey: false, }, ArrowDown: { key: 'ArrowDown', code: 'ArrowDown', windowsVirtualKeyCode: 40, nativeVirtualKeyCode: 40, autoRepeat: false, isKeypad: false, isSystemKey: false, }, ArrowLeft: { key: 'ArrowLeft', code: 'ArrowLeft', windowsVirtualKeyCode: 37, nativeVirtualKeyCode: 37, autoRepeat: false, isKeypad: false, isSystemKey: false, }, ArrowRight: { key: 'ArrowRight', code: 'ArrowRight', windowsVirtualKeyCode: 39, nativeVirtualKeyCode: 39, autoRepeat: false, isKeypad: false, isSystemKey: false, }, 'Ctrl+A': { key: 'a', code: 'KeyA', windowsVirtualKeyCode: 65, nativeVirtualKeyCode: 65, autoRepeat: false, isKeypad: false, isSystemKey: false }, 'Ctrl+B': { key: 'b', code: 'KeyB', windowsVirtualKeyCode: 66, nativeVirtualKeyCode: 66, autoRepeat: false, isKeypad: false, isSystemKey: false }, 'Ctrl+C': { key: 'c', code: 'KeyC', windowsVirtualKeyCode: 67, nativeVirtualKeyCode: 67, autoRepeat: false, isKeypad: false, isSystemKey: false }, 'Ctrl+I': { key: 'i', code: 'KeyI', windowsVirtualKeyCode: 73, nativeVirtualKeyCode: 73, autoRepeat: false, isKeypad: false, isSystemKey: false }, 'Ctrl+U': { key: 'u', code: 'KeyU', windowsVirtualKeyCode: 85, nativeVirtualKeyCode: 85, autoRepeat: false, isKeypad: false, isSystemKey: false }, 'Ctrl+V': { key: 'v', code: 'KeyV', windowsVirtualKeyCode: 86, nativeVirtualKeyCode: 86, autoRepeat: false, isKeypad: false, isSystemKey: false }, 'Ctrl+X': { key: 'x', code: 'KeyX', windowsVirtualKeyCode: 88, nativeVirtualKeyCode: 88, autoRepeat: false, isKeypad: false, isSystemKey: false }, 'Ctrl+Z': { key: 'z', code: 'KeyZ', windowsVirtualKeyCode: 90, nativeVirtualKeyCode: 90, autoRepeat: false, isKeypad: false, isSystemKey: false }, }; const handleModifierKey = async (keyConfig: SpecialKeyConfig, modifiers: { ctrl?: boolean; shift?: boolean; alt?: boolean; meta?: boolean }) => { if (!this.client) return; const { Input } = this.client; if (modifiers.ctrl) { await Input.dispatchKeyEvent({ type: 'keyDown', key: 'Control', code: 'ControlLeft', windowsVirtualKeyCode: 17, nativeVirtualKeyCode: 17, modifiers: 2, isSystemKey: false }); } await Input.dispatchKeyEvent({ type: 'keyDown', ...keyConfig, modifiers: modifiers.ctrl ? 2 : 0, }); await Input.dispatchKeyEvent({ type: 'keyUp', ...keyConfig, modifiers: modifiers.ctrl ? 2 : 0, }); if (modifiers.ctrl) { await Input.dispatchKeyEvent({ type: 'keyUp', key: 'Control', code: 'ControlLeft', windowsVirtualKeyCode: 17, nativeVirtualKeyCode: 17, modifiers: 0, isSystemKey: false }); } }; const parts = text.split(/(\{[^}]+\})/); for (const part of parts) { if (part.startsWith('{') && part.endsWith('}')) { const keyName = part.slice(1, -1); if (keyName in specialKeys) { const keyConfig = specialKeys[keyName]; if (keyName.startsWith('Ctrl+')) { await handleModifierKey(keyConfig, { ctrl: true }); } else { await Input.dispatchKeyEvent({ type: 'keyDown', ...keyConfig, }); if (keyName === 'Enter') { await Input.dispatchKeyEvent({ type: 'char', text: '\r', unmodifiedText: '\r', windowsVirtualKeyCode: 13, nativeVirtualKeyCode: 13, autoRepeat: false, isKeypad: false, isSystemKey: false, }); } await Input.dispatchKeyEvent({ type: 'keyUp', ...keyConfig, }); await new Promise(resolve => setTimeout(resolve, 50)); if (keyName === 'Enter' || keyName === 'Tab') { await new Promise(resolve => setTimeout(resolve, 100)); } } } else { for (const char of part) { // Add random delay before each keystroke await new Promise(resolve => setTimeout(resolve, getRandomDelay())); await Input.dispatchKeyEvent({ type: 'keyDown', text: char, unmodifiedText: char, key: char, code: `Key${char.toUpperCase()}`, }); await Input.dispatchKeyEvent({ type: 'keyUp', text: char, unmodifiedText: char, key: char, code: `Key${char.toUpperCase()}`, }); } } } else { for (const char of part) { // Add random delay before each keystroke await new Promise(resolve => setTimeout(resolve, getRandomDelay())); await Input.dispatchKeyEvent({ type: 'keyDown', text: char, unmodifiedText: char, key: char, code: `Key${char.toUpperCase()}`, }); await Input.dispatchKeyEvent({ type: 'keyUp', text: char, unmodifiedText: char, key: char, code: `Key${char.toUpperCase()}`, }); } } } // Add a slightly longer delay after finishing typing await new Promise(resolve => setTimeout(resolve, 500)); } /** * Gets text content of an element by selector */ async getElementText(selector: string): Promise<string> { if (!this.client) throw new Error('Chrome not connected'); const { Runtime } = this.client; const result = await Runtime.evaluate({ expression: `document.querySelector('${selector}')?.textContent || ''`, }); return result.result.value; } /** * Closes the Chrome connection */ async close() { if (this.client) { await this.client.close(); this.client = null; this.page = null; } } /** * Gets semantic information about the page */ async getPageInfo() { if (!this.client) throw new Error('Chrome not connected'); const { Runtime } = this.client; const { result } = await Runtime.evaluate({ expression: 'window.createTextRepresentation(); window.textRepresentation || "Page text representation not available"', returnByValue: true }); return result.value; } /** * Highlights an element briefly before interaction */ private async highlightElement(element: string) { if (!this.client) throw new Error('Chrome not connected'); const { Runtime } = this.client; await Runtime.evaluate({ expression: ` (function() { const el = ${element}; if (!el) return; // Store original styles const originalOutline = el.style.outline; const originalOutlineOffset = el.style.outlineOffset; // Add highlight effect el.style.outline = '2px solid #007AFF'; el.style.outlineOffset = '2px'; // Remove highlight after animation setTimeout(() => { el.style.outline = originalOutline; el.style.outlineOffset = originalOutlineOffset; }, 500); })() ` }); } /** * Gets the current page state */ async getPageState() { if (!this.client) throw new Error('Chrome not connected'); const { Runtime } = this.client; const result = await Runtime.evaluate({ expression: ` (function() { return { url: window.location.href, title: document.title, readyState: document.readyState, scrollPosition: { x: window.scrollX, y: window.scrollY }, viewportSize: { width: window.innerWidth, height: window.innerHeight } }; })() `, returnByValue: true, }); return result.result.value; } /** * Navigates back in history */ async goBack(): Promise<NavigationResult> { if (!this.client) throw new Error('Chrome not connected'); console.log('[Navigation] Going back in history'); await this.client.Page.navigate({ url: 'javascript:history.back()' }); const pageInfo = await this.getPageInfo(); const pageState = await this.getPageState(); return { navigation: 'Navigated back in history', pageInfo, pageState }; } /** * Navigates forward in history */ async goForward(): Promise<NavigationResult> { if (!this.client) throw new Error('Chrome not connected'); console.log('[Navigation] Going forward in history'); await this.client.Page.navigate({ url: 'javascript:history.forward()' }); const pageInfo = await this.getPageInfo(); const pageState = await this.getPageState(); return { navigation: 'Navigated forward in history', pageInfo, pageState }; } /** * Evaluates JavaScript code in the page context */ async evaluate(expression: string) { if (!this.client) throw new Error('Chrome not connected'); const { Runtime } = this.client; const result = await Runtime.evaluate({ expression, returnByValue: true }); return result.result.value; } } ``` -------------------------------------------------------------------------------- /src/runtime-templates/ariaInteractiveElements.js: -------------------------------------------------------------------------------- ```javascript (function () { function createTextRepresentation() { // Native interactive HTML elements that are inherently focusable/clickable const INTERACTIVE_ELEMENTS = [ 'a[href]', 'button', 'input:not([type="hidden"])', 'select', 'textarea', 'summary', 'video[controls]', 'audio[controls]', ]; // Interactive ARIA roles that make elements programmatically interactive const INTERACTIVE_ROLES = [ 'button', 'checkbox', 'combobox', 'gridcell', 'link', 'listbox', 'menuitem', 'menuitemcheckbox', 'menuitemradio', 'option', 'radio', 'searchbox', 'slider', 'spinbutton', 'switch', 'tab', 'textbox', 'treeitem', ]; // Build complete selector for all interactive elements const completeSelector = [...INTERACTIVE_ELEMENTS, ...INTERACTIVE_ROLES.map((role) => `[role="${role}"]`)].join( ',' ); // Helper to get accessible name of an element following ARIA naming specs const getAccessibleName = (el) => { // First try explicit labels const explicitLabel = el.getAttribute('aria-label'); if (explicitLabel) return explicitLabel; // Then try labelledby const labelledBy = el.getAttribute('aria-labelledby'); if (labelledBy) { const labelElements = labelledBy.split(' ').map((id) => document.getElementById(id)); const labelText = labelElements.map((labelEl) => (labelEl ? labelEl.textContent.trim() : '')).join(' '); if (labelText) return labelText; } // Then try associated label element const label = el.labels ? el.labels[0] : null; if (label) return label.textContent.trim(); // Then try placeholder const placeholder = el.getAttribute('placeholder'); if (placeholder) return placeholder; // Then try title const title = el.getAttribute('title'); if (title) return title; // For inputs, use value if (el.tagName.toLowerCase() === 'input') { return el.getAttribute('value') || el.value || ''; } // For other elements, get all text content including from child elements let textContent = ''; const walker = document.createTreeWalker(el, NodeFilter.SHOW_TEXT, { acceptNode: (node) => { // Skip text in hidden elements let parent = node.parentElement; while (parent && parent !== el) { const style = window.getComputedStyle(parent); if (style.display === 'none' || style.visibility === 'hidden') { return NodeFilter.FILTER_REJECT; } parent = parent.parentElement; } return NodeFilter.FILTER_ACCEPT; } }); let node; while ((node = walker.nextNode())) { const text = node.textContent.trim(); if (text) textContent += (textContent ? ' ' : '') + text; } return textContent || ''; }; const interactiveElements = []; // Find all interactive elements in DOM order const findInteractiveElements = () => { // Clear existing elements interactiveElements.length = 0; // First find all native buttons and interactive elements document.querySelectorAll(completeSelector).forEach(node => { if ( node.getAttribute('aria-hidden') !== 'true' && !node.hasAttribute('disabled') && !node.hasAttribute('inert') && window.getComputedStyle(node).display !== 'none' && window.getComputedStyle(node).visibility !== 'hidden' ) { interactiveElements.push(node); } }); // Then use TreeWalker for any we might have missed const walker = document.createTreeWalker(document.body, NodeFilter.SHOW_ELEMENT, { acceptNode: (node) => { if ( !interactiveElements.includes(node) && // Skip if already found node.matches(completeSelector) && node.getAttribute('aria-hidden') !== 'true' && !node.hasAttribute('disabled') && !node.hasAttribute('inert') && window.getComputedStyle(node).display !== 'none' && window.getComputedStyle(node).visibility !== 'hidden' ) { return NodeFilter.FILTER_ACCEPT; } return NodeFilter.FILTER_SKIP; } }); let node; while ((node = walker.nextNode())) { if (!interactiveElements.includes(node)) { interactiveElements.push(node); } } }; // Create text representation of the page with interactive elements const createTextRepresentation = () => { const USE_ELEMENT_POSITION_FOR_TEXT_REPRESENTATION = false; // Flag to control text representation method if (USE_ELEMENT_POSITION_FOR_TEXT_REPRESENTATION) { // Position-based text representation (existing implementation) const output = []; const processedElements = new Set(); const LINE_HEIGHT = 20; // Base line height const MIN_GAP_FOR_NEWLINE = LINE_HEIGHT * 1.2; // Gap threshold for newline const HORIZONTAL_GAP = 50; // Minimum horizontal gap to consider elements on different lines // Helper to get element's bounding box const getBoundingBox = (node) => { if (node.nodeType === Node.TEXT_NODE) { const range = document.createRange(); range.selectNodeContents(node); return range.getBoundingClientRect(); } return node.getBoundingClientRect(); }; // Store nodes with their positions for sorting const nodePositions = []; // Process all nodes in DOM order const walker = document.createTreeWalker(document.body, NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_TEXT, { acceptNode: (node) => { // Skip script/style elements and their contents if ( node.nodeType === Node.ELEMENT_NODE && (node.tagName.toLowerCase() === 'script' || node.tagName.toLowerCase() === 'style' || node.tagName.toLowerCase() === 'head' || node.tagName.toLowerCase() === 'meta' || node.tagName.toLowerCase() === 'link') ) { return NodeFilter.FILTER_REJECT; } return NodeFilter.FILTER_ACCEPT; }, }); let node; while ((node = walker.nextNode())) { // Handle text nodes if (node.nodeType === Node.TEXT_NODE) { const text = node.textContent.trim(); if (!text) continue; // Skip text in hidden elements let parent = node.parentElement; let isHidden = false; let isInsideProcessedInteractive = false; let computedStyles = new Map(); // Cache computed styles while (parent) { // Cache and reuse computed styles let style = computedStyles.get(parent); if (!style) { style = window.getComputedStyle(parent); computedStyles.set(parent, style); } if ( style.display === 'none' || style.visibility === 'hidden' || parent.getAttribute('aria-hidden') === 'true' ) { isHidden = true; break; } if (processedElements.has(parent)) { isInsideProcessedInteractive = true; break; } parent = parent.parentElement; } if (isHidden || isInsideProcessedInteractive) continue; // Skip if this is just a number inside a highlight element if (/^\d+$/.test(text)) { parent = node.parentElement; while (parent) { if (parent.classList && parent.classList.contains('claude-highlight')) { isHidden = true; break; } parent = parent.parentElement; } if (isHidden) continue; } // Check if this text is inside an interactive element let isInsideInteractive = false; let interactiveParent = null; parent = node.parentElement; while (parent) { if (parent.matches(completeSelector)) { isInsideInteractive = true; interactiveParent = parent; break; } parent = parent.parentElement; } // If inside an interactive element, add it to the interactive element's content if (isInsideInteractive && interactiveParent) { const index = interactiveElements.indexOf(interactiveParent); if (index !== -1 && !processedElements.has(interactiveParent)) { const role = interactiveParent.getAttribute('role') || interactiveParent.tagName.toLowerCase(); const name = getAccessibleName(interactiveParent); if (name) { const box = getBoundingBox(interactiveParent); if (box.width > 0 && box.height > 0) { nodePositions.push({ type: 'interactive', content: `[${index}]{${role}}(${name})`, box, y: box.top + window.pageYOffset, x: box.left + window.pageXOffset }); } } processedElements.add(interactiveParent); } continue; } // If not inside an interactive element, add as regular text const box = getBoundingBox(node); if (box.width > 0 && box.height > 0) { nodePositions.push({ type: 'text', content: text, box, y: box.top + window.pageYOffset, x: box.left + window.pageXOffset }); } } // Handle interactive elements if (node.nodeType === Node.ELEMENT_NODE && node.matches(completeSelector)) { const index = interactiveElements.indexOf(node); if (index !== -1 && !processedElements.has(node)) { const role = node.getAttribute('role') || node.tagName.toLowerCase(); const name = getAccessibleName(node); if (name) { const box = getBoundingBox(node); if (box.width > 0 && box.height > 0) { nodePositions.push({ type: 'interactive', content: `[${index}]{${role}}(${name})`, box, y: box.top + window.pageYOffset, x: box.left + window.pageXOffset }); } } processedElements.add(node); } } } // Sort nodes by vertical position first, then horizontal nodePositions.sort((a, b) => { const yDiff = a.y - b.y; if (Math.abs(yDiff) < MIN_GAP_FOR_NEWLINE) { return a.x - b.x; } return yDiff; }); // Group nodes into lines let currentLine = []; let lastY = 0; let lastX = 0; const flushLine = () => { if (currentLine.length > 0) { // Sort line by x position currentLine.sort((a, b) => a.x - b.x); output.push(currentLine.map(node => node.content).join(' ')); currentLine = []; } }; for (const node of nodePositions) { // Start new line if significant vertical gap or if horizontal position is before previous element if (currentLine.length > 0 && (Math.abs(node.y - lastY) > MIN_GAP_FOR_NEWLINE || node.x < lastX - HORIZONTAL_GAP)) { flushLine(); output.push('\n'); } currentLine.push(node); lastY = node.y; lastX = node.x + node.box.width; } // Flush final line flushLine(); // Join all text with appropriate spacing return output .join('\n') .replace(/\n\s+/g, '\n') // Clean up newline spacing .replace(/\n{3,}/g, '\n\n') // Limit consecutive newlines to 2 .trim(); } else { // DOM-based text representation const output = []; const processedElements = new Set(); // Process all nodes in DOM order const walker = document.createTreeWalker(document.body, NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_TEXT, { acceptNode: (node) => { // Skip script/style elements and their contents if ( node.nodeType === Node.ELEMENT_NODE && (node.tagName.toLowerCase() === 'script' || node.tagName.toLowerCase() === 'style' || node.tagName.toLowerCase() === 'head' || node.tagName.toLowerCase() === 'meta' || node.tagName.toLowerCase() === 'link') ) { return NodeFilter.FILTER_REJECT; } return NodeFilter.FILTER_ACCEPT; }, }); let node; let currentBlock = []; const flushBlock = () => { if (currentBlock.length > 0) { output.push(currentBlock.join(' ')); currentBlock = []; } }; while ((node = walker.nextNode())) { // Skip hidden elements let parent = node.parentElement; let isHidden = false; while (parent) { const style = window.getComputedStyle(parent); if ( style.display === 'none' || style.visibility === 'hidden' || parent.getAttribute('aria-hidden') === 'true' ) { isHidden = true; break; } parent = parent.parentElement; } if (isHidden) continue; // Handle text nodes if (node.nodeType === Node.TEXT_NODE) { const text = node.textContent.trim(); if (!text) continue; // Skip if this is just a number inside a highlight element if (/^\d+$/.test(text)) { parent = node.parentElement; while (parent) { if (parent.classList && parent.classList.contains('claude-highlight')) { isHidden = true; break; } parent = parent.parentElement; } if (isHidden) continue; } // Check if this text is inside an interactive element let isInsideInteractive = false; let interactiveParent = null; parent = node.parentElement; while (parent) { if (parent.matches(completeSelector)) { isInsideInteractive = true; interactiveParent = parent; break; } parent = parent.parentElement; } // If inside an interactive element, add it to the interactive element's content if (isInsideInteractive && interactiveParent) { if (!processedElements.has(interactiveParent)) { const index = interactiveElements.indexOf(interactiveParent); if (index !== -1) { const role = interactiveParent.getAttribute('role') || interactiveParent.tagName.toLowerCase(); const name = getAccessibleName(interactiveParent); if (name) { currentBlock.push(`[${index}]{${role}}(${name})`); } processedElements.add(interactiveParent); } } continue; } // Add text to current block currentBlock.push(text); } // Handle block-level elements and interactive elements if (node.nodeType === Node.ELEMENT_NODE) { const style = window.getComputedStyle(node); const isBlockLevel = style.display === 'block' || style.display === 'flex' || style.display === 'grid' || node.tagName.toLowerCase() === 'br'; // Handle interactive elements if (node.matches(completeSelector) && !processedElements.has(node)) { const index = interactiveElements.indexOf(node); if (index !== -1) { const role = node.getAttribute('role') || node.tagName.toLowerCase(); const name = getAccessibleName(node); if (name) { currentBlock.push(`[${index}]{${role}}(${name})`); } processedElements.add(node); } } // Add newline for block-level elements if (isBlockLevel) { flushBlock(); output.push(''); } } } // Flush final block flushBlock(); // Join all text with appropriate spacing return output .join('\n') .replace(/\n\s+/g, '\n') // Clean up newline spacing .replace(/\n{3,}/g, '\n\n') // Limit consecutive newlines to 2 .trim(); } }; // Helper functions for accurate clicking const isElementClickable = (element) => { if (!element) return false; const style = window.getComputedStyle(element); const rect = element.getBoundingClientRect(); return ( // Element must be visible style.display !== 'none' && style.visibility !== 'hidden' && style.opacity !== '0' && // Must have non-zero dimensions rect.width > 0 && rect.height > 0 && // Must be within viewport bounds rect.top >= 0 && rect.left >= 0 && rect.bottom <= (window.innerHeight || document.documentElement.clientHeight) && rect.right <= (window.innerWidth || document.documentElement.clientWidth) && // Must not be disabled !element.hasAttribute('disabled') && !element.hasAttribute('aria-disabled') && element.getAttribute('aria-hidden') !== 'true' ); }; const getClickableCenter = (element) => { const rect = element.getBoundingClientRect(); // Get the actual visible area accounting for overflow const style = window.getComputedStyle(element); const overflowX = style.overflowX; const overflowY = style.overflowY; let width = rect.width; let height = rect.height; // Adjust for overflow if (overflowX === 'hidden') { width = Math.min(width, element.clientWidth); } if (overflowY === 'hidden') { height = Math.min(height, element.clientHeight); } // Calculate center coordinates const x = rect.left + (width / 2); const y = rect.top + (height / 2); return { x: Math.round(x + window.pageXOffset), y: Math.round(y + window.pageYOffset) }; }; // Expose helper functions to window for use by MCP window.isElementClickable = isElementClickable; window.getClickableCenter = getClickableCenter; // Main execution findInteractiveElements(); const textRepresentation = createTextRepresentation(); if (false) requestAnimationFrame(() => { // Clear existing highlights document.querySelectorAll('.claude-highlight').forEach((el) => el.remove()); // Create main overlay container const overlay = document.createElement('div'); overlay.className = 'claude-highlight'; overlay.style.cssText = ` position: absolute; top: 0; left: 0; width: 100%; height: ${Math.max(document.body.scrollHeight, document.documentElement.scrollHeight)}px; pointer-events: none; z-index: 2147483647; `; document.body.appendChild(overlay); // Batch DOM operations and reduce reflows const fragment = document.createDocumentFragment(); const pageXOffset = window.pageXOffset; const pageYOffset = window.pageYOffset; // Create highlights in a batch interactiveElements.forEach((el, index) => { const rect = el.getBoundingClientRect(); if (rect.width <= 0 || rect.height <= 0) return; const highlight = document.createElement('div'); highlight.className = 'claude-highlight'; highlight.style.cssText = ` position: absolute; left: ${pageXOffset + rect.left}px; top: ${pageYOffset + rect.top}px; width: ${rect.width}px; height: ${rect.height}px; background-color: hsla(${(index * 30) % 360}, 80%, 50%, 0.3); display: flex; align-items: center; justify-content: center; font-size: 10px; font-weight: bold; color: #000; pointer-events: none; border: none; z-index: 2147483647; `; highlight.textContent = index; fragment.appendChild(highlight); }); // Single DOM update overlay.appendChild(fragment); }); // Return the results const result = { interactiveElements, textRepresentation, }; window.interactiveElements = interactiveElements; window.textRepresentation = textRepresentation; console.log(`Gerenated ${interactiveElements.length} interactive elements`); console.log(`Text representation size: ${textRepresentation.length} characters`); return result; } // // Debounce helper function // function debounce(func, wait) { // let timeout; // return function executedFunction(...args) { // const later = () => { // clearTimeout(timeout); // func(...args); // }; // clearTimeout(timeout); // timeout = setTimeout(later, wait); // }; // } // // Create a debounced version of the text representation creation // const debouncedCreateTextRepresentation = debounce(() => { // const result = createTextRepresentation(); // // Dispatch a custom event with the new text representation // const event = new CustomEvent('textRepresentationUpdated', { // detail: result, // }); // document.dispatchEvent(event); // }, 250); // 250ms debounce time // // Set up mutation observer to watch for DOM changes // const observer = new MutationObserver((mutations) => { // // Check if any mutation is relevant (affects visibility, attributes, or structure) // const isRelevantMutation = mutations.some((mutation) => { // // Check if the mutation affects visibility or attributes // if ( // mutation.type === 'attributes' && // (mutation.attributeName === 'aria-hidden' || // mutation.attributeName === 'disabled' || // mutation.attributeName === 'inert' || // mutation.attributeName === 'style' || // mutation.attributeName === 'class') // ) { // return true; // } // // Check if the mutation affects the DOM structure // if (mutation.type === 'childList') { // return true; // } // return false; // }); // if (isRelevantMutation) { // debouncedCreateTextRepresentation(); // } // }); // // Start observing the document with the configured parameters // observer.observe(document.body, { // childList: true, // subtree: true, // attributes: true, // characterData: true, // attributeFilter: ['aria-hidden', 'disabled', 'inert', 'style', 'class', 'role', 'aria-label', 'aria-labelledby'], // }); window.createTextRepresentation = createTextRepresentation; // Initial creation createTextRepresentation(); // // Also rerun when dynamic content might be loaded // window.addEventListener('load', createTextRepresentation); // document.addEventListener('DOMContentLoaded', createTextRepresentation); // // Handle dynamic updates like dialogs // const dynamicUpdateEvents = ['dialog', 'popstate', 'pushstate', 'replacestate']; // dynamicUpdateEvents.forEach(event => { // window.addEventListener(event, () => { // setTimeout(createTextRepresentation, 100); // Small delay to let content settle // }); // }); console.log('Aria Interactive Elements script loaded'); })(); ```