luminati-io/brightdata-mcp # codebase.md

# Directory Structure

```
├── .github
│   └── workflows
│       ├── publish-mcp.yml
│       └── release.yml
├── .gitignore
├── .npmignore
├── aria_snapshot_filter.js
├── assets
│   ├── Demo.gif
│   ├── Demo2.gif
│   ├── Demo3.gif
│   ├── logo.png
│   └── Tools.md
├── brightdata-mcp-extension.dxt
├── browser_session.js
├── browser_tools.js
├── CHANGELOG.md
├── Dockerfile
├── examples
│   └── README.md
├── LICENSE
├── package-lock.json
├── package.json
├── README.md
├── server.js
├── server.json
└── smithery.yaml
```

# Files

--------------------------------------------------------------------------------
/.npmignore:
--------------------------------------------------------------------------------

```
1 | *.dxt
2 | smithery.yaml
3 | Dockerfile
4 | examples
5 | assets
6 | CHANGELOG.md
7 | 
```

--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------

```
 1 | # build output
 2 | dist/
 3 | 
 4 | # generated types
 5 | .astro/
 6 | 
 7 | # dependencies
 8 | node_modules/
 9 | 
10 | # logs
11 | npm-debug.log*
12 | yarn-debug.log*
13 | yarn-error.log*
14 | pnpm-debug.log*
15 | 
16 | # environment variables
17 | .env
18 | .env.production
19 | 
20 | # macOS-specific files
21 | .DS_Store
22 | 
23 | # jetbrains setting folder
24 | .idea/
25 | 
```

--------------------------------------------------------------------------------
/examples/README.md:
--------------------------------------------------------------------------------

```markdown
 1 | # MCP Usage Examples
 2 | 
 3 | A curated list of community demos using Bright Data's MCP server.
 4 | 
 5 | ## 🧠 Notable Examples
 6 | 
 7 | - **AI voice agent that closed 4 deals & made $596 overnight 🤑**  
 8 |   [📹 YouTube Demo](https://www.youtube.com/watch?v=YGzT3sVdwdY) 
 9 | 
10 |    [💻 GitHub Repo](https://github.com/llSourcell/my_ai_intern)
11 | 
12 | - **Langgraph with mcp-adapters demo**
13 | 
14 |   [📹 YouTube Demo](https://www.youtube.com/watch?v=6DXuadyaJ4g)
15 |   
16 |   [💻 Source Code](https://github.com/techwithtim/BrightDataMCPServerAgent)
17 | 
18 | - **Researcher Agent built with Google ADK that is connected to Bright Data's MCP to fetch real-time data**
19 | 
20 |    [📹 YouTube Demo](https://www.youtube.com/watch?v=r7WG6dXWdUI)
21 |   
22 |   [💻Source Code](https://github.com/MeirKaD/MCP_ADK)
23 | 
24 | - **Replacing 3 MCP servers with our MCP server to avoid getting blocked 🤯**  
25 | 
26 |   [📹 YouTube Demo](https://www.youtube.com/watch?v=0xmE0OJrNmg) 
27 | 
28 | - **Scrape ANY Website In Realtime With This Powerful AI MCP Server**
29 | 
30 |    [📹 YouTube Demo](https://www.youtube.com/watch?v=bL5JIeGL3J0)
31 | 
32 |  - **Multi-Agent job finder using Bright Data MCP and TypeScript from SCRATCH**
33 | 
34 |    [📹 YouTube Demo](https://www.youtube.com/watch?v=45OtteCGFiI)
35 |    
36 |    [💻Source Code](https://github.com/bitswired/jobwizard)
37 | 
38 |   - **Usage example with Gemini CLI**
39 | 
40 |     [📹 YouTube Tutorial](https://www.youtube.com/watch?v=FE1LChbgFEw)
41 | ---
42 | 
43 | Got a cool example? Open a PR or contact us!
44 | 
```

--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------

```markdown
  1 | <div align="center">
  2 |   <a href="https://brightdata.com/ai/mcp-server">
  3 |     <img src="https://github.com/user-attachments/assets/c21b3f7b-7ff1-40c3-b3d8-66706913d62f" alt="Bright Data Logo">
  4 |   </a>
  5 | 
  6 |   <h1>The Web MCP</h1>
  7 |   
  8 |   <p>
  9 |     <strong>🌐 Give your AI real-time web superpowers</strong><br/>
 10 |     <i>Seamlessly connect LLMs to the live web without getting blocked</i>
 11 |   </p>
 12 | 
 13 |   <p>
 14 |     <a href="https://www.npmjs.com/package/@brightdata/mcp">
 15 |       <img src="https://img.shields.io/npm/v/@brightdata/mcp?style=for-the-badge&color=blue" alt="npm version"/>
 16 |     </a>
 17 |     <a href="https://www.npmjs.com/package/@brightdata/mcp">
 18 |       <img src="https://img.shields.io/npm/dw/@brightdata/mcp?style=for-the-badge&color=green" alt="npm downloads"/>
 19 |     </a>
 20 |     <a href="https://github.com/brightdata-com/brightdata-mcp/blob/main/LICENSE">
 21 |       <img src="https://img.shields.io/badge/license-MIT-purple?style=for-the-badge" alt="License"/>
 22 |     </a>
 23 |   </p>
 24 | 
 25 |   <p>
 26 |     <a href="#-quick-start">Quick Start</a> •
 27 |     <a href="#-features">Features</a> •
 28 |     <a href="#-pricing--modes">Pricing</a> •
 29 |     <a href="#-demos">Demos</a> •
 30 |     <a href="#-documentation">Docs</a> •
 31 |     <a href="#-support">Support</a>
 32 |   </p>
 33 | 
 34 |   <div>
 35 |     <h3>🎉 <strong>Free Tier Available!</strong> 🎉</h3>
 36 |     <p><strong>5,000 requests/month FREE</strong> <br/>
 37 |     <sub>Perfect for prototyping and everyday AI workflows</sub></p>
 38 |   </div>
 39 | </div>
 40 | 
 41 | ---
 42 | 
 43 | ## 🌟 Overview
 44 | 
 45 | **The Web MCP** is your gateway to giving AI assistants true web capabilities. No more outdated responses, no more "I can't access real-time information" - just seamless, reliable web access that actually works.
 46 | 
 47 | Built by [Bright Data](https://brightdata.com), the world's #1 web data platform, this MCP server ensures your AI never gets blocked, rate-limited, or served CAPTCHAs.
 48 | 
 49 | <div align="center">
 50 |   <table>
 51 |     <tr>
 52 |       <td align="center">✅ <strong>Works with Any LLM</strong><br/><sub>Claude, GPT, Gemini, Llama</sub></td>
 53 |       <td align="center">🛡️ <strong>Never Gets Blocked</strong><br/><sub>Enterprise-grade unblocking</sub></td>
 54 |       <td align="center">🚀 <strong>5,000 Free Requests</strong><br/><sub>Monthly</sub></td>
 55 |       <td align="center">⚡ <strong>Zero Config</strong><br/><sub>Works out of the box</sub></td>
 56 |     </tr>
 57 |   </table>
 58 | </div>
 59 | 
 60 | ---
 61 | 
 62 | ## 🎯 Perfect For
 63 | 
 64 | - 🔍 **Real-time Research** - Get current prices, news, and live data
 65 | - 🛍️ **E-commerce Intelligence** - Monitor products, prices, and availability  
 66 | - 📊 **Market Analysis** - Track competitors and industry trends
 67 | - 🤖 **AI Agents** - Build agents that can actually browse the web
 68 | - 📝 **Content Creation** - Access up-to-date information for writing
 69 | - 🎓 **Academic Research** - Gather data from multiple sources efficiently
 70 | 
 71 | ---
 72 | 
 73 | ## ⚡ Quick Start
 74 | 
 75 | 
 76 | <summary><b>📡 Use our hosted server - No installation needed!</b></summary>
 77 | 
 78 | Perfect for users who want zero setup. Just add this URL to your MCP client:
 79 | 
 80 | ```
 81 | https://mcp.brightdata.com/mcp?token=YOUR_API_TOKEN_HERE
 82 | ```
 83 | 
 84 | **Setup in Claude Desktop:**
 85 | 1. Go to: Settings → Connectors → Add custom connector
 86 | 2. Name: `Bright Data Web`
 87 | 3. URL: `https://mcp.brightdata.com/mcp?token=YOUR_API_TOKEN`
 88 | 4. Click "Add" and you're done! ✨
 89 | 
 90 | 
 91 | <summary><b>Run locally on your machine</b></summary>
 92 | 
 93 | ```json
 94 | {
 95 |   "mcpServers": {
 96 |     "Bright Data": {
 97 |       "command": "npx",
 98 |       "args": ["@brightdata/mcp"],
 99 |       "env": {
100 |         "API_TOKEN": "<your-api-token-here>"
101 |       }
102 |     }
103 |   }
104 | }
105 | ```
106 | 
107 | 
108 | ---
109 | 
110 | ## 🚀 Pricing & Modes
111 | 
112 | <div align="center">
113 |   <table>
114 |     <tr>
115 |       <th width="33%">⚡ Rapid Mode (Free tier)</th>
116 |       <th width="33%">💎 Pro Mode</th>
117 |     </tr>
118 |     <tr>
119 |       <td align="center">
120 |         <h3>$0/month</h3>
121 |         <p><strong>5,000 requests</strong></p>
122 |         <hr/>
123 |         <p>✅ Web Search<br/>
124 |         ✅ Scraping with Web unlocker<br/>
125 |         ❌ Browser Automation<br/>
126 |         ❌ Web data tools</p>
127 |         <br/>
128 |         <code>Default Mode</code>
129 |       </td>
130 |       <td align="center">
131 |         <h3>Pay-as-you-go</h3>
132 |         <p><strong>Every thing in rapid and 60+ Advanced Tools</strong></p>
133 |         <hr/>
134 |         <p>✅ Browser Control<br/>
135 |         ✅ Web Data APIs<br/>
136 |         <br/>
137 |         <br/>
138 |         <br/>
139 |         <code>PRO_MODE=true</code>
140 |       </td>
141 |     </tr>
142 |   </table>
143 | </div>
144 | 
145 | > **💡 Note:** Pro mode is **not included** in the free tier and incurs additional charges based on usage.
146 | 
147 | ---
148 | 
149 | ## ✨ Features
150 | 
151 | ### 🔥 Core Capabilities
152 | 
153 | <table>
154 |   <tr>
155 |     <td>🔍 <b>Smart Web Search</b><br/>Google-quality results optimized for AI</td>
156 |     <td>📄 <b>Clean Markdown</b><br/>AI-ready content extraction</td>
157 |   </tr>
158 |   <tr>
159 |     <td>🌍 <b>Global Access</b><br/>Bypass geo-restrictions automatically</td>
160 |     <td>🛡️ <b>Anti-Bot Protection</b><br/>Never get blocked or rate-limited</td>
161 |   </tr>
162 |   <tr>
163 |     <td>🤖 <b>Browser Automation</b><br/>Control real browsers remotely (Pro)</td>
164 |     <td>⚡ <b>Lightning Fast</b><br/>Optimized for minimal latency</td>
165 |   </tr>
166 | </table>
167 | 
168 | ### 🎯 Example Queries That Just Work
169 | 
170 | ```yaml
171 | ✅ "What's Tesla's current stock price?"
172 | ✅ "Find the best-rated restaurants in Tokyo right now"
173 | ✅ "Get today's weather forecast for New York"
174 | ✅ "What movies are releasing this week?"
175 | ✅ "What are the trending topics on Twitter today?"
176 | ```
177 | 
178 | ---
179 | 
180 | ## 🎬 Demos
181 | 
182 | > **Note:** These videos show earlier versions. New demos coming soon! 🎥
183 | 
184 | <details>
185 | <summary><b>View Demo Videos</b></summary>
186 | 
187 | ### Basic Web Search Demo
188 | https://github.com/user-attachments/assets/59f6ebba-801a-49ab-8278-1b2120912e33
189 | 
190 | ### Advanced Scraping Demo
191 | https://github.com/user-attachments/assets/61ab0bee-fdfa-4d50-b0de-5fab96b4b91d
192 | 
193 | [📺 More tutorials on YouTube →](https://github.com/brightdata-com/brightdata-mcp/blob/main/examples/README.md)
194 | 
195 | </details>
196 | 
197 | ---
198 | 
199 | ## 🔧 Available Tools
200 | 
201 | ### ⚡ Rapid Mode Tools (Default - Free)
202 | 
203 | | Tool | Description | Use Case |
204 | |------|-------------|----------|
205 | | 🔍 `search_engine` | Web search with AI-optimized results | Research, fact-checking, current events |
206 | | 📄 `scrape_as_markdown` | Convert any webpage to clean markdown | Content extraction, documentation |
207 | 
208 | ### 💎 Pro Mode Tools (60+ Tools)
209 | 
210 | <details>
211 | <summary><b>Click to see all Pro tools</b></summary>
212 | 
213 | | Category | Tools | Description |
214 | |----------|-------|-------------|
215 | | **Browser Control** | `scraping_browser.*` | Full browser automation |
216 | | **Web Data APIs** | `web_data_*` | Structured data extraction |
217 | | **E-commerce** | Product scrapers | Amazon, eBay, Walmart data |
218 | | **Social Media** | Social scrapers | Twitter, LinkedIn, Instagram |
219 | | **Maps & Local** | Location tools | Google Maps, business data |
220 | 
221 | [📚 View complete tool documentation →](https://github.com/brightdata-com/brightdata-mcp/blob/main/assets/Tools.md)
222 | 
223 | </details>
224 | 
225 | ---
226 | 
227 | ## 🎮 Try It Now!
228 | 
229 | ### 🧪 Online Playground
230 | Try the Web MCP without any setup:
231 | 
232 | <div align="center">
233 |   <a href="https://brightdata.com/ai/playground-chat">
234 |     <img src="https://img.shields.io/badge/Try_on-Playground-00C7B7?style=for-the-badge&logo=data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMjQiIGhlaWdodD0iMjQiIHZpZXdCb3g9IjAgMCAyNCAyNCIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPHBhdGggZD0iTTEyIDJMMyA3VjE3TDEyIDIyTDIxIDE3VjdMMTIgMloiIHN0cm9rZT0id2hpdGUiIHN0cm9rZS13aWR0aD0iMiIvPgo8L3N2Zz4=" alt="Playground"/>
235 |   </a>
236 | </div>
237 | 
238 | ---
239 | 
240 | ## 🔧 Configuration
241 | 
242 | ### Basic Setup
243 | ```json
244 | {
245 |   "mcpServers": {
246 |     "Bright Data": {
247 |       "command": "npx",
248 |       "args": ["@brightdata/mcp"],
249 |       "env": {
250 |         "API_TOKEN": "your-token-here"
251 |       }
252 |     }
253 |   }
254 | }
255 | ```
256 | 
257 | ### Advanced Configuration
258 | ```json
259 | {
260 |   "mcpServers": {
261 |     "Bright Data": {
262 |       "command": "npx",
263 |       "args": ["@brightdata/mcp"],
264 |       "env": {
265 |         "API_TOKEN": "your-token-here",
266 |         "PRO_MODE": "true",              // Enable all 60+ tools
267 |         "RATE_LIMIT": "100/1h",          // Custom rate limiting
268 |         "WEB_UNLOCKER_ZONE": "custom",   // Custom unlocker zone
269 |         "BROWSER_ZONE": "custom_browser" // Custom browser zone
270 |       }
271 |     }
272 |   }
273 | }
274 | ```
275 | 
276 | ---
277 | 
278 | ## 📚 Documentation
279 | 
280 | <div align="center">
281 |   <table>
282 |     <tr>
283 |       <td align="center">
284 |         <a href="https://docs.brightdata.com/mcp-server/overview">
285 |           <img src="https://img.shields.io/badge/📖-API_Docs-blue?style=for-the-badge" alt="API Docs"/>
286 |         </a>
287 |       </td>
288 |       <td align="center">
289 |         <a href="https://github.com/brightdata-com/brightdata-mcp/blob/main/examples">
290 |           <img src="https://img.shields.io/badge/💡-Examples-green?style=for-the-badge" alt="Examples"/>
291 |         </a>
292 |       </td>
293 |       <td align="center">
294 |         <a href="https://github.com/brightdata-com/brightdata-mcp/blob/main/CHANGELOG.md">
295 |           <img src="https://img.shields.io/badge/📝-Changelog-orange?style=for-the-badge" alt="Changelog"/>
296 |         </a>
297 |       </td>
298 |       <td align="center">
299 |         <a href="https://brightdata.com/blog/ai/web-scraping-with-mcp">
300 |           <img src="https://img.shields.io/badge/📚-Tutorial-purple?style=for-the-badge" alt="Tutorial"/>
301 |         </a>
302 |       </td>
303 |     </tr>
304 |   </table>
305 | </div>
306 | 
307 | ---
308 | 
309 | ## 🚨 Common Issues & Solutions
310 | 
311 | <details>
312 | <summary><b>🔧 Troubleshooting Guide</b></summary>
313 | 
314 | ### ❌ "spawn npx ENOENT" Error
315 | **Solution:** Install Node.js or use the full path to node:
316 | ```json
317 | "command": "/usr/local/bin/node"  // macOS/Linux
318 | "command": "C:\\Program Files\\nodejs\\node.exe"  // Windows
319 | ```
320 | 
321 | ### ⏱️ Timeouts on Complex Sites
322 | **Solution:** Increase timeout in your client settings to 180s
323 | 
324 | ### 🔑 Authentication Issues
325 | **Solution:** Ensure your API token is valid and has proper permissions
326 | 
327 | ### 📡 Remote Server Connection
328 | **Solution:** Check your internet connection and firewall settings
329 | 
330 | [More troubleshooting →](https://github.com/brightdata-com/brightdata-mcp#troubleshooting)
331 | 
332 | </details>
333 | 
334 | ---
335 | 
336 | ## 🤝 Contributing
337 | 
338 | We love contributions! Here's how you can help:
339 | 
340 | - 🐛 [Report bugs](https://github.com/brightdata-com/brightdata-mcp/issues)
341 | - 💡 [Suggest features](https://github.com/brightdata-com/brightdata-mcp/issues)
342 | - 🔧 [Submit PRs](https://github.com/brightdata-com/brightdata-mcp/pulls)
343 | - ⭐ Star this repo!
344 | 
345 | Please follow [Bright Data's coding standards](https://brightdata.com/dna/js_code).
346 | 
347 | ---
348 | 
349 | ## 📞 Support
350 | 
351 | <div align="center">
352 |   <table>
353 |     <tr>
354 |       <td align="center">
355 |         <a href="https://github.com/brightdata-com/brightdata-mcp/issues">
356 |           <strong>🐛 GitHub Issues</strong><br/>
357 |           <sub>Report bugs & features</sub>
358 |         </a>
359 |       </td>
360 |       <td align="center">
361 |         <a href="https://docs.brightdata.com/mcp-server/overview">
362 |           <strong>📚 Documentation</strong><br/>
363 |           <sub>Complete guides</sub>
364 |         </a>
365 |       </td>
366 |       <td align="center">
367 |         <a href="mailto:[email protected]">
368 |           <strong>✉️ Email</strong><br/>
369 |           <sub>[email protected]</sub>
370 |         </a>
371 |       </td>
372 |     </tr>
373 |   </table>
374 | </div>
375 | 
376 | ---
377 | 
378 | ## 📜 License
379 | 
380 | MIT © [Bright Data Ltd.](https://brightdata.com)
381 | 
382 | ---
383 | 
384 | <div align="center">
385 |   <p>
386 |     <strong>Built with ❤️ by</strong><br/>
387 |     <a href="https://brightdata.com">
388 |       <img src="https://idsai.net.technion.ac.il/files/2022/01/Logo-600.png" alt="Bright Data" height="30"/>
389 |     </a>
390 |   </p>
391 |   <p>
392 |     <sub>The world's #1 web data platform</sub>
393 |   </p>
394 |   
395 |   <br/>
396 |   
397 |   <p>
398 |     <a href="https://github.com/brightdata-com/brightdata-mcp">⭐ Star us on GitHub</a> • 
399 |     <a href="https://brightdata.com/blog">Read our Blog</a>
400 |   </p>
401 | </div>
402 | 
```

--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------

```dockerfile
 1 | FROM node:22.12-alpine AS builder
 2 | 
 3 | 
 4 | COPY . /app
 5 | WORKDIR /app
 6 | 
 7 | 
 8 | RUN --mount=type=cache,target=/root/.npm npm install
 9 | 
10 | FROM node:22-alpine AS release
11 | 
12 | WORKDIR /app
13 | 
14 | 
15 | COPY --from=builder /app/server.js /app/
16 | COPY --from=builder /app/browser_tools.js /app/
17 | COPY --from=builder /app/browser_session.js /app/
18 | COPY --from=builder /app/package.json /app/
19 | COPY --from=builder /app/package-lock.json /app/
20 | 
21 | 
22 | ENV NODE_ENV=production
23 | 
24 | 
25 | RUN npm ci --ignore-scripts --omit-dev
26 | 
27 | 
28 | ENTRYPOINT ["node", "server.js"]
29 | 
```

--------------------------------------------------------------------------------
/.github/workflows/release.yml:
--------------------------------------------------------------------------------

```yaml
 1 | name: Release
 2 | on:
 3 |   push:
 4 |     tags: v*
 5 | 
 6 | jobs:
 7 |   release:
 8 |     name: Release
 9 |     runs-on: ubuntu-latest
10 |     permissions:
11 |       contents: read
12 |       id-token: write
13 |     steps:
14 |       - uses: actions/checkout@v5
15 |       - uses: actions/setup-node@v5
16 |         with:
17 |           node-version: 22
18 |           cache: "npm"
19 |           registry-url: 'https://registry.npmjs.org'
20 |           scope: '@brightdata'
21 |       - run: npm ci
22 |       - run: npm audit signatures
23 |       - run: npm publish
24 |         env:
25 |           NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
26 | 
```

--------------------------------------------------------------------------------
/smithery.yaml:
--------------------------------------------------------------------------------

```yaml
 1 | startCommand:
 2 |   type: stdio
 3 |   configSchema:
 4 |     type: object
 5 |     properties:
 6 |       apiToken:
 7 |         type: string
 8 |         description: "Bright Data API key, available in https://brightdata.com/cp/setting/users"
 9 |       webUnlockerZone:
10 |         type: string
11 |         default: 'mcp_unlocker'
12 |         description: "Optional: The Web Unlocker zone name (defaults to 'mcp_unlocker')"
13 |       browserZone:
14 |         type: string
15 |         default: 'mcp_browser'
16 |         description: "Optional: Zone name for the Browser API (enables browser control tools, deafults to 'mcp_browser')"
17 |   commandFunction: |-
18 |     config => ({ 
19 |       command: 'node', 
20 |       args: ['server.js'], 
21 |       env: { 
22 |         API_TOKEN: config.apiToken,
23 |         WEB_UNLOCKER_ZONE: config.webUnlockerZone,
24 |         BROWSER_ZONE: config.browserZone
25 |       } 
26 |     })
27 | 
```

--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------

```json
 1 | {
 2 |     "name": "@brightdata/mcp",
 3 |     "version": "2.6.0",
 4 |     "description": "An MCP interface into the Bright Data toolset",
 5 |     "type": "module",
 6 |     "main": "./server.js",
 7 |     "bin": {
 8 |         "@brightdata/mcp": "./server.js"
 9 |     },
10 |     "scripts": {
11 |         "start": "node server.js"
12 |     },
13 |     "keywords": [
14 |         "mcp",
15 |         "brightdata"
16 |     ],
17 |     "author": "Bright Data",
18 |     "repository": {
19 |         "type": "git",
20 |         "url": "https://github.com/brightdata/brightdata-mcp.git"
21 |     },
22 |     "bugs": {
23 |         "url": "https://github.com/brightdata/brightdata-mcp/issues"
24 |     },
25 |     "license": "MIT",
26 |     "dependencies": {
27 |         "axios": "^1.11.0",
28 |         "fastmcp": "^3.1.1",
29 |         "playwright": "^1.51.1",
30 |         "zod": "^3.24.2"
31 |     },
32 |     "publishConfig": {
33 |         "access": "public"
34 |     },
35 |     "files": [
36 |         "server.js",
37 |         "browser_tools.js",
38 |         "browser_session.js",
39 |         "aria_snapshot_filter.js"
40 |     ],
41 |     "mcpName": "io.github.brightdata/brightdata-mcp"
42 | }
43 | 
```

--------------------------------------------------------------------------------
/.github/workflows/publish-mcp.yml:
--------------------------------------------------------------------------------

```yaml
 1 | name: Publish to MCP Registry
 2 | 
 3 | on:
 4 |   push:
 5 |     tags: ["v*"] 
 6 |   workflow_dispatch: 
 7 | 
 8 | jobs:
 9 |   publish:
10 |     runs-on: ubuntu-latest
11 |     permissions:
12 |       id-token: write  
13 |       contents: read
14 | 
15 |     steps:
16 |       - name: Checkout code
17 |         uses: actions/checkout@v4
18 | 
19 |       - name: Setup Node.js 
20 |         uses: actions/setup-node@v4
21 |         with:
22 |           node-version: "22"
23 | 
24 |       - name: Sync version in server.json with package.json
25 |         run: |
26 |           VERSION=$(node -p "require('./package.json').version")
27 |           echo "Syncing version to: $VERSION"
28 |           jq --arg v "$VERSION" '.version = $v | .packages[0].version = $v' server.json > tmp.json && mv tmp.json server.json
29 |           echo "Updated server.json:"
30 |           cat server.json
31 | 
32 |       - name: Install MCP Publisher
33 |         run: |
34 |           curl -L "https://github.com/modelcontextprotocol/registry/releases/latest/download/mcp-publisher_$(uname -s | tr '[:upper:]' '[:lower:]')_$(uname -m | sed 's/x86_64/amd64/;s/aarch64/arm64/').tar.gz" | tar xz mcp-publisher
35 |           chmod +x mcp-publisher
36 | 
37 |       - name: Login to MCP Registry
38 |         run: ./mcp-publisher login github-oidc
39 | 
40 |       - name: Publish to MCP Registry
41 |         run: ./mcp-publisher publish
42 | 
```

--------------------------------------------------------------------------------
/server.json:
--------------------------------------------------------------------------------

```json
 1 | {
 2 |   "$schema": "https://static.modelcontextprotocol.io/schemas/2025-10-17/server.schema.json",
 3 |   "name": "io.github.brightdata/brightdata-mcp",
 4 |   "description": "Bright Data's Web MCP server enabling AI agents to search, extract & navigate the web",
 5 |   "repository": {
 6 |     "url": "https://github.com/brightdata/brightdata-mcp",
 7 |     "source": "github"
 8 |   },
 9 |   "version": "2.5.0",
10 |   "packages": [
11 |     {
12 |       "registryType": "npm",
13 |       "registryBaseUrl": "https://registry.npmjs.org",
14 |       "identifier": "@brightdata/mcp",
15 |       "version": "2.5.0",
16 |       "transport": {
17 |         "type": "stdio"
18 |       },
19 |       "environmentVariables": [
20 |         {
21 |           "name": "API_TOKEN",
22 |           "description": "Your API key for Bright Data",
23 |           "isRequired": true,
24 |           "isSecret": true,
25 |           "format": "string"
26 |         },
27 |         {
28 |           "name": "WEB_UNLOCKER_ZONE",
29 |           "description": "Your unlocker zone name",
30 |           "isRequired": false,
31 |           "isSecret": false,
32 |           "format": "string"
33 |         },
34 |         {
35 |           "name": "BROWSER_ZONE",
36 |           "description": "Your browser zone name",
37 |           "isRequired": false,
38 |           "isSecret": false,
39 |           "format": "string"
40 |         },
41 |         {
42 |           "name": "PRO_MODE",
43 |           "description": "To enable PRO_MODE - set to true",
44 |           "isRequired": false,
45 |           "isSecret": false,
46 |           "format": "boolean"
47 |         }
48 |       ]
49 |     }
50 |   ]
51 | }
52 | 
```

--------------------------------------------------------------------------------
/aria_snapshot_filter.js:
--------------------------------------------------------------------------------

```javascript
 1 | // LICENSE_CODE ZON
 2 | 'use strict'; /*jslint node:true es9:true*/
 3 | 
 4 | export class Aria_snapshot_filter {
 5 |     static INTERACTIVE_ROLES = new Set([
 6 |         'button', 'link', 'textbox', 'searchbox', 'combobox', 'checkbox',
 7 |         'radio', 'switch', 'slider', 'tab', 'menuitem', 'option',
 8 |     ]);
 9 |     static parse_playwright_snapshot(snapshot_text){
10 |         const lines = snapshot_text.split('\n');
11 |         const elements = [];
12 |         for (const line of lines)
13 |         {
14 |             const trimmed = line.trim();
15 |             if (!trimmed || !trimmed.startsWith('-'))
16 |                 continue;
17 |             const ref_match = trimmed.match(/\[ref=([^\]]+)\]/);
18 |             if (!ref_match)
19 |                 continue;
20 |             const ref = ref_match[1];
21 |             const role_match = trimmed.match(/^-\s+([a-zA-Z]+)/);
22 |             if (!role_match)
23 |                 continue;
24 |             const role = role_match[1];
25 |             if (!this.INTERACTIVE_ROLES.has(role))
26 |                 continue;
27 |             const name_match = trimmed.match(/"([^"]*)"/);
28 |             const name = name_match ? name_match[1] : '';
29 |             let url = null;
30 |             const next_line_index = lines.indexOf(line)+1;
31 |             if (next_line_index<lines.length)
32 |             {
33 |                 const next_line = lines[next_line_index];
34 |                 const url_match = next_line.match(/\/url:\s*(.+)/);
35 |                 if (url_match)
36 |                     url = url_match[1].trim().replace(/^["']|["']$/g, '');
37 |             }
38 |             elements.push({ref, role, name, url});
39 |         }
40 |         return elements;
41 |     }
42 | 
43 |     static format_compact(elements){
44 |         const lines = [];
45 |         for (const el of elements)
46 |         {
47 |             const parts = [`[${el.ref}]`, el.role];
48 |             if (el.name && el.name.length>0)
49 |             {
50 |                 const name = el.name.length>60 ?
51 |                     el.name.substring(0, 57)+'...' : el.name;
52 |                 parts.push(`"${name}"`);
53 |             }
54 |             if (el.url && el.url.length>0 && !el.url.startsWith('#'))
55 |             {
56 |                 let url = el.url;
57 |                 if (url.length>50)
58 |                     url = url.substring(0, 47)+'...';
59 |                 parts.push(`→ ${url}`);
60 |             }
61 |             lines.push(parts.join(' '));
62 |         }
63 |         return lines.join('\n');
64 |     }
65 | 
66 |     static filter_snapshot(snapshot_text){
67 |         try {
68 |             const elements = this.parse_playwright_snapshot(snapshot_text);
69 |             if (elements.length===0)
70 |                 return 'No interactive elements found';
71 |             return this.format_compact(elements);
72 |         } catch(e){
73 |             return `Error filtering snapshot: ${e.message}\n${e.stack}`;
74 |         }
75 |     }
76 | }
77 | 
```

--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------

```markdown
  1 | # Changelog
  2 | 
  3 | All notable changes to this project will be documented in this file.
  4 | 
  5 | ## [2.6.0] - 2025-10-27
  6 | 
  7 | ### Added
  8 | - Client name logging and header passthrough for improved observability (PR #75)
  9 | - ARIA ref-based browser automation for more reliable element interactions (PR #65)
 10 | - ARIA snapshot filtering for better element targeting
 11 | - Network request tracking in browser sessions
 12 | - MCP Registry support (PR #71)
 13 | - `scraping_browser_snapshot` tool to capture ARIA snapshots
 14 | - `scraping_browser_click_ref`, `scraping_browser_type_ref`, `scraping_browser_wait_for_ref` tools using ref-based selectors
 15 | - `scraping_browser_network_requests` tool to track HTTP requests
 16 | 
 17 | ### Changed
 18 | - Enhanced search engine tool to return JSON with only relevant fields (PR #57)
 19 | - Added `fixed_values` parameter to reduce token usage (PR #60)
 20 | - Browser tools now use ARIA refs instead of CSS selectors for better reliability
 21 | 
 22 | ### Fixed
 23 | - Stop polling on HTTP 400 errors in web data tools (PR #64)
 24 | 
 25 | ### Deprecated
 26 | - Legacy selector-based tools (`scraping_browser_click`, `scraping_browser_type`, `scraping_browser_wait_for`) replaced by ref-based equivalents
 27 | - `scraping_browser_links` tool deprecated in favor of snapshot-based approach
 28 | 
 29 | 
 30 | ## [2.0.0] - 2025-05-26
 31 | 
 32 | ### Changed
 33 | - Updated browser authentication to use API_TOKEN instead of previous authentication method
 34 | - BROWSER_ZONE is now an optional parameter, the deafult zone is `mcp_browser`
 35 | - Removed duplicate web_data_ tools
 36 | 
 37 | ## [1.9.2] - 2025-05-23
 38 | 
 39 | ### Fixed
 40 | - Fixed GitHub references and repository settings
 41 | 
 42 | ## [1.9.1] - 2025-05-21
 43 | 
 44 | ### Fixed
 45 | - Fixed spelling errors and improved coding conventions
 46 | - Converted files back to Unix line endings for consistency
 47 | 
 48 | ## [1.9.0] - 2025-05-21
 49 | 
 50 | ### Added
 51 | - Added 23 new web data tools for enhanced data collection capabilities
 52 | - Added progress reporting functionality for better user feedback
 53 | - Added default parameter handling for improved tool usability
 54 | 
 55 | ### Changed
 56 | - Improved coding conventions and file formatting
 57 | - Enhanced web data API endpoints integration
 58 | 
 59 | ## [1.8.3] - 2025-05-21
 60 | 
 61 | ### Added
 62 | - Added Bright Data MCP with Claude demo video to README.md
 63 | 
 64 | ### Changed
 65 | - Updated documentation with video demonstrations
 66 | 
 67 | ## [1.8.2] - 2025-05-13
 68 | 
 69 | ### Changed
 70 | - Bumped FastMCP version for improved performance
 71 | - Updated README.md with additional documentation
 72 | 
 73 | ## [1.8.1] - 2025-05-05
 74 | 
 75 | ### Added
 76 | - Added 12 new WSAPI endpoints for enhanced functionality
 77 | - Changed to polling mechanism for better reliability
 78 | 
 79 | ### Changed
 80 | - Applied dos2unix formatting for consistency
 81 | - Updated Docker configuration
 82 | - Updated smithery.yaml configuration
 83 | 
 84 | ## [1.8.0] - 2025-05-03
 85 | 
 86 | ### Added
 87 | - Added domain-based browser sessions to avoid navigation limit issues
 88 | - Added automatic creation of required unlocker zone when not present
 89 | 
 90 | ### Fixed
 91 | - Fixed browser context maintenance across tool calls with current domain tracking
 92 | - Minor lint fixes
 93 | 
 94 | ## [1.0.0] - 2025-04-29
 95 | 
 96 | ### Added
 97 | - Initial release of Bright Data MCP server
 98 | - Browser automation capabilities with Bright Data integration
 99 | - Core web scraping and data collection tools
100 | - Smithery.yaml configuration for deployment in Smithery.ai
101 | - MIT License
102 | - Demo materials and documentation
103 | 
104 | ### Documentation
105 | - Created comprehensive README.md
106 | - Added demo.md with usage examples
107 | - Created examples/README.md for sample implementations
108 | - Added Tools.md documentation for available tools
109 | 
110 | ---
111 | 
112 | ## Release Notes
113 | 
114 | ### Version 1.9.x Series
115 | The 1.9.x series focuses on expanding web data collection capabilities and improving authentication mechanisms. Key highlights include the addition of 23 new web data tools.
116 | 
117 | ### Version 1.8.x Series  
118 | The 1.8.x series introduced significant improvements to browser session management, WSAPI endpoints, and overall system reliability. Notable features include domain-based sessions and automatic zone creation.
119 | 
120 | ### Version 1.0.0
121 | Initial stable release providing core MCP server functionality for Bright Data integration with comprehensive browser automation and web scraping capabilities.
122 | 
123 | 
```

--------------------------------------------------------------------------------
/browser_session.js:
--------------------------------------------------------------------------------

```javascript
  1 | 'use strict'; /*jslint node:true es9:true*/
  2 | import * as playwright from 'playwright';
  3 | import {Aria_snapshot_filter} from './aria_snapshot_filter.js';
  4 | 
  5 | export class Browser_session {
  6 |     constructor({cdp_endpoint}){
  7 |         this.cdp_endpoint = cdp_endpoint;
  8 |         this._domainSessions = new Map();
  9 |         this._currentDomain = 'default';
 10 |     }
 11 | 
 12 |     _getDomain(url){
 13 |         try {
 14 |             const urlObj = new URL(url);
 15 |             return urlObj.hostname;
 16 |         } catch(e){
 17 |             console.error(`Error extracting domain from ${url}:`, e);
 18 |             return 'default';
 19 |         }
 20 |     }
 21 | 
 22 |     async _getDomainSession(domain, {log}={}){
 23 |         if (!this._domainSessions.has(domain)) 
 24 |         {
 25 |             this._domainSessions.set(domain, {
 26 |                 browser: null,
 27 |                 page: null,
 28 |                 browserClosed: true,
 29 |                 requests: new Map()
 30 |             });
 31 |         }
 32 |         return this._domainSessions.get(domain);
 33 |     }
 34 | 
 35 |     async get_browser({log, domain='default'}={}){
 36 |         try {
 37 |             const session = await this._getDomainSession(domain, {log});
 38 |             if (session.browser)
 39 |             {
 40 |                 try { await session.browser.contexts(); }
 41 |                 catch(e){
 42 |                     log?.(`Browser connection lost for domain ${domain} (${e.message}), `
 43 |                         +`reconnecting...`);
 44 |                     session.browser = null;
 45 |                     session.page = null;
 46 |                     session.browserClosed = true;
 47 |                 }
 48 |             }
 49 |             if (!session.browser)
 50 |             {
 51 |                 log?.(`Connecting to Bright Data Scraping Browser for domain ${domain}.`);
 52 |                 session.browser = await playwright.chromium.connectOverCDP(
 53 |                     this.cdp_endpoint);
 54 |                 session.browserClosed = false;
 55 |                 session.browser.on('disconnected', ()=>{
 56 |                     log?.(`Browser disconnected for domain ${domain}`);
 57 |                     session.browser = null;
 58 |                     session.page = null;
 59 |                     session.browserClosed = true;
 60 |                 });
 61 |                 log?.(`Connected to Bright Data Scraping Browser for domain ${domain}`);
 62 |             }
 63 |             return session.browser;
 64 |         } catch(e){
 65 |             console.error(`Error connecting to browser for domain ${domain}:`, e);
 66 |             const session = this._domainSessions.get(domain);
 67 |             if (session) 
 68 |             {
 69 |                 session.browser = null;
 70 |                 session.page = null;
 71 |                 session.browserClosed = true;
 72 |             }
 73 |             throw e;
 74 |         }
 75 |     }
 76 | 
 77 |     async get_page({url=null}={}){
 78 |         if (url) 
 79 |         {
 80 |             this._currentDomain = this._getDomain(url);
 81 |         }
 82 |         const domain = this._currentDomain;
 83 |         try {
 84 |             const session = await this._getDomainSession(domain);
 85 |             if (session.browserClosed || !session.page)
 86 |             {
 87 |                 const browser = await this.get_browser({domain});
 88 |                 const existingContexts = browser.contexts();
 89 |                 if (existingContexts.length === 0)
 90 |                 {
 91 |                     const context = await browser.newContext();
 92 |                     session.page = await context.newPage();
 93 |                 }
 94 |                 else
 95 |                 {
 96 |                     const existingPages = existingContexts[0]?.pages();
 97 |                     if (existingPages && existingPages.length > 0)
 98 |                         session.page = existingPages[0];
 99 |                     else
100 |                         session.page = await existingContexts[0].newPage();
101 |                 }
102 |                 session.page.on('request', request=>
103 |                     session.requests.set(request, null));
104 |                 session.page.on('response', response=>
105 |                     session.requests.set(response.request(), response));
106 |                 session.browserClosed = false;
107 |                 session.page.once('close', ()=>{
108 |                     session.page = null;
109 |                 });
110 |             }
111 |             return session.page;
112 |         } catch(e){
113 |             console.error(`Error getting page for domain ${domain}:`, e);
114 |             const session = this._domainSessions.get(domain);
115 |             if (session) 
116 |             {
117 |                 session.browser = null;
118 |                 session.page = null;
119 |                 session.browserClosed = true;
120 |             }
121 |             throw e;
122 |         }
123 |     }
124 | 
125 |     async capture_snapshot({filtered=true}={}){
126 |         const page = await this.get_page();
127 |         try {
128 |             const full_snapshot = await page._snapshotForAI();
129 |             if (!filtered)
130 |             {
131 |                 return {
132 |                     url: page.url(),
133 |                     title: await page.title(),
134 |                     aria_snapshot: full_snapshot,
135 |                 };
136 |             }
137 |             const filtered_snapshot = Aria_snapshot_filter.filter_snapshot(
138 |                 full_snapshot);
139 |             return {
140 |                 url: page.url(),
141 |                 title: await page.title(),
142 |                 aria_snapshot: filtered_snapshot,
143 |             };
144 |         } catch(e){
145 |             throw new Error(`Error capturing ARIA snapshot: ${e.message}`);
146 |         }
147 |     }
148 | 
149 |     async ref_locator({element, ref}){
150 |         const page = await this.get_page();
151 |         try {
152 |             const snapshot = await page._snapshotForAI();
153 |             if (!snapshot.includes(`[ref=${ref}]`))
154 |                 throw new Error('Ref '+ref+' not found in the current page '
155 |                     +'snapshot. Try capturing new snapshot.');
156 |             return page.locator(`aria-ref=${ref}`).describe(element);
157 |         } catch(e){
158 |             throw new Error(`Error creating ref locator for ${element} with ref ${ref}: ${e.message}`);
159 |         }
160 |     }
161 | 
162 |     async get_requests(){
163 |         const domain = this._currentDomain;
164 |         const session = await this._getDomainSession(domain);
165 |         return session.requests;
166 |     }
167 | 
168 |     async clear_requests(){
169 |         const domain = this._currentDomain;
170 |         const session = await this._getDomainSession(domain);
171 |         session.requests.clear();
172 |     }
173 | 
174 |     async close(domain=null){
175 |         if (domain){
176 |             const session = this._domainSessions.get(domain);
177 |             if (session && session.browser) 
178 |             {
179 |                 try { await session.browser.close(); }
180 |                 catch(e){ console.error(`Error closing browser for domain ${domain}:`, e); }
181 |                 session.browser = null;
182 |                 session.page = null;
183 |                 session.browserClosed = true;
184 |                 session.requests.clear();
185 |                 this._domainSessions.delete(domain);
186 |             }
187 |         }
188 |         else {
189 |             for (const [domain, session] of this._domainSessions.entries()) {
190 |                 if (session.browser) 
191 |                 {
192 |                     try { await session.browser.close(); }
193 |                     catch(e){ console.error(`Error closing browser for domain ${domain}:`, e); }
194 |                     session.browser = null;
195 |                     session.page = null;
196 |                     session.browserClosed = true;
197 |                     session.requests.clear();
198 |                 }
199 |             }
200 |             this._domainSessions.clear();
201 |         }
202 |         if (!domain) 
203 |         {
204 |             this._currentDomain = 'default';
205 |         }
206 |     }
207 | }
208 | 
209 | 
```

--------------------------------------------------------------------------------
/assets/Tools.md:
--------------------------------------------------------------------------------

```markdown
 1 | |Feature|Description|
 2 | |---|---|
 3 | |search_engine|Scrape search results from Google, Bing, or Yandex. Returns SERP results in JSON for Google and Markdown for Bing/Yandex; supports pagination with the cursor parameter.|
 4 | |scrape_as_markdown|Scrape a single webpage with advanced extraction and return Markdown. Uses Bright Data's unlocker to handle bot protection and CAPTCHA.|
 5 | |search_engine_batch|Run up to 10 search queries in parallel. Returns JSON for Google results and Markdown for Bing/Yandex.|
 6 | |scrape_batch|Scrape up to 10 webpages in one request and return an array of URL/content pairs in Markdown format.|
 7 | |scrape_as_html|Scrape a single webpage with advanced extraction and return the HTML response body. Handles sites protected by bot detection or CAPTCHA.|
 8 | |extract|Scrape a webpage as Markdown and convert it to structured JSON using AI sampling, with an optional custom extraction prompt.|
 9 | |session_stats|Report how many times each tool has been called during the current MCP session.|
10 | |web_data_amazon_product|Quickly read structured Amazon product data. Requires a valid product URL containing /dp/. Often faster and more reliable than scraping.|
11 | |web_data_amazon_product_reviews|Quickly read structured Amazon product review data. Requires a valid product URL containing /dp/. Often faster and more reliable than scraping.|
12 | |web_data_amazon_product_search|Retrieve structured Amazon search results. Requires a search keyword and Amazon domain URL; limited to the first page of results.|
13 | |web_data_walmart_product|Quickly read structured Walmart product data. Requires a product URL containing /ip/. Often faster and more reliable than scraping.|
14 | |web_data_walmart_seller|Quickly read structured Walmart seller data. Requires a valid Walmart seller URL. Often faster and more reliable than scraping.|
15 | |web_data_ebay_product|Quickly read structured eBay product data. Requires a valid eBay product URL. Often faster and more reliable than scraping.|
16 | |web_data_homedepot_products|Quickly read structured Home Depot product data. Requires a valid homedepot.com product URL. Often faster and more reliable than scraping.|
17 | |web_data_zara_products|Quickly read structured Zara product data. Requires a valid Zara product URL. Often faster and more reliable than scraping.|
18 | |web_data_etsy_products|Quickly read structured Etsy product data. Requires a valid Etsy product URL. Often faster and more reliable than scraping.|
19 | |web_data_bestbuy_products|Quickly read structured Best Buy product data. Requires a valid Best Buy product URL. Often faster and more reliable than scraping.|
20 | |web_data_linkedin_person_profile|Quickly read structured LinkedIn people profile data. Requires a valid LinkedIn profile URL. Often faster and more reliable than scraping.|
21 | |web_data_linkedin_company_profile|Quickly read structured LinkedIn company profile data. Requires a valid LinkedIn company URL. Often faster and more reliable than scraping.|
22 | |web_data_linkedin_job_listings|Quickly read structured LinkedIn job listings data. Requires a valid LinkedIn jobs URL or search URL. Often faster and more reliable than scraping.|
23 | |web_data_linkedin_posts|Quickly read structured LinkedIn posts data. Requires a valid LinkedIn post URL. Often faster and more reliable than scraping.|
24 | |web_data_linkedin_people_search|Quickly read structured LinkedIn people search data. Requires a LinkedIn people search URL. Often faster and more reliable than scraping.|
25 | |web_data_crunchbase_company|Quickly read structured Crunchbase company data. Requires a valid Crunchbase company URL. Often faster and more reliable than scraping.|
26 | |web_data_zoominfo_company_profile|Quickly read structured ZoomInfo company profile data. Requires a valid ZoomInfo company URL. Often faster and more reliable than scraping.|
27 | |web_data_instagram_profiles|Quickly read structured Instagram profile data. Requires a valid Instagram profile URL. Often faster and more reliable than scraping.|
28 | |web_data_instagram_posts|Quickly read structured Instagram post data. Requires a valid Instagram post URL. Often faster and more reliable than scraping.|
29 | |web_data_instagram_reels|Quickly read structured Instagram reel data. Requires a valid Instagram reel URL. Often faster and more reliable than scraping.|
30 | |web_data_instagram_comments|Quickly read structured Instagram comments data. Requires a valid Instagram URL. Often faster and more reliable than scraping.|
31 | |web_data_facebook_posts|Quickly read structured Facebook post data. Requires a valid Facebook post URL. Often faster and more reliable than scraping.|
32 | |web_data_facebook_marketplace_listings|Quickly read structured Facebook Marketplace listing data. Requires a valid Marketplace listing URL. Often faster and more reliable than scraping.|
33 | |web_data_facebook_company_reviews|Quickly read structured Facebook company reviews data. Requires a valid Facebook company URL and review count. Often faster and more reliable than scraping.|
34 | |web_data_facebook_events|Quickly read structured Facebook events data. Requires a valid Facebook event URL. Often faster and more reliable than scraping.|
35 | |web_data_tiktok_profiles|Quickly read structured TikTok profile data. Requires a valid TikTok profile URL. Often faster and more reliable than scraping.|
36 | |web_data_tiktok_posts|Quickly read structured TikTok post data. Requires a valid TikTok post URL. Often faster and more reliable than scraping.|
37 | |web_data_tiktok_shop|Quickly read structured TikTok Shop product data. Requires a valid TikTok Shop product URL. Often faster and more reliable than scraping.|
38 | |web_data_tiktok_comments|Quickly read structured TikTok comments data. Requires a valid TikTok video URL. Often faster and more reliable than scraping.|
39 | |web_data_google_maps_reviews|Quickly read structured Google Maps reviews data. Requires a valid Google Maps URL and optional days_limit (default 3). Often faster and more reliable than scraping.|
40 | |web_data_google_shopping|Quickly read structured Google Shopping product data. Requires a valid Google Shopping product URL. Often faster and more reliable than scraping.|
41 | |web_data_google_play_store|Quickly read structured Google Play Store app data. Requires a valid Play Store app URL. Often faster and more reliable than scraping.|
42 | |web_data_apple_app_store|Quickly read structured Apple App Store app data. Requires a valid App Store app URL. Often faster and more reliable than scraping.|
43 | |web_data_reuter_news|Quickly read structured Reuters news data. Requires a valid Reuters news article URL. Often faster and more reliable than scraping.|
44 | |web_data_github_repository_file|Quickly read structured GitHub repository file data. Requires a valid GitHub file URL. Often faster and more reliable than scraping.|
45 | |web_data_yahoo_finance_business|Quickly read structured Yahoo Finance company profile data. Requires a valid Yahoo Finance business URL. Often faster and more reliable than scraping.|
46 | |web_data_x_posts|Quickly read structured X (Twitter) post data. Requires a valid X post URL. Often faster and more reliable than scraping.|
47 | |web_data_zillow_properties_listing|Quickly read structured Zillow property listing data. Requires a valid Zillow listing URL. Often faster and more reliable than scraping.|
48 | |web_data_booking_hotel_listings|Quickly read structured Booking.com hotel listing data. Requires a valid Booking.com listing URL. Often faster and more reliable than scraping.|
49 | |web_data_youtube_profiles|Quickly read structured YouTube channel profile data. Requires a valid YouTube channel URL. Often faster and more reliable than scraping.|
50 | |web_data_youtube_comments|Quickly read structured YouTube comments data. Requires a valid YouTube video URL and optional num_of_comments (default 10). Often faster and more reliable than scraping.|
51 | |web_data_reddit_posts|Quickly read structured Reddit post data. Requires a valid Reddit post URL. Often faster and more reliable than scraping.|
52 | |web_data_youtube_videos|Quickly read structured YouTube video metadata. Requires a valid YouTube video URL. Often faster and more reliable than scraping.|
53 | |scraping_browser_navigate|Open or reuse a scraping-browser session and navigate to the provided URL, resetting tracked network requests.|
54 | |scraping_browser_go_back|Navigate the active scraping-browser session back to the previous page and report the new URL and title.|
55 | |scraping_browser_go_forward|Navigate the active scraping-browser session forward to the next page and report the new URL and title.|
56 | |scraping_browser_snapshot|Capture an ARIA snapshot of the current page listing interactive elements and their refs for later ref-based actions.|
57 | |scraping_browser_click_ref|Click an element using its ref from the latest ARIA snapshot; requires a ref and human-readable element description.|
58 | |scraping_browser_type_ref|Fill an element identified by ref from the ARIA snapshot, optionally pressing Enter to submit after typing.|
59 | |scraping_browser_screenshot|Capture a screenshot of the current page; supports optional full_page mode for full-length images.|
60 | |scraping_browser_network_requests|List the network requests recorded since page load with HTTP method, URL, and response status for debugging.|
61 | |scraping_browser_wait_for_ref|Wait until an element identified by ARIA ref becomes visible, with an optional timeout in milliseconds.|
62 | |scraping_browser_get_text|Return the text content of the current page's body element.|
63 | |scraping_browser_get_html|Return the HTML content of the current page; avoid the full_page option unless head or script tags are required.|
64 | |scraping_browser_scroll|Scroll to the bottom of the current page in the scraping-browser session.|
65 | |scraping_browser_scroll_to_ref|Scroll the page until the element referenced in the ARIA snapshot is in view.|
66 | 
```

--------------------------------------------------------------------------------
/browser_tools.js:
--------------------------------------------------------------------------------

```javascript
  1 | 'use strict'; /*jslint node:true es9:true*/
  2 | import {UserError, imageContent as image_content} from 'fastmcp';
  3 | import {z} from 'zod';
  4 | import axios from 'axios';
  5 | import {Browser_session} from './browser_session.js';
  6 | let browser_zone = process.env.BROWSER_ZONE || 'mcp_browser';
  7 | 
  8 | let open_session;
  9 | const require_browser = async()=>{
 10 |     if (!open_session)
 11 |     {
 12 |         open_session = new Browser_session({
 13 |             cdp_endpoint: await calculate_cdp_endpoint(),
 14 |         });
 15 |     }
 16 |     return open_session;
 17 | };
 18 | 
 19 | const calculate_cdp_endpoint = async()=>{
 20 |     try {
 21 |         const status_response = await axios({
 22 |             url: 'https://api.brightdata.com/status',
 23 |             method: 'GET',
 24 |             headers: {authorization: `Bearer ${process.env.API_TOKEN}`},
 25 |         });
 26 |         const customer = status_response.data.customer;
 27 |         const password_response = await axios({
 28 |             url: `https://api.brightdata.com/zone/passwords?zone=${browser_zone}`,
 29 |             method: 'GET',
 30 |             headers: {authorization: `Bearer ${process.env.API_TOKEN}`},
 31 |         });
 32 |         const password = password_response.data.passwords[0];
 33 | 
 34 |         return `wss://brd-customer-${customer}-zone-${browser_zone}:`
 35 |             +`${password}@brd.superproxy.io:9222`;
 36 |     } catch(e){
 37 |         if (e.response?.status===422)
 38 |             throw new Error(`Browser zone '${browser_zone}' does not exist`);
 39 |         throw new Error(`Error retrieving browser credentials: ${e.message}`);
 40 |     }
 41 | };
 42 | 
 43 | let scraping_browser_navigate = {
 44 |     name: 'scraping_browser_navigate',
 45 |     description: 'Navigate a scraping browser session to a new URL',
 46 |     parameters: z.object({
 47 |         url: z.string().describe('The URL to navigate to'),
 48 |     }),
 49 |     execute: async({url})=>{
 50 |         const browser_session = await require_browser();
 51 |         const page = await browser_session.get_page({url});
 52 |         await browser_session.clear_requests();
 53 |         try {
 54 |             await page.goto(url, {
 55 |                 timeout: 120000,
 56 |                 waitUntil: 'domcontentloaded',
 57 |             });
 58 |             return [
 59 |                 `Successfully navigated to ${url}`,
 60 |                 `Title: ${await page.title()}`,
 61 |                 `URL: ${page.url()}`,
 62 |             ].join('\n');
 63 |         } catch(e){
 64 |             throw new UserError(`Error navigating to ${url}: ${e}`);
 65 |         }
 66 |     },
 67 | };
 68 | 
 69 | let scraping_browser_go_back = {
 70 |     name: 'scraping_browser_go_back',
 71 |     description: 'Go back to the previous page',
 72 |     parameters: z.object({}),
 73 |     execute: async()=>{
 74 |         const page = await (await require_browser()).get_page();
 75 |         try {
 76 |             await page.goBack();
 77 |             return [
 78 |                 'Successfully navigated back',
 79 |                 `Title: ${await page.title()}`,
 80 |                 `URL: ${page.url()}`,
 81 |             ].join('\n');
 82 |         } catch(e){
 83 |             throw new UserError(`Error navigating back: ${e}`);
 84 |         }
 85 |     },
 86 | };
 87 | 
 88 | const scraping_browser_go_forward = {
 89 |     name: 'scraping_browser_go_forward',
 90 |     description: 'Go forward to the next page',
 91 |     parameters: z.object({}),
 92 |     execute: async()=>{
 93 |         const page = await (await require_browser()).get_page();
 94 |         try {
 95 |             await page.goForward();
 96 |             return [
 97 |                 'Successfully navigated forward',
 98 |                 `Title: ${await page.title()}`,
 99 |                 `URL: ${page.url()}`,
100 |             ].join('\n');
101 |         } catch(e){
102 |             throw new UserError(`Error navigating forward: ${e}`);
103 |         }
104 |     },
105 | };
106 | 
107 | let scraping_browser_snapshot = {
108 |     name: 'scraping_browser_snapshot',
109 |     description: [
110 |         'Capture an ARIA snapshot of the current page showing all interactive '
111 |         +'elements with their refs.',
112 |         'This provides accurate element references that can be used with '
113 |         +'ref-based tools.',
114 |         'Use this before interacting with elements to get proper refs instead '
115 |         +'of guessing selectors.'
116 |     ].join('\n'),
117 |     parameters: z.object({}),
118 |     execute: async()=>{
119 |         const browser_session = await require_browser();
120 |         try {
121 |             const snapshot = await browser_session.capture_snapshot();
122 |             return [
123 |                 `Page: ${snapshot.url}`,
124 |                 `Title: ${snapshot.title}`,
125 |                 '',
126 |                 'Interactive Elements:',
127 |                 snapshot.aria_snapshot
128 |             ].join('\n');
129 |         } catch(e){
130 |             throw new UserError(`Error capturing snapshot: ${e}`);
131 |         }
132 |     },
133 | };
134 | 
135 | let scraping_browser_click_ref = {
136 |     name: 'scraping_browser_click_ref',
137 |     description: [
138 |         'Click on an element using its ref from the ARIA snapshot.',
139 |         'Use scraping_browser_snapshot first to get the correct ref values.',
140 |         'This is more reliable than CSS selectors.'
141 |     ].join('\n'),
142 |     parameters: z.object({
143 |         ref: z.string().describe('The ref attribute from the ARIA snapshot (e.g., "23")'),
144 |         element: z.string().describe('Description of the element being clicked for context'),
145 |     }),
146 |     execute: async({ref, element})=>{
147 |         const browser_session = await require_browser();
148 |         try {
149 |             const locator = await browser_session.ref_locator({element, ref});
150 |             await locator.click({timeout: 5000});
151 |             return `Successfully clicked element: ${element} (ref=${ref})`;
152 |         } catch(e){
153 |             throw new UserError(`Error clicking element ${element} with ref ${ref}: ${e}`);
154 |         }
155 |     },
156 | };
157 | 
158 | let scraping_browser_type_ref = {
159 |     name: 'scraping_browser_type_ref',
160 |     description: [
161 |         'Type text into an element using its ref from the ARIA snapshot.',
162 |         'Use scraping_browser_snapshot first to get the correct ref values.',
163 |         'This is more reliable than CSS selectors.'
164 |     ].join('\n'),
165 |     parameters: z.object({
166 |         ref: z.string().describe('The ref attribute from the ARIA snapshot (e.g., "23")'),
167 |         element: z.string().describe('Description of the element being typed into for context'),
168 |         text: z.string().describe('Text to type'),
169 |         submit: z.boolean().optional()
170 |             .describe('Whether to submit the form after typing (press Enter)'),
171 |     }),
172 |     execute: async({ref, element, text, submit})=>{
173 |         const browser_session = await require_browser();
174 |         try {
175 |             const locator = await browser_session.ref_locator({element, ref});
176 |             await locator.fill(text);
177 |             if (submit)
178 |                 await locator.press('Enter');
179 |             const suffix = submit ? ' and submitted the form' : '';
180 |             return 'Successfully typed "'+text+'" into element: '+element
181 |                 +' (ref='+ref+')'+suffix;
182 |         } catch(e){
183 |             throw new UserError(`Error typing into element ${element} with ref ${ref}: ${e}`);
184 |         }
185 |     },
186 | };
187 | 
188 | let scraping_browser_screenshot = {
189 |     name: 'scraping_browser_screenshot',
190 |     description: 'Take a screenshot of the current page',
191 |     parameters: z.object({
192 |         full_page: z.boolean().optional().describe([
193 |             'Whether to screenshot the full page (default: false)',
194 |             'You should avoid fullscreen if it\'s not important, since the '
195 |             +'images can be quite large',
196 |         ].join('\n')),
197 |     }),
198 |     execute: async({full_page=false})=>{
199 |         const page = await (await require_browser()).get_page();
200 |         try {
201 |             const buffer = await page.screenshot({fullPage: full_page});
202 |             return image_content({buffer});
203 |         } catch(e){
204 |             throw new UserError(`Error taking screenshot: ${e}`);
205 |         }
206 |     },
207 | };
208 | 
209 | let scraping_browser_get_html = {
210 |     name: 'scraping_browser_get_html',
211 |     description: 'Get the HTML content of the current page. Avoid using this '
212 |     +'tool and if used, use full_page option unless it is important to see '
213 |     +'things like script tags since this can be large',
214 |     parameters: z.object({
215 |         full_page: z.boolean().optional().describe([
216 |             'Whether to get the full page HTML including head and script tags',
217 |             'Avoid this if you only need the extra HTML, since it can be '
218 |             +'quite large',
219 |         ].join('\n')),
220 |     }),
221 |     execute: async({full_page=false})=>{
222 |         const page = await (await require_browser()).get_page();
223 |         try {
224 |             if (!full_page)
225 |                 return await page.$eval('body', body=>body.innerHTML);
226 |             const html = await page.content();
227 |             if (!full_page && html)
228 |                 return html.split('<body>')[1].split('</body>')[0];
229 |             return html;
230 |         } catch(e){
231 |             throw new UserError(`Error getting HTML content: ${e}`);
232 |         }
233 |     },
234 | };
235 | 
236 | let scraping_browser_get_text = {
237 |     name: 'scraping_browser_get_text',
238 |     description: 'Get the text content of the current page',
239 |     parameters: z.object({}),
240 |     execute: async()=>{
241 |         const page = await (await require_browser()).get_page();
242 |         try { return await page.$eval('body', body=>body.innerText); }
243 |         catch(e){ throw new UserError(`Error getting text content: ${e}`); }
244 |     },
245 | };
246 | 
247 | let scraping_browser_scroll = {
248 |     name: 'scraping_browser_scroll',
249 |     description: 'Scroll to the bottom of the current page',
250 |     parameters: z.object({}),
251 |     execute: async()=>{
252 |         const page = await (await require_browser()).get_page();
253 |         try {
254 |             await page.evaluate(()=>{
255 |                 window.scrollTo(0, document.body.scrollHeight);
256 |             });
257 |             return 'Successfully scrolled to the bottom of the page';
258 |         } catch(e){
259 |             throw new UserError(`Error scrolling page: ${e}`);
260 |         }
261 |     },
262 | };
263 | 
264 | let scraping_browser_scroll_to_ref = {
265 |     name: 'scraping_browser_scroll_to_ref',
266 |     description: [
267 |         'Scroll to a specific element using its ref from the ARIA snapshot.',
268 |         'Use scraping_browser_snapshot first to get the correct ref values.',
269 |         'This is more reliable than CSS selectors.'
270 |     ].join('\n'),
271 |     parameters: z.object({
272 |         ref: z.string().describe('The ref attribute from the ARIA snapshot (e.g., "23")'),
273 |         element: z.string().describe('Description of the element to scroll to'),
274 |     }),
275 |     execute: async({ref, element})=>{
276 |         const browser_session = await require_browser();
277 |         try {
278 |             const locator = await browser_session.ref_locator({element, ref});
279 |             await locator.scrollIntoViewIfNeeded();
280 |             return `Successfully scrolled to element: ${element} (ref=${ref})`;
281 |         } catch(e){
282 |             throw new UserError(`Error scrolling to element ${element} with `
283 |                 +`ref ${ref}: ${e}`);
284 |         }
285 |     },
286 | };
287 | 
288 | let scraping_browser_network_requests = {
289 |     name: 'scraping_browser_network_requests',
290 |     description: [
291 |         'Get all network requests made since loading the current page.',
292 |         'Shows HTTP method, URL, status code and status text for each request.',
293 |         'Useful for debugging API calls, tracking data fetching, and '
294 |         +'understanding page behavior.'
295 |     ].join('\n'),
296 |     parameters: z.object({}),
297 |     execute: async()=>{
298 |         const browser_session = await require_browser();
299 |         try {
300 |             const requests = await browser_session.get_requests();
301 |             if (requests.size==0) 
302 |                 return 'No network requests recorded for the current page.';
303 | 
304 |             const results = [];
305 |             requests.forEach((response, request)=>{
306 |                 const result = [];
307 |                 result.push(`[${request.method().toUpperCase()}] ${request.url()}`);
308 |                 if (response)
309 |                     result.push(`=> [${response.status()}] ${response.statusText()}`);
310 | 
311 |                 results.push(result.join(' '));
312 |             });
313 |             
314 |             return [
315 |                 `Network Requests (${results.length} total):`,
316 |                 '',
317 |                 ...results
318 |             ].join('\n');
319 |         } catch(e){
320 |             throw new UserError(`Error getting network requests: ${e}`);
321 |         }
322 |     },
323 | };
324 | 
325 | let scraping_browser_wait_for_ref = {
326 |     name: 'scraping_browser_wait_for_ref',
327 |     description: [
328 |         'Wait for an element to be visible using its ref from the ARIA snapshot.',
329 |         'Use scraping_browser_snapshot first to get the correct ref values.',
330 |         'This is more reliable than CSS selectors.'
331 |     ].join('\n'),
332 |     parameters: z.object({
333 |         ref: z.string().describe('The ref attribute from the ARIA snapshot (e.g., "23")'),
334 |         element: z.string().describe('Description of the element being waited for'),
335 |         timeout: z.number().optional()
336 |             .describe('Maximum time to wait in milliseconds (default: 30000)'),
337 |     }),
338 |     execute: async({ref, element, timeout})=>{
339 |         const browser_session = await require_browser();
340 |         try {
341 |             const locator = await browser_session.ref_locator({element, ref});
342 |             await locator.waitFor({timeout: timeout || 30000});
343 |             return `Successfully waited for element: ${element} (ref=${ref})`;
344 |         } catch(e){
345 |             throw new UserError(`Error waiting for element ${element} with ref ${ref}: ${e}`);
346 |         }
347 |     },
348 | };
349 | 
350 | export const tools = [
351 |     scraping_browser_navigate,
352 |     scraping_browser_go_back,
353 |     scraping_browser_go_forward,
354 |     scraping_browser_snapshot,
355 |     scraping_browser_click_ref,
356 |     scraping_browser_type_ref,
357 |     scraping_browser_screenshot,
358 |     scraping_browser_network_requests,
359 |     scraping_browser_wait_for_ref,
360 |     scraping_browser_get_text,
361 |     scraping_browser_get_html,
362 |     scraping_browser_scroll,
363 |     scraping_browser_scroll_to_ref,
364 | ];
365 | 
```

--------------------------------------------------------------------------------
/server.js:
--------------------------------------------------------------------------------

```javascript
  1 | #!/usr/bin/env node
  2 | 'use strict'; /*jslint node:true es9:true*/
  3 | import {FastMCP} from 'fastmcp';
  4 | import {z} from 'zod';
  5 | import axios from 'axios';
  6 | import {tools as browser_tools} from './browser_tools.js';
  7 | import {createRequire} from 'node:module';
  8 | const require = createRequire(import.meta.url);
  9 | const package_json = require('./package.json');
 10 | const api_token = process.env.API_TOKEN;
 11 | const unlocker_zone = process.env.WEB_UNLOCKER_ZONE || 'mcp_unlocker';
 12 | const browser_zone = process.env.BROWSER_ZONE || 'mcp_browser';
 13 | const pro_mode = process.env.PRO_MODE === 'true';
 14 | const pro_mode_tools = ['search_engine', 'scrape_as_markdown', 
 15 |     'search_engine_batch', 'scrape_batch'];
 16 | function parse_rate_limit(rate_limit_str) {
 17 |     if (!rate_limit_str) 
 18 |         return null;
 19 |     
 20 |     const match = rate_limit_str.match(/^(\d+)\/(\d+)([mhs])$/);
 21 |     if (!match) 
 22 |         throw new Error('Invalid RATE_LIMIT format. Use: 100/1h or 50/30m');
 23 |     
 24 |     const [, limit, time, unit] = match;
 25 |     const multiplier = unit==='h' ? 3600 : unit==='m' ? 60 : 1;
 26 |     
 27 |     return {
 28 |         limit: parseInt(limit),
 29 |         window: parseInt(time) * multiplier * 1000, 
 30 |         display: rate_limit_str
 31 |     };
 32 | }
 33 | 
 34 | const rate_limit_config = parse_rate_limit(process.env.RATE_LIMIT);
 35 | 
 36 | if (!api_token)
 37 |     throw new Error('Cannot run MCP server without API_TOKEN env');
 38 | 
 39 | const api_headers = (clientName=null)=>({
 40 |     'user-agent': `${package_json.name}/${package_json.version}`,
 41 |     authorization: `Bearer ${api_token}`,
 42 |     ...(clientName ? {'x-mcp-client-name': clientName} : {}),
 43 | });
 44 | 
 45 | function check_rate_limit(){
 46 |     if (!rate_limit_config) 
 47 |         return true;
 48 |     
 49 |     const now = Date.now();
 50 |     const window_start = now - rate_limit_config.window;
 51 |     
 52 |     debug_stats.call_timestamps = debug_stats.call_timestamps.filter(timestamp=>timestamp>window_start);
 53 |     
 54 |     if (debug_stats.call_timestamps.length>=rate_limit_config.limit)
 55 |         throw new Error(`Rate limit exceeded: ${rate_limit_config.display}`);
 56 |     
 57 |     debug_stats.call_timestamps.push(now);
 58 |     return true;
 59 | }
 60 | 
 61 | async function ensure_required_zones(){
 62 |     try {
 63 |         console.error('Checking for required zones...');
 64 |         let response = await axios({
 65 |             url: 'https://api.brightdata.com/zone/get_active_zones',
 66 |             method: 'GET',
 67 |             headers: api_headers(),
 68 |         });
 69 |         let zones = response.data || [];
 70 |         let has_unlocker_zone = zones.some(zone=>zone.name==unlocker_zone);
 71 |         let has_browser_zone = zones.some(zone=>zone.name==browser_zone);
 72 |         
 73 |         if (!has_unlocker_zone)
 74 |         {
 75 |             console.error(`Required zone "${unlocker_zone}" not found, `
 76 |                 +`creating it...`);
 77 |             await axios({
 78 |                 url: 'https://api.brightdata.com/zone',
 79 |                 method: 'POST',
 80 |                 headers: {
 81 |                     ...api_headers(),
 82 |                     'Content-Type': 'application/json',
 83 |                 },
 84 |                 data: {
 85 |                     zone: {name: unlocker_zone, type: 'unblocker'},
 86 |                     plan: {type: 'unblocker'},
 87 |                 },
 88 |             });
 89 |             console.error(`Zone "${unlocker_zone}" created successfully`);
 90 |         }
 91 |         else
 92 |             console.error(`Required zone "${unlocker_zone}" already exists`);
 93 |             
 94 |         if (!has_browser_zone)
 95 |         {
 96 |             console.error(`Required zone "${browser_zone}" not found, `
 97 |                 +`creating it...`);
 98 |             await axios({
 99 |                 url: 'https://api.brightdata.com/zone',
100 |                 method: 'POST',
101 |                 headers: {
102 |                     ...api_headers(),
103 |                     'Content-Type': 'application/json',
104 |                 },
105 |                 data: {
106 |                     zone: {name: browser_zone, type: 'browser_api'},
107 |                     plan: {type: 'browser_api'},
108 |                 },
109 |             });
110 |             console.error(`Zone "${browser_zone}" created successfully`);
111 |         }
112 |         else
113 |             console.error(`Required zone "${browser_zone}" already exists`);
114 |     } catch(e){
115 |         console.error('Error checking/creating zones:',
116 |             e.response?.data||e.message);
117 |     }
118 | }
119 | 
120 | await ensure_required_zones();
121 | 
122 | let server = new FastMCP({
123 |     name: 'Bright Data',
124 |     version: package_json.version,
125 | });
126 | let debug_stats = {tool_calls: {}, session_calls: 0, call_timestamps: []};
127 | 
128 | const addTool = (tool) => {
129 |     if (!pro_mode && !pro_mode_tools.includes(tool.name)) 
130 |         return;
131 |     server.addTool(tool);
132 | };
133 | 
134 | addTool({
135 |     name: 'search_engine',
136 |     description: 'Scrape search results from Google, Bing or Yandex. Returns '
137 |         +'SERP results in JSON or Markdown (URL, title, description), Ideal for'
138 |         +'gathering current information, news, and detailed search results.',
139 |     parameters: z.object({
140 |         query: z.string(),
141 |         engine: z.enum(['google', 'bing', 'yandex'])
142 |             .optional()
143 |             .default('google'),
144 |         cursor: z.string()
145 |             .optional()
146 |             .describe('Pagination cursor for next page'),
147 |     }),
148 |     execute: tool_fn('search_engine', async ({query, engine, cursor}, ctx)=>{
149 |         const is_google = engine=='google';
150 |         const url = search_url(engine, query, cursor);
151 |         let response = await axios({
152 |             url: 'https://api.brightdata.com/request',
153 |             method: 'POST',
154 |             data: {
155 |                 url: url,
156 |                 zone: unlocker_zone,
157 |                 format: 'raw',
158 |                 data_format: is_google ? 'parsed' : 'markdown',
159 |             },
160 |             headers: api_headers(ctx.clientName),
161 |             responseType: 'text',
162 |         });
163 |         if (!is_google)
164 |             return response.data;
165 |         try {
166 |             const searchData = JSON.parse(response.data);
167 |             return JSON.stringify({
168 |                 organic: searchData.organic || [],
169 |                 images: searchData.images
170 |                     ? searchData.images.map(img=>img.link) : [],
171 |                 current_page: searchData.pagination.current_page || {},
172 |                 related: searchData.related || [],
173 |                 ai_overview: searchData.ai_overview || null,
174 |             });
175 |         } catch(e){
176 |             return JSON.stringify({
177 |                 organic: [],
178 |                 images: [],
179 |                 pagination: {},
180 |                 related: [],
181 |             });
182 |         }
183 |     }),
184 | });
185 | 
186 | addTool({
187 |     name: 'scrape_as_markdown',
188 |     description: 'Scrape a single webpage URL with advanced options for '
189 |     +'content extraction and get back the results in MarkDown language. '
190 |     +'This tool can unlock any webpage even if it uses bot detection or '
191 |     +'CAPTCHA.',
192 |     parameters: z.object({url: z.string().url()}),
193 |     execute: tool_fn('scrape_as_markdown', async({url}, ctx)=>{
194 |         let response = await axios({
195 |             url: 'https://api.brightdata.com/request',
196 |             method: 'POST',
197 |             data: {
198 |                 url,
199 |                 zone: unlocker_zone,
200 |                 format: 'raw',
201 |                 data_format: 'markdown',
202 |             },
203 |             headers: api_headers(ctx.clientName),
204 |             responseType: 'text',
205 |         });
206 |         return response.data;
207 |     }),
208 | });
209 | 
210 | addTool({
211 |     name: 'search_engine_batch',
212 |     description: 'Run multiple search queries simultaneously. Returns '
213 |     +'JSON for Google, Markdown for Bing/Yandex.',
214 |     parameters: z.object({
215 |         queries: z.array(z.object({
216 |             query: z.string(),
217 |             engine: z.enum(['google', 'bing', 'yandex'])
218 |                 .optional()
219 |                 .default('google'),
220 |             cursor: z.string()
221 |                 .optional(),
222 |         })).min(1).max(10),
223 |     }),
224 |     execute: tool_fn('search_engine_batch', async ({queries}, ctx)=>{
225 |         const search_promises = queries.map(({query, engine, cursor})=>{
226 |             const is_google = (engine || 'google') === 'google';
227 |             const url = is_google
228 |                 ? `${search_url(engine || 'google', query, cursor)}&brd_json=1`
229 |                 : search_url(engine || 'google', query, cursor);
230 | 
231 |             return axios({
232 |                 url: 'https://api.brightdata.com/request',
233 |                 method: 'POST',
234 |                 data: {
235 |                     url,
236 |                     zone: unlocker_zone,
237 |                     format: 'raw',
238 |                     data_format: is_google ? undefined : 'markdown',
239 |                 },
240 |                 headers: api_headers(ctx.clientName),
241 |                 responseType: 'text',
242 |             }).then(response => {
243 |                 if (is_google) {
244 |                     const search_data = JSON.parse(response.data);
245 |                     return {
246 |                         query,
247 |                         engine: engine || 'google',
248 |                         result: {
249 |                             organic: search_data.organic || [],
250 |                             images: search_data.images ? search_data.images.map(img => img.link) : [],
251 |                             current_page: search_data.pagination?.current_page || {},
252 |                             related: search_data.related || [],
253 |                             ai_overview: search_data.ai_overview || null
254 |                         }
255 |                     };
256 |                 }
257 |                 return {
258 |                     query,
259 |                     engine: engine || 'google',
260 |                     result: response.data
261 |                 };
262 |             });
263 |         });
264 | 
265 |         const results = await Promise.all(search_promises);
266 |         return JSON.stringify(results, null, 2);
267 |     }),
268 | });
269 | 
270 | addTool({
271 |    name: 'scrape_batch',
272 |    description: 'Scrape multiple webpages URLs with advanced options for '
273 |         +'content extraction and get back the results in MarkDown language. '
274 |         +'This tool can unlock any webpage even if it uses bot detection or '
275 |         +'CAPTCHA.',
276 |    parameters: z.object({
277 |        urls: z.array(z.string().url()).min(1).max(10).describe('Array of URLs to scrape (max 10)')
278 |    }),
279 |    execute: tool_fn('scrape_batch', async ({urls}, ctx)=>{
280 |        const scrapePromises = urls.map(url =>
281 |            axios({
282 |                url: 'https://api.brightdata.com/request',
283 |                method: 'POST',
284 |                data: {
285 |                    url,
286 |                    zone: unlocker_zone,
287 |                    format: 'raw',
288 |                    data_format: 'markdown',
289 |                },
290 |                headers: api_headers(ctx.clientName),
291 |                responseType: 'text',
292 |            }).then(response => ({
293 |                url,
294 |                content: response.data
295 |            }))
296 |        );
297 | 
298 |        const results = await Promise.all(scrapePromises);
299 |        return JSON.stringify(results, null, 2);
300 |    }),
301 | });
302 | 
303 | addTool({
304 |     name: 'scrape_as_html',
305 |     description: 'Scrape a single webpage URL with advanced options for '
306 |     +'content extraction and get back the results in HTML. '
307 |     +'This tool can unlock any webpage even if it uses bot detection or '
308 |     +'CAPTCHA.',
309 |     parameters: z.object({url: z.string().url()}),
310 |     execute: tool_fn('scrape_as_html', async({url}, ctx)=>{
311 |         let response = await axios({
312 |             url: 'https://api.brightdata.com/request',
313 |             method: 'POST',
314 |             data: {
315 |                 url,
316 |                 zone: unlocker_zone,
317 |                 format: 'raw',
318 |             },
319 |             headers: api_headers(ctx.clientName),
320 |             responseType: 'text',
321 |         });
322 |         return response.data;
323 |     }),
324 | });
325 | 
326 | addTool({
327 |     name: 'extract',
328 |     description: 'Scrape a webpage and extract structured data as JSON. '
329 |         + 'First scrapes the page as markdown, then uses AI sampling to convert '
330 |         + 'it to structured JSON format. This tool can unlock any webpage even '
331 |         + 'if it uses bot detection or CAPTCHA.',
332 |     parameters: z.object({
333 |         url: z.string().url(),
334 |         extraction_prompt: z.string().optional().describe(
335 |             'Custom prompt to guide the extraction process. If not provided, '
336 |             + 'will extract general structured data from the page.'
337 |         ),
338 |     }),
339 |     execute: tool_fn('extract', async ({ url, extraction_prompt }, ctx) => {
340 |         let scrape_response = await axios({
341 |             url: 'https://api.brightdata.com/request',
342 |             method: 'POST',
343 |             data: {
344 |                 url,
345 |                 zone: unlocker_zone,
346 |                 format: 'raw',
347 |                 data_format: 'markdown',
348 |             },
349 |             headers: api_headers(ctx.clientName),
350 |             responseType: 'text',
351 |         });
352 | 
353 |         let markdown_content = scrape_response.data;
354 | 
355 |         let system_prompt = 'You are a data extraction specialist. You MUST respond with ONLY valid JSON, no other text or formatting. '
356 |             + 'Extract the requested information from the markdown content and return it as a properly formatted JSON object. '
357 |             + 'Do not include any explanations, markdown formatting, or text outside the JSON response.';
358 | 
359 |         let user_prompt = extraction_prompt ||
360 |             'Extract the requested information from this markdown content and return ONLY a JSON object:';
361 | 
362 |         let session = server.sessions[0]; // Get the first active session
363 |         if (!session) throw new Error('No active session available for sampling');
364 | 
365 |         let sampling_response = await session.requestSampling({
366 |             messages: [
367 |                 {
368 |                     role: "user",
369 |                     content: {
370 |                         type: "text",
371 |                         text: `${user_prompt}\n\nMarkdown content:\n${markdown_content}\n\nRemember: Respond with ONLY valid JSON, no other text.`,
372 |                     },
373 |                 },
374 |             ],
375 |             systemPrompt: system_prompt,
376 |             includeContext: "thisServer",
377 |         });
378 | 
379 |         return sampling_response.content.text;
380 |     }),
381 | });
382 | 
383 | addTool({
384 |     name: 'session_stats',
385 |     description: 'Tell the user about the tool usage during this session',
386 |     parameters: z.object({}),
387 |     execute: tool_fn('session_stats', async()=>{
388 |         let used_tools = Object.entries(debug_stats.tool_calls);
389 |         let lines = ['Tool calls this session:'];
390 |         for (let [name, calls] of used_tools)
391 |             lines.push(`- ${name} tool: called ${calls} times`);
392 |         return lines.join('\n');
393 |     }),
394 | });
395 | 
396 | const datasets = [{
397 |     id: 'amazon_product',
398 |     dataset_id: 'gd_l7q7dkf244hwjntr0',
399 |     description: [
400 |         'Quickly read structured amazon product data.',
401 |         'Requires a valid product URL with /dp/ in it.',
402 |         'This can be a cache lookup, so it can be more reliable than scraping',
403 |     ].join('\n'),
404 |     inputs: ['url'],
405 | }, {
406 |     id: 'amazon_product_reviews',
407 |     dataset_id: 'gd_le8e811kzy4ggddlq',
408 |     description: [
409 |         'Quickly read structured amazon product review data.',
410 |         'Requires a valid product URL with /dp/ in it.',
411 |         'This can be a cache lookup, so it can be more reliable than scraping',
412 |     ].join('\n'),
413 |     inputs: ['url'],
414 | }, {
415 |     id: 'amazon_product_search',
416 |     dataset_id: 'gd_lwdb4vjm1ehb499uxs',
417 |     description: [
418 |         'Quickly read structured amazon product search data.',
419 |         'Requires a valid search keyword and amazon domain URL.',
420 |         'This can be a cache lookup, so it can be more reliable than scraping',
421 |     ].join('\n'),
422 |     inputs: ['keyword', 'url'],
423 |     fixed_values: {pages_to_search: '1'}, 
424 | }, {
425 |     id: 'walmart_product',
426 |     dataset_id: 'gd_l95fol7l1ru6rlo116',
427 |     description: [
428 |         'Quickly read structured walmart product data.',
429 |         'Requires a valid product URL with /ip/ in it.',
430 |         'This can be a cache lookup, so it can be more reliable than scraping',
431 |     ].join('\n'),
432 |     inputs: ['url'],
433 | }, {
434 |     id: 'walmart_seller',
435 |     dataset_id: 'gd_m7ke48w81ocyu4hhz0',
436 |     description: [
437 |         'Quickly read structured walmart seller data.',
438 |         'Requires a valid walmart seller URL.',
439 |         'This can be a cache lookup, so it can be more reliable than scraping',
440 |     ].join('\n'),
441 |     inputs: ['url'],
442 | }, {
443 |     id: 'ebay_product',
444 |     dataset_id: 'gd_ltr9mjt81n0zzdk1fb',
445 |     description: [
446 |         'Quickly read structured ebay product data.',
447 |         'Requires a valid ebay product URL.',
448 |         'This can be a cache lookup, so it can be more reliable than scraping',
449 |     ].join('\n'),
450 |     inputs: ['url'],
451 | }, {
452 |     id: 'homedepot_products',
453 |     dataset_id: 'gd_lmusivh019i7g97q2n',
454 |     description: [
455 |         'Quickly read structured homedepot product data.',
456 |         'Requires a valid homedepot product URL.',
457 |         'This can be a cache lookup, so it can be more reliable than scraping',
458 |     ].join('\n'),
459 |     inputs: ['url'],
460 | }, {
461 |     id: 'zara_products',
462 |     dataset_id: 'gd_lct4vafw1tgx27d4o0',
463 |     description: [
464 |         'Quickly read structured zara product data.',
465 |         'Requires a valid zara product URL.',
466 |         'This can be a cache lookup, so it can be more reliable than scraping',
467 |     ].join('\n'),
468 |     inputs: ['url'],
469 | }, {
470 |     id: 'etsy_products',
471 |     dataset_id: 'gd_ltppk0jdv1jqz25mz',
472 |     description: [
473 |         'Quickly read structured etsy product data.',
474 |         'Requires a valid etsy product URL.',
475 |         'This can be a cache lookup, so it can be more reliable than scraping',
476 |     ].join('\n'),
477 |     inputs: ['url'],
478 | }, {
479 |     id: 'bestbuy_products',
480 |     dataset_id: 'gd_ltre1jqe1jfr7cccf',
481 |     description: [
482 |         'Quickly read structured bestbuy product data.',
483 |         'Requires a valid bestbuy product URL.',
484 |         'This can be a cache lookup, so it can be more reliable than scraping',
485 |     ].join('\n'),
486 |     inputs: ['url'],
487 | }, {
488 |     id: 'linkedin_person_profile',
489 |     dataset_id: 'gd_l1viktl72bvl7bjuj0',
490 |     description: [
491 |         'Quickly read structured linkedin people profile data.',
492 |         'This can be a cache lookup, so it can be more reliable than scraping',
493 |     ].join('\n'),
494 |     inputs: ['url'],
495 | }, {
496 |     id: 'linkedin_company_profile',
497 |     dataset_id: 'gd_l1vikfnt1wgvvqz95w',
498 |     description: [
499 |         'Quickly read structured linkedin company profile data',
500 |         'This can be a cache lookup, so it can be more reliable than scraping',
501 |     ].join('\n'),
502 |     inputs: ['url'],
503 | }, {
504 |     id: 'linkedin_job_listings',
505 |     dataset_id: 'gd_lpfll7v5hcqtkxl6l',
506 |     description: [
507 |         'Quickly read structured linkedin job listings data',
508 |         'This can be a cache lookup, so it can be more reliable than scraping',
509 |     ].join('\n'),
510 |     inputs: ['url'],
511 | }, {
512 |     id: 'linkedin_posts',
513 |     dataset_id: 'gd_lyy3tktm25m4avu764',
514 |     description: [
515 |         'Quickly read structured linkedin posts data',
516 |         'This can be a cache lookup, so it can be more reliable than scraping',
517 |     ].join('\n'),
518 |     inputs: ['url'],
519 | }, {
520 |     id: 'linkedin_people_search',
521 |     dataset_id: 'gd_m8d03he47z8nwb5xc',
522 |     description: [
523 |         'Quickly read structured linkedin people search data',
524 |         'This can be a cache lookup, so it can be more reliable than scraping',
525 |     ].join('\n'),
526 |     inputs: ['url', 'first_name', 'last_name'],
527 | }, {
528 |     id: 'crunchbase_company',
529 |     dataset_id: 'gd_l1vijqt9jfj7olije',
530 |     description: [
531 |         'Quickly read structured crunchbase company data',
532 |         'This can be a cache lookup, so it can be more reliable than scraping',
533 |     ].join('\n'),
534 |     inputs: ['url'],
535 | },
536 | {
537 |     id: 'zoominfo_company_profile',
538 |     dataset_id: 'gd_m0ci4a4ivx3j5l6nx',
539 |     description: [
540 |         'Quickly read structured ZoomInfo company profile data.',
541 |         'Requires a valid ZoomInfo company URL.',
542 |         'This can be a cache lookup, so it can be more reliable than scraping',
543 |     ].join('\n'),
544 |     inputs: ['url'],
545 | },
546 | {
547 |     id: 'instagram_profiles',
548 |     dataset_id: 'gd_l1vikfch901nx3by4',
549 |     description: [
550 |         'Quickly read structured Instagram profile data.',
551 |         'Requires a valid Instagram URL.',
552 |         'This can be a cache lookup, so it can be more reliable than scraping',
553 |     ].join('\n'),
554 |     inputs: ['url'],
555 | },
556 | {
557 |     id: 'instagram_posts',
558 |     dataset_id: 'gd_lk5ns7kz21pck8jpis',
559 |     description: [
560 |         'Quickly read structured Instagram post data.',
561 |         'Requires a valid Instagram URL.',
562 |         'This can be a cache lookup, so it can be more reliable than scraping',
563 |     ].join('\n'),
564 |     inputs: ['url'],
565 | },
566 | {
567 |     id: 'instagram_reels',
568 |     dataset_id: 'gd_lyclm20il4r5helnj',
569 |     description: [
570 |         'Quickly read structured Instagram reel data.',
571 |         'Requires a valid Instagram URL.',
572 |         'This can be a cache lookup, so it can be more reliable than scraping',
573 |     ].join('\n'),
574 |     inputs: ['url'],
575 | },
576 | {
577 |     id: 'instagram_comments',
578 |     dataset_id: 'gd_ltppn085pokosxh13',
579 |     description: [
580 |         'Quickly read structured Instagram comments data.',
581 |         'Requires a valid Instagram URL.',
582 |         'This can be a cache lookup, so it can be more reliable than scraping',
583 |     ].join('\n'),
584 |     inputs: ['url'],
585 | },
586 | {
587 |     id: 'facebook_posts',
588 |     dataset_id: 'gd_lyclm1571iy3mv57zw',
589 |     description: [
590 |         'Quickly read structured Facebook post data.',
591 |         'Requires a valid Facebook post URL.',
592 |         'This can be a cache lookup, so it can be more reliable than scraping',
593 |     ].join('\n'),
594 |     inputs: ['url'],
595 | },
596 | {
597 |     id: 'facebook_marketplace_listings',
598 |     dataset_id: 'gd_lvt9iwuh6fbcwmx1a',
599 |     description: [
600 |         'Quickly read structured Facebook marketplace listing data.',
601 |         'Requires a valid Facebook marketplace listing URL.',
602 |         'This can be a cache lookup, so it can be more reliable than scraping',
603 |     ].join('\n'),
604 |     inputs: ['url'],
605 | },
606 | {
607 |     id: 'facebook_company_reviews',
608 |     dataset_id: 'gd_m0dtqpiu1mbcyc2g86',
609 |     description: [
610 |         'Quickly read structured Facebook company reviews data.',
611 |         'Requires a valid Facebook company URL and number of reviews.',
612 |         'This can be a cache lookup, so it can be more reliable than scraping',
613 |     ].join('\n'),
614 |     inputs: ['url', 'num_of_reviews'],
615 | }, {
616 |     id: 'facebook_events',
617 |     dataset_id: 'gd_m14sd0to1jz48ppm51',
618 |     description: [
619 |         'Quickly read structured Facebook events data.',
620 |         'Requires a valid Facebook event URL.',
621 |         'This can be a cache lookup, so it can be more reliable than scraping',
622 |     ].join('\n'),
623 |     inputs: ['url'],
624 | }, {
625 |     id: 'tiktok_profiles',
626 |     dataset_id: 'gd_l1villgoiiidt09ci',
627 |     description: [
628 |         'Quickly read structured Tiktok profiles data.',
629 |         'Requires a valid Tiktok profile URL.',
630 |         'This can be a cache lookup, so it can be more reliable than scraping',
631 |     ].join('\n'),
632 |     inputs: ['url'],
633 | }, {
634 |     id: 'tiktok_posts',
635 |     dataset_id: 'gd_lu702nij2f790tmv9h',
636 |     description: [
637 |         'Quickly read structured Tiktok post data.',
638 |         'Requires a valid Tiktok post URL.',
639 |         'This can be a cache lookup, so it can be more reliable than scraping',
640 |     ].join('\n'),
641 |     inputs: ['url'],
642 | }, {
643 |     id: 'tiktok_shop',
644 |     dataset_id: 'gd_m45m1u911dsa4274pi',
645 |     description: [
646 |         'Quickly read structured Tiktok shop data.',
647 |         'Requires a valid Tiktok shop product URL.',
648 |         'This can be a cache lookup, so it can be more reliable than scraping',
649 |     ].join('\n'),
650 |     inputs: ['url'],
651 | }, {
652 |     id: 'tiktok_comments',
653 |     dataset_id: 'gd_lkf2st302ap89utw5k',
654 |     description: [
655 |         'Quickly read structured Tiktok comments data.',
656 |         'Requires a valid Tiktok video URL.',
657 |         'This can be a cache lookup, so it can be more reliable than scraping',
658 |     ].join('\n'),
659 |     inputs: ['url'],
660 | }, {
661 |     id: 'google_maps_reviews',
662 |     dataset_id: 'gd_luzfs1dn2oa0teb81',
663 |     description: [
664 |         'Quickly read structured Google maps reviews data.',
665 |         'Requires a valid Google maps URL.',
666 |         'This can be a cache lookup, so it can be more reliable than scraping',
667 |     ].join('\n'),
668 |     inputs: ['url', 'days_limit'],
669 |     defaults: {days_limit: '3'},
670 | }, {
671 |     id: 'google_shopping',
672 |     dataset_id: 'gd_ltppk50q18kdw67omz',
673 |     description: [
674 |         'Quickly read structured Google shopping data.',
675 |         'Requires a valid Google shopping product URL.',
676 |         'This can be a cache lookup, so it can be more reliable than scraping',
677 |     ].join('\n'),
678 |     inputs: ['url'],
679 | }, {
680 |     id: 'google_play_store',
681 |     dataset_id: 'gd_lsk382l8xei8vzm4u',
682 |     description: [
683 |         'Quickly read structured Google play store data.',
684 |         'Requires a valid Google play store app URL.',
685 |         'This can be a cache lookup, so it can be more reliable than scraping',
686 |     ].join('\n'),
687 |     inputs: ['url'],
688 | }, {
689 |     id: 'apple_app_store',
690 |     dataset_id: 'gd_lsk9ki3u2iishmwrui',
691 |     description: [
692 |         'Quickly read structured apple app store data.',
693 |         'Requires a valid apple app store app URL.',
694 |         'This can be a cache lookup, so it can be more reliable than scraping',
695 |     ].join('\n'),
696 |     inputs: ['url'],
697 | }, {
698 |     id: 'reuter_news',
699 |     dataset_id: 'gd_lyptx9h74wtlvpnfu',
700 |     description: [
701 |         'Quickly read structured reuter news data.',
702 |         'Requires a valid reuter news report URL.',
703 |         'This can be a cache lookup, so it can be more reliable than scraping',
704 |     ].join('\n'),
705 |     inputs: ['url'],
706 | }, {
707 |     id: 'github_repository_file',
708 |     dataset_id: 'gd_lyrexgxc24b3d4imjt',
709 |     description: [
710 |         'Quickly read structured github repository data.',
711 |         'Requires a valid github repository file URL.',
712 |         'This can be a cache lookup, so it can be more reliable than scraping',
713 |     ].join('\n'),
714 |     inputs: ['url'],
715 | }, {
716 |     id: 'yahoo_finance_business',
717 |     dataset_id: 'gd_lmrpz3vxmz972ghd7',
718 |     description: [
719 |         'Quickly read structured yahoo finance business data.',
720 |         'Requires a valid yahoo finance business URL.',
721 |         'This can be a cache lookup, so it can be more reliable than scraping',
722 |     ].join('\n'),
723 |     inputs: ['url'],
724 | },
725 | {
726 |     id: 'x_posts',
727 |     dataset_id: 'gd_lwxkxvnf1cynvib9co',
728 |     description: [
729 |         'Quickly read structured X post data.',
730 |         'Requires a valid X post URL.',
731 |         'This can be a cache lookup, so it can be more reliable than scraping',
732 |     ].join('\n'),
733 |     inputs: ['url'],
734 | },
735 | {
736 |     id: 'zillow_properties_listing',
737 |     dataset_id: 'gd_lfqkr8wm13ixtbd8f5',
738 |     description: [
739 |         'Quickly read structured zillow properties listing data.',
740 |         'Requires a valid zillow properties listing URL.',
741 |         'This can be a cache lookup, so it can be more reliable than scraping',
742 |     ].join('\n'),
743 |     inputs: ['url'],
744 | },
745 | {
746 |     id: 'booking_hotel_listings',
747 |     dataset_id: 'gd_m5mbdl081229ln6t4a',
748 |     description: [
749 |         'Quickly read structured booking hotel listings data.',
750 |         'Requires a valid booking hotel listing URL.',
751 |         'This can be a cache lookup, so it can be more reliable than scraping',
752 |     ].join('\n'),
753 |     inputs: ['url'],
754 | }, {
755 |     id: 'youtube_profiles',
756 |     dataset_id: 'gd_lk538t2k2p1k3oos71',
757 |     description: [
758 |         'Quickly read structured youtube profiles data.',
759 |         'Requires a valid youtube profile URL.',
760 |         'This can be a cache lookup, so it can be more reliable than scraping',
761 |     ].join('\n'),
762 |     inputs: ['url'],
763 | }, {
764 |     id: 'youtube_comments',
765 |     dataset_id: 'gd_lk9q0ew71spt1mxywf',
766 |     description: [
767 |         'Quickly read structured youtube comments data.',
768 |         'Requires a valid youtube video URL.',
769 |         'This can be a cache lookup, so it can be more reliable than scraping',
770 |     ].join('\n'),
771 |     inputs: ['url', 'num_of_comments'],
772 |     defaults: {num_of_comments: '10'},
773 | }, {
774 |     id: 'reddit_posts',
775 |     dataset_id: 'gd_lvz8ah06191smkebj4',
776 |     description: [
777 |         'Quickly read structured reddit posts data.',
778 |         'Requires a valid reddit post URL.',
779 |         'This can be a cache lookup, so it can be more reliable than scraping',
780 |     ].join('\n'),
781 |     inputs: ['url'],
782 | },
783 | {
784 |     id: 'youtube_videos',
785 |     dataset_id: 'gd_lk56epmy2i5g7lzu0k',
786 |     description: [
787 |         'Quickly read structured YouTube videos data.',
788 |         'Requires a valid YouTube video URL.',
789 |         'This can be a cache lookup, so it can be more reliable than scraping',
790 |     ].join('\n'),
791 |     inputs: ['url'],
792 | }];
793 | for (let {dataset_id, id, description, inputs, defaults = {}, fixed_values = {}} of datasets)
794 | {
795 |     let parameters = {};
796 |     for (let input of inputs)
797 |     {
798 |         let param_schema = input=='url' ? z.string().url() : z.string();
799 |         parameters[input] = defaults[input] !== undefined ?
800 |             param_schema.default(defaults[input]) : param_schema;
801 |     }
802 |     addTool({
803 |         name: `web_data_${id}`,
804 |         description,
805 |         parameters: z.object(parameters),
806 |         execute: tool_fn(`web_data_${id}`, async(data, ctx)=>{
807 |             data = {...data, ...fixed_values};
808 |             let trigger_response = await axios({
809 |                 url: 'https://api.brightdata.com/datasets/v3/trigger',
810 |                 params: {dataset_id, include_errors: true},
811 |                 method: 'POST',
812 |                 data: [data],
813 |                 headers: api_headers(ctx.clientName),
814 |             });
815 |             if (!trigger_response.data?.snapshot_id)
816 |                 throw new Error('No snapshot ID returned from request');
817 |             let snapshot_id = trigger_response.data.snapshot_id;
818 |             console.error(`[web_data_${id}] triggered collection with `
819 |                 +`snapshot ID: ${snapshot_id}`);
820 |             let max_attempts = 600;
821 |             let attempts = 0;
822 |             while (attempts < max_attempts)
823 |             {
824 |                 try {
825 |                     if (ctx && ctx.reportProgress)
826 |                     {
827 |                         await ctx.reportProgress({
828 |                             progress: attempts,
829 |                             total: max_attempts,
830 |                             message: `Polling for data (attempt `
831 |                                 +`${attempts + 1}/${max_attempts})`,
832 |                         });
833 |                     }
834 |                     let snapshot_response = await axios({
835 |                         url: `https://api.brightdata.com/datasets/v3`
836 |                             +`/snapshot/${snapshot_id}`,
837 |                         params: {format: 'json'},
838 |                         method: 'GET',
839 |                         headers: api_headers(ctx.clientName),
840 |                     });
841 |                     if (['running', 'building'].includes(snapshot_response.data?.status))
842 |                     {
843 |                         console.error(`[web_data_${id}] snapshot not ready, `
844 |                             +`polling again (attempt `
845 |                             +`${attempts + 1}/${max_attempts})`);
846 |                         attempts++;
847 |                         await new Promise(resolve=>setTimeout(resolve, 1000));
848 |                         continue;
849 |                     }
850 |                     console.error(`[web_data_${id}] snapshot data received `
851 |                         +`after ${attempts + 1} attempts`);
852 |                     let result_data = JSON.stringify(snapshot_response.data);
853 |                     return result_data;
854 |                 } catch(e){
855 |                     console.error(`[web_data_${id}] polling error: `
856 |                         +`${e.message}`);
857 |                     if (e.response?.status === 400) throw e;
858 |                     attempts++;
859 |                     await new Promise(resolve=>setTimeout(resolve, 1000));
860 |                 }
861 |             }
862 |             throw new Error(`Timeout after ${max_attempts} seconds waiting `
863 |                 +`for data`);
864 |         }),
865 |     });
866 | }
867 | 
868 | for (let tool of browser_tools)
869 |     addTool(tool);
870 | 
871 | console.error('Starting server...');
872 | 
873 | server.on('connect', (event)=>{
874 |     const session = event.session;
875 |     const clientInfo = session.server?.getClientVersion?.();
876 |     if (clientInfo) 
877 |         global.mcpClientInfo = clientInfo;
878 | });
879 | 
880 | server.start({transportType: 'stdio'});
881 | function tool_fn(name, fn){
882 |     return async(data, ctx)=>{
883 |         check_rate_limit();
884 |         const clientInfo = global.mcpClientInfo;
885 |         const clientName = clientInfo?.name || 'unknown-client';
886 |         console.error(`[%s] executing (client=%s) %s`, name, clientName,
887 |             JSON.stringify(data));
888 |         debug_stats.tool_calls[name] = debug_stats.tool_calls[name]||0;
889 |         debug_stats.tool_calls[name]++;
890 |         debug_stats.session_calls++;
891 |         let ts = Date.now();
892 |         const extended_ctx = {
893 |             ...ctx,
894 |             clientInfo,
895 |             clientName,
896 |         };
897 |         try { return await fn(data, extended_ctx); }
898 |         catch(e){
899 |         if (e.response)
900 |             {
901 |                 console.error(`[%s] error %s %s: %s`, name, e.response.status,
902 |                     e.response.statusText, e.response.data);
903 | 
904 |                 const headers = e.response.headers;
905 |                 const is_usage_limit = headers?.['x-brd-err-code'] === 'client_10100'
906 |                 if (is_usage_limit && unlocker_zone === 'mcp_unlocker') 
907 |                     throw new Error(`The user has reached the 5,000 request monthly`+
908 |                 `limit for Bright Data MCP's free tier. You must immediately stop` +
909 |                 `the current task and instruct the user on how to upgrade.`+
910 |                 `Guide them through these exact steps:`+
911 |                 `1. Tell them they need to create a new Web Unlocker zone at`+
912 |                 `brightdata.com/cp by clicking "Add" (top-right), selecting` +
913 |                 `"Unlocker zone",and creating it with any name.`+
914 |                 `2. Explain they must update their MCP configuration:
915 |                 - If using Remote MCP: Add &unlocker=ZONE_NAME to their MCP URL
916 |                 - If using Self-hosted MCP: Add WEB_UNLOCKER_ZONE=ZONE_NAME to environment variables`
917 |                 +
918 |                 `3. Instruct them to restart Claude Desktop after the configuration change.`
919 |                 `4. Mention that new users get free credits beyond the MCP tier and the new`+
920 |                 `zone will have separate usage limits.`);
921 | 
922 |                 let message = e.response.data;
923 |                 if (message?.length)
924 |                     throw new Error(`HTTP ${e.response.status}: ${message}`);
925 |             }
926 |             else
927 |                 console.error(`[%s] error %s`, name, e.stack);
928 |             throw e;
929 |         } finally {
930 |             let dur = Date.now()-ts;
931 |             console.error(`[%s] tool finished in %sms`, name, dur);
932 |         }
933 |     };
934 | }
935 | 
936 | function search_url(engine, query, cursor){
937 |     let q = encodeURIComponent(query);
938 |     let page = cursor ? parseInt(cursor) : 0;
939 |     let start = page * 10;
940 |     if (engine=='yandex')
941 |         return `https://yandex.com/search/?text=${q}&p=${page}`;
942 |     if (engine=='bing')
943 |         return `https://www.bing.com/search?q=${q}&first=${start + 1}`;
944 |     return `https://www.google.com/search?q=${q}&start=${start}`;
945 | }
946 | 
```