oxylabs/oxylabs-mcp # codebase.md

# Directory Structure

```
├── .github
│   └── workflows
│       ├── lint_and_test.yml
│       └── publish_to_pypi.yml
├── .gitignore
├── CHANGELOG.md
├── Dockerfile
├── LICENSE
├── Makefile
├── pyproject.toml
├── README.md
├── server.json
├── smithery.yaml
├── src
│   └── oxylabs_mcp
│       ├── __init__.py
│       ├── config.py
│       ├── exceptions.py
│       ├── tools
│       │   ├── __init__.py
│       │   ├── ai_studio.py
│       │   ├── misc.py
│       │   └── scraper.py
│       ├── url_params.py
│       └── utils.py
├── tests
│   ├── __init__.py
│   ├── conftest.py
│   ├── e2e
│   │   ├── __init__.py
│   │   ├── conftest.py
│   │   ├── example.env
│   │   ├── test_call_tools.py
│   │   └── test_llm_agent.py
│   ├── integration
│   │   ├── __init__.py
│   │   ├── params.py
│   │   ├── test_ai_studio_tools.py
│   │   ├── test_scraper_tools.py
│   │   └── test_server.py
│   ├── unit
│   │   ├── __init__.py
│   │   ├── fixtures
│   │   │   ├── __init__.py
│   │   │   ├── after_strip.html
│   │   │   ├── before_strip.html
│   │   │   └── with_links.html
│   │   └── test_utils.py
│   └── utils.py
└── uv.lock
```

# Files

--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------

```
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | share/python-wheels/
 24 | *.egg-info/
 25 | .installed.cfg
 26 | *.egg
 27 | MANIFEST
 28 | 
 29 | # PyInstaller
 30 | #  Usually these files are written by a python script from a template
 31 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 32 | *.manifest
 33 | *.spec
 34 | 
 35 | # Installer logs
 36 | pip-log.txt
 37 | pip-delete-this-directory.txt
 38 | 
 39 | # Unit test / coverage reports
 40 | htmlcov/
 41 | .tox/
 42 | .nox/
 43 | .coverage
 44 | .coverage.*
 45 | .cache
 46 | nosetests.xml
 47 | coverage.xml
 48 | *.cover
 49 | *.py,cover
 50 | .hypothesis/
 51 | .pytest_cache/
 52 | cover/
 53 | 
 54 | # Translations
 55 | *.mo
 56 | *.pot
 57 | 
 58 | # PyBuilder
 59 | .pybuilder/
 60 | target/
 61 | 
 62 | # Jupyter Notebook
 63 | .ipynb_checkpoints
 64 | 
 65 | # IPython
 66 | profile_default/
 67 | ipython_config.py
 68 | 
 69 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
 70 | __pypackages__/
 71 | 
 72 | # Environments
 73 | .env
 74 | tests/e2e/.env
 75 | .venv
 76 | env/
 77 | venv/
 78 | ENV/
 79 | env.bak/
 80 | venv.bak/
 81 | .envrc
 82 | 
 83 | # Rope project settings
 84 | .ropeproject
 85 | 
 86 | # mypy
 87 | .mypy_cache/
 88 | .dmypy.json
 89 | dmypy.json
 90 | 
 91 | # pytype static type analyzer
 92 | .pytype/
 93 | 
 94 | # Cython debug symbols
 95 | cython_debug/
 96 | 
 97 | # PyCharm
 98 | .idea/
 99 | 
100 | # Ruff stuff:
101 | .ruff_cache/
102 | 
103 | # PyPI configuration file
104 | .pypirc
105 | 
106 | deployment/charts/
107 | 
108 | .mcpregistry_*
109 | 
```

--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------

```markdown
  1 | <p align="center">
  2 |   <img src="https://storage.googleapis.com/oxylabs-public-assets/oxylabs_mcp.svg" alt="Oxylabs + MCP">
  3 | </p>
  4 | <h1 align="center" style="border-bottom: none;">
  5 |   Oxylabs MCP Server
  6 | </h1>
  7 | 
  8 | <p align="center">
  9 |   <em>The missing link between AI models and the real‑world web: one API that delivers clean, structured data from any site.</em>
 10 | </p>
 11 | 
 12 | <div align="center">
 13 | 
 14 | [![smithery badge](https://smithery.ai/badge/@oxylabs/oxylabs-mcp)](https://smithery.ai/server/@oxylabs/oxylabs-mcp)
 15 | [![pypi package](https://img.shields.io/pypi/v/oxylabs-mcp?color=%2334D058&label=pypi%20package)](https://pypi.org/project/oxylabs-mcp/)
 16 | [![](https://dcbadge.vercel.app/api/server/eWsVUJrnG5?style=flat)](https://discord.gg/Pds3gBmKMH)
 17 | [![Licence](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
 18 | [![Verified on MseeP](https://mseep.ai/badge.svg)](https://mseep.ai/app/f6a9c0bc-83a6-4f78-89d9-f2cec4ece98d)
 19 | ![Coverage badge](https://raw.githubusercontent.com/oxylabs/oxylabs-mcp/coverage/coverage-badge.svg)
 20 | 
 21 | <br/>
 22 | <a href="https://glama.ai/mcp/servers/@oxylabs/oxylabs-mcp">
 23 |   <img width="380" height="200" src="https://glama.ai/mcp/servers/@oxylabs/oxylabs-mcp/badge" alt="Oxylabs Server MCP server" />
 24 | </a>
 25 | 
 26 | </div>
 27 | 
 28 | ---
 29 | 
 30 | ## 📖 Overview
 31 | 
 32 | The Oxylabs MCP server provides a bridge between AI models and the web. It enables them to scrape any URL, render JavaScript-heavy pages, extract and format content for AI use, bypass anti-scraping measures, and access geo-restricted web data from 195+ countries.
 33 | 
 34 | 
 35 | ## 🛠️ MCP Tools
 36 | 
 37 | Oxylabs MCP provides two sets of tools that can be used together or independently:
 38 | 
 39 | ### Oxylabs Web Scraper API Tools
 40 | 1. **universal_scraper**: Uses Oxylabs Web Scraper API for general website scraping;
 41 | 2. **google_search_scraper**: Uses Oxylabs Web Scraper API to extract results from Google Search;
 42 | 3. **amazon_search_scraper**: Uses Oxylabs Web Scraper API to scrape Amazon search result pages;
 43 | 4. **amazon_product_scraper**: Uses Oxylabs Web Scraper API to extract data from individual Amazon product pages.
 44 | 
 45 | ### Oxylabs AI Studio Tools
 46 | 
 47 | 5. **ai_scraper**: Scrape content from any URL in JSON or Markdown format with AI-powered data extraction;
 48 | 6. **ai_crawler**: Based on a prompt, crawls a website and collects data in Markdown or JSON format across multiple pages;
 49 | 7. **ai_browser_agent**: Based on prompt, controls a browser and returns data in Markdown, JSON, HTML, or screenshot formats;
 50 | 8. **ai_search**: Search the web for URLs and their contents with AI-powered content extraction.
 51 | 
 52 | 
 53 | ## ✅ Prerequisites
 54 | 
 55 | Before you begin, make sure you have **at least one** of the following:
 56 | 
 57 | - **Oxylabs Web Scraper API Account**: Obtain your username and password from [Oxylabs](https://dashboard.oxylabs.io/) (1-week free trial available);
 58 | - **Oxylabs AI Studio API Key**: Obtain your API key from [Oxylabs AI Studio](https://aistudio.oxylabs.io/settings/api-key). (1000 credits free).
 59 | 
 60 | ## 📦 Configuration
 61 | 
 62 | ### Environment variables
 63 | 
 64 | Oxylabs MCP server supports the following environment variables:
 65 | | Name                       | Description                                   | Default |
 66 | |----------------------------|-----------------------------------------------|---------|
 67 | | `OXYLABS_USERNAME`         | Your Oxylabs Web Scraper API username         |         |
 68 | | `OXYLABS_PASSWORD`         | Your Oxylabs Web Scraper API password         |         |
 69 | | `OXYLABS_AI_STUDIO_API_KEY`| Your Oxylabs AI Studio API key                |         |
 70 | | `LOG_LEVEL`                | Log level for the logs returned to the client | `INFO`  |
 71 | 
 72 | Based on provided credentials, the server will automatically expose the corresponding tools:
 73 | - If only `OXYLABS_USERNAME` and `OXYLABS_PASSWORD` are provided, the server will expose the Web Scraper API tools;
 74 | - If only `OXYLABS_AI_STUDIO_API_KEY` is provided, the server will expose the AI Studio tools;
 75 | - If both `OXYLABS_USERNAME` and `OXYLABS_PASSWORD` and `OXYLABS_AI_STUDIO_API_KEY` are provided, the server will expose all tools.
 76 | 
 77 | ❗❗❗ **Important note: if you don't have Web Scraper API or Oxylabs AI studio credentials, delete the corresponding environment variables placeholders.
 78 | Leaving placeholder values will result in exposed tools that do not work.**
 79 | 
 80 | 
 81 | 
 82 | ### Configure with uvx
 83 | 
 84 | - Install the uvx package manager:
 85 |   ```bash
 86 |   # macOS and Linux
 87 |   curl -LsSf https://astral.sh/uv/install.sh | sh
 88 |   ```
 89 |   OR:
 90 |   ```bash
 91 |   # Windows
 92 |   powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
 93 |   ```
 94 | - Use the following config:
 95 |   ```json
 96 |   {
 97 |     "mcpServers": {
 98 |       "oxylabs": {
 99 |         "command": "uvx",
100 |         "args": ["oxylabs-mcp"],
101 |         "env": {
102 |           "OXYLABS_USERNAME": "OXYLABS_USERNAME",
103 |           "OXYLABS_PASSWORD": "OXYLABS_PASSWORD",
104 |           "OXYLABS_AI_STUDIO_API_KEY": "OXYLABS_AI_STUDIO_API_KEY"
105 |         }
106 |       }
107 |     }
108 |   }
109 |   ```
110 | 
111 | ### Configure with uv
112 | 
113 | - Install the uv package manager:
114 |   ```bash
115 |   # macOS and Linux
116 |   curl -LsSf https://astral.sh/uv/install.sh | sh
117 |   ```
118 |   OR:
119 |   ```bash
120 |   # Windows
121 |   powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
122 |   ```
123 | 
124 | - Use the following config:
125 |   ```json
126 |   {
127 |     "mcpServers": {
128 |       "oxylabs": {
129 |         "command": "uv",
130 |         "args": [
131 |           "--directory",
132 |           "/<Absolute-path-to-folder>/oxylabs-mcp",
133 |           "run",
134 |           "oxylabs-mcp"
135 |         ],
136 |         "env": {
137 |           "OXYLABS_USERNAME": "OXYLABS_USERNAME",
138 |           "OXYLABS_PASSWORD": "OXYLABS_PASSWORD",
139 |           "OXYLABS_AI_STUDIO_API_KEY": "OXYLABS_AI_STUDIO_API_KEY"
140 |         }
141 |       }
142 |     }
143 |   }
144 |   ```
145 | 
146 | ### Configure with Smithery Oauth2
147 | 
148 | - Go to https://smithery.ai/server/@oxylabs/oxylabs-mcp;
149 | - Click _Auto_ to install the Oxylabs MCP configuration for the respective client;
150 | - OR use the following config:
151 | ```json
152 |   {
153 |     "mcpServers": {
154 |       "oxylabs": {
155 |         "url": "https://server.smithery.ai/@oxylabs/oxylabs-mcp/mcp"
156 |       }
157 |     }
158 |   }
159 | ```
160 | - Follow the instructions to authenticate Oxylabs MCP with Oauth2 flow
161 | 
162 | ### Configure with Smithery query parameters
163 | 
164 | In case your client does not support the Oauth2 authentication, you can pass the Oxylabs authentication parameters directly in url
165 | ```json
166 |   {
167 |     "mcpServers": {
168 |       "oxylabs": {
169 |         "url": "https://server.smithery.ai/@oxylabs/oxylabs-mcp/mcp?oxylabsUsername=OXYLABS_USERNAME&oxylabsPassword=OXYLABS_PASSWORD&oxylabsAiStudioApiKey=OXYLABS_AI_STUDIO_API_KEY"
170 |       }
171 |     }
172 |   }
173 | ```
174 | 
175 | ### Manual Setup with Claude Desktop
176 | 
177 | Navigate to **Claude → Settings → Developer → Edit Config** and add one of the configurations above to the `claude_desktop_config.json` file.
178 | 
179 | ### Manual Setup with Cursor AI
180 | 
181 | Navigate to **Cursor → Settings → Cursor Settings → MCP**. Click **Add new global MCP server** and add one of the configurations above.
182 | 
183 | 
184 | 
185 | ## 📝 Logging
186 | 
187 | Server provides additional information about the tool calls in `notification/message` events
188 | 
189 | ```json
190 | {
191 |   "method": "notifications/message",
192 |   "params": {
193 |     "level": "info",
194 |     "data": "Create job with params: {\"url\": \"https://ip.oxylabs.io\"}"
195 |   }
196 | }
197 | ```
198 | 
199 | ```json
200 | {
201 |   "method": "notifications/message",
202 |   "params": {
203 |     "level": "info",
204 |     "data": "Job info: job_id=7333113830223918081 job_status=done"
205 |   }
206 | }
207 | ```
208 | 
209 | ```json
210 | {
211 |   "method": "notifications/message",
212 |   "params": {
213 |     "level": "error",
214 |     "data": "Error: request to Oxylabs API failed"
215 |   }
216 | }
217 | ```
218 | 
219 | ---
220 | 
221 | ## 🛡️ License
222 | 
223 | Distributed under the MIT License – see [LICENSE](LICENSE) for details.
224 | 
225 | ---
226 | 
227 | ## About Oxylabs
228 | 
229 | Established in 2015, Oxylabs is a market-leading web intelligence collection
230 | platform, driven by the highest business, ethics, and compliance standards,
231 | enabling companies worldwide to unlock data-driven insights.
232 | 
233 | [![image](https://oxylabs.io/images/og-image.png)](https://oxylabs.io/)
234 | 
235 | <div align="center">
236 | <sub>
237 |   Made with ☕ by <a href="https://oxylabs.io">Oxylabs</a>.  Feel free to give us a ⭐ if MCP saved you a weekend.
238 | </sub>
239 | </div>
240 | 
241 | 
242 | ## ✨ Key Features
243 | 
244 | <details>
245 | <summary><strong> Scrape content from any site</strong></summary>
246 | <br>
247 | 
248 | - Extract data from any URL, including complex single-page applications
249 | - Fully render dynamic websites using headless browser support
250 | - Choose full JavaScript rendering, HTML-only, or none
251 | - Emulate Mobile and Desktop viewports for realistic rendering
252 | 
253 | </details>
254 | 
255 | <details>
256 | <summary><strong> Automatically get AI-ready data</strong></summary>
257 | <br>
258 | 
259 | - Automatically clean and convert HTML to Markdown for improved readability
260 | - Use automated parsers for popular targets like Google, Amazon, and more
261 | 
262 | </details>
263 | 
264 | <details>
265 | <summary><strong> Bypass blocks & geo-restrictions</strong></summary>
266 | <br>
267 | 
268 | - Bypass sophisticated bot protection systems with high success rate
269 | - Reliably scrape even the most complex websites
270 | - Get automatically rotating IPs from a proxy pool covering 195+ countries
271 | 
272 | </details>
273 | 
274 | <details>
275 | <summary><strong> Flexible setup & cross-platform support</strong></summary>
276 | <br>
277 | 
278 | - Set rendering and parsing options if needed
279 | - Feed data directly into AI models or analytics tools
280 | - Works on macOS, Windows, and Linux
281 | 
282 | </details>
283 | 
284 | <details>
285 | <summary><strong> Built-in error handling and request management</strong></summary>
286 | <br>
287 | 
288 | - Comprehensive error handling and reporting
289 | - Smart rate limiting and request management
290 | 
291 | </details>
292 | 
293 | ---
294 | 
295 | 
296 | ## Why Oxylabs MCP? &nbsp;🕸️ ➜ 📦 ➜ 🤖
297 | 
298 | Imagine telling your LLM *"Summarise the latest Hacker News discussion about GPT‑5"* – and it simply answers.  
299 | MCP (Multi‑Client Proxy) makes that happen by doing the boring parts for you:
300 | 
301 | | What Oxylabs MCP does                                             | Why it matters to you                    |
302 | |-------------------------------------------------------------------|------------------------------------------|
303 | | **Bypasses anti‑bot walls** with the Oxylabs global proxy network | Keeps you unblocked and anonymous        |
304 | | **Renders JavaScript** in headless Chrome                         | Single‑page apps, sorted                 |
305 | | **Cleans HTML → JSON**                                            | Drop straight into vector DBs or prompts |
306 | | **Optional structured parsers** (Google, Amazon, etc.)            | One‑line access to popular targets       |
307 | 
308 | mcp-name: io.github.oxylabs/oxylabs-mcp
309 | 
```

--------------------------------------------------------------------------------
/src/oxylabs_mcp/tools/__init__.py:
--------------------------------------------------------------------------------

```python
1 | 
```

--------------------------------------------------------------------------------
/tests/__init__.py:
--------------------------------------------------------------------------------

```python
1 | 
```

--------------------------------------------------------------------------------
/tests/e2e/__init__.py:
--------------------------------------------------------------------------------

```python
1 | 
```

--------------------------------------------------------------------------------
/tests/integration/__init__.py:
--------------------------------------------------------------------------------

```python
1 | 
```

--------------------------------------------------------------------------------
/tests/unit/__init__.py:
--------------------------------------------------------------------------------

```python
1 | 
```

--------------------------------------------------------------------------------
/tests/unit/fixtures/__init__.py:
--------------------------------------------------------------------------------

```python
1 | 
```

--------------------------------------------------------------------------------
/tests/e2e/conftest.py:
--------------------------------------------------------------------------------

```python
 1 | import dotenv
 2 | import pytest
 3 | 
 4 | 
 5 | dotenv.load_dotenv()
 6 | 
 7 | 
 8 | @pytest.fixture(scope="session", autouse=True)
 9 | def environment():
10 |     pass
11 | 
```

--------------------------------------------------------------------------------
/tests/unit/fixtures/after_strip.html:
--------------------------------------------------------------------------------

```html
1 | <html> <body> <div class="content"> <p>Welcome to my website</p> </div> <div class="other"> <p>Visible content</p> </div> </body>
2 | </html>
```

--------------------------------------------------------------------------------
/tests/e2e/example.env:
--------------------------------------------------------------------------------

```
 1 | # Oxylabs Settings
 2 | OXYLABS_USERNAME=
 3 | OXYLABS_PASSWORD=
 4 | 
 5 | # LLM Providers
 6 | ANTHROPIC_API_KEY=
 7 | GOOGLE_API_KEY=
 8 | OPENAI_API_KEY=
 9 | 
10 | # Misc
11 | LOCAL_OXYLABS_MCP_DIRECTORY=
12 | 
```

--------------------------------------------------------------------------------
/src/oxylabs_mcp/tools/misc.py:
--------------------------------------------------------------------------------

```python
1 | # mypy: disable-error-code=import-untyped
2 | from oxylabs_ai_studio import client
3 | 
4 | 
5 | def setup() -> None:
6 |     """Setups the environment."""
7 |     client._UA_API = "py-mcp"
8 | 
```

--------------------------------------------------------------------------------
/src/oxylabs_mcp/exceptions.py:
--------------------------------------------------------------------------------

```python
 1 | from fastmcp.server.dependencies import get_context
 2 | 
 3 | 
 4 | class MCPServerError(Exception):
 5 |     """Generic MCP server exception."""
 6 | 
 7 |     async def process(self) -> str:
 8 |         """Process exception."""
 9 |         err = str(self)
10 |         await get_context().error(err)
11 |         return err
12 | 
```

--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------

```markdown
 1 | # Changelog
 2 | 
 3 | ## [0.2.2] - 2025-05-23
 4 | 
 5 | ### Fixed
 6 | 
 7 | - Coverage badge
 8 | 
 9 | ## [0.2.1] - 2025-05-23
10 | 
11 | ### Added
12 | 
13 | - More tests
14 | 
15 | ### Changed
16 | 
17 | - README.md
18 | 
19 | ## [0.2.0] - 2025-05-13
20 | 
21 | ### Added
22 | 
23 | - Changelog
24 | - E2E tests
25 | - Geolocation and User Agent type parameters to universal scraper
26 | 
27 | ### Changed
28 | 
29 | - Descriptions for tools
30 | - Descriptions for tool parameters
31 | - Default values for tool parameters
32 | 
33 | ### Removed
34 | 
35 | - WebUnblocker tool
36 | - Parse parameter for universal scraper
37 | 
```

--------------------------------------------------------------------------------
/tests/unit/fixtures/before_strip.html:
--------------------------------------------------------------------------------

```html
 1 | <html>
 2 |     <body>
 3 |         <div class="content">
 4 |             <p>Welcome to my website</p>
 5 |         </div>
 6 |         <div id="footer">
 7 |             <p>This is the footer content.</p>
 8 |         </div>
 9 |         <div class="hidden">
10 |             <p>This content is hidden.</p>
11 |         </div>
12 |         <div class="other">
13 |             <p>Visible content</p>
14 |         </div>
15 |         <script>console.log('script tag');</script>
16 |         <noscript>This is noscript content.</noscript>
17 |         <form><input type="text"/></form>
18 |     </body>
19 | </html>
```

--------------------------------------------------------------------------------
/src/oxylabs_mcp/config.py:
--------------------------------------------------------------------------------

```python
 1 | from typing import Literal
 2 | 
 3 | from dotenv import load_dotenv
 4 | from pydantic_settings import BaseSettings
 5 | 
 6 | 
 7 | load_dotenv()
 8 | 
 9 | 
10 | class Settings(BaseSettings):
11 |     """Project settings."""
12 | 
13 |     OXYLABS_SCRAPER_URL: str = "https://realtime.oxylabs.io/v1/queries"
14 |     OXYLABS_REQUEST_TIMEOUT_S: int = 100
15 |     LOG_LEVEL: str = "INFO"
16 | 
17 |     MCP_TRANSPORT: Literal["stdio", "sse", "streamable-http"] = "stdio"
18 |     MCP_PORT: int = 8000
19 |     MCP_HOST: str = "localhost"
20 |     MCP_STATELESS_HTTP: bool = False
21 | 
22 |     # smithery config
23 |     PORT: int | None = None
24 | 
25 | 
26 | settings = Settings()
27 | 
```

--------------------------------------------------------------------------------
/smithery.yaml:
--------------------------------------------------------------------------------

```yaml
 1 | runtime: "container"
 2 | build:
 3 |   dockerfile: "Dockerfile"
 4 |   dockerBuildPath: "."
 5 | startCommand:
 6 |   type: "http"
 7 |   configSchema:
 8 |     type: "object"
 9 |     properties:
10 |       oxylabsUsername:
11 |         type: "string"
12 |         description: "Oxylabs username"
13 |       oxylabsPassword:
14 |         type: "string"
15 |         description: "Oxylabs password"
16 |       oxylabsAiStudioApiKey:
17 |         type: "string"
18 |         description: "Oxylabs AI Studio api key"
19 |     required: []
20 |   exampleConfig:
21 |     oxylabsUsername: "Your Oxylabs username"
22 |     oxylabsPassword: "Your Oxylabs password"
23 |     oxylabsAiStudioApiKey: "Your Oxylabs AI Studio api key"
24 | 
```

--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------

```dockerfile
 1 | FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim
 2 | 
 3 | ENV UV_COMPILE_BYTECODE=1
 4 | ENV UV_LINK_MODE=copy
 5 | ENV UV_CACHE_DIR=/opt/uv-cache/
 6 | 
 7 | RUN apt-get update && apt-get install -y --no-install-recommends git
 8 | 
 9 | WORKDIR /app
10 | 
11 | RUN --mount=type=cache,target=UV_CACHE_DIR \
12 |     --mount=type=bind,source=uv.lock,target=uv.lock \
13 |     --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
14 |     uv sync --frozen --no-install-project --no-dev --no-editable
15 | 
16 | ADD . /app
17 | 
18 | RUN --mount=type=cache,target=UV_CACHE_DIR \
19 |     uv sync --frozen --no-dev --no-editable
20 | 
21 | # Add virtual environment to PATH
22 | ENV PATH="/app/.venv/bin:$PATH"
23 | ENV MCP_TRANSPORT="streamable-http"
24 | 
25 | ENTRYPOINT ["oxylabs-mcp"]
26 | 
```

--------------------------------------------------------------------------------
/tests/utils.py:
--------------------------------------------------------------------------------

```python
 1 | def convert_context_params(arguments: dict) -> dict:
 2 |     context_fields = ["category_id", "merchant_id", "currency", "autoselect_variant"]
 3 |     arguments_copy = {**arguments}
 4 | 
 5 |     for f in context_fields:
 6 |         if f in arguments_copy:
 7 |             if "context" not in arguments_copy:
 8 |                 arguments_copy["context"] = []
 9 | 
10 |             arguments_copy["context"].append({"key": f, "value": arguments_copy[f]})
11 |             del arguments_copy[f]
12 | 
13 |     return arguments_copy
14 | 
15 | 
16 | def prepare_expected_arguments(arguments: dict) -> dict:
17 |     arguments_copy = {**arguments}
18 |     if "output_format" in arguments_copy:
19 |         del arguments_copy["output_format"]
20 |     return arguments_copy
21 | 
```

--------------------------------------------------------------------------------
/.github/workflows/publish_to_pypi.yml:
--------------------------------------------------------------------------------

```yaml
 1 | name: Publish Python 🐍 distributions 📦 to PyPI
 2 | 
 3 | on:
 4 |   push:
 5 |     tags:
 6 |       - 'v[0-9]+.[0-9]+.[0-9]+'
 7 | jobs:
 8 |   build-n-publish:
 9 |     name: Build and publish Python distribution to PyPI
10 |     runs-on: ubuntu-latest
11 |     steps:
12 |     - uses: actions/checkout@v4
13 | 
14 |     - name: Set up Python
15 |       uses: actions/setup-python@v2
16 |       with:
17 |         python-version: 3.12
18 | 
19 |     - name: Install uv
20 |       run: |
21 |         pip install uv
22 | 
23 |     - name: Install dependencies
24 |       run: |
25 |         uv sync --no-dev
26 | 
27 |     - name: Build a dist package
28 |       run: uv build
29 | 
30 |     - name: Publish distribution to PyPI
31 |       uses: pypa/gh-action-pypi-publish@release/v1
32 |       with:
33 |         user: __token__
34 |         password: ${{ secrets.PYPI_API_TOKEN }}
35 | 
```

--------------------------------------------------------------------------------
/server.json:
--------------------------------------------------------------------------------

```json
 1 | {
 2 |   "$schema": "https://static.modelcontextprotocol.io/schemas/2025-10-17/server.schema.json",
 3 |   "name": "io.github.oxylabs/oxylabs-mcp",
 4 |   "description": "Fetch and process content from specified URLs & sources using the Oxylabs Web Scraper API.",
 5 |   "repository": {
 6 |     "url": "https://github.com/oxylabs/oxylabs-mcp",
 7 |     "source": "github"
 8 |   },
 9 |   "version": "0.7.1",
10 |   "packages": [
11 |     {
12 |       "registryType": "pypi",
13 |       "identifier": "oxylabs-mcp",
14 |       "version": "0.7.1",
15 |       "transport": {
16 |         "type": "stdio"
17 |       },
18 |       "environmentVariables": [
19 |         {
20 |           "description": "Your Oxylabs username",
21 |           "isRequired": false,
22 |           "format": "string",
23 |           "isSecret": true,
24 |           "name": "OXYLABS_USERNAME"
25 |         },
26 |         {
27 |           "description": "Your Oxylabs password",
28 |           "isRequired": false,
29 |           "format": "string",
30 |           "isSecret": true,
31 |           "name": "OXYLABS_PASSWORD"
32 |         },
33 |         {
34 |           "description": "Your Oxylabs AI Studio api key",
35 |           "isRequired": false,
36 |           "format": "string",
37 |           "isSecret": true,
38 |           "name": "OXYLABS_AI_STUDIO_API_KEY"
39 |         }
40 |       ]
41 |     }
42 |   ]
43 | }
```

--------------------------------------------------------------------------------
/tests/unit/fixtures/with_links.html:
--------------------------------------------------------------------------------

```html
 1 | <!doctype html>
 2 | <html>
 3 | <head>
 4 |     <title>Example Domain</title>
 5 | 
 6 |     <meta charset="utf-8" />
 7 |     <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
 8 |     <meta name="viewport" content="width=device-width, initial-scale=1" />
 9 |     <style type="text/css">
10 |     body {
11 |         background-color: #f0f0f2;
12 |         margin: 0;
13 |         padding: 0;
14 |         font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
15 | 
16 |     }
17 |     div {
18 |         width: 600px;
19 |         margin: 5em auto;
20 |         padding: 2em;
21 |         background-color: #fdfdff;
22 |         border-radius: 0.5em;
23 |         box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
24 |     }
25 |     a:link, a:visited {
26 |         color: #38488f;
27 |         text-decoration: none;
28 |     }
29 |     @media (max-width: 700px) {
30 |         div {
31 |             margin: 0 auto;
32 |             width: auto;
33 |         }
34 |     }
35 |     </style>
36 | </head>
37 | 
38 | <body>
39 | <div>
40 |     <h1>Example Domain</h1>
41 |     <p>This domain is for use in illustrative examples in documents. You may use this
42 |     domain in literature without prior coordination or asking for permission.</p>
43 |     <p><a href="https://www.iana.org/domains/example">More information...</a></p>
44 |     <p><a href="https://example.com">Another link</a></p>
45 | </div>
46 | </body>
47 | </html>
48 | 
```

--------------------------------------------------------------------------------
/tests/e2e/test_call_tools.py:
--------------------------------------------------------------------------------

```python
 1 | import os
 2 | from contextlib import asynccontextmanager
 3 | 
 4 | import pytest
 5 | from mcp import ClientSession, StdioServerParameters
 6 | from mcp.client.stdio import stdio_client
 7 | 
 8 | 
 9 | @asynccontextmanager
10 | async def get_oxylabs_mcp_client():
11 |     server_params = StdioServerParameters(
12 |         command="uv",  # Using uv to run the server
13 |         args=["run", "oxylabs-mcp"],
14 |         env={
15 |             "OXYLABS_USERNAME": os.getenv("OXYLABS_USERNAME"),
16 |             "OXYLABS_PASSWORD": os.getenv("OXYLABS_PASSWORD"),
17 |         },
18 |         cwd=os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))),
19 |     )
20 | 
21 |     async with stdio_client(server_params) as (read, write):
22 |         async with ClientSession(read, write) as session:
23 |             await session.initialize()
24 |             yield session
25 | 
26 | 
27 | @pytest.mark.asyncio
28 | @pytest.mark.parametrize(
29 |     ("url", "min_response_len"),
30 |     [
31 |         (
32 |             "https://maisonpur.com/best-non-toxic-cutting-boards-safer-options-for-a-healthy-kitchen/",
33 |             10000,
34 |         ),
35 |         ("https://sandbox.oxylabs.io/products/1", 2500),
36 |         ("https://sandbox.oxylabs.io/products/5", 3000),
37 |     ],
38 | )
39 | async def test_universal_scraper_tool(url: str, min_response_len: int):
40 |     async with get_oxylabs_mcp_client() as session:
41 |         result = await session.call_tool("universal_scraper", arguments={"url": url})
42 |         assert len(result.content[0].text) > min_response_len
43 | 
```

--------------------------------------------------------------------------------
/.github/workflows/lint_and_test.yml:
--------------------------------------------------------------------------------

```yaml
 1 | name: Lint & Test
 2 | 
 3 | on:
 4 |   push:
 5 |     branches: [ "main" ]
 6 |   pull_request:
 7 |     branches: [ "main" ]
 8 | 
 9 | permissions:
10 |   contents: write
11 | 
12 | jobs:
13 |   lint_and_test:
14 | 
15 |     runs-on: ubuntu-latest
16 | 
17 |     steps:
18 |     - uses: actions/checkout@v4
19 |     - name: Set up Python 3.12
20 |       uses: actions/setup-python@v3
21 |       with:
22 |         python-version: "3.12"
23 | 
24 |     - name: Install uv
25 |       run: |
26 |         pip install uv
27 | 
28 |     - name: Install dependencies
29 |       run: |
30 |         uv sync
31 | 
32 |     - name: Run linters
33 |       run: |
34 |         uv run black --check .
35 |         uv run mypy src
36 |         uv run ruff check .
37 | 
38 |     - name: Run tests
39 |       run: |
40 |         uv run pytest --cov=src --cov-report xml --cov-report term --cov-fail-under=90 tests/unit tests/integration
41 | 
42 |     - name: Generate coverage badge
43 |       run: |
44 |         pip install "genbadge[coverage]"
45 |         genbadge coverage -i coverage.xml -o coverage-badge.svg
46 | 
47 |     - name: Upload coverage report artifact
48 |       uses: actions/upload-artifact@v4
49 |       with:
50 |         name: coverage-report
51 |         path: coverage.xml
52 | 
53 |     - name: Upload coverage badge artifact
54 |       uses: actions/upload-artifact@v4
55 |       with:
56 |         name: coverage-badge
57 |         path: coverage-badge.svg
58 | 
59 |     - name: Deploy coverage report to branch
60 |       if: github.ref == 'refs/heads/main'
61 |       uses: peaceiris/actions-gh-pages@v4
62 |       with:
63 |         publish_branch: 'coverage'
64 |         github_token: ${{ secrets.GITHUB_TOKEN }}
65 |         publish_dir: .
66 |         keep_files: coverage-badge.svg
67 |         user_name: 'github-actions[bot]'
68 |         user_email: 'github-actions[bot]@users.noreply.github.com'
69 |         commit_message: 'chore: Update coverage data from workflow run ${{ github.event.workflow_run.id }}'
70 | 
```

--------------------------------------------------------------------------------
/src/oxylabs_mcp/__init__.py:
--------------------------------------------------------------------------------

```python
 1 | import logging
 2 | from typing import Any
 3 | 
 4 | from fastmcp import Context, FastMCP
 5 | from mcp import Tool as MCPTool
 6 | 
 7 | from oxylabs_mcp.config import settings
 8 | from oxylabs_mcp.tools.ai_studio import AI_TOOLS
 9 | from oxylabs_mcp.tools.ai_studio import mcp as ai_studio_mcp
10 | from oxylabs_mcp.tools.scraper import SCRAPER_TOOLS
11 | from oxylabs_mcp.tools.scraper import mcp as scraper_mcp
12 | from oxylabs_mcp.utils import get_oxylabs_ai_studio_api_key, get_oxylabs_auth
13 | 
14 | 
15 | class OxylabsMCPServer(FastMCP):
16 |     """Oxylabs MCP server."""
17 | 
18 |     async def _mcp_list_tools(self) -> list[MCPTool]:
19 |         """List all available Oxylabs tools."""
20 |         async with Context(fastmcp=self):
21 |             tools = await self._list_tools()
22 | 
23 |             username, password = get_oxylabs_auth()
24 |             if not username or not password:
25 |                 tools = [tool for tool in tools if tool.name not in SCRAPER_TOOLS]
26 | 
27 |             if not get_oxylabs_ai_studio_api_key():
28 |                 tools = [tool for tool in tools if tool.name not in AI_TOOLS]
29 | 
30 |             return [
31 |                 tool.to_mcp_tool(
32 |                     name=tool.key,
33 |                     include_fastmcp_meta=self.include_fastmcp_meta,
34 |                 )
35 |                 for tool in tools
36 |             ]
37 | 
38 | 
39 | mcp = OxylabsMCPServer("oxylabs_mcp")
40 | 
41 | mcp.mount(ai_studio_mcp)
42 | mcp.mount(scraper_mcp)
43 | 
44 | 
45 | def main() -> None:
46 |     """Start the MCP server."""
47 |     logging.getLogger("oxylabs_mcp").setLevel(settings.LOG_LEVEL)
48 | 
49 |     params: dict[str, Any] = {}
50 | 
51 |     if settings.MCP_TRANSPORT == "streamable-http":
52 |         params["host"] = settings.MCP_HOST
53 |         params["port"] = settings.PORT or settings.MCP_PORT
54 |         params["log_level"] = settings.LOG_LEVEL
55 |         params["stateless_http"] = settings.MCP_STATELESS_HTTP
56 | 
57 |     mcp.run(
58 |         settings.MCP_TRANSPORT,
59 |         **params,
60 |     )
61 | 
62 | 
63 | # Optionally expose other important items at package level
64 | __all__ = ["main", "mcp"]
65 | 
```

--------------------------------------------------------------------------------
/tests/unit/test_utils.py:
--------------------------------------------------------------------------------

```python
 1 | from unittest.mock import patch
 2 | 
 3 | import pytest
 4 | 
 5 | from oxylabs_mcp.config import settings
 6 | from oxylabs_mcp.utils import extract_links_with_text, get_oxylabs_auth, strip_html
 7 | 
 8 | 
 9 | TEST_FIXTURES = "tests/unit/fixtures/"
10 | 
11 | 
12 | @pytest.mark.parametrize(
13 |     "env_vars",
14 |     [
15 |         pytest.param(
16 |             {"OXYLABS_USERNAME": "test_user", "OXYLABS_PASSWORD": "test_pass"},
17 |             id="valid-env",
18 |         ),
19 |         pytest.param(
20 |             {"OXYLABS_PASSWORD": "test_pass"},
21 |             id="no-username",
22 |         ),
23 |         pytest.param(
24 |             {"OXYLABS_USERNAME": "test_user"},
25 |             id="no-password",
26 |         ),
27 |         pytest.param({}, id="no-username-and-no-password"),
28 |     ],
29 | )
30 | def test_get_oxylabs_auth(env_vars):
31 |     with patch("os.environ", new=env_vars):
32 |         settings.MCP_TRANSPORT = "stdio"
33 |         username, password = get_oxylabs_auth()
34 |         assert username == env_vars.get("OXYLABS_USERNAME")
35 |         assert password == env_vars.get("OXYLABS_PASSWORD")
36 | 
37 | 
38 | @pytest.mark.parametrize(
39 |     ("html_input", "expected_output"),
40 |     [pytest.param("before_strip.html", "after_strip.html", id="strip-html")],
41 | )
42 | def test_strip_html(html_input: str, expected_output: str):
43 |     with (
44 |         open(TEST_FIXTURES + html_input, "r", encoding="utf-8") as input_file,
45 |         open(TEST_FIXTURES + expected_output, "r", encoding="utf-8") as output_file,
46 |     ):
47 |         input_html = input_file.read()
48 |         expected_html = output_file.read()
49 | 
50 |         actual_output = strip_html(input_html)
51 |         assert actual_output == expected_html
52 | 
53 | 
54 | @pytest.mark.parametrize(
55 |     ("html_input", "expected_output"),
56 |     [
57 |         pytest.param(
58 |             "with_links.html",
59 |             "[More information...] https://www.iana.org/domains/example\n"
60 |             "[Another link] https://example.com",
61 |             id="strip-html",
62 |         )
63 |     ],
64 | )
65 | def test_extract_links_with_text(html_input: str, expected_output: str):
66 |     with (open(TEST_FIXTURES + html_input, "r", encoding="utf-8") as input_file,):
67 |         input_html = input_file.read()
68 | 
69 |         links = extract_links_with_text(input_html)
70 |         assert "\n".join(links) == expected_output
71 | 
```

--------------------------------------------------------------------------------
/tests/conftest.py:
--------------------------------------------------------------------------------

```python
  1 | from contextlib import asynccontextmanager
  2 | from unittest.mock import AsyncMock, MagicMock, patch
  3 | 
  4 | import pytest
  5 | from fastmcp.server.context import Context, set_context
  6 | from httpx import Request
  7 | from mcp.server.lowlevel.server import request_ctx
  8 | 
  9 | from oxylabs_mcp import mcp as mcp_server
 10 | 
 11 | 
 12 | @pytest.fixture
 13 | def request_context():
 14 |     request_context = MagicMock()
 15 |     request_context.session.client_params.clientInfo.name = "fake_cursor"
 16 |     request_context.request.headers = {
 17 |         "x-oxylabs-username": "oxylabs_username",
 18 |         "x-oxylabs-password": "oxylabs_password",
 19 |         "x-oxylabs-ai-studio-api-key": "oxylabs_ai_studio_api_key",
 20 |     }
 21 | 
 22 |     ctx = Context(MagicMock())
 23 |     ctx.info = AsyncMock()
 24 |     ctx.error = AsyncMock()
 25 | 
 26 |     request_ctx.set(request_context)
 27 | 
 28 |     with set_context(ctx):
 29 |         yield ctx
 30 | 
 31 | 
 32 | @pytest.fixture(scope="session", autouse=True)
 33 | def environment():
 34 |     env = {
 35 |         "OXYLABS_USERNAME": "oxylabs_username",
 36 |         "OXYLABS_PASSWORD": "oxylabs_password",
 37 |         "OXYLABS_AI_STUDIO_API_KEY": "oxylabs_ai_studio_api_key",
 38 |     }
 39 |     with patch("os.environ", new=env):
 40 |         yield
 41 | 
 42 | 
 43 | @pytest.fixture
 44 | def mcp(request_context: Context):
 45 |     return mcp_server
 46 | 
 47 | 
 48 | @pytest.fixture
 49 | def request_data():
 50 |     return Request("POST", "https://example.com/v1/queries")
 51 | 
 52 | 
 53 | @pytest.fixture
 54 | def oxylabs_client():
 55 |     client_mock = AsyncMock()
 56 | 
 57 |     @asynccontextmanager
 58 |     async def wrapper(*args, **kwargs):
 59 |         client_mock.context_manager_call_args = args
 60 |         client_mock.context_manager_call_kwargs = kwargs
 61 | 
 62 |         yield client_mock
 63 | 
 64 |     with patch("oxylabs_mcp.utils.AsyncClient", new=wrapper):
 65 |         yield client_mock
 66 | 
 67 | 
 68 | @pytest.fixture
 69 | def request_session(request_context):
 70 |     token = request_ctx.set(request_context)
 71 | 
 72 |     yield request_context.session
 73 | 
 74 |     request_ctx.reset(token)
 75 | 
 76 | 
 77 | @pytest.fixture(scope="session", autouse=True)
 78 | def is_api_key_valid_mock():
 79 |     with patch("oxylabs_mcp.utils.is_api_key_valid", return_value=True):
 80 |         yield
 81 | 
 82 | 
 83 | @pytest.fixture
 84 | def mock_schema():
 85 |     return {"field_1": "value1", "field_2": "value2"}
 86 | 
 87 | 
 88 | @pytest.fixture
 89 | def ai_crawler(mock_schema):
 90 |     mock_crawler = MagicMock()
 91 |     mock_crawler.generate_schema.return_value = mock_schema
 92 | 
 93 |     with patch("oxylabs_mcp.tools.ai_studio.AiCrawler", return_value=mock_crawler):
 94 |         yield mock_crawler
 95 | 
 96 | 
 97 | @pytest.fixture
 98 | def ai_scraper(mock_schema):
 99 |     mock_scraper = MagicMock()
100 |     mock_scraper.generate_schema.return_value = mock_schema
101 | 
102 |     with patch("oxylabs_mcp.tools.ai_studio.AiScraper", return_value=mock_scraper):
103 |         yield mock_scraper
104 | 
105 | 
106 | @pytest.fixture
107 | def browser_agent(mock_schema):
108 |     mock_browser_agent = MagicMock()
109 |     mock_browser_agent.generate_schema.return_value = mock_schema
110 | 
111 |     with patch("oxylabs_mcp.tools.ai_studio.BrowserAgent", return_value=mock_browser_agent):
112 |         yield mock_browser_agent
113 | 
114 | 
115 | @pytest.fixture
116 | def ai_search():
117 |     mock_ai_search = MagicMock()
118 | 
119 |     with patch("oxylabs_mcp.tools.ai_studio.AiSearch", return_value=mock_ai_search):
120 |         yield mock_ai_search
121 | 
122 | 
123 | @pytest.fixture
124 | def ai_map():
125 |     mock_ai_map = MagicMock()
126 | 
127 |     with patch("oxylabs_mcp.tools.ai_studio.AiMap", return_value=mock_ai_map):
128 |         yield mock_ai_map
129 | 
```

--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------

```toml
  1 | [project]
  2 | name = "oxylabs-mcp"
  3 | version = "0.7.1"
  4 | description = "Oxylabs MCP server"
  5 | authors = [
  6 |     {name="Augis Braziunas", email="[email protected]"},
  7 |     {name="Rostyslav Borovyk", email="[email protected]"},
  8 | ]
  9 | readme = "README.md"
 10 | requires-python = ">=3.12"
 11 | classifiers = [
 12 |     "Programming Language :: Python :: 3",
 13 |     "Programming Language :: Python :: 3.12",
 14 |     "Programming Language :: Python :: 3.13",
 15 |     "Development Status :: 4 - Beta",
 16 |     "Operating System :: OS Independent",
 17 | ]
 18 | 
 19 | license = "MIT"
 20 | license-files = ["LICEN[CS]E*"]
 21 | 
 22 | dependencies = [
 23 |     "fastmcp>=2.11.3",
 24 |     "httpx>=0.28.1",
 25 |     "lxml>=5.3.0,<6",
 26 |     "lxml-html-clean>=0.4.1",
 27 |     "markdownify>=0.14.1",
 28 |     "oxylabs-ai-studio>=0.2.15",
 29 |     "pydantic>=2.10.5",
 30 |     "pydantic-settings>=2.8.1",
 31 |     "smithery>=0.1.25",
 32 | ]
 33 | 
 34 | [dependency-groups]
 35 | dev = [
 36 |     "bandit>=1.8.6",
 37 |     "black>=25.1.0",
 38 |     "lxml-stubs>=0.5.1",
 39 |     "mypy>=1.14.1",
 40 |     "pytest>=8.3.4",
 41 |     "pytest-asyncio>=0.25.2",
 42 |     "pytest-cov>=6.1.1",
 43 |     "pytest-mock>=3.14.0",
 44 |     "ruff>=0.9.1",
 45 | ]
 46 | e2e-tests = [
 47 |     "agno>=1.8.1",
 48 |     "anthropic>=0.50.0",
 49 |     "google-genai>=1.13.0",
 50 |     "openai>=1.77.0",
 51 | ]
 52 | 
 53 | [build-system]
 54 | requires = ["hatchling"]
 55 | build-backend = "hatchling.build"
 56 | 
 57 | [project.scripts]
 58 | oxylabs-mcp = "oxylabs_mcp:main"
 59 | 
 60 | [project.urls]
 61 | Homepage = "https://github.com/oxylabs/oxylabs-mcp"
 62 | Repository = "https://github.com/oxylabs/oxylabs-mcp"
 63 | 
 64 | [[tool.mypy.overrides]]
 65 | module = "markdownify.*"
 66 | ignore_missing_imports = true
 67 | strict = true
 68 | 
 69 | [tool.ruff]
 70 | target-version = "py312"
 71 | lint.select = [
 72 |   "E", # pycodestyle (E, W) - https://docs.astral.sh/ruff/rules/#pycodestyle-e-w
 73 |   "F", # Pyflakes (F) - https://docs.astral.sh/ruff/rules/#pyflakes-f
 74 |   "W", # pycodestyle (E, W) - https://docs.astral.sh/ruff/rules/#pycodestyle-e-w
 75 |   "I", # isort (I) https://docs.astral.sh/ruff/rules/#isort-i
 76 |   "D", # pydocstyle (D) https://docs.astral.sh/ruff/rules/#pydocstyle-d
 77 |   "S", # bandit (S) https://docs.astral.sh/ruff/rules/#flake8-bandit-s
 78 |   "ARG", # flake8-unused-arguments - https://docs.astral.sh/ruff/rules/#flake8-unused-arguments-arg
 79 |   "B",   # flake8-bugbear - https://docs.astral.sh/ruff/rules/#flake8-bugbear-b
 80 |   "C4",  # flake8-comprehensions - https://docs.astral.sh/ruff/rules/#flake8-comprehensions-c4
 81 |   "ISC", # flake8-implicit-str-concat - https://docs.astral.sh/ruff/rules/#flake8-implicit-str-concat-isc
 82 |   "FA",  # flake8-future-annotations - https://docs.astral.sh/ruff/rules/#flake8-future-annotations-fa
 83 |   "FBT", # flake8-boolean-trap - https://docs.astral.sh/ruff/rules/#flake8-boolean-trap-fbt
 84 |   "Q", # flake8-quotes (Q) https://docs.astral.sh/ruff/rules/#flake8-quotes-q
 85 |   "ANN", # flake8-annotations (ANN) https://docs.astral.sh/ruff/rules/#flake8-annotations-ann
 86 |   "PLR", # Refactor (PLR) https://docs.astral.sh/ruff/rules/#refactor-plr
 87 |   "PT", # flake8-pytest-style (PT) https://docs.astral.sh/ruff/rules/#flake8-pytest-style-pt
 88 | ]
 89 | lint.ignore = [
 90 |   "D213",  # Contradicts D212.
 91 |   "D203",  # Contradicts D211.
 92 |   "D104", # Allow no docstrings in packages
 93 |   "D100", # Allow no docstrings in modules
 94 |   "ANN002", # https://docs.astral.sh/ruff/rules/missing-type-args/
 95 |   "ANN003", # https://docs.astral.sh/ruff/rules/missing-type-kwargs/
 96 |   "PLR0913", # Allow functions with many arguments
 97 |   "PLR0912",  # Allow many branches for functions
 98 | ]
 99 | 
100 | [tool.ruff.lint.per-file-ignores]
101 | "tests/*" = ["D", "S101", "ARG001", "ANN", "PT011", "FBT", "PLR2004"]
102 | "src/oxylabs_mcp/url_params.py" = ["E501"]
103 | 
104 | [tool.ruff.lint.pycodestyle]
105 | max-line-length = 100
106 | 
107 | [tool.ruff.lint.isort]
108 | known-first-party = ["src", "tests"]
109 | lines-after-imports = 2
110 | 
111 | [tool.pytest.ini_options]
112 | asyncio_default_fixture_loop_scope = "session"
113 | asyncio_mode = "auto"
114 | 
115 | [tool.black]
116 | line-length = 100
117 | 
```

--------------------------------------------------------------------------------
/tests/integration/test_server.py:
--------------------------------------------------------------------------------

```python
  1 | import json
  2 | import re
  3 | from unittest.mock import AsyncMock, MagicMock
  4 | 
  5 | import pytest
  6 | from fastmcp import FastMCP
  7 | from httpx import HTTPStatusError, Request, RequestError, Response
  8 | 
  9 | from oxylabs_mcp.config import settings
 10 | from tests.integration import params
 11 | 
 12 | 
 13 | @pytest.mark.asyncio
 14 | @pytest.mark.parametrize(
 15 |     ("tool", "arguments"),
 16 |     [
 17 |         pytest.param(
 18 |             "universal_scraper",
 19 |             {"url": "test_url"},
 20 |             id="universal_scraper",
 21 |         ),
 22 |         pytest.param(
 23 |             "google_search_scraper",
 24 |             {"query": "Generic query"},
 25 |             id="google_search_scraper",
 26 |         ),
 27 |         pytest.param(
 28 |             "amazon_search_scraper",
 29 |             {"query": "Generic query"},
 30 |             id="amazon_search_scraper",
 31 |         ),
 32 |         pytest.param(
 33 |             "amazon_product_scraper",
 34 |             {"query": "Generic query"},
 35 |             id="amazon_product_scraper",
 36 |         ),
 37 |     ],
 38 | )
 39 | async def test_default_headers_are_set(
 40 |     mcp: FastMCP,
 41 |     request_data: Request,
 42 |     oxylabs_client: AsyncMock,
 43 |     tool: str,
 44 |     arguments: dict,
 45 | ):
 46 |     mock_response = Response(
 47 |         200,
 48 |         content=json.dumps(params.STR_RESPONSE),
 49 |         request=request_data,
 50 |     )
 51 | 
 52 |     oxylabs_client.post.return_value = mock_response
 53 |     oxylabs_client.get.return_value = mock_response
 54 | 
 55 |     await mcp._call_tool(tool, arguments=arguments)
 56 | 
 57 |     assert "x-oxylabs-sdk" in oxylabs_client.context_manager_call_kwargs["headers"]
 58 | 
 59 |     oxylabs_sdk_header = oxylabs_client.context_manager_call_kwargs["headers"]["x-oxylabs-sdk"]
 60 |     client_info, _ = oxylabs_sdk_header.split(maxsplit=1)
 61 | 
 62 |     client_info_pattern = re.compile(r"oxylabs-mcp-fake_cursor/(\d+)\.(\d+)\.(\d+)$")
 63 |     assert re.match(client_info_pattern, client_info)
 64 | 
 65 | 
 66 | @pytest.mark.asyncio
 67 | @pytest.mark.parametrize(
 68 |     ("tool", "arguments"),
 69 |     [
 70 |         pytest.param(
 71 |             "universal_scraper",
 72 |             {"url": "test_url"},
 73 |             id="universal_scraper",
 74 |         ),
 75 |         pytest.param(
 76 |             "google_search_scraper",
 77 |             {"query": "Generic query"},
 78 |             id="google_search_scraper",
 79 |         ),
 80 |         pytest.param(
 81 |             "amazon_search_scraper",
 82 |             {"query": "Generic query"},
 83 |             id="amazon_search_scraper",
 84 |         ),
 85 |         pytest.param(
 86 |             "amazon_product_scraper",
 87 |             {"query": "Generic query"},
 88 |             id="amazon_product_scraper",
 89 |         ),
 90 |     ],
 91 | )
 92 | @pytest.mark.parametrize(
 93 |     ("exception", "expected_text"),
 94 |     [
 95 |         pytest.param(
 96 |             HTTPStatusError(
 97 |                 "HTTP status error",
 98 |                 request=MagicMock(),
 99 |                 response=MagicMock(status_code=500, text="Internal Server Error"),
100 |             ),
101 |             "HTTP error during POST request: 500 - Internal Server Error",
102 |             id="https_status_error",
103 |         ),
104 |         pytest.param(
105 |             RequestError("Request error"),
106 |             "Request error during POST request: Request error",
107 |             id="request_error",
108 |         ),
109 |         pytest.param(
110 |             Exception("Unexpected exception"),
111 |             "Error: Unexpected exception",
112 |             id="unhandled_exception",
113 |         ),
114 |     ],
115 | )
116 | async def test_request_client_error_handling(
117 |     mcp: FastMCP,
118 |     request_data: Request,
119 |     oxylabs_client: AsyncMock,
120 |     tool: str,
121 |     arguments: dict,
122 |     exception: Exception,
123 |     expected_text: str,
124 | ):
125 |     oxylabs_client.post.side_effect = [exception]
126 |     oxylabs_client.get.side_effect = [exception]
127 | 
128 |     result = await mcp._call_tool(tool, arguments=arguments)
129 | 
130 |     assert result.content[0].text == expected_text
131 | 
132 | 
133 | @pytest.mark.parametrize("transport", ["stdio", "streamable-http"])
134 | async def test_list_tools(mcp: FastMCP, transport: str):
135 |     settings.MCP_TRANSPORT = transport
136 |     tools = await mcp._mcp_list_tools()
137 |     assert len(tools) == 10
138 | 
```

--------------------------------------------------------------------------------
/src/oxylabs_mcp/url_params.py:
--------------------------------------------------------------------------------

```python
  1 | from typing import Annotated, Literal
  2 | 
  3 | from pydantic import Field
  4 | 
  5 | 
  6 | # Note: optional types (e.g `str | None`) break the introspection in the Cursor AI.
  7 | # See: https://github.com/getcursor/cursor/issues/2932
  8 | # Therefore, sentinel values (e.g. `""`, `0`) are used to represent a nullable parameter.
  9 | URL_PARAM = Annotated[str, Field(description="Website url to scrape.")]
 10 | PARSE_PARAM = Annotated[
 11 |     bool,
 12 |     Field(
 13 |         description="Should result be parsed. If the result is not parsed, the output_format parameter is applied.",
 14 |     ),
 15 | ]
 16 | RENDER_PARAM = Annotated[
 17 |     Literal["", "html"],
 18 |     Field(
 19 |         description="""
 20 |         Whether a headless browser should be used to render the page.
 21 |         For example:
 22 |             - 'html' when browser is required to render the page.
 23 |         """,
 24 |         examples=["", "html"],
 25 |     ),
 26 | ]
 27 | OUTPUT_FORMAT_PARAM = Annotated[
 28 |     Literal[
 29 |         "",
 30 |         "links",
 31 |         "md",
 32 |         "html",
 33 |     ],
 34 |     Field(
 35 |         description="""
 36 |         The format of the output. Works only when parse parameter is false.
 37 |             - links - Most efficient when the goal is navigation or finding specific URLs. Use this first when you need to locate a specific page within a website.
 38 |             - md - Best for extracting and reading visible content once you've found the right page. Use this to get structured content that's easy to read and process.
 39 |             - html - Should be used sparingly only when you need the raw HTML structure, JavaScript code, or styling information.
 40 |         """
 41 |     ),
 42 | ]
 43 | GOOGLE_QUERY_PARAM = Annotated[str, Field(description="URL-encoded keyword to search for.")]
 44 | AMAZON_SEARCH_QUERY_PARAM = Annotated[str, Field(description="Keyword to search for.")]
 45 | USER_AGENT_TYPE_PARAM = Annotated[
 46 |     Literal[
 47 |         "",
 48 |         "desktop",
 49 |         "desktop_chrome",
 50 |         "desktop_firefox",
 51 |         "desktop_safari",
 52 |         "desktop_edge",
 53 |         "desktop_opera",
 54 |         "mobile",
 55 |         "mobile_ios",
 56 |         "mobile_android",
 57 |         "tablet",
 58 |     ],
 59 |     Field(
 60 |         description="Device type and browser that will be used to "
 61 |         "determine User-Agent header value."
 62 |     ),
 63 | ]
 64 | START_PAGE_PARAM = Annotated[
 65 |     int,
 66 |     Field(description="Starting page number."),
 67 | ]
 68 | PAGES_PARAM = Annotated[
 69 |     int,
 70 |     Field(description="Number of pages to retrieve."),
 71 | ]
 72 | LIMIT_PARAM = Annotated[
 73 |     int,
 74 |     Field(description="Number of results to retrieve in each page."),
 75 | ]
 76 | DOMAIN_PARAM = Annotated[
 77 |     str,
 78 |     Field(
 79 |         description="""
 80 |         Domain localization for Google.
 81 |         Use country top level domains.
 82 |         For example:
 83 |             - 'co.uk' for United Kingdom
 84 |             - 'us' for United States
 85 |             - 'fr' for France
 86 |         """,
 87 |         examples=["uk", "us", "fr"],
 88 |     ),
 89 | ]
 90 | GEO_LOCATION_PARAM = Annotated[
 91 |     str,
 92 |     Field(
 93 |         description="""
 94 |         The geographical location that the result should be adapted for.
 95 |         Use ISO-3166 country codes.
 96 |         Examples:
 97 |             - 'California, United States'
 98 |             - 'Mexico'
 99 |             - 'US' for United States
100 |             - 'DE' for Germany
101 |             - 'FR' for France
102 |         """,
103 |         examples=["US", "DE", "FR"],
104 |     ),
105 | ]
106 | LOCALE_PARAM = Annotated[
107 |     str,
108 |     Field(
109 |         description="""
110 |         Set 'Accept-Language' header value which changes your Google search page web interface language.
111 |         Examples:
112 |             - 'en-US' for English, United States
113 |             - 'de-AT' for German, Austria
114 |             - 'fr-FR' for French, France
115 |         """,
116 |         examples=["en-US", "de-AT", "fr-FR"],
117 |     ),
118 | ]
119 | AD_MODE_PARAM = Annotated[
120 |     bool,
121 |     Field(
122 |         description="If true will use the Google Ads source optimized for the paid ads.",
123 |     ),
124 | ]
125 | CATEGORY_ID_CONTEXT_PARAM = Annotated[
126 |     str,
127 |     Field(
128 |         description="Search for items in a particular browse node (product category).",
129 |     ),
130 | ]
131 | MERCHANT_ID_CONTEXT_PARAM = Annotated[
132 |     str,
133 |     Field(
134 |         description="Search for items sold by a particular seller.",
135 |     ),
136 | ]
137 | CURRENCY_CONTEXT_PARAM = Annotated[
138 |     str,
139 |     Field(
140 |         description="Currency that will be used to display the prices.",
141 |         examples=["USD", "EUR", "AUD"],
142 |     ),
143 | ]
144 | AUTOSELECT_VARIANT_CONTEXT_PARAM = Annotated[
145 |     bool,
146 |     Field(
147 |         description="To get accurate pricing/buybox data, set this parameter to true.",
148 |     ),
149 | ]
150 | 
```

--------------------------------------------------------------------------------
/tests/e2e/test_llm_agent.py:
--------------------------------------------------------------------------------

```python
  1 | import json
  2 | import os
  3 | from contextlib import asynccontextmanager
  4 | 
  5 | import pytest
  6 | from agno.agent import Agent
  7 | from agno.models.google import Gemini
  8 | from agno.models.openai import OpenAIChat
  9 | from agno.tools.mcp import MCPTools
 10 | 
 11 | 
 12 | MCP_SERVER = "local"  # local, uvx
 13 | MODELS_CONFIG = [
 14 |     ("GOOGLE_API_KEY", "gemini"),
 15 |     # ("OPENAI_API_KEY", "openai"),
 16 | ]
 17 | 
 18 | 
 19 | def get_agent(model: str, oxylabs_mcp: MCPTools) -> Agent:
 20 |     if model == "gemini":
 21 |         model_ = Gemini(api_key=os.getenv("GOOGLE_API_KEY"))
 22 |     elif model == "openai":
 23 |         model_ = OpenAIChat(api_key=os.getenv("OPENAI_API_KEY"))
 24 |     else:
 25 |         raise ValueError(f"Unknown model: {model}")
 26 | 
 27 |     return Agent(
 28 |         model=model_,
 29 |         tools=[oxylabs_mcp],
 30 |         instructions=["Use MCP tools to fulfil the requests"],
 31 |         markdown=True,
 32 |     )
 33 | 
 34 | 
 35 | def get_models() -> list[str]:
 36 |     models = []
 37 | 
 38 |     for env_var, model_name in MODELS_CONFIG:
 39 |         if os.getenv(env_var):
 40 |             models.append(model_name)
 41 | 
 42 |     return models
 43 | 
 44 | 
 45 | @asynccontextmanager
 46 | async def oxylabs_mcp_server():
 47 |     if MCP_SERVER == "local":
 48 |         command = f"uv run --directory {os.getenv('LOCAL_OXYLABS_MCP_DIRECTORY')} oxylabs-mcp"
 49 |     elif MCP_SERVER == "uvx":
 50 |         command = "uvx oxylabs-mcp"
 51 |     else:
 52 |         raise ValueError(f"Unknown mcp server option: {MCP_SERVER}")
 53 | 
 54 |     async with MCPTools(
 55 |         command,
 56 |         env={
 57 |             "OXYLABS_USERNAME": os.getenv("OXYLABS_USERNAME"),
 58 |             "OXYLABS_PASSWORD": os.getenv("OXYLABS_PASSWORD"),
 59 |         },
 60 |     ) as mcp_server:
 61 |         yield mcp_server
 62 | 
 63 | 
 64 | @pytest.mark.skipif(not os.getenv("OXYLABS_USERNAME"), reason="`OXYLABS_USERNAME` is not set")
 65 | @pytest.mark.skipif(not os.getenv("OXYLABS_PASSWORD"), reason="`OXYLABS_PASSWORD` is not set")
 66 | @pytest.mark.asyncio
 67 | @pytest.mark.parametrize("model", get_models())
 68 | @pytest.mark.parametrize(
 69 |     ("query", "tool", "arguments", "expected_content"),
 70 |     [
 71 |         (
 72 |             "Search for iPhone 16 in google with parsed result",
 73 |             "google_search_scraper",
 74 |             {
 75 |                 "query": "iPhone 16",
 76 |                 "parse": True,
 77 |             },
 78 |             "iPhone 16",
 79 |         ),
 80 |         (
 81 |             "Search for iPhone 16 in google with render html",
 82 |             "google_search_scraper",
 83 |             {
 84 |                 "query": "iPhone 16",
 85 |                 "render": "html",
 86 |             },
 87 |             "iPhone 16",
 88 |         ),
 89 |         (
 90 |             "Search for iPhone 16 in google with browser rendering",
 91 |             "google_search_scraper",
 92 |             {
 93 |                 "query": "iPhone 16",
 94 |                 "render": "html",
 95 |             },
 96 |             "iPhone 16",
 97 |         ),
 98 |         (
 99 |             "Search for iPhone 16 in google with user agent type mobile",
100 |             "google_search_scraper",
101 |             {
102 |                 "query": "iPhone 16",
103 |                 "user_agent_type": "mobile",
104 |             },
105 |             "iPhone 16",
106 |         ),
107 |         (
108 |             "Search for iPhone 16 in google starting from the second page",
109 |             "google_search_scraper",
110 |             {
111 |                 "query": "iPhone 16",
112 |                 "start_page": 2,
113 |             },
114 |             "iPhone 16",
115 |         ),
116 |         (
117 |             "Search for iPhone 16 in google with United Kingdom domain",
118 |             "google_search_scraper",
119 |             {
120 |                 "query": "iPhone 16",
121 |                 "domain": "co.uk",
122 |             },
123 |             "iPhone 16",
124 |         ),
125 |         (
126 |             "Search for iPhone 16 in google with Brazil geolocation",
127 |             "google_search_scraper",
128 |             {
129 |                 "query": "iPhone 16",
130 |                 "geo_location": "BR",
131 |             },
132 |             "iPhone 16",
133 |         ),
134 |         (
135 |             "Search for iPhone 16 in google with French locale",
136 |             "google_search_scraper",
137 |             {
138 |                 "query": "iPhone 16",
139 |                 "locale": "fr-FR",
140 |             },
141 |             "iPhone 16",
142 |         ),
143 |     ],
144 | )
145 | async def test_basic_agent_prompts(
146 |     model: str,
147 |     query: str,
148 |     tool: str,
149 |     arguments: dict,
150 |     expected_content: str,
151 | ):
152 |     async with oxylabs_mcp_server() as mcp_server:
153 |         agent = get_agent(model, mcp_server)
154 |         response = await agent.arun(query)
155 | 
156 |     tool_calls = agent.memory.get_tool_calls(agent.session_id)
157 | 
158 |     # [tool_call, tool_call_result]
159 |     assert len(tool_calls) == 2, "Extra tool calls found!"
160 | 
161 |     assert tool_calls[0]["function"]["name"] == tool
162 |     assert json.loads(tool_calls[0]["function"]["arguments"]) == arguments
163 | 
164 |     assert expected_content in response.content
165 | 
166 | 
167 | @pytest.mark.asyncio
168 | @pytest.mark.parametrize("model", get_models())
169 | async def test_complex_agent_prompt(model: str):
170 |     async with oxylabs_mcp_server() as mcp_server:
171 |         agent = get_agent(model, mcp_server)
172 | 
173 |         await agent.arun(
174 |             "Go to oxylabs.io, look for career page, "
175 |             "go to it and return all job titles in markdown format. "
176 |             "Don't invent URLs, start from one provided."
177 |         )
178 | 
179 |     tool_calls = agent.memory.get_tool_calls(agent.session_id)
180 |     assert len(tool_calls) == 4, f"Not enough tool_calls, got {len(tool_calls)}: {tool_calls}"
181 | 
182 |     oxylabs_page_call, _, careers_page_call, _ = agent.memory.get_tool_calls(agent.session_id)
183 |     assert oxylabs_page_call["function"]["name"] == "universal_scraper"
184 |     assert json.loads(oxylabs_page_call["function"]["arguments"]) == {
185 |         "output_format": "links",
186 |         "url": "https://oxylabs.io",
187 |     }
188 |     assert careers_page_call["function"]["name"] == "universal_scraper"
189 |     assert json.loads(careers_page_call["function"]["arguments"]) == {
190 |         "output_format": "md",
191 |         "url": "https://career.oxylabs.io/",
192 |     }
193 | 
```

--------------------------------------------------------------------------------
/tests/integration/test_scraper_tools.py:
--------------------------------------------------------------------------------

```python
  1 | import json
  2 | from typing import Any
  3 | from unittest.mock import AsyncMock, patch
  4 | 
  5 | import pytest
  6 | from fastmcp import FastMCP
  7 | from httpx import Request, Response
  8 | from mcp.types import TextContent
  9 | 
 10 | from tests.integration import params
 11 | from tests.utils import convert_context_params, prepare_expected_arguments
 12 | 
 13 | 
 14 | @pytest.mark.parametrize(
 15 |     ("arguments", "expectation", "response_data", "expected_result"),
 16 |     [
 17 |         params.URL_ONLY,
 18 |         params.NO_URL,
 19 |         params.RENDER_HTML_WITH_URL,
 20 |         params.RENDER_INVALID_WITH_URL,
 21 |         *params.USER_AGENTS_WITH_URL,
 22 |         params.GEO_LOCATION_SPECIFIED_WITH_URL,
 23 |     ],
 24 | )
 25 | @pytest.mark.asyncio
 26 | async def test_oxylabs_scraper_arguments(
 27 |     mcp: FastMCP,
 28 |     request_data: Request,
 29 |     response_data: str,
 30 |     arguments: dict,
 31 |     expectation,
 32 |     expected_result: str,
 33 |     oxylabs_client: AsyncMock,
 34 | ):
 35 |     mock_response = Response(200, content=json.dumps(response_data), request=request_data)
 36 |     oxylabs_client.post.return_value = mock_response
 37 | 
 38 |     with (
 39 |         expectation,
 40 |         patch("httpx.AsyncClient.post", new=AsyncMock(return_value=mock_response)),
 41 |     ):
 42 |         result = await mcp._call_tool("universal_scraper", arguments=arguments)
 43 | 
 44 |         assert oxylabs_client.post.call_args.kwargs == {
 45 |             "json": convert_context_params(prepare_expected_arguments(arguments)),
 46 |         }
 47 |         assert result.content == [TextContent(type="text", text=expected_result)]
 48 | 
 49 | 
 50 | @pytest.mark.parametrize(
 51 |     ("arguments", "expectation", "response_data", "expected_result"),
 52 |     [
 53 |         params.QUERY_ONLY,
 54 |         params.PARSE_ENABLED,
 55 |         params.RENDER_HTML_WITH_QUERY,
 56 |         *params.USER_AGENTS_WITH_QUERY,
 57 |         *params.OUTPUT_FORMATS,
 58 |         params.INVALID_USER_AGENT,
 59 |         params.START_PAGE_SPECIFIED,
 60 |         params.PAGES_SPECIFIED,
 61 |         params.LIMIT_SPECIFIED,
 62 |         params.DOMAIN_SPECIFIED,
 63 |         params.GEO_LOCATION_SPECIFIED_WITH_QUERY,
 64 |         params.LOCALE_SPECIFIED,
 65 |     ],
 66 | )
 67 | @pytest.mark.asyncio
 68 | async def test_google_search_scraper_arguments(
 69 |     mcp: FastMCP,
 70 |     request_data: Request,
 71 |     response_data: str,
 72 |     arguments: dict,
 73 |     expectation,
 74 |     expected_result: str,
 75 |     oxylabs_client: AsyncMock,
 76 | ):
 77 |     mock_response = Response(200, content=json.dumps(response_data), request=request_data)
 78 |     oxylabs_client.post.return_value = mock_response
 79 | 
 80 |     with expectation:
 81 |         result = await mcp._call_tool("google_search_scraper", arguments=arguments)
 82 | 
 83 |         assert oxylabs_client.post.call_args.kwargs == {
 84 |             "json": {
 85 |                 "source": "google_search",
 86 |                 "parse": True,
 87 |                 **prepare_expected_arguments(arguments),
 88 |             }
 89 |         }
 90 |         assert result.content == [TextContent(type="text", text=expected_result)]
 91 | 
 92 | 
 93 | @pytest.mark.parametrize(
 94 |     ("ad_mode", "expected_result"),
 95 |     [
 96 |         (False, {"parse": True, "query": "Iphone 16", "source": "google_search"}),
 97 |         (True, {"parse": True, "query": "Iphone 16", "source": "google_ads"}),
 98 |     ],
 99 | )
100 | @pytest.mark.asyncio
101 | async def test_oxylabs_google_search_ad_mode_argument(
102 |     mcp: FastMCP,
103 |     request_data: Request,
104 |     ad_mode: bool,
105 |     expected_result: dict[str, Any],
106 |     oxylabs_client: AsyncMock,
107 | ):
108 |     arguments = {"query": "Iphone 16", "ad_mode": ad_mode}
109 |     mock_response = Response(200, content=json.dumps('{"data": "value"}'), request=request_data)
110 |     oxylabs_client.post.return_value = mock_response
111 | 
112 |     await mcp._call_tool("google_search_scraper", arguments=arguments)
113 |     assert oxylabs_client.post.call_args.kwargs == {"json": expected_result}
114 |     assert oxylabs_client.post.await_args.kwargs["json"] == expected_result
115 | 
116 | 
117 | @pytest.mark.parametrize(
118 |     ("arguments", "expectation", "response_data", "expected_result"),
119 |     [
120 |         params.QUERY_ONLY,
121 |         params.PARSE_ENABLED,
122 |         params.RENDER_HTML_WITH_QUERY,
123 |         *params.USER_AGENTS_WITH_QUERY,
124 |         *params.OUTPUT_FORMATS,
125 |         params.INVALID_USER_AGENT,
126 |         params.START_PAGE_SPECIFIED,
127 |         params.PAGES_SPECIFIED,
128 |         params.DOMAIN_SPECIFIED,
129 |         params.GEO_LOCATION_SPECIFIED_WITH_QUERY,
130 |         params.LOCALE_SPECIFIED,
131 |         params.CATEGORY_SPECIFIED,
132 |         params.MERCHANT_ID_SPECIFIED,
133 |         params.CURRENCY_SPECIFIED,
134 |     ],
135 | )
136 | @pytest.mark.asyncio
137 | async def test_amazon_search_scraper_arguments(
138 |     mcp: FastMCP,
139 |     request_data: Request,
140 |     response_data: str,
141 |     arguments: dict,
142 |     expectation,
143 |     expected_result: str,
144 |     oxylabs_client: AsyncMock,
145 |     request_context,
146 | ):
147 |     mock_response = Response(200, content=json.dumps(response_data), request=request_data)
148 |     oxylabs_client.post.return_value = mock_response
149 | 
150 |     with expectation:
151 |         result = await mcp._call_tool("amazon_search_scraper", arguments=arguments)
152 | 
153 |         assert oxylabs_client.post.call_args.kwargs == {
154 |             "json": {
155 |                 "source": "amazon_search",
156 |                 "parse": True,
157 |                 **convert_context_params(prepare_expected_arguments(arguments)),
158 |             }
159 |         }
160 |         assert result.content == [TextContent(type="text", text=expected_result)]
161 | 
162 | 
163 | @pytest.mark.parametrize(
164 |     ("arguments", "expectation", "response_data", "expected_result"),
165 |     [
166 |         params.QUERY_ONLY,
167 |         params.PARSE_ENABLED,
168 |         params.RENDER_HTML_WITH_QUERY,
169 |         *params.USER_AGENTS_WITH_QUERY,
170 |         *params.OUTPUT_FORMATS,
171 |         params.INVALID_USER_AGENT,
172 |         params.DOMAIN_SPECIFIED,
173 |         params.GEO_LOCATION_SPECIFIED_WITH_QUERY,
174 |         params.LOCALE_SPECIFIED,
175 |         params.CURRENCY_SPECIFIED,
176 |         params.AUTOSELECT_VARIANT_ENABLED,
177 |     ],
178 | )
179 | @pytest.mark.asyncio
180 | async def test_amazon_product_scraper_arguments(
181 |     mcp: FastMCP,
182 |     request_data: Request,
183 |     response_data: str,
184 |     arguments: dict,
185 |     expectation,
186 |     expected_result: str,
187 |     oxylabs_client: AsyncMock,
188 | ):
189 |     mock_response = Response(200, content=json.dumps(response_data), request=request_data)
190 |     oxylabs_client.post.return_value = mock_response
191 | 
192 |     with expectation:
193 |         result = await mcp._call_tool("amazon_product_scraper", arguments=arguments)
194 | 
195 |         assert oxylabs_client.post.call_args.kwargs == {
196 |             "json": {
197 |                 "source": "amazon_product",
198 |                 "parse": True,
199 |                 **convert_context_params(prepare_expected_arguments(arguments)),
200 |             }
201 |         }
202 |         assert result.content == [TextContent(type="text", text=expected_result)]
203 | 
```

--------------------------------------------------------------------------------
/src/oxylabs_mcp/tools/scraper.py:
--------------------------------------------------------------------------------

```python
  1 | from typing import Any
  2 | 
  3 | from fastmcp import FastMCP
  4 | from mcp.types import ToolAnnotations
  5 | 
  6 | from oxylabs_mcp import url_params
  7 | from oxylabs_mcp.exceptions import MCPServerError
  8 | from oxylabs_mcp.utils import (
  9 |     get_content,
 10 |     oxylabs_client,
 11 | )
 12 | 
 13 | 
 14 | SCRAPER_TOOLS = [
 15 |     "universal_scraper",
 16 |     "google_search_scraper",
 17 |     "amazon_search_scraper",
 18 |     "amazon_product_scraper",
 19 | ]
 20 | 
 21 | mcp = FastMCP("scraper")
 22 | 
 23 | 
 24 | @mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
 25 | async def universal_scraper(
 26 |     url: url_params.URL_PARAM,
 27 |     render: url_params.RENDER_PARAM = "",
 28 |     user_agent_type: url_params.USER_AGENT_TYPE_PARAM = "",
 29 |     geo_location: url_params.GEO_LOCATION_PARAM = "",
 30 |     output_format: url_params.OUTPUT_FORMAT_PARAM = "",
 31 | ) -> str:
 32 |     """Get a content of any webpage.
 33 | 
 34 |     Supports browser rendering, parsing of certain webpages
 35 |     and different output formats.
 36 |     """
 37 |     try:
 38 |         async with oxylabs_client() as client:
 39 |             payload: dict[str, Any] = {"url": url}
 40 | 
 41 |             if render:
 42 |                 payload["render"] = render
 43 |             if user_agent_type:
 44 |                 payload["user_agent_type"] = user_agent_type
 45 |             if geo_location:
 46 |                 payload["geo_location"] = geo_location
 47 | 
 48 |             response_json = await client.scrape(payload)
 49 | 
 50 |             return get_content(response_json, output_format=output_format)
 51 |     except MCPServerError as e:
 52 |         return await e.process()
 53 | 
 54 | 
 55 | @mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
 56 | async def google_search_scraper(
 57 |     query: url_params.GOOGLE_QUERY_PARAM,
 58 |     parse: url_params.PARSE_PARAM = True,  # noqa: FBT002
 59 |     render: url_params.RENDER_PARAM = "",
 60 |     user_agent_type: url_params.USER_AGENT_TYPE_PARAM = "",
 61 |     start_page: url_params.START_PAGE_PARAM = 0,
 62 |     pages: url_params.PAGES_PARAM = 0,
 63 |     limit: url_params.LIMIT_PARAM = 0,
 64 |     domain: url_params.DOMAIN_PARAM = "",
 65 |     geo_location: url_params.GEO_LOCATION_PARAM = "",
 66 |     locale: url_params.LOCALE_PARAM = "",
 67 |     ad_mode: url_params.AD_MODE_PARAM = False,  # noqa: FBT002
 68 |     output_format: url_params.OUTPUT_FORMAT_PARAM = "",
 69 | ) -> str:
 70 |     """Scrape Google Search results.
 71 | 
 72 |     Supports content parsing, different user agent types, pagination,
 73 |     domain, geolocation, locale parameters and different output formats.
 74 |     """
 75 |     try:
 76 |         async with oxylabs_client() as client:
 77 |             payload: dict[str, Any] = {"query": query}
 78 | 
 79 |             if ad_mode:
 80 |                 payload["source"] = "google_ads"
 81 |             else:
 82 |                 payload["source"] = "google_search"
 83 | 
 84 |             if parse:
 85 |                 payload["parse"] = parse
 86 |             if render:
 87 |                 payload["render"] = render
 88 |             if user_agent_type:
 89 |                 payload["user_agent_type"] = user_agent_type
 90 |             if start_page:
 91 |                 payload["start_page"] = start_page
 92 |             if pages:
 93 |                 payload["pages"] = pages
 94 |             if limit:
 95 |                 payload["limit"] = limit
 96 |             if domain:
 97 |                 payload["domain"] = domain
 98 |             if geo_location:
 99 |                 payload["geo_location"] = geo_location
100 |             if locale:
101 |                 payload["locale"] = locale
102 | 
103 |             response_json = await client.scrape(payload)
104 | 
105 |             return get_content(response_json, parse=parse, output_format=output_format)
106 |     except MCPServerError as e:
107 |         return await e.process()
108 | 
109 | 
110 | @mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
111 | async def amazon_search_scraper(
112 |     query: url_params.AMAZON_SEARCH_QUERY_PARAM,
113 |     category_id: url_params.CATEGORY_ID_CONTEXT_PARAM = "",
114 |     merchant_id: url_params.MERCHANT_ID_CONTEXT_PARAM = "",
115 |     currency: url_params.CURRENCY_CONTEXT_PARAM = "",
116 |     parse: url_params.PARSE_PARAM = True,  # noqa: FBT002
117 |     render: url_params.RENDER_PARAM = "",
118 |     user_agent_type: url_params.USER_AGENT_TYPE_PARAM = "",
119 |     start_page: url_params.START_PAGE_PARAM = 0,
120 |     pages: url_params.PAGES_PARAM = 0,
121 |     domain: url_params.DOMAIN_PARAM = "",
122 |     geo_location: url_params.GEO_LOCATION_PARAM = "",
123 |     locale: url_params.LOCALE_PARAM = "",
124 |     output_format: url_params.OUTPUT_FORMAT_PARAM = "",
125 | ) -> str:
126 |     """Scrape Amazon search results.
127 | 
128 |     Supports content parsing, different user agent types, pagination,
129 |     domain, geolocation, locale parameters and different output formats.
130 |     Supports Amazon specific parameters such as category id, merchant id, currency.
131 |     """
132 |     try:
133 |         async with oxylabs_client() as client:
134 |             payload: dict[str, Any] = {"source": "amazon_search", "query": query}
135 | 
136 |             context = []
137 |             if category_id:
138 |                 context.append({"key": "category_id", "value": category_id})
139 |             if merchant_id:
140 |                 context.append({"key": "merchant_id", "value": merchant_id})
141 |             if currency:
142 |                 context.append({"key": "currency", "value": currency})
143 |             if context:
144 |                 payload["context"] = context
145 | 
146 |             if parse:
147 |                 payload["parse"] = parse
148 |             if render:
149 |                 payload["render"] = render
150 |             if user_agent_type:
151 |                 payload["user_agent_type"] = user_agent_type
152 |             if start_page:
153 |                 payload["start_page"] = start_page
154 |             if pages:
155 |                 payload["pages"] = pages
156 |             if domain:
157 |                 payload["domain"] = domain
158 |             if geo_location:
159 |                 payload["geo_location"] = geo_location
160 |             if locale:
161 |                 payload["locale"] = locale
162 | 
163 |             response_json = await client.scrape(payload)
164 | 
165 |             return get_content(response_json, parse=parse, output_format=output_format)
166 |     except MCPServerError as e:
167 |         return await e.process()
168 | 
169 | 
170 | @mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
171 | async def amazon_product_scraper(
172 |     query: url_params.AMAZON_SEARCH_QUERY_PARAM,
173 |     autoselect_variant: url_params.AUTOSELECT_VARIANT_CONTEXT_PARAM = False,  # noqa: FBT002
174 |     currency: url_params.CURRENCY_CONTEXT_PARAM = "",
175 |     parse: url_params.PARSE_PARAM = True,  # noqa: FBT002
176 |     render: url_params.RENDER_PARAM = "",
177 |     user_agent_type: url_params.USER_AGENT_TYPE_PARAM = "",
178 |     domain: url_params.DOMAIN_PARAM = "",
179 |     geo_location: url_params.GEO_LOCATION_PARAM = "",
180 |     locale: url_params.LOCALE_PARAM = "",
181 |     output_format: url_params.OUTPUT_FORMAT_PARAM = "",
182 | ) -> str:
183 |     """Scrape Amazon products.
184 | 
185 |     Supports content parsing, different user agent types, domain,
186 |     geolocation, locale parameters and different output formats.
187 |     Supports Amazon specific parameters such as currency and getting
188 |     more accurate pricing data with auto select variant.
189 |     """
190 |     try:
191 |         async with oxylabs_client() as client:
192 |             payload: dict[str, Any] = {"source": "amazon_product", "query": query}
193 | 
194 |             context = []
195 |             if autoselect_variant:
196 |                 context.append({"key": "autoselect_variant", "value": autoselect_variant})
197 |             if currency:
198 |                 context.append({"key": "currency", "value": currency})
199 |             if context:
200 |                 payload["context"] = context
201 | 
202 |             if parse:
203 |                 payload["parse"] = parse
204 |             if render:
205 |                 payload["render"] = render
206 |             if user_agent_type:
207 |                 payload["user_agent_type"] = user_agent_type
208 |             if domain:
209 |                 payload["domain"] = domain
210 |             if geo_location:
211 |                 payload["geo_location"] = geo_location
212 |             if locale:
213 |                 payload["locale"] = locale
214 | 
215 |             response_json = await client.scrape(payload)
216 | 
217 |             return get_content(response_json, parse=parse, output_format=output_format)
218 |     except MCPServerError as e:
219 |         return await e.process()
220 | 
```

--------------------------------------------------------------------------------
/tests/integration/test_ai_studio_tools.py:
--------------------------------------------------------------------------------

```python
  1 | import json
  2 | from unittest.mock import AsyncMock, MagicMock
  3 | 
  4 | import pytest
  5 | from fastmcp import FastMCP
  6 | from httpx import Request
  7 | from mcp.types import TextContent
  8 | from oxylabs_ai_studio.apps.ai_search import AiSearchJob, SearchResult
  9 | 
 10 | from tests.integration import params
 11 | from tests.integration.params import SimpleSchema
 12 | 
 13 | 
 14 | @pytest.mark.parametrize(
 15 |     ("arguments", "expectation", "response_data", "expected_result"),
 16 |     [
 17 |         params.AI_STUDIO_URL_ONLY,
 18 |         params.AI_STUDIO_URL_AND_OUTPUT_FORMAT,
 19 |         params.AI_STUDIO_URL_AND_SCHEMA,
 20 |         params.AI_STUDIO_URL_AND_RENDER_JAVASCRIPT,
 21 |         params.AI_STUDIO_URL_AND_RETURN_SOURCES_LIMIT,
 22 |         params.AI_STUDIO_URL_AND_GEO_LOCATION,
 23 |     ],
 24 | )
 25 | @pytest.mark.asyncio
 26 | async def test_ai_crawler(
 27 |     mcp: FastMCP,
 28 |     request_data: Request,
 29 |     response_data: str,
 30 |     arguments: dict,
 31 |     expectation,
 32 |     expected_result: str,
 33 |     oxylabs_client: AsyncMock,
 34 |     ai_crawler: AsyncMock,
 35 | ):
 36 |     mock_result = MagicMock()
 37 |     mock_result.data = expected_result
 38 |     ai_crawler.crawl_async = AsyncMock(return_value=mock_result)
 39 | 
 40 |     arguments = {"user_prompt": "Scrape price and title", **arguments}
 41 | 
 42 |     with expectation:
 43 |         result = await mcp._call_tool("ai_crawler", arguments=arguments)
 44 | 
 45 |         assert result.content == [
 46 |             TextContent(type="text", text=json.dumps({"data": expected_result}))
 47 |         ]
 48 | 
 49 |         default_args = {
 50 |             "geo_location": None,
 51 |             "output_format": "markdown",
 52 |             "render_javascript": False,
 53 |             "return_sources_limit": 25,
 54 |             "schema": None,
 55 |         }
 56 |         default_args = {k: v for k, v in default_args.items() if k not in arguments}
 57 | 
 58 |         ai_crawler.crawl_async.assert_called_once_with(**default_args, **arguments)
 59 | 
 60 | 
 61 | @pytest.mark.parametrize(
 62 |     ("arguments", "expectation", "response_data", "expected_result"),
 63 |     [
 64 |         params.AI_STUDIO_URL_ONLY,
 65 |         params.AI_STUDIO_URL_AND_OUTPUT_FORMAT,
 66 |         params.AI_STUDIO_URL_AND_SCHEMA,
 67 |         params.AI_STUDIO_URL_AND_RENDER_JAVASCRIPT,
 68 |         params.AI_STUDIO_URL_AND_GEO_LOCATION,
 69 |     ],
 70 | )
 71 | @pytest.mark.asyncio
 72 | async def test_ai_scraper(
 73 |     mcp: FastMCP,
 74 |     request_data: Request,
 75 |     response_data: str,
 76 |     arguments: dict,
 77 |     expectation,
 78 |     expected_result: str,
 79 |     oxylabs_client: AsyncMock,
 80 |     ai_scraper: AsyncMock,
 81 | ):
 82 |     mock_result = MagicMock()
 83 |     mock_result.data = expected_result
 84 |     ai_scraper.scrape_async = AsyncMock(return_value=mock_result)
 85 | 
 86 |     arguments = {**arguments}
 87 | 
 88 |     with expectation:
 89 |         result = await mcp._call_tool("ai_scraper", arguments=arguments)
 90 | 
 91 |         assert result.content == [
 92 |             TextContent(type="text", text=json.dumps({"data": expected_result}))
 93 |         ]
 94 | 
 95 |         default_args = {
 96 |             "geo_location": None,
 97 |             "output_format": "markdown",
 98 |             "render_javascript": False,
 99 |             "schema": None,
100 |         }
101 |         default_args = {k: v for k, v in default_args.items() if k not in arguments}
102 | 
103 |         ai_scraper.scrape_async.assert_called_once_with(**default_args, **arguments)
104 | 
105 | 
106 | @pytest.mark.parametrize(
107 |     ("arguments", "expectation", "response_data", "expected_result"),
108 |     [
109 |         params.AI_STUDIO_URL_ONLY,
110 |         params.AI_STUDIO_URL_AND_OUTPUT_FORMAT,
111 |         params.AI_STUDIO_URL_AND_SCHEMA,
112 |         params.AI_STUDIO_URL_AND_GEO_LOCATION,
113 |     ],
114 | )
115 | @pytest.mark.asyncio
116 | async def test_ai_browser_agent(
117 |     mcp: FastMCP,
118 |     request_data: Request,
119 |     response_data: str,
120 |     arguments: dict,
121 |     expectation,
122 |     expected_result: str,
123 |     oxylabs_client: AsyncMock,
124 |     browser_agent: AsyncMock,
125 | ):
126 |     mock_result = MagicMock()
127 |     mock_data = SimpleSchema(title="Title", price=0.0)
128 |     mock_result.data = mock_data
129 |     browser_agent.run_async = AsyncMock(return_value=mock_result)
130 | 
131 |     arguments = {"task_prompt": "Scrape price and title", **arguments}
132 | 
133 |     with expectation:
134 |         result = await mcp._call_tool("ai_browser_agent", arguments=arguments)
135 | 
136 |         assert result.content == [
137 |             TextContent(type="text", text=json.dumps({"data": mock_data.model_dump()}))
138 |         ]
139 | 
140 |         default_args = {
141 |             "geo_location": None,
142 |             "output_format": "markdown",
143 |             "schema": None,
144 |             "user_prompt": arguments["task_prompt"],
145 |         }
146 |         del arguments["task_prompt"]
147 |         default_args = {k: v for k, v in default_args.items() if k not in arguments}
148 | 
149 |         browser_agent.run_async.assert_called_once_with(**default_args, **arguments)
150 | 
151 | 
152 | @pytest.mark.parametrize(
153 |     ("arguments", "expectation", "response_data", "expected_result"),
154 |     [
155 |         params.AI_STUDIO_QUERY_ONLY,
156 |         params.AI_STUDIO_URL_AND_RENDER_JAVASCRIPT,
157 |         params.AI_STUDIO_URL_AND_GEO_LOCATION,
158 |         params.AI_STUDIO_URL_AND_LIMIT,
159 |         params.AI_STUDIO_QUERY_AND_RETURN_CONTENT,
160 |     ],
161 | )
162 | @pytest.mark.asyncio
163 | async def test_ai_search(
164 |     mcp: FastMCP,
165 |     request_data: Request,
166 |     response_data: str,
167 |     arguments: dict,
168 |     expectation,
169 |     expected_result: str,
170 |     oxylabs_client: AsyncMock,
171 |     ai_search: AsyncMock,
172 | ):
173 |     mock_result = AiSearchJob(
174 |         run_id="123",
175 |         data=[SearchResult(url="url", title="title", description="description", content=None)],
176 |     )
177 |     ai_search.search_async = AsyncMock(return_value=mock_result)
178 | 
179 |     arguments = {**arguments}
180 |     if "url" in arguments:
181 |         del arguments["url"]
182 |         arguments["query"] = "Sample query"
183 | 
184 |     with expectation:
185 |         result = await mcp._call_tool("ai_search", arguments=arguments)
186 | 
187 |         assert result.content == [
188 |             TextContent(type="text", text=json.dumps({"data": [mock_result.data[0].model_dump()]}))
189 |         ]
190 | 
191 |         default_args = {
192 |             "limit": 10,
193 |             "render_javascript": False,
194 |             "return_content": False,
195 |             "geo_location": None,
196 |         }
197 |         default_args = {k: v for k, v in default_args.items() if k not in arguments}
198 | 
199 |         ai_search.search_async.assert_called_once_with(**default_args, **arguments)
200 | 
201 | 
202 | @pytest.mark.parametrize(
203 |     ("arguments", "expectation", "response_data", "expected_result"),
204 |     [
205 |         params.AI_STUDIO_USER_PROMPT,
206 |     ],
207 | )
208 | @pytest.mark.parametrize(
209 |     "app_name",
210 |     ["ai_crawler", "ai_scraper", "browser_agent"],
211 | )
212 | @pytest.mark.asyncio
213 | async def test_generate_schema(
214 |     mcp: FastMCP,
215 |     request_data: Request,
216 |     response_data: str,
217 |     arguments: dict,
218 |     expectation,
219 |     expected_result: str,
220 |     oxylabs_client: AsyncMock,
221 |     app_name: str,
222 |     ai_crawler: AsyncMock,
223 |     ai_scraper: AsyncMock,
224 |     browser_agent: AsyncMock,
225 |     mock_schema: dict,
226 | ):
227 |     arguments = {"app_name": app_name, **arguments}
228 | 
229 |     with expectation:
230 |         result = await mcp._call_tool("generate_schema", arguments=arguments)
231 | 
232 |         assert result.content == [TextContent(type="text", text=json.dumps({"data": mock_schema}))]
233 | 
234 |         locals()[app_name].generate_schema.assert_called_once_with(prompt=arguments["user_prompt"])
235 | 
236 | 
237 | @pytest.mark.parametrize(
238 |     ("arguments", "expectation", "response_data", "expected_result"),
239 |     [
240 |         params.AI_STUDIO_URL_ONLY,
241 |         params.AI_STUDIO_URL_AND_RENDER_JAVASCRIPT,
242 |         params.AI_STUDIO_URL_AND_RETURN_SOURCES_LIMIT,
243 |         params.AI_STUDIO_URL_AND_GEO_LOCATION,
244 |     ],
245 | )
246 | @pytest.mark.asyncio
247 | async def test_ai_map(
248 |     mcp: FastMCP,
249 |     request_data: Request,
250 |     response_data: str,
251 |     arguments: dict,
252 |     expectation,
253 |     expected_result: str,
254 |     oxylabs_client: AsyncMock,
255 |     ai_map: AsyncMock,
256 | ):
257 |     mock_result = MagicMock()
258 |     mock_result.data = expected_result
259 |     ai_map.map_async = AsyncMock(return_value=mock_result)
260 | 
261 |     arguments = {"user_prompt": "Scrape price and title", **arguments}
262 | 
263 |     with expectation:
264 |         result = await mcp._call_tool("ai_map", arguments=arguments)
265 | 
266 |         assert result.content == [
267 |             TextContent(type="text", text=json.dumps({"data": expected_result}))
268 |         ]
269 | 
270 |         default_args = {
271 |             "geo_location": None,
272 |             "render_javascript": False,
273 |             "return_sources_limit": 25,
274 |         }
275 |         default_args = {k: v for k, v in default_args.items() if k not in arguments}
276 | 
277 |         ai_map.map_async.assert_called_once_with(**default_args, **arguments)
278 | 
```

--------------------------------------------------------------------------------
/tests/integration/params.py:
--------------------------------------------------------------------------------

```python
  1 | from contextlib import nullcontext as does_not_raise
  2 | 
  3 | import pytest
  4 | from fastmcp.exceptions import ToolError
  5 | from pydantic import BaseModel
  6 | 
  7 | 
  8 | class SimpleSchema(BaseModel):
  9 |     title: str
 10 |     price: float
 11 | 
 12 | 
 13 | JOB_RESPONSE = {"id": "7333092420940211201", "status": "done"}
 14 | STR_RESPONSE = {
 15 |     "results": [{"content": "Mocked content"}],
 16 |     "job": JOB_RESPONSE,
 17 | }
 18 | JSON_RESPONSE = {
 19 |     "results": [{"content": {"data": "value"}}],
 20 |     "job": JOB_RESPONSE,
 21 | }
 22 | AI_STUDIO_JSON_RESPONSE = {
 23 |     "results": [{"content": {"data": "value"}}],
 24 |     "job": JOB_RESPONSE,
 25 | }
 26 | 
 27 | QUERY_ONLY = pytest.param(
 28 |     {"query": "Generic query"},
 29 |     does_not_raise(),
 30 |     STR_RESPONSE,
 31 |     "\n\nMocked content\n\n",
 32 |     id="query-only-args",
 33 | )
 34 | PARSE_ENABLED = pytest.param(
 35 |     {"query": "Generic query", "parse": True},
 36 |     does_not_raise(),
 37 |     JSON_RESPONSE,
 38 |     '{"data": "value"}',
 39 |     id="parse-enabled-args",
 40 | )
 41 | RENDER_HTML_WITH_QUERY = pytest.param(
 42 |     {"query": "Generic query", "render": "html"},
 43 |     does_not_raise(),
 44 |     STR_RESPONSE,
 45 |     "\n\nMocked content\n\n",
 46 |     id="render-enabled-args",
 47 | )
 48 | RENDER_INVALID_WITH_QUERY = pytest.param(
 49 |     {"query": "Generic query", "render": "png"},
 50 |     pytest.raises(ToolError),
 51 |     STR_RESPONSE,
 52 |     None,
 53 |     id="render-enabled-args",
 54 | )
 55 | OUTPUT_FORMATS = [
 56 |     pytest.param(
 57 |         {"query": "Generic query", "output_format": "links"},
 58 |         does_not_raise(),
 59 |         {
 60 |             "results": [
 61 |                 {
 62 |                     "content": '<html><body><div><p><a href="https://example.com">link</a></p></div></body></html>'
 63 |                 }
 64 |             ],
 65 |             "job": JOB_RESPONSE,
 66 |         },
 67 |         "[link] https://example.com",
 68 |         id="links-output-format-args",
 69 |     ),
 70 |     pytest.param(
 71 |         {"query": "Generic query", "output_format": "md"},
 72 |         does_not_raise(),
 73 |         STR_RESPONSE,
 74 |         "\n\nMocked content\n\n",
 75 |         id="md-output-format-args",
 76 |     ),
 77 |     pytest.param(
 78 |         {"query": "Generic query", "output_format": "html"},
 79 |         does_not_raise(),
 80 |         STR_RESPONSE,
 81 |         "Mocked content",
 82 |         id="html-output-format-args",
 83 |     ),
 84 | ]
 85 | USER_AGENTS_WITH_QUERY = [
 86 |     pytest.param(
 87 |         {"query": "Generic query", "user_agent_type": uat},
 88 |         does_not_raise(),
 89 |         STR_RESPONSE,
 90 |         "\n\nMocked content\n\n",
 91 |         id=f"{uat}-user-agent-specified-args",
 92 |     )
 93 |     for uat in [
 94 |         "desktop",
 95 |         "desktop_chrome",
 96 |         "desktop_firefox",
 97 |         "desktop_safari",
 98 |         "desktop_edge",
 99 |         "desktop_opera",
100 |         "mobile",
101 |         "mobile_ios",
102 |         "mobile_android",
103 |         "tablet",
104 |     ]
105 | ]
106 | USER_AGENTS_WITH_URL = [
107 |     pytest.param(
108 |         {"url": "https://example.com", "user_agent_type": uat},
109 |         does_not_raise(),
110 |         STR_RESPONSE,
111 |         "\n\nMocked content\n\n",
112 |         id=f"{uat}-user-agent-specified-args",
113 |     )
114 |     for uat in [
115 |         "desktop",
116 |         "desktop_chrome",
117 |         "desktop_firefox",
118 |         "desktop_safari",
119 |         "desktop_edge",
120 |         "desktop_opera",
121 |         "mobile",
122 |         "mobile_ios",
123 |         "mobile_android",
124 |         "tablet",
125 |     ]
126 | ]
127 | INVALID_USER_AGENT = pytest.param(
128 |     {"query": "Generic query", "user_agent_type": "invalid"},
129 |     pytest.raises(ToolError),
130 |     STR_RESPONSE,
131 |     "Mocked content",
132 |     id="invalid-user-agent-specified-args",
133 | )
134 | START_PAGE_SPECIFIED = pytest.param(
135 |     {"query": "Generic query", "start_page": 2},
136 |     does_not_raise(),
137 |     JSON_RESPONSE,
138 |     '{"data": "value"}',
139 |     id="start-page-specified-args",
140 | )
141 | START_PAGE_INVALID = pytest.param(
142 |     {"query": "Generic query", "start_page": -1},
143 |     pytest.raises(ToolError),
144 |     JSON_RESPONSE,
145 |     '{"data": "value"}',
146 |     id="start-page-invalid-args",
147 | )
148 | PAGES_SPECIFIED = pytest.param(
149 |     {"query": "Generic query", "pages": 20},
150 |     does_not_raise(),
151 |     JSON_RESPONSE,
152 |     '{"data": "value"}',
153 |     id="pages-specified-args",
154 | )
155 | PAGES_INVALID = pytest.param(
156 |     {"query": "Generic query", "pages": -10},
157 |     pytest.raises(ToolError),
158 |     JSON_RESPONSE,
159 |     '{"data": "value"}',
160 |     id="pages-invalid-args",
161 | )
162 | LIMIT_SPECIFIED = pytest.param(
163 |     {"query": "Generic query", "limit": 100},
164 |     does_not_raise(),
165 |     JSON_RESPONSE,
166 |     '{"data": "value"}',
167 |     id="limit-specified-args",
168 | )
169 | LIMIT_INVALID = pytest.param(
170 |     {"query": "Generic query", "limit": 0},
171 |     pytest.raises(ToolError),
172 |     JSON_RESPONSE,
173 |     '{"data": "value"}',
174 |     id="limit-invalid-args",
175 | )
176 | DOMAIN_SPECIFIED = pytest.param(
177 |     {"query": "Generic query", "domain": "io"},
178 |     does_not_raise(),
179 |     JSON_RESPONSE,
180 |     '{"data": "value"}',
181 |     id="domain-specified-args",
182 | )
183 | GEO_LOCATION_SPECIFIED_WITH_QUERY = pytest.param(
184 |     {"query": "Generic query", "geo_location": "Miami, Florida"},
185 |     does_not_raise(),
186 |     JSON_RESPONSE,
187 |     '{"data": "value"}',
188 |     id="geo-location-specified-args",
189 | )
190 | GEO_LOCATION_SPECIFIED_WITH_URL = pytest.param(
191 |     {"url": "https://example.com", "geo_location": "Miami, Florida"},
192 |     does_not_raise(),
193 |     STR_RESPONSE,
194 |     "\n\nMocked content\n\n",
195 |     id="geo-location-specified-args",
196 | )
197 | LOCALE_SPECIFIED = pytest.param(
198 |     {"query": "Generic query", "locale": "ja_JP"},
199 |     does_not_raise(),
200 |     JSON_RESPONSE,
201 |     '{"data": "value"}',
202 |     id="locale-specified-args",
203 | )
204 | CATEGORY_SPECIFIED = pytest.param(
205 |     {"query": "Man's T-shirt", "category_id": "QE21R9AV"},
206 |     does_not_raise(),
207 |     JSON_RESPONSE,
208 |     '{"data": "value"}',
209 |     id="category-id-specified-args",
210 | )
211 | MERCHANT_ID_SPECIFIED = pytest.param(
212 |     {"query": "Man's T-shirt", "merchant_id": "QE21R9AV"},
213 |     does_not_raise(),
214 |     JSON_RESPONSE,
215 |     '{"data": "value"}',
216 |     id="merchant-id-specified-args",
217 | )
218 | CURRENCY_SPECIFIED = pytest.param(
219 |     {"query": "Man's T-shirt", "currency": "USD"},
220 |     does_not_raise(),
221 |     JSON_RESPONSE,
222 |     '{"data": "value"}',
223 |     id="currency-specified-args",
224 | )
225 | AUTOSELECT_VARIANT_ENABLED = pytest.param(
226 |     {"query": "B0BVF87BST", "autoselect_variant": True},
227 |     does_not_raise(),
228 |     JSON_RESPONSE,
229 |     '{"data": "value"}',
230 |     id="autoselect-variant-enabled-args",
231 | )
232 | URL_ONLY = pytest.param(
233 |     {"url": "https://example.com"},
234 |     does_not_raise(),
235 |     STR_RESPONSE,
236 |     "\n\nMocked content\n\n",
237 |     id="url-only-args",
238 | )
239 | NO_URL = pytest.param(
240 |     {},
241 |     pytest.raises(ToolError),
242 |     STR_RESPONSE,
243 |     "\n\nMocked content\n\n",
244 |     id="no-url-args",
245 | )
246 | RENDER_HTML_WITH_URL = pytest.param(
247 |     {"url": "https://example.com", "render": "html"},
248 |     does_not_raise(),
249 |     STR_RESPONSE,
250 |     "\n\nMocked content\n\n",
251 |     id="render-enabled-args",
252 | )
253 | RENDER_INVALID_WITH_URL = pytest.param(
254 |     {"url": "https://example.com", "render": "png"},
255 |     pytest.raises(ToolError),
256 |     JSON_RESPONSE,
257 |     None,
258 |     id="render-enabled-args",
259 | )
260 | AI_STUDIO_URL_ONLY = pytest.param(
261 |     {"url": "https://example.com"},
262 |     does_not_raise(),
263 |     AI_STUDIO_JSON_RESPONSE,
264 |     {"data": "value"},
265 |     id="url-with-user-prompt-args",
266 | )
267 | AI_STUDIO_QUERY_ONLY = pytest.param(
268 |     {"query": "Generic query"},
269 |     does_not_raise(),
270 |     AI_STUDIO_JSON_RESPONSE,
271 |     {"data": "value"},
272 |     id="url-with-user-prompt-args",
273 | )
274 | AI_STUDIO_URL_AND_OUTPUT_FORMAT = pytest.param(
275 |     {"url": "https://example.com", "output_format": "json"},
276 |     does_not_raise(),
277 |     AI_STUDIO_JSON_RESPONSE,
278 |     {"data": "value"},
279 |     id="url-with-user-prompt-and-output-format-args",
280 | )
281 | AI_STUDIO_URL_AND_SCHEMA = pytest.param(
282 |     {
283 |         "url": "https://example.com",
284 |         "schema": SimpleSchema.model_json_schema(),
285 |     },
286 |     does_not_raise(),
287 |     AI_STUDIO_JSON_RESPONSE,
288 |     {"data": "value"},
289 |     id="url-with-user-prompt-and-schema-args",
290 | )
291 | AI_STUDIO_URL_AND_RENDER_JAVASCRIPT = pytest.param(
292 |     {
293 |         "url": "https://example.com",
294 |         "render_javascript": True,
295 |     },
296 |     does_not_raise(),
297 |     AI_STUDIO_JSON_RESPONSE,
298 |     {"data": "value"},
299 |     id="url-with-user-prompt-and-render-js-args",
300 | )
301 | AI_STUDIO_QUERY_AND_RETURN_CONTENT = pytest.param(
302 |     {
303 |         "url": "https://example.com",
304 |         "return_content": True,
305 |     },
306 |     does_not_raise(),
307 |     AI_STUDIO_JSON_RESPONSE,
308 |     {"data": "value"},
309 |     id="url-with-user-prompt-and-return-content-args",
310 | )
311 | AI_STUDIO_URL_AND_RETURN_SOURCES_LIMIT = pytest.param(
312 |     {
313 |         "url": "https://example.com",
314 |         "return_sources_limit": 10,
315 |     },
316 |     does_not_raise(),
317 |     AI_STUDIO_JSON_RESPONSE,
318 |     {"data": "value"},
319 |     id="url-with-user-prompt-and-return-sources-limit-args",
320 | )
321 | AI_STUDIO_URL_AND_GEO_LOCATION = pytest.param(
322 |     {
323 |         "url": "https://example.com",
324 |         "geo_location": "US",
325 |     },
326 |     does_not_raise(),
327 |     AI_STUDIO_JSON_RESPONSE,
328 |     {"data": "value"},
329 |     id="url-with-user-prompt-and-geo_location-args",
330 | )
331 | AI_STUDIO_URL_AND_LIMIT = pytest.param(
332 |     {
333 |         "url": "https://example.com",
334 |         "limit": 5,
335 |     },
336 |     does_not_raise(),
337 |     AI_STUDIO_JSON_RESPONSE,
338 |     {"data": "value"},
339 |     id="url-with-user-prompt-and-limit-args",
340 | )
341 | AI_STUDIO_USER_PROMPT = pytest.param(
342 |     {
343 |         "user_prompt": "Scrape price and title",
344 |     },
345 |     does_not_raise(),
346 |     AI_STUDIO_JSON_RESPONSE,
347 |     {"data": "value"},
348 |     id="user-prompt-args",
349 | )
350 | 
```

--------------------------------------------------------------------------------
/src/oxylabs_mcp/utils.py:
--------------------------------------------------------------------------------

```python
  1 | import json
  2 | import logging
  3 | import os
  4 | import re
  5 | import typing
  6 | from contextlib import asynccontextmanager
  7 | from importlib.metadata import version
  8 | from platform import architecture, python_version
  9 | from typing import AsyncIterator
 10 | 
 11 | from fastmcp.server.dependencies import get_context
 12 | from httpx import (
 13 |     AsyncClient,
 14 |     BasicAuth,
 15 |     HTTPStatusError,
 16 |     RequestError,
 17 |     Timeout,
 18 | )
 19 | from lxml.html import defs, fromstring, tostring
 20 | from lxml.html.clean import Cleaner
 21 | from markdownify import markdownify
 22 | from mcp.server.fastmcp import Context
 23 | from mcp.shared.context import RequestContext
 24 | from oxylabs_ai_studio.utils import is_api_key_valid  # type: ignore[import-untyped]
 25 | from starlette import status
 26 | 
 27 | from oxylabs_mcp.config import settings
 28 | from oxylabs_mcp.exceptions import MCPServerError
 29 | 
 30 | 
 31 | logger = logging.getLogger(__name__)
 32 | 
 33 | USERNAME_ENV = "OXYLABS_USERNAME"
 34 | PASSWORD_ENV = "OXYLABS_PASSWORD"  # noqa: S105  # nosec
 35 | AI_STUDIO_API_KEY_ENV = "OXYLABS_AI_STUDIO_API_KEY"
 36 | 
 37 | USERNAME_HEADER = "X-Oxylabs-Username"
 38 | PASSWORD_HEADER = "X-Oxylabs-Password"  # noqa: S105  # nosec
 39 | AI_STUDIO_API_KEY_HEADER = "X-Oxylabs-AI-Studio-Api-Key"
 40 | 
 41 | USERNAME_QUERY_PARAM = "oxylabsUsername"
 42 | PASSWORD_QUERY_PARAM = "oxylabsPassword"  # noqa: S105  # nosec
 43 | AI_STUDIO_API_KEY_QUERY_PARAM = "oxylabsAiStudioApiKey"
 44 | 
 45 | 
 46 | def clean_html(html: str) -> str:
 47 |     """Clean an HTML string."""
 48 |     cleaner = Cleaner(
 49 |         scripts=True,
 50 |         javascript=True,
 51 |         style=True,
 52 |         remove_tags=[],
 53 |         kill_tags=["nav", "svg", "footer", "noscript", "script", "form"],
 54 |         safe_attrs=list(defs.safe_attrs) + ["idx"],
 55 |         comments=True,
 56 |         inline_style=True,
 57 |         links=True,
 58 |         meta=False,
 59 |         page_structure=False,
 60 |         embedded=True,
 61 |         frames=False,
 62 |         forms=False,
 63 |         annoying_tags=False,
 64 |     )
 65 |     return cleaner.clean_html(html)  # type: ignore[no-any-return]
 66 | 
 67 | 
 68 | def strip_html(html: str) -> str:
 69 |     """Simplify an HTML string.
 70 | 
 71 |     Will remove unwanted elements, attributes, and redundant content
 72 |     Args:
 73 |         html (str): The input HTML string.
 74 | 
 75 |     Returns:
 76 |         str: The cleaned and simplified HTML string.
 77 | 
 78 |     """
 79 |     cleaned_html = clean_html(html)
 80 |     html_tree = fromstring(cleaned_html)
 81 | 
 82 |     for element in html_tree.iter():
 83 |         # Remove style attributes.
 84 |         if "style" in element.attrib:
 85 |             del element.attrib["style"]
 86 | 
 87 |         # Remove elements that have no attributes, no content and no children.
 88 |         if (
 89 |             (not element.attrib or (len(element.attrib) == 1 and "idx" in element.attrib))
 90 |             and not element.getchildren()  # type: ignore[attr-defined]
 91 |             and (not element.text or not element.text.strip())
 92 |             and (not element.tail or not element.tail.strip())
 93 |         ):
 94 |             parent = element.getparent()
 95 |             if parent is not None:
 96 |                 parent.remove(element)
 97 | 
 98 |     # Remove elements with footer and hidden in class or id
 99 |     xpath_query = (
100 |         ".//*[contains(@class, 'footer') or contains(@id, 'footer') or "
101 |         "contains(@class, 'hidden') or contains(@id, 'hidden')]"
102 |     )
103 |     elements_to_remove = html_tree.xpath(xpath_query)
104 |     for element in elements_to_remove:  # type: ignore[assignment, union-attr]
105 |         parent = element.getparent()
106 |         if parent is not None:
107 |             parent.remove(element)
108 | 
109 |     # Serialize the HTML tree back to a string
110 |     stripped_html = tostring(html_tree, encoding="unicode")
111 |     # Previous cleaning produces empty spaces.
112 |     # Replace multiple spaces with a single one
113 |     stripped_html = re.sub(r"\s{2,}", " ", stripped_html)
114 |     # Replace consecutive newlines with an empty string
115 |     stripped_html = re.sub(r"\n{2,}", "", stripped_html)
116 |     return stripped_html
117 | 
118 | 
119 | def _get_request_context(ctx: Context) -> RequestContext | None:  # type: ignore[type-arg]
120 |     try:
121 |         return ctx.request_context
122 |     except ValueError:
123 |         return None
124 | 
125 | 
126 | def _get_default_headers() -> dict[str, str]:
127 |     headers = {}
128 |     if request_ctx := get_context().request_context:
129 |         if client_params := request_ctx.session.client_params:
130 |             client = f"oxylabs-mcp-{client_params.clientInfo.name}"
131 |         else:
132 |             client = "oxylabs-mcp"
133 |     else:
134 |         client = "oxylabs-mcp"
135 | 
136 |     bits, _ = architecture()
137 |     sdk_type = f"{client}/{version('oxylabs-mcp')} ({python_version()}; {bits})"
138 | 
139 |     headers["x-oxylabs-sdk"] = sdk_type
140 | 
141 |     return headers
142 | 
143 | 
144 | class _OxylabsClientWrapper:
145 |     def __init__(
146 |         self,
147 |         client: AsyncClient,
148 |     ) -> None:
149 |         self._client = client
150 |         self._ctx = get_context()
151 | 
152 |     async def scrape(self, payload: dict[str, typing.Any]) -> dict[str, typing.Any]:
153 |         await self._ctx.info(f"Create job with params: {json.dumps(payload)}")
154 | 
155 |         response = await self._client.post(settings.OXYLABS_SCRAPER_URL, json=payload)
156 |         response_json: dict[str, typing.Any] = response.json()
157 | 
158 |         if response.status_code == status.HTTP_201_CREATED:
159 |             await self._ctx.info(
160 |                 f"Job info: "
161 |                 f"job_id={response_json['job']['id']} "
162 |                 f"job_status={response_json['job']['status']}"
163 |             )
164 | 
165 |         response.raise_for_status()
166 | 
167 |         return response_json
168 | 
169 | 
170 | def get_oxylabs_auth() -> tuple[str | None, str | None]:
171 |     """Extract the Oxylabs credentials."""
172 |     if settings.MCP_TRANSPORT == "streamable-http":
173 |         request_headers = dict(get_context().request_context.request.headers)  # type: ignore[union-attr]
174 |         username = request_headers.get(USERNAME_HEADER.lower())
175 |         password = request_headers.get(PASSWORD_HEADER.lower())
176 |         if not username or not password:
177 |             query_params = get_context().request_context.request.query_params  # type: ignore[union-attr]
178 |             username = query_params.get(USERNAME_QUERY_PARAM)
179 |             password = query_params.get(PASSWORD_QUERY_PARAM)
180 |     else:
181 |         username = os.environ.get(USERNAME_ENV)
182 |         password = os.environ.get(PASSWORD_ENV)
183 | 
184 |     return username, password
185 | 
186 | 
187 | def get_oxylabs_ai_studio_api_key() -> str | None:
188 |     """Extract the Oxylabs AI Studio API key."""
189 |     if settings.MCP_TRANSPORT == "streamable-http":
190 |         request_headers = dict(get_context().request_context.request.headers)  # type: ignore[union-attr]
191 |         ai_studio_api_key = request_headers.get(AI_STUDIO_API_KEY_HEADER.lower())
192 |         if not ai_studio_api_key:
193 |             query_params = get_context().request_context.request.query_params  # type: ignore[union-attr]
194 |             ai_studio_api_key = query_params.get(AI_STUDIO_API_KEY_QUERY_PARAM)
195 |     else:
196 |         ai_studio_api_key = os.getenv(AI_STUDIO_API_KEY_ENV)
197 | 
198 |     return ai_studio_api_key
199 | 
200 | 
201 | @asynccontextmanager
202 | async def oxylabs_client() -> AsyncIterator[_OxylabsClientWrapper]:
203 |     """Async context manager for Oxylabs client that is used in MCP tools."""
204 |     headers = _get_default_headers()
205 | 
206 |     username, password = get_oxylabs_auth()
207 | 
208 |     if not username or not password:
209 |         raise ValueError("Oxylabs username and password must be set.")
210 | 
211 |     auth = BasicAuth(username=username, password=password)
212 | 
213 |     async with AsyncClient(
214 |         timeout=Timeout(settings.OXYLABS_REQUEST_TIMEOUT_S),
215 |         verify=True,
216 |         headers=headers,
217 |         auth=auth,
218 |     ) as client:
219 |         try:
220 |             yield _OxylabsClientWrapper(client)
221 |         except HTTPStatusError as e:
222 |             raise MCPServerError(
223 |                 f"HTTP error during POST request: {e.response.status_code} - {e.response.text}"
224 |             ) from None
225 |         except RequestError as e:
226 |             raise MCPServerError(f"Request error during POST request: {e}") from None
227 |         except Exception as e:
228 |             raise MCPServerError(f"Error: {str(e) or repr(e)}") from None
229 | 
230 | 
231 | def get_and_verify_oxylabs_ai_studio_api_key() -> str:
232 |     """Extract and varify the Oxylabs AI Studio API key."""
233 |     ai_studio_api_key = get_oxylabs_ai_studio_api_key()
234 | 
235 |     if ai_studio_api_key is None:
236 |         msg = "AI Studio API key is not set"
237 |         logger.warning(msg)
238 |         raise ValueError(msg)
239 |     if not is_api_key_valid(ai_studio_api_key):
240 |         raise ValueError("AI Studio API key is not valid")
241 | 
242 |     return ai_studio_api_key
243 | 
244 | 
245 | def extract_links_with_text(html: str, base_url: str | None = None) -> list[str]:
246 |     """Extract links with their display text from HTML.
247 | 
248 |     Args:
249 |         html (str): The input HTML string.
250 |         base_url (str | None): Base URL to use for converting relative URLs to absolute.
251 |                              If None, relative URLs will remain as is.
252 | 
253 |     Returns:
254 |         list[str]: List of links in format [Display Text] URL
255 | 
256 |     """
257 |     html_tree = fromstring(html)
258 |     links = []
259 | 
260 |     for link in html_tree.xpath("//a[@href]"):  # type: ignore[union-attr]
261 |         href = link.get("href")  # type: ignore[union-attr]
262 |         text = link.text_content().strip()  # type: ignore[union-attr]
263 | 
264 |         if href and text:
265 |             # Skip empty or whitespace-only text
266 |             if not text:
267 |                 continue
268 | 
269 |             # Skip anchor links
270 |             if href.startswith("#"):
271 |                 continue
272 | 
273 |             # Skip javascript links
274 |             if href.startswith("javascript:"):
275 |                 continue
276 | 
277 |             # Make relative URLs absolute if base_url is provided
278 |             if base_url and href.startswith("/"):
279 |                 # Remove trailing slash from base_url if present
280 |                 base = base_url.rstrip("/")
281 |                 href = f"{base}{href}"
282 | 
283 |             links.append(f"[{text}] {href}")
284 | 
285 |     return links
286 | 
287 | 
288 | def get_content(
289 |     response_json: dict[str, typing.Any],
290 |     *,
291 |     output_format: str,
292 |     parse: bool = False,
293 | ) -> str:
294 |     """Extract content from response and convert to a proper format."""
295 |     content = response_json["results"][0]["content"]
296 |     if parse and isinstance(content, dict):
297 |         return json.dumps(content)
298 |     if output_format == "html":
299 |         return str(content)
300 |     if output_format == "links":
301 |         links = extract_links_with_text(str(content))
302 |         return "\n".join(links)
303 | 
304 |     stripped_html = clean_html(str(content))
305 |     return markdownify(stripped_html)  # type: ignore[no-any-return]
306 | 
```

--------------------------------------------------------------------------------
/src/oxylabs_mcp/tools/ai_studio.py:
--------------------------------------------------------------------------------

```python
  1 | # mypy: disable-error-code=import-untyped
  2 | import json
  3 | import logging
  4 | from typing import Annotated, Any, Literal
  5 | 
  6 | from fastmcp import FastMCP
  7 | from mcp.types import ToolAnnotations
  8 | from oxylabs_ai_studio.apps.ai_crawler import AiCrawler
  9 | from oxylabs_ai_studio.apps.ai_map import AiMap
 10 | from oxylabs_ai_studio.apps.ai_scraper import AiScraper
 11 | from oxylabs_ai_studio.apps.ai_search import AiSearch
 12 | from oxylabs_ai_studio.apps.browser_agent import BrowserAgent
 13 | from pydantic import Field
 14 | 
 15 | from oxylabs_mcp.tools.misc import setup
 16 | from oxylabs_mcp.utils import get_and_verify_oxylabs_ai_studio_api_key
 17 | 
 18 | 
 19 | setup()
 20 | logger = logging.getLogger(__name__)
 21 | 
 22 | 
 23 | AI_TOOLS = [
 24 |     "generate_schema",
 25 |     "ai_search",
 26 |     "ai_scraper",
 27 |     "ai_crawler",
 28 |     "ai_browser_agent",
 29 |     "ai_map",
 30 | ]
 31 | 
 32 | 
 33 | mcp = FastMCP("ai_studio")
 34 | 
 35 | 
 36 | @mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
 37 | async def ai_crawler(
 38 |     url: Annotated[str, Field(description="The URL from which crawling will be started.")],
 39 |     user_prompt: Annotated[
 40 |         str,
 41 |         Field(description="What information user wants to extract from the domain."),
 42 |     ],
 43 |     output_format: Annotated[
 44 |         Literal["json", "markdown", "csv"],
 45 |         Field(
 46 |             description=(
 47 |                 "The format of the output. If json or csv, the schema is required. "
 48 |                 "Markdown returns full text of the page. CSV returns data in CSV format."
 49 |             )
 50 |         ),
 51 |     ] = "markdown",
 52 |     schema: Annotated[
 53 |         dict[str, Any] | None,
 54 |         Field(
 55 |             description="The schema to use for the crawl. Required if output_format is json or csv."
 56 |         ),
 57 |     ] = None,
 58 |     render_javascript: Annotated[  # noqa: FBT002
 59 |         bool,
 60 |         Field(
 61 |             description=(
 62 |                 "Whether to render the HTML of the page using javascript. Much slower, "
 63 |                 "therefore use it only for websites "
 64 |                 "that require javascript to render the page. "
 65 |                 "Unless user asks to use it, first try to crawl the page without it. "
 66 |                 "If results are unsatisfactory, try to use it."
 67 |             )
 68 |         ),
 69 |     ] = False,
 70 |     return_sources_limit: Annotated[
 71 |         int, Field(description="The maximum number of sources to return.", le=50)
 72 |     ] = 25,
 73 |     geo_location: Annotated[
 74 |         str | None,
 75 |         Field(description="Two letter ISO country code to use for the crawl proxy."),
 76 |     ] = None,
 77 | ) -> str:
 78 |     """Tool useful for crawling a website from starting url and returning data in a specified format.
 79 | 
 80 |     Schema is required only if output_format is json.
 81 |     'render_javascript' is used to render javascript heavy websites.
 82 |     'return_sources_limit' is used to limit the number of sources to return,
 83 |     for example if you expect results from single source, you can set it to 1.
 84 |     """  # noqa: E501
 85 |     logger.info(
 86 |         f"Calling ai_crawler with: {url=}, {user_prompt=}, "
 87 |         f"{output_format=}, {schema=}, {render_javascript=}, "
 88 |         f"{return_sources_limit=}"
 89 |     )
 90 |     crawler = AiCrawler(api_key=get_and_verify_oxylabs_ai_studio_api_key())
 91 |     result = await crawler.crawl_async(
 92 |         url=url,
 93 |         user_prompt=user_prompt,
 94 |         output_format=output_format,
 95 |         schema=schema,
 96 |         render_javascript=render_javascript,
 97 |         return_sources_limit=return_sources_limit,
 98 |         geo_location=geo_location,
 99 |     )
100 |     return json.dumps({"data": result.data})
101 | 
102 | 
103 | @mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
104 | async def ai_scraper(
105 |     url: Annotated[str, Field(description="The URL to scrape")],
106 |     output_format: Annotated[
107 |         Literal["json", "markdown", "csv"],
108 |         Field(
109 |             description=(
110 |                 "The format of the output. If json or csv, the schema is required. "
111 |                 "Markdown returns full text of the page. CSV returns data in CSV format, "
112 |                 "tabular like data."
113 |             )
114 |         ),
115 |     ] = "markdown",
116 |     schema: Annotated[
117 |         dict[str, Any] | None,
118 |         Field(
119 |             description=(
120 |                 "The schema to use for the scrape. Only required if output_format is json or csv."
121 |             )
122 |         ),
123 |     ] = None,
124 |     render_javascript: Annotated[  # noqa: FBT002
125 |         bool,
126 |         Field(
127 |             description=(
128 |                 "Whether to render the HTML of the page using javascript. "
129 |                 "Much slower, therefore use it only for websites "
130 |                 "that require javascript to render the page."
131 |                 "Unless user asks to use it, first try to scrape the page without it. "
132 |                 "If results are unsatisfactory, try to use it."
133 |             )
134 |         ),
135 |     ] = False,
136 |     geo_location: Annotated[
137 |         str | None,
138 |         Field(description="Two letter ISO country code to use for the scrape proxy."),
139 |     ] = None,
140 | ) -> str:
141 |     """Scrape the contents of the web page and return the data in the specified format.
142 | 
143 |     Schema is required only if output_format is json or csv.
144 |     'render_javascript' is used to render javascript heavy websites.
145 |     """
146 |     logger.info(
147 |         f"Calling ai_scraper with: {url=}, {output_format=}, {schema=}, {render_javascript=}"
148 |     )
149 |     scraper = AiScraper(api_key=get_and_verify_oxylabs_ai_studio_api_key())
150 |     result = await scraper.scrape_async(
151 |         url=url,
152 |         output_format=output_format,
153 |         schema=schema,
154 |         render_javascript=render_javascript,
155 |         geo_location=geo_location,
156 |     )
157 |     return json.dumps({"data": result.data})
158 | 
159 | 
160 | @mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
161 | async def ai_browser_agent(
162 |     url: Annotated[str, Field(description="The URL to start the browser agent navigation from.")],
163 |     task_prompt: Annotated[str, Field(description="What browser agent should do.")],
164 |     output_format: Annotated[
165 |         Literal["json", "markdown", "html", "csv"],
166 |         Field(
167 |             description=(
168 |                 "The output format. "
169 |                 "Markdown returns full text of the page including links. "
170 |                 "If json or csv, the schema is required."
171 |             )
172 |         ),
173 |     ] = "markdown",
174 |     schema: Annotated[
175 |         dict[str, Any] | None,
176 |         Field(
177 |             description=(
178 |                 "The schema to use for the scrape. Only required if output_format is json or csv."
179 |             )
180 |         ),
181 |     ] = None,
182 |     geo_location: Annotated[
183 |         str | None,
184 |         Field(description="Two letter ISO country code to use for the browser proxy."),
185 |     ] = None,
186 | ) -> str:
187 |     """Run the browser agent and return the data in the specified format.
188 | 
189 |     This tool is useful if you need navigate around the website and do some actions.
190 |     It allows navigating to any url, clicking on links, filling forms, scrolling, etc.
191 |     Finally it returns the data in the specified format. Schema is required only if output_format is json or csv.
192 |     'task_prompt' describes what browser agent should achieve
193 |     """  # noqa: E501
194 |     logger.info(
195 |         f"Calling ai_browser_agent with: {url=}, {task_prompt=}, {output_format=}, {schema=}"
196 |     )
197 |     browser_agent = BrowserAgent(api_key=get_and_verify_oxylabs_ai_studio_api_key())
198 |     result = await browser_agent.run_async(
199 |         url=url,
200 |         user_prompt=task_prompt,
201 |         output_format=output_format,
202 |         schema=schema,
203 |         geo_location=geo_location,
204 |     )
205 |     data = result.data.model_dump(mode="json") if result.data else None
206 |     return json.dumps({"data": data})
207 | 
208 | 
209 | @mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
210 | async def ai_search(
211 |     query: Annotated[str, Field(description="The query to search for.")],
212 |     limit: Annotated[int, Field(description="Maximum number of results to return.", le=50)] = 10,
213 |     render_javascript: Annotated[  # noqa: FBT002
214 |         bool,
215 |         Field(
216 |             description=(
217 |                 "Whether to render the HTML of the page using javascript. "
218 |                 "Much slower, therefore use it only if user asks to use it."
219 |                 "First try to search with setting it to False. "
220 |             )
221 |         ),
222 |     ] = False,
223 |     return_content: Annotated[  # noqa: FBT002
224 |         bool,
225 |         Field(description="Whether to return markdown content of the search results."),
226 |     ] = False,
227 |     geo_location: Annotated[
228 |         str | None,
229 |         Field(description="Two letter ISO country code to use for the search proxy."),
230 |     ] = None,
231 | ) -> str:
232 |     """Search the web based on a provided query.
233 | 
234 |     'return_content' is used to return markdown content for each search result. If 'return_content'
235 |         is set to True, you don't need to use ai_scraper to get the content of the search results urls,
236 |         because it is already included in the search results.
237 |     if 'return_content' is set to True, prefer lower 'limit' to reduce payload size.
238 |     """  # noqa: E501
239 |     logger.info(
240 |         f"Calling ai_search with: {query=}, {limit=}, {render_javascript=}, {return_content=}"
241 |     )
242 |     search = AiSearch(api_key=get_and_verify_oxylabs_ai_studio_api_key())
243 |     result = await search.search_async(
244 |         query=query,
245 |         limit=limit,
246 |         render_javascript=render_javascript,
247 |         return_content=return_content,
248 |         geo_location=geo_location,
249 |     )
250 |     data = result.model_dump(mode="json")["data"]
251 |     return json.dumps({"data": data})
252 | 
253 | 
254 | @mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
255 | async def generate_schema(
256 |     user_prompt: str,
257 |     app_name: Literal["ai_crawler", "ai_scraper", "browser_agent"],
258 | ) -> str:
259 |     """Generate a json schema in openapi format."""
260 |     if app_name == "ai_crawler":
261 |         crawler = AiCrawler(api_key=get_and_verify_oxylabs_ai_studio_api_key())
262 |         schema = crawler.generate_schema(prompt=user_prompt)
263 |     elif app_name == "ai_scraper":
264 |         scraper = AiScraper(api_key=get_and_verify_oxylabs_ai_studio_api_key())
265 |         schema = scraper.generate_schema(prompt=user_prompt)
266 |     elif app_name == "browser_agent":
267 |         browser_agent = BrowserAgent(api_key=get_and_verify_oxylabs_ai_studio_api_key())
268 |         schema = browser_agent.generate_schema(prompt=user_prompt)
269 |     else:
270 |         raise ValueError(f"Invalid app name: {app_name}")
271 | 
272 |     return json.dumps({"data": schema})
273 | 
274 | 
275 | @mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
276 | async def ai_map(
277 |     url: Annotated[str, Field(description="The URL from which URLs mapping will be started.")],
278 |     user_prompt: Annotated[
279 |         str,
280 |         Field(description="What kind of urls user wants to find."),
281 |     ],
282 |     render_javascript: Annotated[  # noqa: FBT002
283 |         bool,
284 |         Field(
285 |             description=(
286 |                 "Whether to render the HTML of the page using javascript. Much slower, "
287 |                 "therefore use it only for websites "
288 |                 "that require javascript to render the page. "
289 |                 "Unless user asks to use it, first try to crawl the page without it. "
290 |                 "If results are unsatisfactory, try to use it."
291 |             )
292 |         ),
293 |     ] = False,
294 |     return_sources_limit: Annotated[
295 |         int, Field(description="The maximum number of sources to return.", le=50)
296 |     ] = 25,
297 |     geo_location: Annotated[
298 |         str | None,
299 |         Field(description="Two letter ISO country code to use for the mapping proxy."),
300 |     ] = None,
301 | ) -> str:
302 |     """Tool useful for mapping website's urls."""  # noqa: E501
303 |     logger.info(
304 |         f"Calling ai_map with: {url=}, {user_prompt=}, "
305 |         f"{render_javascript=}, "
306 |         f"{return_sources_limit=}"
307 |     )
308 |     ai_map = AiMap(api_key=get_and_verify_oxylabs_ai_studio_api_key())
309 |     result = await ai_map.map_async(
310 |         url=url,
311 |         user_prompt=user_prompt,
312 |         render_javascript=render_javascript,
313 |         return_sources_limit=return_sources_limit,
314 |         geo_location=geo_location,
315 |     )
316 |     return json.dumps({"data": result.data})
317 | 
```