# Directory Structure
```
├── .github
│ └── workflows
│ ├── lint_and_test.yml
│ └── publish_to_pypi.yml
├── .gitignore
├── CHANGELOG.md
├── Dockerfile
├── LICENSE
├── Makefile
├── pyproject.toml
├── README.md
├── server.json
├── smithery.yaml
├── src
│ └── oxylabs_mcp
│ ├── __init__.py
│ ├── config.py
│ ├── exceptions.py
│ ├── tools
│ │ ├── __init__.py
│ │ ├── ai_studio.py
│ │ ├── misc.py
│ │ └── scraper.py
│ ├── url_params.py
│ └── utils.py
├── tests
│ ├── __init__.py
│ ├── conftest.py
│ ├── e2e
│ │ ├── __init__.py
│ │ ├── conftest.py
│ │ ├── example.env
│ │ ├── test_call_tools.py
│ │ └── test_llm_agent.py
│ ├── integration
│ │ ├── __init__.py
│ │ ├── params.py
│ │ ├── test_ai_studio_tools.py
│ │ ├── test_scraper_tools.py
│ │ └── test_server.py
│ ├── unit
│ │ ├── __init__.py
│ │ ├── fixtures
│ │ │ ├── __init__.py
│ │ │ ├── after_strip.html
│ │ │ ├── before_strip.html
│ │ │ └── with_links.html
│ │ └── test_utils.py
│ └── utils.py
└── uv.lock
```
# Files
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
```
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | share/python-wheels/
24 | *.egg-info/
25 | .installed.cfg
26 | *.egg
27 | MANIFEST
28 |
29 | # PyInstaller
30 | # Usually these files are written by a python script from a template
31 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
32 | *.manifest
33 | *.spec
34 |
35 | # Installer logs
36 | pip-log.txt
37 | pip-delete-this-directory.txt
38 |
39 | # Unit test / coverage reports
40 | htmlcov/
41 | .tox/
42 | .nox/
43 | .coverage
44 | .coverage.*
45 | .cache
46 | nosetests.xml
47 | coverage.xml
48 | *.cover
49 | *.py,cover
50 | .hypothesis/
51 | .pytest_cache/
52 | cover/
53 |
54 | # Translations
55 | *.mo
56 | *.pot
57 |
58 | # PyBuilder
59 | .pybuilder/
60 | target/
61 |
62 | # Jupyter Notebook
63 | .ipynb_checkpoints
64 |
65 | # IPython
66 | profile_default/
67 | ipython_config.py
68 |
69 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
70 | __pypackages__/
71 |
72 | # Environments
73 | .env
74 | tests/e2e/.env
75 | .venv
76 | env/
77 | venv/
78 | ENV/
79 | env.bak/
80 | venv.bak/
81 | .envrc
82 |
83 | # Rope project settings
84 | .ropeproject
85 |
86 | # mypy
87 | .mypy_cache/
88 | .dmypy.json
89 | dmypy.json
90 |
91 | # pytype static type analyzer
92 | .pytype/
93 |
94 | # Cython debug symbols
95 | cython_debug/
96 |
97 | # PyCharm
98 | .idea/
99 |
100 | # Ruff stuff:
101 | .ruff_cache/
102 |
103 | # PyPI configuration file
104 | .pypirc
105 |
106 | deployment/charts/
107 |
108 | .mcpregistry_*
109 |
```
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
```markdown
1 | <p align="center">
2 | <img src="https://storage.googleapis.com/oxylabs-public-assets/oxylabs_mcp.svg" alt="Oxylabs + MCP">
3 | </p>
4 | <h1 align="center" style="border-bottom: none;">
5 | Oxylabs MCP Server
6 | </h1>
7 |
8 | <p align="center">
9 | <em>The missing link between AI models and the real‑world web: one API that delivers clean, structured data from any site.</em>
10 | </p>
11 |
12 | <div align="center">
13 |
14 | [](https://smithery.ai/server/@oxylabs/oxylabs-mcp)
15 | [](https://pypi.org/project/oxylabs-mcp/)
16 | [](https://discord.gg/Pds3gBmKMH)
17 | [](LICENSE)
18 | [](https://mseep.ai/app/f6a9c0bc-83a6-4f78-89d9-f2cec4ece98d)
19 | 
20 |
21 | <br/>
22 | <a href="https://glama.ai/mcp/servers/@oxylabs/oxylabs-mcp">
23 | <img width="380" height="200" src="https://glama.ai/mcp/servers/@oxylabs/oxylabs-mcp/badge" alt="Oxylabs Server MCP server" />
24 | </a>
25 |
26 | </div>
27 |
28 | ---
29 |
30 | ## 📖 Overview
31 |
32 | The Oxylabs MCP server provides a bridge between AI models and the web. It enables them to scrape any URL, render JavaScript-heavy pages, extract and format content for AI use, bypass anti-scraping measures, and access geo-restricted web data from 195+ countries.
33 |
34 |
35 | ## 🛠️ MCP Tools
36 |
37 | Oxylabs MCP provides two sets of tools that can be used together or independently:
38 |
39 | ### Oxylabs Web Scraper API Tools
40 | 1. **universal_scraper**: Uses Oxylabs Web Scraper API for general website scraping;
41 | 2. **google_search_scraper**: Uses Oxylabs Web Scraper API to extract results from Google Search;
42 | 3. **amazon_search_scraper**: Uses Oxylabs Web Scraper API to scrape Amazon search result pages;
43 | 4. **amazon_product_scraper**: Uses Oxylabs Web Scraper API to extract data from individual Amazon product pages.
44 |
45 | ### Oxylabs AI Studio Tools
46 |
47 | 5. **ai_scraper**: Scrape content from any URL in JSON or Markdown format with AI-powered data extraction;
48 | 6. **ai_crawler**: Based on a prompt, crawls a website and collects data in Markdown or JSON format across multiple pages;
49 | 7. **ai_browser_agent**: Based on prompt, controls a browser and returns data in Markdown, JSON, HTML, or screenshot formats;
50 | 8. **ai_search**: Search the web for URLs and their contents with AI-powered content extraction.
51 |
52 |
53 | ## ✅ Prerequisites
54 |
55 | Before you begin, make sure you have **at least one** of the following:
56 |
57 | - **Oxylabs Web Scraper API Account**: Obtain your username and password from [Oxylabs](https://dashboard.oxylabs.io/) (1-week free trial available);
58 | - **Oxylabs AI Studio API Key**: Obtain your API key from [Oxylabs AI Studio](https://aistudio.oxylabs.io/settings/api-key). (1000 credits free).
59 |
60 | ## 📦 Configuration
61 |
62 | ### Environment variables
63 |
64 | Oxylabs MCP server supports the following environment variables:
65 | | Name | Description | Default |
66 | |----------------------------|-----------------------------------------------|---------|
67 | | `OXYLABS_USERNAME` | Your Oxylabs Web Scraper API username | |
68 | | `OXYLABS_PASSWORD` | Your Oxylabs Web Scraper API password | |
69 | | `OXYLABS_AI_STUDIO_API_KEY`| Your Oxylabs AI Studio API key | |
70 | | `LOG_LEVEL` | Log level for the logs returned to the client | `INFO` |
71 |
72 | Based on provided credentials, the server will automatically expose the corresponding tools:
73 | - If only `OXYLABS_USERNAME` and `OXYLABS_PASSWORD` are provided, the server will expose the Web Scraper API tools;
74 | - If only `OXYLABS_AI_STUDIO_API_KEY` is provided, the server will expose the AI Studio tools;
75 | - If both `OXYLABS_USERNAME` and `OXYLABS_PASSWORD` and `OXYLABS_AI_STUDIO_API_KEY` are provided, the server will expose all tools.
76 |
77 | ❗❗❗ **Important note: if you don't have Web Scraper API or Oxylabs AI studio credentials, delete the corresponding environment variables placeholders.
78 | Leaving placeholder values will result in exposed tools that do not work.**
79 |
80 |
81 |
82 | ### Configure with uvx
83 |
84 | - Install the uvx package manager:
85 | ```bash
86 | # macOS and Linux
87 | curl -LsSf https://astral.sh/uv/install.sh | sh
88 | ```
89 | OR:
90 | ```bash
91 | # Windows
92 | powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
93 | ```
94 | - Use the following config:
95 | ```json
96 | {
97 | "mcpServers": {
98 | "oxylabs": {
99 | "command": "uvx",
100 | "args": ["oxylabs-mcp"],
101 | "env": {
102 | "OXYLABS_USERNAME": "OXYLABS_USERNAME",
103 | "OXYLABS_PASSWORD": "OXYLABS_PASSWORD",
104 | "OXYLABS_AI_STUDIO_API_KEY": "OXYLABS_AI_STUDIO_API_KEY"
105 | }
106 | }
107 | }
108 | }
109 | ```
110 |
111 | ### Configure with uv
112 |
113 | - Install the uv package manager:
114 | ```bash
115 | # macOS and Linux
116 | curl -LsSf https://astral.sh/uv/install.sh | sh
117 | ```
118 | OR:
119 | ```bash
120 | # Windows
121 | powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
122 | ```
123 |
124 | - Use the following config:
125 | ```json
126 | {
127 | "mcpServers": {
128 | "oxylabs": {
129 | "command": "uv",
130 | "args": [
131 | "--directory",
132 | "/<Absolute-path-to-folder>/oxylabs-mcp",
133 | "run",
134 | "oxylabs-mcp"
135 | ],
136 | "env": {
137 | "OXYLABS_USERNAME": "OXYLABS_USERNAME",
138 | "OXYLABS_PASSWORD": "OXYLABS_PASSWORD",
139 | "OXYLABS_AI_STUDIO_API_KEY": "OXYLABS_AI_STUDIO_API_KEY"
140 | }
141 | }
142 | }
143 | }
144 | ```
145 |
146 | ### Configure with Smithery Oauth2
147 |
148 | - Go to https://smithery.ai/server/@oxylabs/oxylabs-mcp;
149 | - Click _Auto_ to install the Oxylabs MCP configuration for the respective client;
150 | - OR use the following config:
151 | ```json
152 | {
153 | "mcpServers": {
154 | "oxylabs": {
155 | "url": "https://server.smithery.ai/@oxylabs/oxylabs-mcp/mcp"
156 | }
157 | }
158 | }
159 | ```
160 | - Follow the instructions to authenticate Oxylabs MCP with Oauth2 flow
161 |
162 | ### Configure with Smithery query parameters
163 |
164 | In case your client does not support the Oauth2 authentication, you can pass the Oxylabs authentication parameters directly in url
165 | ```json
166 | {
167 | "mcpServers": {
168 | "oxylabs": {
169 | "url": "https://server.smithery.ai/@oxylabs/oxylabs-mcp/mcp?oxylabsUsername=OXYLABS_USERNAME&oxylabsPassword=OXYLABS_PASSWORD&oxylabsAiStudioApiKey=OXYLABS_AI_STUDIO_API_KEY"
170 | }
171 | }
172 | }
173 | ```
174 |
175 | ### Manual Setup with Claude Desktop
176 |
177 | Navigate to **Claude → Settings → Developer → Edit Config** and add one of the configurations above to the `claude_desktop_config.json` file.
178 |
179 | ### Manual Setup with Cursor AI
180 |
181 | Navigate to **Cursor → Settings → Cursor Settings → MCP**. Click **Add new global MCP server** and add one of the configurations above.
182 |
183 |
184 |
185 | ## 📝 Logging
186 |
187 | Server provides additional information about the tool calls in `notification/message` events
188 |
189 | ```json
190 | {
191 | "method": "notifications/message",
192 | "params": {
193 | "level": "info",
194 | "data": "Create job with params: {\"url\": \"https://ip.oxylabs.io\"}"
195 | }
196 | }
197 | ```
198 |
199 | ```json
200 | {
201 | "method": "notifications/message",
202 | "params": {
203 | "level": "info",
204 | "data": "Job info: job_id=7333113830223918081 job_status=done"
205 | }
206 | }
207 | ```
208 |
209 | ```json
210 | {
211 | "method": "notifications/message",
212 | "params": {
213 | "level": "error",
214 | "data": "Error: request to Oxylabs API failed"
215 | }
216 | }
217 | ```
218 |
219 | ---
220 |
221 | ## 🛡️ License
222 |
223 | Distributed under the MIT License – see [LICENSE](LICENSE) for details.
224 |
225 | ---
226 |
227 | ## About Oxylabs
228 |
229 | Established in 2015, Oxylabs is a market-leading web intelligence collection
230 | platform, driven by the highest business, ethics, and compliance standards,
231 | enabling companies worldwide to unlock data-driven insights.
232 |
233 | [](https://oxylabs.io/)
234 |
235 | <div align="center">
236 | <sub>
237 | Made with ☕ by <a href="https://oxylabs.io">Oxylabs</a>. Feel free to give us a ⭐ if MCP saved you a weekend.
238 | </sub>
239 | </div>
240 |
241 |
242 | ## ✨ Key Features
243 |
244 | <details>
245 | <summary><strong> Scrape content from any site</strong></summary>
246 | <br>
247 |
248 | - Extract data from any URL, including complex single-page applications
249 | - Fully render dynamic websites using headless browser support
250 | - Choose full JavaScript rendering, HTML-only, or none
251 | - Emulate Mobile and Desktop viewports for realistic rendering
252 |
253 | </details>
254 |
255 | <details>
256 | <summary><strong> Automatically get AI-ready data</strong></summary>
257 | <br>
258 |
259 | - Automatically clean and convert HTML to Markdown for improved readability
260 | - Use automated parsers for popular targets like Google, Amazon, and more
261 |
262 | </details>
263 |
264 | <details>
265 | <summary><strong> Bypass blocks & geo-restrictions</strong></summary>
266 | <br>
267 |
268 | - Bypass sophisticated bot protection systems with high success rate
269 | - Reliably scrape even the most complex websites
270 | - Get automatically rotating IPs from a proxy pool covering 195+ countries
271 |
272 | </details>
273 |
274 | <details>
275 | <summary><strong> Flexible setup & cross-platform support</strong></summary>
276 | <br>
277 |
278 | - Set rendering and parsing options if needed
279 | - Feed data directly into AI models or analytics tools
280 | - Works on macOS, Windows, and Linux
281 |
282 | </details>
283 |
284 | <details>
285 | <summary><strong> Built-in error handling and request management</strong></summary>
286 | <br>
287 |
288 | - Comprehensive error handling and reporting
289 | - Smart rate limiting and request management
290 |
291 | </details>
292 |
293 | ---
294 |
295 |
296 | ## Why Oxylabs MCP? 🕸️ ➜ 📦 ➜ 🤖
297 |
298 | Imagine telling your LLM *"Summarise the latest Hacker News discussion about GPT‑5"* – and it simply answers.
299 | MCP (Multi‑Client Proxy) makes that happen by doing the boring parts for you:
300 |
301 | | What Oxylabs MCP does | Why it matters to you |
302 | |-------------------------------------------------------------------|------------------------------------------|
303 | | **Bypasses anti‑bot walls** with the Oxylabs global proxy network | Keeps you unblocked and anonymous |
304 | | **Renders JavaScript** in headless Chrome | Single‑page apps, sorted |
305 | | **Cleans HTML → JSON** | Drop straight into vector DBs or prompts |
306 | | **Optional structured parsers** (Google, Amazon, etc.) | One‑line access to popular targets |
307 |
308 | mcp-name: io.github.oxylabs/oxylabs-mcp
309 |
```
--------------------------------------------------------------------------------
/src/oxylabs_mcp/tools/__init__.py:
--------------------------------------------------------------------------------
```python
1 |
```
--------------------------------------------------------------------------------
/tests/__init__.py:
--------------------------------------------------------------------------------
```python
1 |
```
--------------------------------------------------------------------------------
/tests/e2e/__init__.py:
--------------------------------------------------------------------------------
```python
1 |
```
--------------------------------------------------------------------------------
/tests/integration/__init__.py:
--------------------------------------------------------------------------------
```python
1 |
```
--------------------------------------------------------------------------------
/tests/unit/__init__.py:
--------------------------------------------------------------------------------
```python
1 |
```
--------------------------------------------------------------------------------
/tests/unit/fixtures/__init__.py:
--------------------------------------------------------------------------------
```python
1 |
```
--------------------------------------------------------------------------------
/tests/e2e/conftest.py:
--------------------------------------------------------------------------------
```python
1 | import dotenv
2 | import pytest
3 |
4 |
5 | dotenv.load_dotenv()
6 |
7 |
8 | @pytest.fixture(scope="session", autouse=True)
9 | def environment():
10 | pass
11 |
```
--------------------------------------------------------------------------------
/tests/unit/fixtures/after_strip.html:
--------------------------------------------------------------------------------
```html
1 | <html> <body> <div class="content"> <p>Welcome to my website</p> </div> <div class="other"> <p>Visible content</p> </div> </body>
2 | </html>
```
--------------------------------------------------------------------------------
/tests/e2e/example.env:
--------------------------------------------------------------------------------
```
1 | # Oxylabs Settings
2 | OXYLABS_USERNAME=
3 | OXYLABS_PASSWORD=
4 |
5 | # LLM Providers
6 | ANTHROPIC_API_KEY=
7 | GOOGLE_API_KEY=
8 | OPENAI_API_KEY=
9 |
10 | # Misc
11 | LOCAL_OXYLABS_MCP_DIRECTORY=
12 |
```
--------------------------------------------------------------------------------
/src/oxylabs_mcp/tools/misc.py:
--------------------------------------------------------------------------------
```python
1 | # mypy: disable-error-code=import-untyped
2 | from oxylabs_ai_studio import client
3 |
4 |
5 | def setup() -> None:
6 | """Setups the environment."""
7 | client._UA_API = "py-mcp"
8 |
```
--------------------------------------------------------------------------------
/src/oxylabs_mcp/exceptions.py:
--------------------------------------------------------------------------------
```python
1 | from fastmcp.server.dependencies import get_context
2 |
3 |
4 | class MCPServerError(Exception):
5 | """Generic MCP server exception."""
6 |
7 | async def process(self) -> str:
8 | """Process exception."""
9 | err = str(self)
10 | await get_context().error(err)
11 | return err
12 |
```
--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------
```markdown
1 | # Changelog
2 |
3 | ## [0.2.2] - 2025-05-23
4 |
5 | ### Fixed
6 |
7 | - Coverage badge
8 |
9 | ## [0.2.1] - 2025-05-23
10 |
11 | ### Added
12 |
13 | - More tests
14 |
15 | ### Changed
16 |
17 | - README.md
18 |
19 | ## [0.2.0] - 2025-05-13
20 |
21 | ### Added
22 |
23 | - Changelog
24 | - E2E tests
25 | - Geolocation and User Agent type parameters to universal scraper
26 |
27 | ### Changed
28 |
29 | - Descriptions for tools
30 | - Descriptions for tool parameters
31 | - Default values for tool parameters
32 |
33 | ### Removed
34 |
35 | - WebUnblocker tool
36 | - Parse parameter for universal scraper
37 |
```
--------------------------------------------------------------------------------
/tests/unit/fixtures/before_strip.html:
--------------------------------------------------------------------------------
```html
1 | <html>
2 | <body>
3 | <div class="content">
4 | <p>Welcome to my website</p>
5 | </div>
6 | <div id="footer">
7 | <p>This is the footer content.</p>
8 | </div>
9 | <div class="hidden">
10 | <p>This content is hidden.</p>
11 | </div>
12 | <div class="other">
13 | <p>Visible content</p>
14 | </div>
15 | <script>console.log('script tag');</script>
16 | <noscript>This is noscript content.</noscript>
17 | <form><input type="text"/></form>
18 | </body>
19 | </html>
```
--------------------------------------------------------------------------------
/src/oxylabs_mcp/config.py:
--------------------------------------------------------------------------------
```python
1 | from typing import Literal
2 |
3 | from dotenv import load_dotenv
4 | from pydantic_settings import BaseSettings
5 |
6 |
7 | load_dotenv()
8 |
9 |
10 | class Settings(BaseSettings):
11 | """Project settings."""
12 |
13 | OXYLABS_SCRAPER_URL: str = "https://realtime.oxylabs.io/v1/queries"
14 | OXYLABS_REQUEST_TIMEOUT_S: int = 100
15 | LOG_LEVEL: str = "INFO"
16 |
17 | MCP_TRANSPORT: Literal["stdio", "sse", "streamable-http"] = "stdio"
18 | MCP_PORT: int = 8000
19 | MCP_HOST: str = "localhost"
20 | MCP_STATELESS_HTTP: bool = False
21 |
22 | # smithery config
23 | PORT: int | None = None
24 |
25 |
26 | settings = Settings()
27 |
```
--------------------------------------------------------------------------------
/smithery.yaml:
--------------------------------------------------------------------------------
```yaml
1 | runtime: "container"
2 | build:
3 | dockerfile: "Dockerfile"
4 | dockerBuildPath: "."
5 | startCommand:
6 | type: "http"
7 | configSchema:
8 | type: "object"
9 | properties:
10 | oxylabsUsername:
11 | type: "string"
12 | description: "Oxylabs username"
13 | oxylabsPassword:
14 | type: "string"
15 | description: "Oxylabs password"
16 | oxylabsAiStudioApiKey:
17 | type: "string"
18 | description: "Oxylabs AI Studio api key"
19 | required: []
20 | exampleConfig:
21 | oxylabsUsername: "Your Oxylabs username"
22 | oxylabsPassword: "Your Oxylabs password"
23 | oxylabsAiStudioApiKey: "Your Oxylabs AI Studio api key"
24 |
```
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
```dockerfile
1 | FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim
2 |
3 | ENV UV_COMPILE_BYTECODE=1
4 | ENV UV_LINK_MODE=copy
5 | ENV UV_CACHE_DIR=/opt/uv-cache/
6 |
7 | RUN apt-get update && apt-get install -y --no-install-recommends git
8 |
9 | WORKDIR /app
10 |
11 | RUN --mount=type=cache,target=UV_CACHE_DIR \
12 | --mount=type=bind,source=uv.lock,target=uv.lock \
13 | --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
14 | uv sync --frozen --no-install-project --no-dev --no-editable
15 |
16 | ADD . /app
17 |
18 | RUN --mount=type=cache,target=UV_CACHE_DIR \
19 | uv sync --frozen --no-dev --no-editable
20 |
21 | # Add virtual environment to PATH
22 | ENV PATH="/app/.venv/bin:$PATH"
23 | ENV MCP_TRANSPORT="streamable-http"
24 |
25 | ENTRYPOINT ["oxylabs-mcp"]
26 |
```
--------------------------------------------------------------------------------
/tests/utils.py:
--------------------------------------------------------------------------------
```python
1 | def convert_context_params(arguments: dict) -> dict:
2 | context_fields = ["category_id", "merchant_id", "currency", "autoselect_variant"]
3 | arguments_copy = {**arguments}
4 |
5 | for f in context_fields:
6 | if f in arguments_copy:
7 | if "context" not in arguments_copy:
8 | arguments_copy["context"] = []
9 |
10 | arguments_copy["context"].append({"key": f, "value": arguments_copy[f]})
11 | del arguments_copy[f]
12 |
13 | return arguments_copy
14 |
15 |
16 | def prepare_expected_arguments(arguments: dict) -> dict:
17 | arguments_copy = {**arguments}
18 | if "output_format" in arguments_copy:
19 | del arguments_copy["output_format"]
20 | return arguments_copy
21 |
```
--------------------------------------------------------------------------------
/.github/workflows/publish_to_pypi.yml:
--------------------------------------------------------------------------------
```yaml
1 | name: Publish Python 🐍 distributions 📦 to PyPI
2 |
3 | on:
4 | push:
5 | tags:
6 | - 'v[0-9]+.[0-9]+.[0-9]+'
7 | jobs:
8 | build-n-publish:
9 | name: Build and publish Python distribution to PyPI
10 | runs-on: ubuntu-latest
11 | steps:
12 | - uses: actions/checkout@v4
13 |
14 | - name: Set up Python
15 | uses: actions/setup-python@v2
16 | with:
17 | python-version: 3.12
18 |
19 | - name: Install uv
20 | run: |
21 | pip install uv
22 |
23 | - name: Install dependencies
24 | run: |
25 | uv sync --no-dev
26 |
27 | - name: Build a dist package
28 | run: uv build
29 |
30 | - name: Publish distribution to PyPI
31 | uses: pypa/gh-action-pypi-publish@release/v1
32 | with:
33 | user: __token__
34 | password: ${{ secrets.PYPI_API_TOKEN }}
35 |
```
--------------------------------------------------------------------------------
/server.json:
--------------------------------------------------------------------------------
```json
1 | {
2 | "$schema": "https://static.modelcontextprotocol.io/schemas/2025-10-17/server.schema.json",
3 | "name": "io.github.oxylabs/oxylabs-mcp",
4 | "description": "Fetch and process content from specified URLs & sources using the Oxylabs Web Scraper API.",
5 | "repository": {
6 | "url": "https://github.com/oxylabs/oxylabs-mcp",
7 | "source": "github"
8 | },
9 | "version": "0.7.1",
10 | "packages": [
11 | {
12 | "registryType": "pypi",
13 | "identifier": "oxylabs-mcp",
14 | "version": "0.7.1",
15 | "transport": {
16 | "type": "stdio"
17 | },
18 | "environmentVariables": [
19 | {
20 | "description": "Your Oxylabs username",
21 | "isRequired": false,
22 | "format": "string",
23 | "isSecret": true,
24 | "name": "OXYLABS_USERNAME"
25 | },
26 | {
27 | "description": "Your Oxylabs password",
28 | "isRequired": false,
29 | "format": "string",
30 | "isSecret": true,
31 | "name": "OXYLABS_PASSWORD"
32 | },
33 | {
34 | "description": "Your Oxylabs AI Studio api key",
35 | "isRequired": false,
36 | "format": "string",
37 | "isSecret": true,
38 | "name": "OXYLABS_AI_STUDIO_API_KEY"
39 | }
40 | ]
41 | }
42 | ]
43 | }
```
--------------------------------------------------------------------------------
/tests/unit/fixtures/with_links.html:
--------------------------------------------------------------------------------
```html
1 | <!doctype html>
2 | <html>
3 | <head>
4 | <title>Example Domain</title>
5 |
6 | <meta charset="utf-8" />
7 | <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
8 | <meta name="viewport" content="width=device-width, initial-scale=1" />
9 | <style type="text/css">
10 | body {
11 | background-color: #f0f0f2;
12 | margin: 0;
13 | padding: 0;
14 | font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
15 |
16 | }
17 | div {
18 | width: 600px;
19 | margin: 5em auto;
20 | padding: 2em;
21 | background-color: #fdfdff;
22 | border-radius: 0.5em;
23 | box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
24 | }
25 | a:link, a:visited {
26 | color: #38488f;
27 | text-decoration: none;
28 | }
29 | @media (max-width: 700px) {
30 | div {
31 | margin: 0 auto;
32 | width: auto;
33 | }
34 | }
35 | </style>
36 | </head>
37 |
38 | <body>
39 | <div>
40 | <h1>Example Domain</h1>
41 | <p>This domain is for use in illustrative examples in documents. You may use this
42 | domain in literature without prior coordination or asking for permission.</p>
43 | <p><a href="https://www.iana.org/domains/example">More information...</a></p>
44 | <p><a href="https://example.com">Another link</a></p>
45 | </div>
46 | </body>
47 | </html>
48 |
```
--------------------------------------------------------------------------------
/tests/e2e/test_call_tools.py:
--------------------------------------------------------------------------------
```python
1 | import os
2 | from contextlib import asynccontextmanager
3 |
4 | import pytest
5 | from mcp import ClientSession, StdioServerParameters
6 | from mcp.client.stdio import stdio_client
7 |
8 |
9 | @asynccontextmanager
10 | async def get_oxylabs_mcp_client():
11 | server_params = StdioServerParameters(
12 | command="uv", # Using uv to run the server
13 | args=["run", "oxylabs-mcp"],
14 | env={
15 | "OXYLABS_USERNAME": os.getenv("OXYLABS_USERNAME"),
16 | "OXYLABS_PASSWORD": os.getenv("OXYLABS_PASSWORD"),
17 | },
18 | cwd=os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))),
19 | )
20 |
21 | async with stdio_client(server_params) as (read, write):
22 | async with ClientSession(read, write) as session:
23 | await session.initialize()
24 | yield session
25 |
26 |
27 | @pytest.mark.asyncio
28 | @pytest.mark.parametrize(
29 | ("url", "min_response_len"),
30 | [
31 | (
32 | "https://maisonpur.com/best-non-toxic-cutting-boards-safer-options-for-a-healthy-kitchen/",
33 | 10000,
34 | ),
35 | ("https://sandbox.oxylabs.io/products/1", 2500),
36 | ("https://sandbox.oxylabs.io/products/5", 3000),
37 | ],
38 | )
39 | async def test_universal_scraper_tool(url: str, min_response_len: int):
40 | async with get_oxylabs_mcp_client() as session:
41 | result = await session.call_tool("universal_scraper", arguments={"url": url})
42 | assert len(result.content[0].text) > min_response_len
43 |
```
--------------------------------------------------------------------------------
/.github/workflows/lint_and_test.yml:
--------------------------------------------------------------------------------
```yaml
1 | name: Lint & Test
2 |
3 | on:
4 | push:
5 | branches: [ "main" ]
6 | pull_request:
7 | branches: [ "main" ]
8 |
9 | permissions:
10 | contents: write
11 |
12 | jobs:
13 | lint_and_test:
14 |
15 | runs-on: ubuntu-latest
16 |
17 | steps:
18 | - uses: actions/checkout@v4
19 | - name: Set up Python 3.12
20 | uses: actions/setup-python@v3
21 | with:
22 | python-version: "3.12"
23 |
24 | - name: Install uv
25 | run: |
26 | pip install uv
27 |
28 | - name: Install dependencies
29 | run: |
30 | uv sync
31 |
32 | - name: Run linters
33 | run: |
34 | uv run black --check .
35 | uv run mypy src
36 | uv run ruff check .
37 |
38 | - name: Run tests
39 | run: |
40 | uv run pytest --cov=src --cov-report xml --cov-report term --cov-fail-under=90 tests/unit tests/integration
41 |
42 | - name: Generate coverage badge
43 | run: |
44 | pip install "genbadge[coverage]"
45 | genbadge coverage -i coverage.xml -o coverage-badge.svg
46 |
47 | - name: Upload coverage report artifact
48 | uses: actions/upload-artifact@v4
49 | with:
50 | name: coverage-report
51 | path: coverage.xml
52 |
53 | - name: Upload coverage badge artifact
54 | uses: actions/upload-artifact@v4
55 | with:
56 | name: coverage-badge
57 | path: coverage-badge.svg
58 |
59 | - name: Deploy coverage report to branch
60 | if: github.ref == 'refs/heads/main'
61 | uses: peaceiris/actions-gh-pages@v4
62 | with:
63 | publish_branch: 'coverage'
64 | github_token: ${{ secrets.GITHUB_TOKEN }}
65 | publish_dir: .
66 | keep_files: coverage-badge.svg
67 | user_name: 'github-actions[bot]'
68 | user_email: 'github-actions[bot]@users.noreply.github.com'
69 | commit_message: 'chore: Update coverage data from workflow run ${{ github.event.workflow_run.id }}'
70 |
```
--------------------------------------------------------------------------------
/src/oxylabs_mcp/__init__.py:
--------------------------------------------------------------------------------
```python
1 | import logging
2 | from typing import Any
3 |
4 | from fastmcp import Context, FastMCP
5 | from mcp import Tool as MCPTool
6 |
7 | from oxylabs_mcp.config import settings
8 | from oxylabs_mcp.tools.ai_studio import AI_TOOLS
9 | from oxylabs_mcp.tools.ai_studio import mcp as ai_studio_mcp
10 | from oxylabs_mcp.tools.scraper import SCRAPER_TOOLS
11 | from oxylabs_mcp.tools.scraper import mcp as scraper_mcp
12 | from oxylabs_mcp.utils import get_oxylabs_ai_studio_api_key, get_oxylabs_auth
13 |
14 |
15 | class OxylabsMCPServer(FastMCP):
16 | """Oxylabs MCP server."""
17 |
18 | async def _mcp_list_tools(self) -> list[MCPTool]:
19 | """List all available Oxylabs tools."""
20 | async with Context(fastmcp=self):
21 | tools = await self._list_tools()
22 |
23 | username, password = get_oxylabs_auth()
24 | if not username or not password:
25 | tools = [tool for tool in tools if tool.name not in SCRAPER_TOOLS]
26 |
27 | if not get_oxylabs_ai_studio_api_key():
28 | tools = [tool for tool in tools if tool.name not in AI_TOOLS]
29 |
30 | return [
31 | tool.to_mcp_tool(
32 | name=tool.key,
33 | include_fastmcp_meta=self.include_fastmcp_meta,
34 | )
35 | for tool in tools
36 | ]
37 |
38 |
39 | mcp = OxylabsMCPServer("oxylabs_mcp")
40 |
41 | mcp.mount(ai_studio_mcp)
42 | mcp.mount(scraper_mcp)
43 |
44 |
45 | def main() -> None:
46 | """Start the MCP server."""
47 | logging.getLogger("oxylabs_mcp").setLevel(settings.LOG_LEVEL)
48 |
49 | params: dict[str, Any] = {}
50 |
51 | if settings.MCP_TRANSPORT == "streamable-http":
52 | params["host"] = settings.MCP_HOST
53 | params["port"] = settings.PORT or settings.MCP_PORT
54 | params["log_level"] = settings.LOG_LEVEL
55 | params["stateless_http"] = settings.MCP_STATELESS_HTTP
56 |
57 | mcp.run(
58 | settings.MCP_TRANSPORT,
59 | **params,
60 | )
61 |
62 |
63 | # Optionally expose other important items at package level
64 | __all__ = ["main", "mcp"]
65 |
```
--------------------------------------------------------------------------------
/tests/unit/test_utils.py:
--------------------------------------------------------------------------------
```python
1 | from unittest.mock import patch
2 |
3 | import pytest
4 |
5 | from oxylabs_mcp.config import settings
6 | from oxylabs_mcp.utils import extract_links_with_text, get_oxylabs_auth, strip_html
7 |
8 |
9 | TEST_FIXTURES = "tests/unit/fixtures/"
10 |
11 |
12 | @pytest.mark.parametrize(
13 | "env_vars",
14 | [
15 | pytest.param(
16 | {"OXYLABS_USERNAME": "test_user", "OXYLABS_PASSWORD": "test_pass"},
17 | id="valid-env",
18 | ),
19 | pytest.param(
20 | {"OXYLABS_PASSWORD": "test_pass"},
21 | id="no-username",
22 | ),
23 | pytest.param(
24 | {"OXYLABS_USERNAME": "test_user"},
25 | id="no-password",
26 | ),
27 | pytest.param({}, id="no-username-and-no-password"),
28 | ],
29 | )
30 | def test_get_oxylabs_auth(env_vars):
31 | with patch("os.environ", new=env_vars):
32 | settings.MCP_TRANSPORT = "stdio"
33 | username, password = get_oxylabs_auth()
34 | assert username == env_vars.get("OXYLABS_USERNAME")
35 | assert password == env_vars.get("OXYLABS_PASSWORD")
36 |
37 |
38 | @pytest.mark.parametrize(
39 | ("html_input", "expected_output"),
40 | [pytest.param("before_strip.html", "after_strip.html", id="strip-html")],
41 | )
42 | def test_strip_html(html_input: str, expected_output: str):
43 | with (
44 | open(TEST_FIXTURES + html_input, "r", encoding="utf-8") as input_file,
45 | open(TEST_FIXTURES + expected_output, "r", encoding="utf-8") as output_file,
46 | ):
47 | input_html = input_file.read()
48 | expected_html = output_file.read()
49 |
50 | actual_output = strip_html(input_html)
51 | assert actual_output == expected_html
52 |
53 |
54 | @pytest.mark.parametrize(
55 | ("html_input", "expected_output"),
56 | [
57 | pytest.param(
58 | "with_links.html",
59 | "[More information...] https://www.iana.org/domains/example\n"
60 | "[Another link] https://example.com",
61 | id="strip-html",
62 | )
63 | ],
64 | )
65 | def test_extract_links_with_text(html_input: str, expected_output: str):
66 | with (open(TEST_FIXTURES + html_input, "r", encoding="utf-8") as input_file,):
67 | input_html = input_file.read()
68 |
69 | links = extract_links_with_text(input_html)
70 | assert "\n".join(links) == expected_output
71 |
```
--------------------------------------------------------------------------------
/tests/conftest.py:
--------------------------------------------------------------------------------
```python
1 | from contextlib import asynccontextmanager
2 | from unittest.mock import AsyncMock, MagicMock, patch
3 |
4 | import pytest
5 | from fastmcp.server.context import Context, set_context
6 | from httpx import Request
7 | from mcp.server.lowlevel.server import request_ctx
8 |
9 | from oxylabs_mcp import mcp as mcp_server
10 |
11 |
12 | @pytest.fixture
13 | def request_context():
14 | request_context = MagicMock()
15 | request_context.session.client_params.clientInfo.name = "fake_cursor"
16 | request_context.request.headers = {
17 | "x-oxylabs-username": "oxylabs_username",
18 | "x-oxylabs-password": "oxylabs_password",
19 | "x-oxylabs-ai-studio-api-key": "oxylabs_ai_studio_api_key",
20 | }
21 |
22 | ctx = Context(MagicMock())
23 | ctx.info = AsyncMock()
24 | ctx.error = AsyncMock()
25 |
26 | request_ctx.set(request_context)
27 |
28 | with set_context(ctx):
29 | yield ctx
30 |
31 |
32 | @pytest.fixture(scope="session", autouse=True)
33 | def environment():
34 | env = {
35 | "OXYLABS_USERNAME": "oxylabs_username",
36 | "OXYLABS_PASSWORD": "oxylabs_password",
37 | "OXYLABS_AI_STUDIO_API_KEY": "oxylabs_ai_studio_api_key",
38 | }
39 | with patch("os.environ", new=env):
40 | yield
41 |
42 |
43 | @pytest.fixture
44 | def mcp(request_context: Context):
45 | return mcp_server
46 |
47 |
48 | @pytest.fixture
49 | def request_data():
50 | return Request("POST", "https://example.com/v1/queries")
51 |
52 |
53 | @pytest.fixture
54 | def oxylabs_client():
55 | client_mock = AsyncMock()
56 |
57 | @asynccontextmanager
58 | async def wrapper(*args, **kwargs):
59 | client_mock.context_manager_call_args = args
60 | client_mock.context_manager_call_kwargs = kwargs
61 |
62 | yield client_mock
63 |
64 | with patch("oxylabs_mcp.utils.AsyncClient", new=wrapper):
65 | yield client_mock
66 |
67 |
68 | @pytest.fixture
69 | def request_session(request_context):
70 | token = request_ctx.set(request_context)
71 |
72 | yield request_context.session
73 |
74 | request_ctx.reset(token)
75 |
76 |
77 | @pytest.fixture(scope="session", autouse=True)
78 | def is_api_key_valid_mock():
79 | with patch("oxylabs_mcp.utils.is_api_key_valid", return_value=True):
80 | yield
81 |
82 |
83 | @pytest.fixture
84 | def mock_schema():
85 | return {"field_1": "value1", "field_2": "value2"}
86 |
87 |
88 | @pytest.fixture
89 | def ai_crawler(mock_schema):
90 | mock_crawler = MagicMock()
91 | mock_crawler.generate_schema.return_value = mock_schema
92 |
93 | with patch("oxylabs_mcp.tools.ai_studio.AiCrawler", return_value=mock_crawler):
94 | yield mock_crawler
95 |
96 |
97 | @pytest.fixture
98 | def ai_scraper(mock_schema):
99 | mock_scraper = MagicMock()
100 | mock_scraper.generate_schema.return_value = mock_schema
101 |
102 | with patch("oxylabs_mcp.tools.ai_studio.AiScraper", return_value=mock_scraper):
103 | yield mock_scraper
104 |
105 |
106 | @pytest.fixture
107 | def browser_agent(mock_schema):
108 | mock_browser_agent = MagicMock()
109 | mock_browser_agent.generate_schema.return_value = mock_schema
110 |
111 | with patch("oxylabs_mcp.tools.ai_studio.BrowserAgent", return_value=mock_browser_agent):
112 | yield mock_browser_agent
113 |
114 |
115 | @pytest.fixture
116 | def ai_search():
117 | mock_ai_search = MagicMock()
118 |
119 | with patch("oxylabs_mcp.tools.ai_studio.AiSearch", return_value=mock_ai_search):
120 | yield mock_ai_search
121 |
122 |
123 | @pytest.fixture
124 | def ai_map():
125 | mock_ai_map = MagicMock()
126 |
127 | with patch("oxylabs_mcp.tools.ai_studio.AiMap", return_value=mock_ai_map):
128 | yield mock_ai_map
129 |
```
--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
```toml
1 | [project]
2 | name = "oxylabs-mcp"
3 | version = "0.7.1"
4 | description = "Oxylabs MCP server"
5 | authors = [
6 | {name="Augis Braziunas", email="[email protected]"},
7 | {name="Rostyslav Borovyk", email="[email protected]"},
8 | ]
9 | readme = "README.md"
10 | requires-python = ">=3.12"
11 | classifiers = [
12 | "Programming Language :: Python :: 3",
13 | "Programming Language :: Python :: 3.12",
14 | "Programming Language :: Python :: 3.13",
15 | "Development Status :: 4 - Beta",
16 | "Operating System :: OS Independent",
17 | ]
18 |
19 | license = "MIT"
20 | license-files = ["LICEN[CS]E*"]
21 |
22 | dependencies = [
23 | "fastmcp>=2.11.3",
24 | "httpx>=0.28.1",
25 | "lxml>=5.3.0,<6",
26 | "lxml-html-clean>=0.4.1",
27 | "markdownify>=0.14.1",
28 | "oxylabs-ai-studio>=0.2.15",
29 | "pydantic>=2.10.5",
30 | "pydantic-settings>=2.8.1",
31 | "smithery>=0.1.25",
32 | ]
33 |
34 | [dependency-groups]
35 | dev = [
36 | "bandit>=1.8.6",
37 | "black>=25.1.0",
38 | "lxml-stubs>=0.5.1",
39 | "mypy>=1.14.1",
40 | "pytest>=8.3.4",
41 | "pytest-asyncio>=0.25.2",
42 | "pytest-cov>=6.1.1",
43 | "pytest-mock>=3.14.0",
44 | "ruff>=0.9.1",
45 | ]
46 | e2e-tests = [
47 | "agno>=1.8.1",
48 | "anthropic>=0.50.0",
49 | "google-genai>=1.13.0",
50 | "openai>=1.77.0",
51 | ]
52 |
53 | [build-system]
54 | requires = ["hatchling"]
55 | build-backend = "hatchling.build"
56 |
57 | [project.scripts]
58 | oxylabs-mcp = "oxylabs_mcp:main"
59 |
60 | [project.urls]
61 | Homepage = "https://github.com/oxylabs/oxylabs-mcp"
62 | Repository = "https://github.com/oxylabs/oxylabs-mcp"
63 |
64 | [[tool.mypy.overrides]]
65 | module = "markdownify.*"
66 | ignore_missing_imports = true
67 | strict = true
68 |
69 | [tool.ruff]
70 | target-version = "py312"
71 | lint.select = [
72 | "E", # pycodestyle (E, W) - https://docs.astral.sh/ruff/rules/#pycodestyle-e-w
73 | "F", # Pyflakes (F) - https://docs.astral.sh/ruff/rules/#pyflakes-f
74 | "W", # pycodestyle (E, W) - https://docs.astral.sh/ruff/rules/#pycodestyle-e-w
75 | "I", # isort (I) https://docs.astral.sh/ruff/rules/#isort-i
76 | "D", # pydocstyle (D) https://docs.astral.sh/ruff/rules/#pydocstyle-d
77 | "S", # bandit (S) https://docs.astral.sh/ruff/rules/#flake8-bandit-s
78 | "ARG", # flake8-unused-arguments - https://docs.astral.sh/ruff/rules/#flake8-unused-arguments-arg
79 | "B", # flake8-bugbear - https://docs.astral.sh/ruff/rules/#flake8-bugbear-b
80 | "C4", # flake8-comprehensions - https://docs.astral.sh/ruff/rules/#flake8-comprehensions-c4
81 | "ISC", # flake8-implicit-str-concat - https://docs.astral.sh/ruff/rules/#flake8-implicit-str-concat-isc
82 | "FA", # flake8-future-annotations - https://docs.astral.sh/ruff/rules/#flake8-future-annotations-fa
83 | "FBT", # flake8-boolean-trap - https://docs.astral.sh/ruff/rules/#flake8-boolean-trap-fbt
84 | "Q", # flake8-quotes (Q) https://docs.astral.sh/ruff/rules/#flake8-quotes-q
85 | "ANN", # flake8-annotations (ANN) https://docs.astral.sh/ruff/rules/#flake8-annotations-ann
86 | "PLR", # Refactor (PLR) https://docs.astral.sh/ruff/rules/#refactor-plr
87 | "PT", # flake8-pytest-style (PT) https://docs.astral.sh/ruff/rules/#flake8-pytest-style-pt
88 | ]
89 | lint.ignore = [
90 | "D213", # Contradicts D212.
91 | "D203", # Contradicts D211.
92 | "D104", # Allow no docstrings in packages
93 | "D100", # Allow no docstrings in modules
94 | "ANN002", # https://docs.astral.sh/ruff/rules/missing-type-args/
95 | "ANN003", # https://docs.astral.sh/ruff/rules/missing-type-kwargs/
96 | "PLR0913", # Allow functions with many arguments
97 | "PLR0912", # Allow many branches for functions
98 | ]
99 |
100 | [tool.ruff.lint.per-file-ignores]
101 | "tests/*" = ["D", "S101", "ARG001", "ANN", "PT011", "FBT", "PLR2004"]
102 | "src/oxylabs_mcp/url_params.py" = ["E501"]
103 |
104 | [tool.ruff.lint.pycodestyle]
105 | max-line-length = 100
106 |
107 | [tool.ruff.lint.isort]
108 | known-first-party = ["src", "tests"]
109 | lines-after-imports = 2
110 |
111 | [tool.pytest.ini_options]
112 | asyncio_default_fixture_loop_scope = "session"
113 | asyncio_mode = "auto"
114 |
115 | [tool.black]
116 | line-length = 100
117 |
```
--------------------------------------------------------------------------------
/tests/integration/test_server.py:
--------------------------------------------------------------------------------
```python
1 | import json
2 | import re
3 | from unittest.mock import AsyncMock, MagicMock
4 |
5 | import pytest
6 | from fastmcp import FastMCP
7 | from httpx import HTTPStatusError, Request, RequestError, Response
8 |
9 | from oxylabs_mcp.config import settings
10 | from tests.integration import params
11 |
12 |
13 | @pytest.mark.asyncio
14 | @pytest.mark.parametrize(
15 | ("tool", "arguments"),
16 | [
17 | pytest.param(
18 | "universal_scraper",
19 | {"url": "test_url"},
20 | id="universal_scraper",
21 | ),
22 | pytest.param(
23 | "google_search_scraper",
24 | {"query": "Generic query"},
25 | id="google_search_scraper",
26 | ),
27 | pytest.param(
28 | "amazon_search_scraper",
29 | {"query": "Generic query"},
30 | id="amazon_search_scraper",
31 | ),
32 | pytest.param(
33 | "amazon_product_scraper",
34 | {"query": "Generic query"},
35 | id="amazon_product_scraper",
36 | ),
37 | ],
38 | )
39 | async def test_default_headers_are_set(
40 | mcp: FastMCP,
41 | request_data: Request,
42 | oxylabs_client: AsyncMock,
43 | tool: str,
44 | arguments: dict,
45 | ):
46 | mock_response = Response(
47 | 200,
48 | content=json.dumps(params.STR_RESPONSE),
49 | request=request_data,
50 | )
51 |
52 | oxylabs_client.post.return_value = mock_response
53 | oxylabs_client.get.return_value = mock_response
54 |
55 | await mcp._call_tool(tool, arguments=arguments)
56 |
57 | assert "x-oxylabs-sdk" in oxylabs_client.context_manager_call_kwargs["headers"]
58 |
59 | oxylabs_sdk_header = oxylabs_client.context_manager_call_kwargs["headers"]["x-oxylabs-sdk"]
60 | client_info, _ = oxylabs_sdk_header.split(maxsplit=1)
61 |
62 | client_info_pattern = re.compile(r"oxylabs-mcp-fake_cursor/(\d+)\.(\d+)\.(\d+)$")
63 | assert re.match(client_info_pattern, client_info)
64 |
65 |
66 | @pytest.mark.asyncio
67 | @pytest.mark.parametrize(
68 | ("tool", "arguments"),
69 | [
70 | pytest.param(
71 | "universal_scraper",
72 | {"url": "test_url"},
73 | id="universal_scraper",
74 | ),
75 | pytest.param(
76 | "google_search_scraper",
77 | {"query": "Generic query"},
78 | id="google_search_scraper",
79 | ),
80 | pytest.param(
81 | "amazon_search_scraper",
82 | {"query": "Generic query"},
83 | id="amazon_search_scraper",
84 | ),
85 | pytest.param(
86 | "amazon_product_scraper",
87 | {"query": "Generic query"},
88 | id="amazon_product_scraper",
89 | ),
90 | ],
91 | )
92 | @pytest.mark.parametrize(
93 | ("exception", "expected_text"),
94 | [
95 | pytest.param(
96 | HTTPStatusError(
97 | "HTTP status error",
98 | request=MagicMock(),
99 | response=MagicMock(status_code=500, text="Internal Server Error"),
100 | ),
101 | "HTTP error during POST request: 500 - Internal Server Error",
102 | id="https_status_error",
103 | ),
104 | pytest.param(
105 | RequestError("Request error"),
106 | "Request error during POST request: Request error",
107 | id="request_error",
108 | ),
109 | pytest.param(
110 | Exception("Unexpected exception"),
111 | "Error: Unexpected exception",
112 | id="unhandled_exception",
113 | ),
114 | ],
115 | )
116 | async def test_request_client_error_handling(
117 | mcp: FastMCP,
118 | request_data: Request,
119 | oxylabs_client: AsyncMock,
120 | tool: str,
121 | arguments: dict,
122 | exception: Exception,
123 | expected_text: str,
124 | ):
125 | oxylabs_client.post.side_effect = [exception]
126 | oxylabs_client.get.side_effect = [exception]
127 |
128 | result = await mcp._call_tool(tool, arguments=arguments)
129 |
130 | assert result.content[0].text == expected_text
131 |
132 |
133 | @pytest.mark.parametrize("transport", ["stdio", "streamable-http"])
134 | async def test_list_tools(mcp: FastMCP, transport: str):
135 | settings.MCP_TRANSPORT = transport
136 | tools = await mcp._mcp_list_tools()
137 | assert len(tools) == 10
138 |
```
--------------------------------------------------------------------------------
/src/oxylabs_mcp/url_params.py:
--------------------------------------------------------------------------------
```python
1 | from typing import Annotated, Literal
2 |
3 | from pydantic import Field
4 |
5 |
6 | # Note: optional types (e.g `str | None`) break the introspection in the Cursor AI.
7 | # See: https://github.com/getcursor/cursor/issues/2932
8 | # Therefore, sentinel values (e.g. `""`, `0`) are used to represent a nullable parameter.
9 | URL_PARAM = Annotated[str, Field(description="Website url to scrape.")]
10 | PARSE_PARAM = Annotated[
11 | bool,
12 | Field(
13 | description="Should result be parsed. If the result is not parsed, the output_format parameter is applied.",
14 | ),
15 | ]
16 | RENDER_PARAM = Annotated[
17 | Literal["", "html"],
18 | Field(
19 | description="""
20 | Whether a headless browser should be used to render the page.
21 | For example:
22 | - 'html' when browser is required to render the page.
23 | """,
24 | examples=["", "html"],
25 | ),
26 | ]
27 | OUTPUT_FORMAT_PARAM = Annotated[
28 | Literal[
29 | "",
30 | "links",
31 | "md",
32 | "html",
33 | ],
34 | Field(
35 | description="""
36 | The format of the output. Works only when parse parameter is false.
37 | - links - Most efficient when the goal is navigation or finding specific URLs. Use this first when you need to locate a specific page within a website.
38 | - md - Best for extracting and reading visible content once you've found the right page. Use this to get structured content that's easy to read and process.
39 | - html - Should be used sparingly only when you need the raw HTML structure, JavaScript code, or styling information.
40 | """
41 | ),
42 | ]
43 | GOOGLE_QUERY_PARAM = Annotated[str, Field(description="URL-encoded keyword to search for.")]
44 | AMAZON_SEARCH_QUERY_PARAM = Annotated[str, Field(description="Keyword to search for.")]
45 | USER_AGENT_TYPE_PARAM = Annotated[
46 | Literal[
47 | "",
48 | "desktop",
49 | "desktop_chrome",
50 | "desktop_firefox",
51 | "desktop_safari",
52 | "desktop_edge",
53 | "desktop_opera",
54 | "mobile",
55 | "mobile_ios",
56 | "mobile_android",
57 | "tablet",
58 | ],
59 | Field(
60 | description="Device type and browser that will be used to "
61 | "determine User-Agent header value."
62 | ),
63 | ]
64 | START_PAGE_PARAM = Annotated[
65 | int,
66 | Field(description="Starting page number."),
67 | ]
68 | PAGES_PARAM = Annotated[
69 | int,
70 | Field(description="Number of pages to retrieve."),
71 | ]
72 | LIMIT_PARAM = Annotated[
73 | int,
74 | Field(description="Number of results to retrieve in each page."),
75 | ]
76 | DOMAIN_PARAM = Annotated[
77 | str,
78 | Field(
79 | description="""
80 | Domain localization for Google.
81 | Use country top level domains.
82 | For example:
83 | - 'co.uk' for United Kingdom
84 | - 'us' for United States
85 | - 'fr' for France
86 | """,
87 | examples=["uk", "us", "fr"],
88 | ),
89 | ]
90 | GEO_LOCATION_PARAM = Annotated[
91 | str,
92 | Field(
93 | description="""
94 | The geographical location that the result should be adapted for.
95 | Use ISO-3166 country codes.
96 | Examples:
97 | - 'California, United States'
98 | - 'Mexico'
99 | - 'US' for United States
100 | - 'DE' for Germany
101 | - 'FR' for France
102 | """,
103 | examples=["US", "DE", "FR"],
104 | ),
105 | ]
106 | LOCALE_PARAM = Annotated[
107 | str,
108 | Field(
109 | description="""
110 | Set 'Accept-Language' header value which changes your Google search page web interface language.
111 | Examples:
112 | - 'en-US' for English, United States
113 | - 'de-AT' for German, Austria
114 | - 'fr-FR' for French, France
115 | """,
116 | examples=["en-US", "de-AT", "fr-FR"],
117 | ),
118 | ]
119 | AD_MODE_PARAM = Annotated[
120 | bool,
121 | Field(
122 | description="If true will use the Google Ads source optimized for the paid ads.",
123 | ),
124 | ]
125 | CATEGORY_ID_CONTEXT_PARAM = Annotated[
126 | str,
127 | Field(
128 | description="Search for items in a particular browse node (product category).",
129 | ),
130 | ]
131 | MERCHANT_ID_CONTEXT_PARAM = Annotated[
132 | str,
133 | Field(
134 | description="Search for items sold by a particular seller.",
135 | ),
136 | ]
137 | CURRENCY_CONTEXT_PARAM = Annotated[
138 | str,
139 | Field(
140 | description="Currency that will be used to display the prices.",
141 | examples=["USD", "EUR", "AUD"],
142 | ),
143 | ]
144 | AUTOSELECT_VARIANT_CONTEXT_PARAM = Annotated[
145 | bool,
146 | Field(
147 | description="To get accurate pricing/buybox data, set this parameter to true.",
148 | ),
149 | ]
150 |
```
--------------------------------------------------------------------------------
/tests/e2e/test_llm_agent.py:
--------------------------------------------------------------------------------
```python
1 | import json
2 | import os
3 | from contextlib import asynccontextmanager
4 |
5 | import pytest
6 | from agno.agent import Agent
7 | from agno.models.google import Gemini
8 | from agno.models.openai import OpenAIChat
9 | from agno.tools.mcp import MCPTools
10 |
11 |
12 | MCP_SERVER = "local" # local, uvx
13 | MODELS_CONFIG = [
14 | ("GOOGLE_API_KEY", "gemini"),
15 | # ("OPENAI_API_KEY", "openai"),
16 | ]
17 |
18 |
19 | def get_agent(model: str, oxylabs_mcp: MCPTools) -> Agent:
20 | if model == "gemini":
21 | model_ = Gemini(api_key=os.getenv("GOOGLE_API_KEY"))
22 | elif model == "openai":
23 | model_ = OpenAIChat(api_key=os.getenv("OPENAI_API_KEY"))
24 | else:
25 | raise ValueError(f"Unknown model: {model}")
26 |
27 | return Agent(
28 | model=model_,
29 | tools=[oxylabs_mcp],
30 | instructions=["Use MCP tools to fulfil the requests"],
31 | markdown=True,
32 | )
33 |
34 |
35 | def get_models() -> list[str]:
36 | models = []
37 |
38 | for env_var, model_name in MODELS_CONFIG:
39 | if os.getenv(env_var):
40 | models.append(model_name)
41 |
42 | return models
43 |
44 |
45 | @asynccontextmanager
46 | async def oxylabs_mcp_server():
47 | if MCP_SERVER == "local":
48 | command = f"uv run --directory {os.getenv('LOCAL_OXYLABS_MCP_DIRECTORY')} oxylabs-mcp"
49 | elif MCP_SERVER == "uvx":
50 | command = "uvx oxylabs-mcp"
51 | else:
52 | raise ValueError(f"Unknown mcp server option: {MCP_SERVER}")
53 |
54 | async with MCPTools(
55 | command,
56 | env={
57 | "OXYLABS_USERNAME": os.getenv("OXYLABS_USERNAME"),
58 | "OXYLABS_PASSWORD": os.getenv("OXYLABS_PASSWORD"),
59 | },
60 | ) as mcp_server:
61 | yield mcp_server
62 |
63 |
64 | @pytest.mark.skipif(not os.getenv("OXYLABS_USERNAME"), reason="`OXYLABS_USERNAME` is not set")
65 | @pytest.mark.skipif(not os.getenv("OXYLABS_PASSWORD"), reason="`OXYLABS_PASSWORD` is not set")
66 | @pytest.mark.asyncio
67 | @pytest.mark.parametrize("model", get_models())
68 | @pytest.mark.parametrize(
69 | ("query", "tool", "arguments", "expected_content"),
70 | [
71 | (
72 | "Search for iPhone 16 in google with parsed result",
73 | "google_search_scraper",
74 | {
75 | "query": "iPhone 16",
76 | "parse": True,
77 | },
78 | "iPhone 16",
79 | ),
80 | (
81 | "Search for iPhone 16 in google with render html",
82 | "google_search_scraper",
83 | {
84 | "query": "iPhone 16",
85 | "render": "html",
86 | },
87 | "iPhone 16",
88 | ),
89 | (
90 | "Search for iPhone 16 in google with browser rendering",
91 | "google_search_scraper",
92 | {
93 | "query": "iPhone 16",
94 | "render": "html",
95 | },
96 | "iPhone 16",
97 | ),
98 | (
99 | "Search for iPhone 16 in google with user agent type mobile",
100 | "google_search_scraper",
101 | {
102 | "query": "iPhone 16",
103 | "user_agent_type": "mobile",
104 | },
105 | "iPhone 16",
106 | ),
107 | (
108 | "Search for iPhone 16 in google starting from the second page",
109 | "google_search_scraper",
110 | {
111 | "query": "iPhone 16",
112 | "start_page": 2,
113 | },
114 | "iPhone 16",
115 | ),
116 | (
117 | "Search for iPhone 16 in google with United Kingdom domain",
118 | "google_search_scraper",
119 | {
120 | "query": "iPhone 16",
121 | "domain": "co.uk",
122 | },
123 | "iPhone 16",
124 | ),
125 | (
126 | "Search for iPhone 16 in google with Brazil geolocation",
127 | "google_search_scraper",
128 | {
129 | "query": "iPhone 16",
130 | "geo_location": "BR",
131 | },
132 | "iPhone 16",
133 | ),
134 | (
135 | "Search for iPhone 16 in google with French locale",
136 | "google_search_scraper",
137 | {
138 | "query": "iPhone 16",
139 | "locale": "fr-FR",
140 | },
141 | "iPhone 16",
142 | ),
143 | ],
144 | )
145 | async def test_basic_agent_prompts(
146 | model: str,
147 | query: str,
148 | tool: str,
149 | arguments: dict,
150 | expected_content: str,
151 | ):
152 | async with oxylabs_mcp_server() as mcp_server:
153 | agent = get_agent(model, mcp_server)
154 | response = await agent.arun(query)
155 |
156 | tool_calls = agent.memory.get_tool_calls(agent.session_id)
157 |
158 | # [tool_call, tool_call_result]
159 | assert len(tool_calls) == 2, "Extra tool calls found!"
160 |
161 | assert tool_calls[0]["function"]["name"] == tool
162 | assert json.loads(tool_calls[0]["function"]["arguments"]) == arguments
163 |
164 | assert expected_content in response.content
165 |
166 |
167 | @pytest.mark.asyncio
168 | @pytest.mark.parametrize("model", get_models())
169 | async def test_complex_agent_prompt(model: str):
170 | async with oxylabs_mcp_server() as mcp_server:
171 | agent = get_agent(model, mcp_server)
172 |
173 | await agent.arun(
174 | "Go to oxylabs.io, look for career page, "
175 | "go to it and return all job titles in markdown format. "
176 | "Don't invent URLs, start from one provided."
177 | )
178 |
179 | tool_calls = agent.memory.get_tool_calls(agent.session_id)
180 | assert len(tool_calls) == 4, f"Not enough tool_calls, got {len(tool_calls)}: {tool_calls}"
181 |
182 | oxylabs_page_call, _, careers_page_call, _ = agent.memory.get_tool_calls(agent.session_id)
183 | assert oxylabs_page_call["function"]["name"] == "universal_scraper"
184 | assert json.loads(oxylabs_page_call["function"]["arguments"]) == {
185 | "output_format": "links",
186 | "url": "https://oxylabs.io",
187 | }
188 | assert careers_page_call["function"]["name"] == "universal_scraper"
189 | assert json.loads(careers_page_call["function"]["arguments"]) == {
190 | "output_format": "md",
191 | "url": "https://career.oxylabs.io/",
192 | }
193 |
```
--------------------------------------------------------------------------------
/tests/integration/test_scraper_tools.py:
--------------------------------------------------------------------------------
```python
1 | import json
2 | from typing import Any
3 | from unittest.mock import AsyncMock, patch
4 |
5 | import pytest
6 | from fastmcp import FastMCP
7 | from httpx import Request, Response
8 | from mcp.types import TextContent
9 |
10 | from tests.integration import params
11 | from tests.utils import convert_context_params, prepare_expected_arguments
12 |
13 |
14 | @pytest.mark.parametrize(
15 | ("arguments", "expectation", "response_data", "expected_result"),
16 | [
17 | params.URL_ONLY,
18 | params.NO_URL,
19 | params.RENDER_HTML_WITH_URL,
20 | params.RENDER_INVALID_WITH_URL,
21 | *params.USER_AGENTS_WITH_URL,
22 | params.GEO_LOCATION_SPECIFIED_WITH_URL,
23 | ],
24 | )
25 | @pytest.mark.asyncio
26 | async def test_oxylabs_scraper_arguments(
27 | mcp: FastMCP,
28 | request_data: Request,
29 | response_data: str,
30 | arguments: dict,
31 | expectation,
32 | expected_result: str,
33 | oxylabs_client: AsyncMock,
34 | ):
35 | mock_response = Response(200, content=json.dumps(response_data), request=request_data)
36 | oxylabs_client.post.return_value = mock_response
37 |
38 | with (
39 | expectation,
40 | patch("httpx.AsyncClient.post", new=AsyncMock(return_value=mock_response)),
41 | ):
42 | result = await mcp._call_tool("universal_scraper", arguments=arguments)
43 |
44 | assert oxylabs_client.post.call_args.kwargs == {
45 | "json": convert_context_params(prepare_expected_arguments(arguments)),
46 | }
47 | assert result.content == [TextContent(type="text", text=expected_result)]
48 |
49 |
50 | @pytest.mark.parametrize(
51 | ("arguments", "expectation", "response_data", "expected_result"),
52 | [
53 | params.QUERY_ONLY,
54 | params.PARSE_ENABLED,
55 | params.RENDER_HTML_WITH_QUERY,
56 | *params.USER_AGENTS_WITH_QUERY,
57 | *params.OUTPUT_FORMATS,
58 | params.INVALID_USER_AGENT,
59 | params.START_PAGE_SPECIFIED,
60 | params.PAGES_SPECIFIED,
61 | params.LIMIT_SPECIFIED,
62 | params.DOMAIN_SPECIFIED,
63 | params.GEO_LOCATION_SPECIFIED_WITH_QUERY,
64 | params.LOCALE_SPECIFIED,
65 | ],
66 | )
67 | @pytest.mark.asyncio
68 | async def test_google_search_scraper_arguments(
69 | mcp: FastMCP,
70 | request_data: Request,
71 | response_data: str,
72 | arguments: dict,
73 | expectation,
74 | expected_result: str,
75 | oxylabs_client: AsyncMock,
76 | ):
77 | mock_response = Response(200, content=json.dumps(response_data), request=request_data)
78 | oxylabs_client.post.return_value = mock_response
79 |
80 | with expectation:
81 | result = await mcp._call_tool("google_search_scraper", arguments=arguments)
82 |
83 | assert oxylabs_client.post.call_args.kwargs == {
84 | "json": {
85 | "source": "google_search",
86 | "parse": True,
87 | **prepare_expected_arguments(arguments),
88 | }
89 | }
90 | assert result.content == [TextContent(type="text", text=expected_result)]
91 |
92 |
93 | @pytest.mark.parametrize(
94 | ("ad_mode", "expected_result"),
95 | [
96 | (False, {"parse": True, "query": "Iphone 16", "source": "google_search"}),
97 | (True, {"parse": True, "query": "Iphone 16", "source": "google_ads"}),
98 | ],
99 | )
100 | @pytest.mark.asyncio
101 | async def test_oxylabs_google_search_ad_mode_argument(
102 | mcp: FastMCP,
103 | request_data: Request,
104 | ad_mode: bool,
105 | expected_result: dict[str, Any],
106 | oxylabs_client: AsyncMock,
107 | ):
108 | arguments = {"query": "Iphone 16", "ad_mode": ad_mode}
109 | mock_response = Response(200, content=json.dumps('{"data": "value"}'), request=request_data)
110 | oxylabs_client.post.return_value = mock_response
111 |
112 | await mcp._call_tool("google_search_scraper", arguments=arguments)
113 | assert oxylabs_client.post.call_args.kwargs == {"json": expected_result}
114 | assert oxylabs_client.post.await_args.kwargs["json"] == expected_result
115 |
116 |
117 | @pytest.mark.parametrize(
118 | ("arguments", "expectation", "response_data", "expected_result"),
119 | [
120 | params.QUERY_ONLY,
121 | params.PARSE_ENABLED,
122 | params.RENDER_HTML_WITH_QUERY,
123 | *params.USER_AGENTS_WITH_QUERY,
124 | *params.OUTPUT_FORMATS,
125 | params.INVALID_USER_AGENT,
126 | params.START_PAGE_SPECIFIED,
127 | params.PAGES_SPECIFIED,
128 | params.DOMAIN_SPECIFIED,
129 | params.GEO_LOCATION_SPECIFIED_WITH_QUERY,
130 | params.LOCALE_SPECIFIED,
131 | params.CATEGORY_SPECIFIED,
132 | params.MERCHANT_ID_SPECIFIED,
133 | params.CURRENCY_SPECIFIED,
134 | ],
135 | )
136 | @pytest.mark.asyncio
137 | async def test_amazon_search_scraper_arguments(
138 | mcp: FastMCP,
139 | request_data: Request,
140 | response_data: str,
141 | arguments: dict,
142 | expectation,
143 | expected_result: str,
144 | oxylabs_client: AsyncMock,
145 | request_context,
146 | ):
147 | mock_response = Response(200, content=json.dumps(response_data), request=request_data)
148 | oxylabs_client.post.return_value = mock_response
149 |
150 | with expectation:
151 | result = await mcp._call_tool("amazon_search_scraper", arguments=arguments)
152 |
153 | assert oxylabs_client.post.call_args.kwargs == {
154 | "json": {
155 | "source": "amazon_search",
156 | "parse": True,
157 | **convert_context_params(prepare_expected_arguments(arguments)),
158 | }
159 | }
160 | assert result.content == [TextContent(type="text", text=expected_result)]
161 |
162 |
163 | @pytest.mark.parametrize(
164 | ("arguments", "expectation", "response_data", "expected_result"),
165 | [
166 | params.QUERY_ONLY,
167 | params.PARSE_ENABLED,
168 | params.RENDER_HTML_WITH_QUERY,
169 | *params.USER_AGENTS_WITH_QUERY,
170 | *params.OUTPUT_FORMATS,
171 | params.INVALID_USER_AGENT,
172 | params.DOMAIN_SPECIFIED,
173 | params.GEO_LOCATION_SPECIFIED_WITH_QUERY,
174 | params.LOCALE_SPECIFIED,
175 | params.CURRENCY_SPECIFIED,
176 | params.AUTOSELECT_VARIANT_ENABLED,
177 | ],
178 | )
179 | @pytest.mark.asyncio
180 | async def test_amazon_product_scraper_arguments(
181 | mcp: FastMCP,
182 | request_data: Request,
183 | response_data: str,
184 | arguments: dict,
185 | expectation,
186 | expected_result: str,
187 | oxylabs_client: AsyncMock,
188 | ):
189 | mock_response = Response(200, content=json.dumps(response_data), request=request_data)
190 | oxylabs_client.post.return_value = mock_response
191 |
192 | with expectation:
193 | result = await mcp._call_tool("amazon_product_scraper", arguments=arguments)
194 |
195 | assert oxylabs_client.post.call_args.kwargs == {
196 | "json": {
197 | "source": "amazon_product",
198 | "parse": True,
199 | **convert_context_params(prepare_expected_arguments(arguments)),
200 | }
201 | }
202 | assert result.content == [TextContent(type="text", text=expected_result)]
203 |
```
--------------------------------------------------------------------------------
/src/oxylabs_mcp/tools/scraper.py:
--------------------------------------------------------------------------------
```python
1 | from typing import Any
2 |
3 | from fastmcp import FastMCP
4 | from mcp.types import ToolAnnotations
5 |
6 | from oxylabs_mcp import url_params
7 | from oxylabs_mcp.exceptions import MCPServerError
8 | from oxylabs_mcp.utils import (
9 | get_content,
10 | oxylabs_client,
11 | )
12 |
13 |
14 | SCRAPER_TOOLS = [
15 | "universal_scraper",
16 | "google_search_scraper",
17 | "amazon_search_scraper",
18 | "amazon_product_scraper",
19 | ]
20 |
21 | mcp = FastMCP("scraper")
22 |
23 |
24 | @mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
25 | async def universal_scraper(
26 | url: url_params.URL_PARAM,
27 | render: url_params.RENDER_PARAM = "",
28 | user_agent_type: url_params.USER_AGENT_TYPE_PARAM = "",
29 | geo_location: url_params.GEO_LOCATION_PARAM = "",
30 | output_format: url_params.OUTPUT_FORMAT_PARAM = "",
31 | ) -> str:
32 | """Get a content of any webpage.
33 |
34 | Supports browser rendering, parsing of certain webpages
35 | and different output formats.
36 | """
37 | try:
38 | async with oxylabs_client() as client:
39 | payload: dict[str, Any] = {"url": url}
40 |
41 | if render:
42 | payload["render"] = render
43 | if user_agent_type:
44 | payload["user_agent_type"] = user_agent_type
45 | if geo_location:
46 | payload["geo_location"] = geo_location
47 |
48 | response_json = await client.scrape(payload)
49 |
50 | return get_content(response_json, output_format=output_format)
51 | except MCPServerError as e:
52 | return await e.process()
53 |
54 |
55 | @mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
56 | async def google_search_scraper(
57 | query: url_params.GOOGLE_QUERY_PARAM,
58 | parse: url_params.PARSE_PARAM = True, # noqa: FBT002
59 | render: url_params.RENDER_PARAM = "",
60 | user_agent_type: url_params.USER_AGENT_TYPE_PARAM = "",
61 | start_page: url_params.START_PAGE_PARAM = 0,
62 | pages: url_params.PAGES_PARAM = 0,
63 | limit: url_params.LIMIT_PARAM = 0,
64 | domain: url_params.DOMAIN_PARAM = "",
65 | geo_location: url_params.GEO_LOCATION_PARAM = "",
66 | locale: url_params.LOCALE_PARAM = "",
67 | ad_mode: url_params.AD_MODE_PARAM = False, # noqa: FBT002
68 | output_format: url_params.OUTPUT_FORMAT_PARAM = "",
69 | ) -> str:
70 | """Scrape Google Search results.
71 |
72 | Supports content parsing, different user agent types, pagination,
73 | domain, geolocation, locale parameters and different output formats.
74 | """
75 | try:
76 | async with oxylabs_client() as client:
77 | payload: dict[str, Any] = {"query": query}
78 |
79 | if ad_mode:
80 | payload["source"] = "google_ads"
81 | else:
82 | payload["source"] = "google_search"
83 |
84 | if parse:
85 | payload["parse"] = parse
86 | if render:
87 | payload["render"] = render
88 | if user_agent_type:
89 | payload["user_agent_type"] = user_agent_type
90 | if start_page:
91 | payload["start_page"] = start_page
92 | if pages:
93 | payload["pages"] = pages
94 | if limit:
95 | payload["limit"] = limit
96 | if domain:
97 | payload["domain"] = domain
98 | if geo_location:
99 | payload["geo_location"] = geo_location
100 | if locale:
101 | payload["locale"] = locale
102 |
103 | response_json = await client.scrape(payload)
104 |
105 | return get_content(response_json, parse=parse, output_format=output_format)
106 | except MCPServerError as e:
107 | return await e.process()
108 |
109 |
110 | @mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
111 | async def amazon_search_scraper(
112 | query: url_params.AMAZON_SEARCH_QUERY_PARAM,
113 | category_id: url_params.CATEGORY_ID_CONTEXT_PARAM = "",
114 | merchant_id: url_params.MERCHANT_ID_CONTEXT_PARAM = "",
115 | currency: url_params.CURRENCY_CONTEXT_PARAM = "",
116 | parse: url_params.PARSE_PARAM = True, # noqa: FBT002
117 | render: url_params.RENDER_PARAM = "",
118 | user_agent_type: url_params.USER_AGENT_TYPE_PARAM = "",
119 | start_page: url_params.START_PAGE_PARAM = 0,
120 | pages: url_params.PAGES_PARAM = 0,
121 | domain: url_params.DOMAIN_PARAM = "",
122 | geo_location: url_params.GEO_LOCATION_PARAM = "",
123 | locale: url_params.LOCALE_PARAM = "",
124 | output_format: url_params.OUTPUT_FORMAT_PARAM = "",
125 | ) -> str:
126 | """Scrape Amazon search results.
127 |
128 | Supports content parsing, different user agent types, pagination,
129 | domain, geolocation, locale parameters and different output formats.
130 | Supports Amazon specific parameters such as category id, merchant id, currency.
131 | """
132 | try:
133 | async with oxylabs_client() as client:
134 | payload: dict[str, Any] = {"source": "amazon_search", "query": query}
135 |
136 | context = []
137 | if category_id:
138 | context.append({"key": "category_id", "value": category_id})
139 | if merchant_id:
140 | context.append({"key": "merchant_id", "value": merchant_id})
141 | if currency:
142 | context.append({"key": "currency", "value": currency})
143 | if context:
144 | payload["context"] = context
145 |
146 | if parse:
147 | payload["parse"] = parse
148 | if render:
149 | payload["render"] = render
150 | if user_agent_type:
151 | payload["user_agent_type"] = user_agent_type
152 | if start_page:
153 | payload["start_page"] = start_page
154 | if pages:
155 | payload["pages"] = pages
156 | if domain:
157 | payload["domain"] = domain
158 | if geo_location:
159 | payload["geo_location"] = geo_location
160 | if locale:
161 | payload["locale"] = locale
162 |
163 | response_json = await client.scrape(payload)
164 |
165 | return get_content(response_json, parse=parse, output_format=output_format)
166 | except MCPServerError as e:
167 | return await e.process()
168 |
169 |
170 | @mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
171 | async def amazon_product_scraper(
172 | query: url_params.AMAZON_SEARCH_QUERY_PARAM,
173 | autoselect_variant: url_params.AUTOSELECT_VARIANT_CONTEXT_PARAM = False, # noqa: FBT002
174 | currency: url_params.CURRENCY_CONTEXT_PARAM = "",
175 | parse: url_params.PARSE_PARAM = True, # noqa: FBT002
176 | render: url_params.RENDER_PARAM = "",
177 | user_agent_type: url_params.USER_AGENT_TYPE_PARAM = "",
178 | domain: url_params.DOMAIN_PARAM = "",
179 | geo_location: url_params.GEO_LOCATION_PARAM = "",
180 | locale: url_params.LOCALE_PARAM = "",
181 | output_format: url_params.OUTPUT_FORMAT_PARAM = "",
182 | ) -> str:
183 | """Scrape Amazon products.
184 |
185 | Supports content parsing, different user agent types, domain,
186 | geolocation, locale parameters and different output formats.
187 | Supports Amazon specific parameters such as currency and getting
188 | more accurate pricing data with auto select variant.
189 | """
190 | try:
191 | async with oxylabs_client() as client:
192 | payload: dict[str, Any] = {"source": "amazon_product", "query": query}
193 |
194 | context = []
195 | if autoselect_variant:
196 | context.append({"key": "autoselect_variant", "value": autoselect_variant})
197 | if currency:
198 | context.append({"key": "currency", "value": currency})
199 | if context:
200 | payload["context"] = context
201 |
202 | if parse:
203 | payload["parse"] = parse
204 | if render:
205 | payload["render"] = render
206 | if user_agent_type:
207 | payload["user_agent_type"] = user_agent_type
208 | if domain:
209 | payload["domain"] = domain
210 | if geo_location:
211 | payload["geo_location"] = geo_location
212 | if locale:
213 | payload["locale"] = locale
214 |
215 | response_json = await client.scrape(payload)
216 |
217 | return get_content(response_json, parse=parse, output_format=output_format)
218 | except MCPServerError as e:
219 | return await e.process()
220 |
```
--------------------------------------------------------------------------------
/tests/integration/test_ai_studio_tools.py:
--------------------------------------------------------------------------------
```python
1 | import json
2 | from unittest.mock import AsyncMock, MagicMock
3 |
4 | import pytest
5 | from fastmcp import FastMCP
6 | from httpx import Request
7 | from mcp.types import TextContent
8 | from oxylabs_ai_studio.apps.ai_search import AiSearchJob, SearchResult
9 |
10 | from tests.integration import params
11 | from tests.integration.params import SimpleSchema
12 |
13 |
14 | @pytest.mark.parametrize(
15 | ("arguments", "expectation", "response_data", "expected_result"),
16 | [
17 | params.AI_STUDIO_URL_ONLY,
18 | params.AI_STUDIO_URL_AND_OUTPUT_FORMAT,
19 | params.AI_STUDIO_URL_AND_SCHEMA,
20 | params.AI_STUDIO_URL_AND_RENDER_JAVASCRIPT,
21 | params.AI_STUDIO_URL_AND_RETURN_SOURCES_LIMIT,
22 | params.AI_STUDIO_URL_AND_GEO_LOCATION,
23 | ],
24 | )
25 | @pytest.mark.asyncio
26 | async def test_ai_crawler(
27 | mcp: FastMCP,
28 | request_data: Request,
29 | response_data: str,
30 | arguments: dict,
31 | expectation,
32 | expected_result: str,
33 | oxylabs_client: AsyncMock,
34 | ai_crawler: AsyncMock,
35 | ):
36 | mock_result = MagicMock()
37 | mock_result.data = expected_result
38 | ai_crawler.crawl_async = AsyncMock(return_value=mock_result)
39 |
40 | arguments = {"user_prompt": "Scrape price and title", **arguments}
41 |
42 | with expectation:
43 | result = await mcp._call_tool("ai_crawler", arguments=arguments)
44 |
45 | assert result.content == [
46 | TextContent(type="text", text=json.dumps({"data": expected_result}))
47 | ]
48 |
49 | default_args = {
50 | "geo_location": None,
51 | "output_format": "markdown",
52 | "render_javascript": False,
53 | "return_sources_limit": 25,
54 | "schema": None,
55 | }
56 | default_args = {k: v for k, v in default_args.items() if k not in arguments}
57 |
58 | ai_crawler.crawl_async.assert_called_once_with(**default_args, **arguments)
59 |
60 |
61 | @pytest.mark.parametrize(
62 | ("arguments", "expectation", "response_data", "expected_result"),
63 | [
64 | params.AI_STUDIO_URL_ONLY,
65 | params.AI_STUDIO_URL_AND_OUTPUT_FORMAT,
66 | params.AI_STUDIO_URL_AND_SCHEMA,
67 | params.AI_STUDIO_URL_AND_RENDER_JAVASCRIPT,
68 | params.AI_STUDIO_URL_AND_GEO_LOCATION,
69 | ],
70 | )
71 | @pytest.mark.asyncio
72 | async def test_ai_scraper(
73 | mcp: FastMCP,
74 | request_data: Request,
75 | response_data: str,
76 | arguments: dict,
77 | expectation,
78 | expected_result: str,
79 | oxylabs_client: AsyncMock,
80 | ai_scraper: AsyncMock,
81 | ):
82 | mock_result = MagicMock()
83 | mock_result.data = expected_result
84 | ai_scraper.scrape_async = AsyncMock(return_value=mock_result)
85 |
86 | arguments = {**arguments}
87 |
88 | with expectation:
89 | result = await mcp._call_tool("ai_scraper", arguments=arguments)
90 |
91 | assert result.content == [
92 | TextContent(type="text", text=json.dumps({"data": expected_result}))
93 | ]
94 |
95 | default_args = {
96 | "geo_location": None,
97 | "output_format": "markdown",
98 | "render_javascript": False,
99 | "schema": None,
100 | }
101 | default_args = {k: v for k, v in default_args.items() if k not in arguments}
102 |
103 | ai_scraper.scrape_async.assert_called_once_with(**default_args, **arguments)
104 |
105 |
106 | @pytest.mark.parametrize(
107 | ("arguments", "expectation", "response_data", "expected_result"),
108 | [
109 | params.AI_STUDIO_URL_ONLY,
110 | params.AI_STUDIO_URL_AND_OUTPUT_FORMAT,
111 | params.AI_STUDIO_URL_AND_SCHEMA,
112 | params.AI_STUDIO_URL_AND_GEO_LOCATION,
113 | ],
114 | )
115 | @pytest.mark.asyncio
116 | async def test_ai_browser_agent(
117 | mcp: FastMCP,
118 | request_data: Request,
119 | response_data: str,
120 | arguments: dict,
121 | expectation,
122 | expected_result: str,
123 | oxylabs_client: AsyncMock,
124 | browser_agent: AsyncMock,
125 | ):
126 | mock_result = MagicMock()
127 | mock_data = SimpleSchema(title="Title", price=0.0)
128 | mock_result.data = mock_data
129 | browser_agent.run_async = AsyncMock(return_value=mock_result)
130 |
131 | arguments = {"task_prompt": "Scrape price and title", **arguments}
132 |
133 | with expectation:
134 | result = await mcp._call_tool("ai_browser_agent", arguments=arguments)
135 |
136 | assert result.content == [
137 | TextContent(type="text", text=json.dumps({"data": mock_data.model_dump()}))
138 | ]
139 |
140 | default_args = {
141 | "geo_location": None,
142 | "output_format": "markdown",
143 | "schema": None,
144 | "user_prompt": arguments["task_prompt"],
145 | }
146 | del arguments["task_prompt"]
147 | default_args = {k: v for k, v in default_args.items() if k not in arguments}
148 |
149 | browser_agent.run_async.assert_called_once_with(**default_args, **arguments)
150 |
151 |
152 | @pytest.mark.parametrize(
153 | ("arguments", "expectation", "response_data", "expected_result"),
154 | [
155 | params.AI_STUDIO_QUERY_ONLY,
156 | params.AI_STUDIO_URL_AND_RENDER_JAVASCRIPT,
157 | params.AI_STUDIO_URL_AND_GEO_LOCATION,
158 | params.AI_STUDIO_URL_AND_LIMIT,
159 | params.AI_STUDIO_QUERY_AND_RETURN_CONTENT,
160 | ],
161 | )
162 | @pytest.mark.asyncio
163 | async def test_ai_search(
164 | mcp: FastMCP,
165 | request_data: Request,
166 | response_data: str,
167 | arguments: dict,
168 | expectation,
169 | expected_result: str,
170 | oxylabs_client: AsyncMock,
171 | ai_search: AsyncMock,
172 | ):
173 | mock_result = AiSearchJob(
174 | run_id="123",
175 | data=[SearchResult(url="url", title="title", description="description", content=None)],
176 | )
177 | ai_search.search_async = AsyncMock(return_value=mock_result)
178 |
179 | arguments = {**arguments}
180 | if "url" in arguments:
181 | del arguments["url"]
182 | arguments["query"] = "Sample query"
183 |
184 | with expectation:
185 | result = await mcp._call_tool("ai_search", arguments=arguments)
186 |
187 | assert result.content == [
188 | TextContent(type="text", text=json.dumps({"data": [mock_result.data[0].model_dump()]}))
189 | ]
190 |
191 | default_args = {
192 | "limit": 10,
193 | "render_javascript": False,
194 | "return_content": False,
195 | "geo_location": None,
196 | }
197 | default_args = {k: v for k, v in default_args.items() if k not in arguments}
198 |
199 | ai_search.search_async.assert_called_once_with(**default_args, **arguments)
200 |
201 |
202 | @pytest.mark.parametrize(
203 | ("arguments", "expectation", "response_data", "expected_result"),
204 | [
205 | params.AI_STUDIO_USER_PROMPT,
206 | ],
207 | )
208 | @pytest.mark.parametrize(
209 | "app_name",
210 | ["ai_crawler", "ai_scraper", "browser_agent"],
211 | )
212 | @pytest.mark.asyncio
213 | async def test_generate_schema(
214 | mcp: FastMCP,
215 | request_data: Request,
216 | response_data: str,
217 | arguments: dict,
218 | expectation,
219 | expected_result: str,
220 | oxylabs_client: AsyncMock,
221 | app_name: str,
222 | ai_crawler: AsyncMock,
223 | ai_scraper: AsyncMock,
224 | browser_agent: AsyncMock,
225 | mock_schema: dict,
226 | ):
227 | arguments = {"app_name": app_name, **arguments}
228 |
229 | with expectation:
230 | result = await mcp._call_tool("generate_schema", arguments=arguments)
231 |
232 | assert result.content == [TextContent(type="text", text=json.dumps({"data": mock_schema}))]
233 |
234 | locals()[app_name].generate_schema.assert_called_once_with(prompt=arguments["user_prompt"])
235 |
236 |
237 | @pytest.mark.parametrize(
238 | ("arguments", "expectation", "response_data", "expected_result"),
239 | [
240 | params.AI_STUDIO_URL_ONLY,
241 | params.AI_STUDIO_URL_AND_RENDER_JAVASCRIPT,
242 | params.AI_STUDIO_URL_AND_RETURN_SOURCES_LIMIT,
243 | params.AI_STUDIO_URL_AND_GEO_LOCATION,
244 | ],
245 | )
246 | @pytest.mark.asyncio
247 | async def test_ai_map(
248 | mcp: FastMCP,
249 | request_data: Request,
250 | response_data: str,
251 | arguments: dict,
252 | expectation,
253 | expected_result: str,
254 | oxylabs_client: AsyncMock,
255 | ai_map: AsyncMock,
256 | ):
257 | mock_result = MagicMock()
258 | mock_result.data = expected_result
259 | ai_map.map_async = AsyncMock(return_value=mock_result)
260 |
261 | arguments = {"user_prompt": "Scrape price and title", **arguments}
262 |
263 | with expectation:
264 | result = await mcp._call_tool("ai_map", arguments=arguments)
265 |
266 | assert result.content == [
267 | TextContent(type="text", text=json.dumps({"data": expected_result}))
268 | ]
269 |
270 | default_args = {
271 | "geo_location": None,
272 | "render_javascript": False,
273 | "return_sources_limit": 25,
274 | }
275 | default_args = {k: v for k, v in default_args.items() if k not in arguments}
276 |
277 | ai_map.map_async.assert_called_once_with(**default_args, **arguments)
278 |
```
--------------------------------------------------------------------------------
/tests/integration/params.py:
--------------------------------------------------------------------------------
```python
1 | from contextlib import nullcontext as does_not_raise
2 |
3 | import pytest
4 | from fastmcp.exceptions import ToolError
5 | from pydantic import BaseModel
6 |
7 |
8 | class SimpleSchema(BaseModel):
9 | title: str
10 | price: float
11 |
12 |
13 | JOB_RESPONSE = {"id": "7333092420940211201", "status": "done"}
14 | STR_RESPONSE = {
15 | "results": [{"content": "Mocked content"}],
16 | "job": JOB_RESPONSE,
17 | }
18 | JSON_RESPONSE = {
19 | "results": [{"content": {"data": "value"}}],
20 | "job": JOB_RESPONSE,
21 | }
22 | AI_STUDIO_JSON_RESPONSE = {
23 | "results": [{"content": {"data": "value"}}],
24 | "job": JOB_RESPONSE,
25 | }
26 |
27 | QUERY_ONLY = pytest.param(
28 | {"query": "Generic query"},
29 | does_not_raise(),
30 | STR_RESPONSE,
31 | "\n\nMocked content\n\n",
32 | id="query-only-args",
33 | )
34 | PARSE_ENABLED = pytest.param(
35 | {"query": "Generic query", "parse": True},
36 | does_not_raise(),
37 | JSON_RESPONSE,
38 | '{"data": "value"}',
39 | id="parse-enabled-args",
40 | )
41 | RENDER_HTML_WITH_QUERY = pytest.param(
42 | {"query": "Generic query", "render": "html"},
43 | does_not_raise(),
44 | STR_RESPONSE,
45 | "\n\nMocked content\n\n",
46 | id="render-enabled-args",
47 | )
48 | RENDER_INVALID_WITH_QUERY = pytest.param(
49 | {"query": "Generic query", "render": "png"},
50 | pytest.raises(ToolError),
51 | STR_RESPONSE,
52 | None,
53 | id="render-enabled-args",
54 | )
55 | OUTPUT_FORMATS = [
56 | pytest.param(
57 | {"query": "Generic query", "output_format": "links"},
58 | does_not_raise(),
59 | {
60 | "results": [
61 | {
62 | "content": '<html><body><div><p><a href="https://example.com">link</a></p></div></body></html>'
63 | }
64 | ],
65 | "job": JOB_RESPONSE,
66 | },
67 | "[link] https://example.com",
68 | id="links-output-format-args",
69 | ),
70 | pytest.param(
71 | {"query": "Generic query", "output_format": "md"},
72 | does_not_raise(),
73 | STR_RESPONSE,
74 | "\n\nMocked content\n\n",
75 | id="md-output-format-args",
76 | ),
77 | pytest.param(
78 | {"query": "Generic query", "output_format": "html"},
79 | does_not_raise(),
80 | STR_RESPONSE,
81 | "Mocked content",
82 | id="html-output-format-args",
83 | ),
84 | ]
85 | USER_AGENTS_WITH_QUERY = [
86 | pytest.param(
87 | {"query": "Generic query", "user_agent_type": uat},
88 | does_not_raise(),
89 | STR_RESPONSE,
90 | "\n\nMocked content\n\n",
91 | id=f"{uat}-user-agent-specified-args",
92 | )
93 | for uat in [
94 | "desktop",
95 | "desktop_chrome",
96 | "desktop_firefox",
97 | "desktop_safari",
98 | "desktop_edge",
99 | "desktop_opera",
100 | "mobile",
101 | "mobile_ios",
102 | "mobile_android",
103 | "tablet",
104 | ]
105 | ]
106 | USER_AGENTS_WITH_URL = [
107 | pytest.param(
108 | {"url": "https://example.com", "user_agent_type": uat},
109 | does_not_raise(),
110 | STR_RESPONSE,
111 | "\n\nMocked content\n\n",
112 | id=f"{uat}-user-agent-specified-args",
113 | )
114 | for uat in [
115 | "desktop",
116 | "desktop_chrome",
117 | "desktop_firefox",
118 | "desktop_safari",
119 | "desktop_edge",
120 | "desktop_opera",
121 | "mobile",
122 | "mobile_ios",
123 | "mobile_android",
124 | "tablet",
125 | ]
126 | ]
127 | INVALID_USER_AGENT = pytest.param(
128 | {"query": "Generic query", "user_agent_type": "invalid"},
129 | pytest.raises(ToolError),
130 | STR_RESPONSE,
131 | "Mocked content",
132 | id="invalid-user-agent-specified-args",
133 | )
134 | START_PAGE_SPECIFIED = pytest.param(
135 | {"query": "Generic query", "start_page": 2},
136 | does_not_raise(),
137 | JSON_RESPONSE,
138 | '{"data": "value"}',
139 | id="start-page-specified-args",
140 | )
141 | START_PAGE_INVALID = pytest.param(
142 | {"query": "Generic query", "start_page": -1},
143 | pytest.raises(ToolError),
144 | JSON_RESPONSE,
145 | '{"data": "value"}',
146 | id="start-page-invalid-args",
147 | )
148 | PAGES_SPECIFIED = pytest.param(
149 | {"query": "Generic query", "pages": 20},
150 | does_not_raise(),
151 | JSON_RESPONSE,
152 | '{"data": "value"}',
153 | id="pages-specified-args",
154 | )
155 | PAGES_INVALID = pytest.param(
156 | {"query": "Generic query", "pages": -10},
157 | pytest.raises(ToolError),
158 | JSON_RESPONSE,
159 | '{"data": "value"}',
160 | id="pages-invalid-args",
161 | )
162 | LIMIT_SPECIFIED = pytest.param(
163 | {"query": "Generic query", "limit": 100},
164 | does_not_raise(),
165 | JSON_RESPONSE,
166 | '{"data": "value"}',
167 | id="limit-specified-args",
168 | )
169 | LIMIT_INVALID = pytest.param(
170 | {"query": "Generic query", "limit": 0},
171 | pytest.raises(ToolError),
172 | JSON_RESPONSE,
173 | '{"data": "value"}',
174 | id="limit-invalid-args",
175 | )
176 | DOMAIN_SPECIFIED = pytest.param(
177 | {"query": "Generic query", "domain": "io"},
178 | does_not_raise(),
179 | JSON_RESPONSE,
180 | '{"data": "value"}',
181 | id="domain-specified-args",
182 | )
183 | GEO_LOCATION_SPECIFIED_WITH_QUERY = pytest.param(
184 | {"query": "Generic query", "geo_location": "Miami, Florida"},
185 | does_not_raise(),
186 | JSON_RESPONSE,
187 | '{"data": "value"}',
188 | id="geo-location-specified-args",
189 | )
190 | GEO_LOCATION_SPECIFIED_WITH_URL = pytest.param(
191 | {"url": "https://example.com", "geo_location": "Miami, Florida"},
192 | does_not_raise(),
193 | STR_RESPONSE,
194 | "\n\nMocked content\n\n",
195 | id="geo-location-specified-args",
196 | )
197 | LOCALE_SPECIFIED = pytest.param(
198 | {"query": "Generic query", "locale": "ja_JP"},
199 | does_not_raise(),
200 | JSON_RESPONSE,
201 | '{"data": "value"}',
202 | id="locale-specified-args",
203 | )
204 | CATEGORY_SPECIFIED = pytest.param(
205 | {"query": "Man's T-shirt", "category_id": "QE21R9AV"},
206 | does_not_raise(),
207 | JSON_RESPONSE,
208 | '{"data": "value"}',
209 | id="category-id-specified-args",
210 | )
211 | MERCHANT_ID_SPECIFIED = pytest.param(
212 | {"query": "Man's T-shirt", "merchant_id": "QE21R9AV"},
213 | does_not_raise(),
214 | JSON_RESPONSE,
215 | '{"data": "value"}',
216 | id="merchant-id-specified-args",
217 | )
218 | CURRENCY_SPECIFIED = pytest.param(
219 | {"query": "Man's T-shirt", "currency": "USD"},
220 | does_not_raise(),
221 | JSON_RESPONSE,
222 | '{"data": "value"}',
223 | id="currency-specified-args",
224 | )
225 | AUTOSELECT_VARIANT_ENABLED = pytest.param(
226 | {"query": "B0BVF87BST", "autoselect_variant": True},
227 | does_not_raise(),
228 | JSON_RESPONSE,
229 | '{"data": "value"}',
230 | id="autoselect-variant-enabled-args",
231 | )
232 | URL_ONLY = pytest.param(
233 | {"url": "https://example.com"},
234 | does_not_raise(),
235 | STR_RESPONSE,
236 | "\n\nMocked content\n\n",
237 | id="url-only-args",
238 | )
239 | NO_URL = pytest.param(
240 | {},
241 | pytest.raises(ToolError),
242 | STR_RESPONSE,
243 | "\n\nMocked content\n\n",
244 | id="no-url-args",
245 | )
246 | RENDER_HTML_WITH_URL = pytest.param(
247 | {"url": "https://example.com", "render": "html"},
248 | does_not_raise(),
249 | STR_RESPONSE,
250 | "\n\nMocked content\n\n",
251 | id="render-enabled-args",
252 | )
253 | RENDER_INVALID_WITH_URL = pytest.param(
254 | {"url": "https://example.com", "render": "png"},
255 | pytest.raises(ToolError),
256 | JSON_RESPONSE,
257 | None,
258 | id="render-enabled-args",
259 | )
260 | AI_STUDIO_URL_ONLY = pytest.param(
261 | {"url": "https://example.com"},
262 | does_not_raise(),
263 | AI_STUDIO_JSON_RESPONSE,
264 | {"data": "value"},
265 | id="url-with-user-prompt-args",
266 | )
267 | AI_STUDIO_QUERY_ONLY = pytest.param(
268 | {"query": "Generic query"},
269 | does_not_raise(),
270 | AI_STUDIO_JSON_RESPONSE,
271 | {"data": "value"},
272 | id="url-with-user-prompt-args",
273 | )
274 | AI_STUDIO_URL_AND_OUTPUT_FORMAT = pytest.param(
275 | {"url": "https://example.com", "output_format": "json"},
276 | does_not_raise(),
277 | AI_STUDIO_JSON_RESPONSE,
278 | {"data": "value"},
279 | id="url-with-user-prompt-and-output-format-args",
280 | )
281 | AI_STUDIO_URL_AND_SCHEMA = pytest.param(
282 | {
283 | "url": "https://example.com",
284 | "schema": SimpleSchema.model_json_schema(),
285 | },
286 | does_not_raise(),
287 | AI_STUDIO_JSON_RESPONSE,
288 | {"data": "value"},
289 | id="url-with-user-prompt-and-schema-args",
290 | )
291 | AI_STUDIO_URL_AND_RENDER_JAVASCRIPT = pytest.param(
292 | {
293 | "url": "https://example.com",
294 | "render_javascript": True,
295 | },
296 | does_not_raise(),
297 | AI_STUDIO_JSON_RESPONSE,
298 | {"data": "value"},
299 | id="url-with-user-prompt-and-render-js-args",
300 | )
301 | AI_STUDIO_QUERY_AND_RETURN_CONTENT = pytest.param(
302 | {
303 | "url": "https://example.com",
304 | "return_content": True,
305 | },
306 | does_not_raise(),
307 | AI_STUDIO_JSON_RESPONSE,
308 | {"data": "value"},
309 | id="url-with-user-prompt-and-return-content-args",
310 | )
311 | AI_STUDIO_URL_AND_RETURN_SOURCES_LIMIT = pytest.param(
312 | {
313 | "url": "https://example.com",
314 | "return_sources_limit": 10,
315 | },
316 | does_not_raise(),
317 | AI_STUDIO_JSON_RESPONSE,
318 | {"data": "value"},
319 | id="url-with-user-prompt-and-return-sources-limit-args",
320 | )
321 | AI_STUDIO_URL_AND_GEO_LOCATION = pytest.param(
322 | {
323 | "url": "https://example.com",
324 | "geo_location": "US",
325 | },
326 | does_not_raise(),
327 | AI_STUDIO_JSON_RESPONSE,
328 | {"data": "value"},
329 | id="url-with-user-prompt-and-geo_location-args",
330 | )
331 | AI_STUDIO_URL_AND_LIMIT = pytest.param(
332 | {
333 | "url": "https://example.com",
334 | "limit": 5,
335 | },
336 | does_not_raise(),
337 | AI_STUDIO_JSON_RESPONSE,
338 | {"data": "value"},
339 | id="url-with-user-prompt-and-limit-args",
340 | )
341 | AI_STUDIO_USER_PROMPT = pytest.param(
342 | {
343 | "user_prompt": "Scrape price and title",
344 | },
345 | does_not_raise(),
346 | AI_STUDIO_JSON_RESPONSE,
347 | {"data": "value"},
348 | id="user-prompt-args",
349 | )
350 |
```
--------------------------------------------------------------------------------
/src/oxylabs_mcp/utils.py:
--------------------------------------------------------------------------------
```python
1 | import json
2 | import logging
3 | import os
4 | import re
5 | import typing
6 | from contextlib import asynccontextmanager
7 | from importlib.metadata import version
8 | from platform import architecture, python_version
9 | from typing import AsyncIterator
10 |
11 | from fastmcp.server.dependencies import get_context
12 | from httpx import (
13 | AsyncClient,
14 | BasicAuth,
15 | HTTPStatusError,
16 | RequestError,
17 | Timeout,
18 | )
19 | from lxml.html import defs, fromstring, tostring
20 | from lxml.html.clean import Cleaner
21 | from markdownify import markdownify
22 | from mcp.server.fastmcp import Context
23 | from mcp.shared.context import RequestContext
24 | from oxylabs_ai_studio.utils import is_api_key_valid # type: ignore[import-untyped]
25 | from starlette import status
26 |
27 | from oxylabs_mcp.config import settings
28 | from oxylabs_mcp.exceptions import MCPServerError
29 |
30 |
31 | logger = logging.getLogger(__name__)
32 |
33 | USERNAME_ENV = "OXYLABS_USERNAME"
34 | PASSWORD_ENV = "OXYLABS_PASSWORD" # noqa: S105 # nosec
35 | AI_STUDIO_API_KEY_ENV = "OXYLABS_AI_STUDIO_API_KEY"
36 |
37 | USERNAME_HEADER = "X-Oxylabs-Username"
38 | PASSWORD_HEADER = "X-Oxylabs-Password" # noqa: S105 # nosec
39 | AI_STUDIO_API_KEY_HEADER = "X-Oxylabs-AI-Studio-Api-Key"
40 |
41 | USERNAME_QUERY_PARAM = "oxylabsUsername"
42 | PASSWORD_QUERY_PARAM = "oxylabsPassword" # noqa: S105 # nosec
43 | AI_STUDIO_API_KEY_QUERY_PARAM = "oxylabsAiStudioApiKey"
44 |
45 |
46 | def clean_html(html: str) -> str:
47 | """Clean an HTML string."""
48 | cleaner = Cleaner(
49 | scripts=True,
50 | javascript=True,
51 | style=True,
52 | remove_tags=[],
53 | kill_tags=["nav", "svg", "footer", "noscript", "script", "form"],
54 | safe_attrs=list(defs.safe_attrs) + ["idx"],
55 | comments=True,
56 | inline_style=True,
57 | links=True,
58 | meta=False,
59 | page_structure=False,
60 | embedded=True,
61 | frames=False,
62 | forms=False,
63 | annoying_tags=False,
64 | )
65 | return cleaner.clean_html(html) # type: ignore[no-any-return]
66 |
67 |
68 | def strip_html(html: str) -> str:
69 | """Simplify an HTML string.
70 |
71 | Will remove unwanted elements, attributes, and redundant content
72 | Args:
73 | html (str): The input HTML string.
74 |
75 | Returns:
76 | str: The cleaned and simplified HTML string.
77 |
78 | """
79 | cleaned_html = clean_html(html)
80 | html_tree = fromstring(cleaned_html)
81 |
82 | for element in html_tree.iter():
83 | # Remove style attributes.
84 | if "style" in element.attrib:
85 | del element.attrib["style"]
86 |
87 | # Remove elements that have no attributes, no content and no children.
88 | if (
89 | (not element.attrib or (len(element.attrib) == 1 and "idx" in element.attrib))
90 | and not element.getchildren() # type: ignore[attr-defined]
91 | and (not element.text or not element.text.strip())
92 | and (not element.tail or not element.tail.strip())
93 | ):
94 | parent = element.getparent()
95 | if parent is not None:
96 | parent.remove(element)
97 |
98 | # Remove elements with footer and hidden in class or id
99 | xpath_query = (
100 | ".//*[contains(@class, 'footer') or contains(@id, 'footer') or "
101 | "contains(@class, 'hidden') or contains(@id, 'hidden')]"
102 | )
103 | elements_to_remove = html_tree.xpath(xpath_query)
104 | for element in elements_to_remove: # type: ignore[assignment, union-attr]
105 | parent = element.getparent()
106 | if parent is not None:
107 | parent.remove(element)
108 |
109 | # Serialize the HTML tree back to a string
110 | stripped_html = tostring(html_tree, encoding="unicode")
111 | # Previous cleaning produces empty spaces.
112 | # Replace multiple spaces with a single one
113 | stripped_html = re.sub(r"\s{2,}", " ", stripped_html)
114 | # Replace consecutive newlines with an empty string
115 | stripped_html = re.sub(r"\n{2,}", "", stripped_html)
116 | return stripped_html
117 |
118 |
119 | def _get_request_context(ctx: Context) -> RequestContext | None: # type: ignore[type-arg]
120 | try:
121 | return ctx.request_context
122 | except ValueError:
123 | return None
124 |
125 |
126 | def _get_default_headers() -> dict[str, str]:
127 | headers = {}
128 | if request_ctx := get_context().request_context:
129 | if client_params := request_ctx.session.client_params:
130 | client = f"oxylabs-mcp-{client_params.clientInfo.name}"
131 | else:
132 | client = "oxylabs-mcp"
133 | else:
134 | client = "oxylabs-mcp"
135 |
136 | bits, _ = architecture()
137 | sdk_type = f"{client}/{version('oxylabs-mcp')} ({python_version()}; {bits})"
138 |
139 | headers["x-oxylabs-sdk"] = sdk_type
140 |
141 | return headers
142 |
143 |
144 | class _OxylabsClientWrapper:
145 | def __init__(
146 | self,
147 | client: AsyncClient,
148 | ) -> None:
149 | self._client = client
150 | self._ctx = get_context()
151 |
152 | async def scrape(self, payload: dict[str, typing.Any]) -> dict[str, typing.Any]:
153 | await self._ctx.info(f"Create job with params: {json.dumps(payload)}")
154 |
155 | response = await self._client.post(settings.OXYLABS_SCRAPER_URL, json=payload)
156 | response_json: dict[str, typing.Any] = response.json()
157 |
158 | if response.status_code == status.HTTP_201_CREATED:
159 | await self._ctx.info(
160 | f"Job info: "
161 | f"job_id={response_json['job']['id']} "
162 | f"job_status={response_json['job']['status']}"
163 | )
164 |
165 | response.raise_for_status()
166 |
167 | return response_json
168 |
169 |
170 | def get_oxylabs_auth() -> tuple[str | None, str | None]:
171 | """Extract the Oxylabs credentials."""
172 | if settings.MCP_TRANSPORT == "streamable-http":
173 | request_headers = dict(get_context().request_context.request.headers) # type: ignore[union-attr]
174 | username = request_headers.get(USERNAME_HEADER.lower())
175 | password = request_headers.get(PASSWORD_HEADER.lower())
176 | if not username or not password:
177 | query_params = get_context().request_context.request.query_params # type: ignore[union-attr]
178 | username = query_params.get(USERNAME_QUERY_PARAM)
179 | password = query_params.get(PASSWORD_QUERY_PARAM)
180 | else:
181 | username = os.environ.get(USERNAME_ENV)
182 | password = os.environ.get(PASSWORD_ENV)
183 |
184 | return username, password
185 |
186 |
187 | def get_oxylabs_ai_studio_api_key() -> str | None:
188 | """Extract the Oxylabs AI Studio API key."""
189 | if settings.MCP_TRANSPORT == "streamable-http":
190 | request_headers = dict(get_context().request_context.request.headers) # type: ignore[union-attr]
191 | ai_studio_api_key = request_headers.get(AI_STUDIO_API_KEY_HEADER.lower())
192 | if not ai_studio_api_key:
193 | query_params = get_context().request_context.request.query_params # type: ignore[union-attr]
194 | ai_studio_api_key = query_params.get(AI_STUDIO_API_KEY_QUERY_PARAM)
195 | else:
196 | ai_studio_api_key = os.getenv(AI_STUDIO_API_KEY_ENV)
197 |
198 | return ai_studio_api_key
199 |
200 |
201 | @asynccontextmanager
202 | async def oxylabs_client() -> AsyncIterator[_OxylabsClientWrapper]:
203 | """Async context manager for Oxylabs client that is used in MCP tools."""
204 | headers = _get_default_headers()
205 |
206 | username, password = get_oxylabs_auth()
207 |
208 | if not username or not password:
209 | raise ValueError("Oxylabs username and password must be set.")
210 |
211 | auth = BasicAuth(username=username, password=password)
212 |
213 | async with AsyncClient(
214 | timeout=Timeout(settings.OXYLABS_REQUEST_TIMEOUT_S),
215 | verify=True,
216 | headers=headers,
217 | auth=auth,
218 | ) as client:
219 | try:
220 | yield _OxylabsClientWrapper(client)
221 | except HTTPStatusError as e:
222 | raise MCPServerError(
223 | f"HTTP error during POST request: {e.response.status_code} - {e.response.text}"
224 | ) from None
225 | except RequestError as e:
226 | raise MCPServerError(f"Request error during POST request: {e}") from None
227 | except Exception as e:
228 | raise MCPServerError(f"Error: {str(e) or repr(e)}") from None
229 |
230 |
231 | def get_and_verify_oxylabs_ai_studio_api_key() -> str:
232 | """Extract and varify the Oxylabs AI Studio API key."""
233 | ai_studio_api_key = get_oxylabs_ai_studio_api_key()
234 |
235 | if ai_studio_api_key is None:
236 | msg = "AI Studio API key is not set"
237 | logger.warning(msg)
238 | raise ValueError(msg)
239 | if not is_api_key_valid(ai_studio_api_key):
240 | raise ValueError("AI Studio API key is not valid")
241 |
242 | return ai_studio_api_key
243 |
244 |
245 | def extract_links_with_text(html: str, base_url: str | None = None) -> list[str]:
246 | """Extract links with their display text from HTML.
247 |
248 | Args:
249 | html (str): The input HTML string.
250 | base_url (str | None): Base URL to use for converting relative URLs to absolute.
251 | If None, relative URLs will remain as is.
252 |
253 | Returns:
254 | list[str]: List of links in format [Display Text] URL
255 |
256 | """
257 | html_tree = fromstring(html)
258 | links = []
259 |
260 | for link in html_tree.xpath("//a[@href]"): # type: ignore[union-attr]
261 | href = link.get("href") # type: ignore[union-attr]
262 | text = link.text_content().strip() # type: ignore[union-attr]
263 |
264 | if href and text:
265 | # Skip empty or whitespace-only text
266 | if not text:
267 | continue
268 |
269 | # Skip anchor links
270 | if href.startswith("#"):
271 | continue
272 |
273 | # Skip javascript links
274 | if href.startswith("javascript:"):
275 | continue
276 |
277 | # Make relative URLs absolute if base_url is provided
278 | if base_url and href.startswith("/"):
279 | # Remove trailing slash from base_url if present
280 | base = base_url.rstrip("/")
281 | href = f"{base}{href}"
282 |
283 | links.append(f"[{text}] {href}")
284 |
285 | return links
286 |
287 |
288 | def get_content(
289 | response_json: dict[str, typing.Any],
290 | *,
291 | output_format: str,
292 | parse: bool = False,
293 | ) -> str:
294 | """Extract content from response and convert to a proper format."""
295 | content = response_json["results"][0]["content"]
296 | if parse and isinstance(content, dict):
297 | return json.dumps(content)
298 | if output_format == "html":
299 | return str(content)
300 | if output_format == "links":
301 | links = extract_links_with_text(str(content))
302 | return "\n".join(links)
303 |
304 | stripped_html = clean_html(str(content))
305 | return markdownify(stripped_html) # type: ignore[no-any-return]
306 |
```
--------------------------------------------------------------------------------
/src/oxylabs_mcp/tools/ai_studio.py:
--------------------------------------------------------------------------------
```python
1 | # mypy: disable-error-code=import-untyped
2 | import json
3 | import logging
4 | from typing import Annotated, Any, Literal
5 |
6 | from fastmcp import FastMCP
7 | from mcp.types import ToolAnnotations
8 | from oxylabs_ai_studio.apps.ai_crawler import AiCrawler
9 | from oxylabs_ai_studio.apps.ai_map import AiMap
10 | from oxylabs_ai_studio.apps.ai_scraper import AiScraper
11 | from oxylabs_ai_studio.apps.ai_search import AiSearch
12 | from oxylabs_ai_studio.apps.browser_agent import BrowserAgent
13 | from pydantic import Field
14 |
15 | from oxylabs_mcp.tools.misc import setup
16 | from oxylabs_mcp.utils import get_and_verify_oxylabs_ai_studio_api_key
17 |
18 |
19 | setup()
20 | logger = logging.getLogger(__name__)
21 |
22 |
23 | AI_TOOLS = [
24 | "generate_schema",
25 | "ai_search",
26 | "ai_scraper",
27 | "ai_crawler",
28 | "ai_browser_agent",
29 | "ai_map",
30 | ]
31 |
32 |
33 | mcp = FastMCP("ai_studio")
34 |
35 |
36 | @mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
37 | async def ai_crawler(
38 | url: Annotated[str, Field(description="The URL from which crawling will be started.")],
39 | user_prompt: Annotated[
40 | str,
41 | Field(description="What information user wants to extract from the domain."),
42 | ],
43 | output_format: Annotated[
44 | Literal["json", "markdown", "csv"],
45 | Field(
46 | description=(
47 | "The format of the output. If json or csv, the schema is required. "
48 | "Markdown returns full text of the page. CSV returns data in CSV format."
49 | )
50 | ),
51 | ] = "markdown",
52 | schema: Annotated[
53 | dict[str, Any] | None,
54 | Field(
55 | description="The schema to use for the crawl. Required if output_format is json or csv."
56 | ),
57 | ] = None,
58 | render_javascript: Annotated[ # noqa: FBT002
59 | bool,
60 | Field(
61 | description=(
62 | "Whether to render the HTML of the page using javascript. Much slower, "
63 | "therefore use it only for websites "
64 | "that require javascript to render the page. "
65 | "Unless user asks to use it, first try to crawl the page without it. "
66 | "If results are unsatisfactory, try to use it."
67 | )
68 | ),
69 | ] = False,
70 | return_sources_limit: Annotated[
71 | int, Field(description="The maximum number of sources to return.", le=50)
72 | ] = 25,
73 | geo_location: Annotated[
74 | str | None,
75 | Field(description="Two letter ISO country code to use for the crawl proxy."),
76 | ] = None,
77 | ) -> str:
78 | """Tool useful for crawling a website from starting url and returning data in a specified format.
79 |
80 | Schema is required only if output_format is json.
81 | 'render_javascript' is used to render javascript heavy websites.
82 | 'return_sources_limit' is used to limit the number of sources to return,
83 | for example if you expect results from single source, you can set it to 1.
84 | """ # noqa: E501
85 | logger.info(
86 | f"Calling ai_crawler with: {url=}, {user_prompt=}, "
87 | f"{output_format=}, {schema=}, {render_javascript=}, "
88 | f"{return_sources_limit=}"
89 | )
90 | crawler = AiCrawler(api_key=get_and_verify_oxylabs_ai_studio_api_key())
91 | result = await crawler.crawl_async(
92 | url=url,
93 | user_prompt=user_prompt,
94 | output_format=output_format,
95 | schema=schema,
96 | render_javascript=render_javascript,
97 | return_sources_limit=return_sources_limit,
98 | geo_location=geo_location,
99 | )
100 | return json.dumps({"data": result.data})
101 |
102 |
103 | @mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
104 | async def ai_scraper(
105 | url: Annotated[str, Field(description="The URL to scrape")],
106 | output_format: Annotated[
107 | Literal["json", "markdown", "csv"],
108 | Field(
109 | description=(
110 | "The format of the output. If json or csv, the schema is required. "
111 | "Markdown returns full text of the page. CSV returns data in CSV format, "
112 | "tabular like data."
113 | )
114 | ),
115 | ] = "markdown",
116 | schema: Annotated[
117 | dict[str, Any] | None,
118 | Field(
119 | description=(
120 | "The schema to use for the scrape. Only required if output_format is json or csv."
121 | )
122 | ),
123 | ] = None,
124 | render_javascript: Annotated[ # noqa: FBT002
125 | bool,
126 | Field(
127 | description=(
128 | "Whether to render the HTML of the page using javascript. "
129 | "Much slower, therefore use it only for websites "
130 | "that require javascript to render the page."
131 | "Unless user asks to use it, first try to scrape the page without it. "
132 | "If results are unsatisfactory, try to use it."
133 | )
134 | ),
135 | ] = False,
136 | geo_location: Annotated[
137 | str | None,
138 | Field(description="Two letter ISO country code to use for the scrape proxy."),
139 | ] = None,
140 | ) -> str:
141 | """Scrape the contents of the web page and return the data in the specified format.
142 |
143 | Schema is required only if output_format is json or csv.
144 | 'render_javascript' is used to render javascript heavy websites.
145 | """
146 | logger.info(
147 | f"Calling ai_scraper with: {url=}, {output_format=}, {schema=}, {render_javascript=}"
148 | )
149 | scraper = AiScraper(api_key=get_and_verify_oxylabs_ai_studio_api_key())
150 | result = await scraper.scrape_async(
151 | url=url,
152 | output_format=output_format,
153 | schema=schema,
154 | render_javascript=render_javascript,
155 | geo_location=geo_location,
156 | )
157 | return json.dumps({"data": result.data})
158 |
159 |
160 | @mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
161 | async def ai_browser_agent(
162 | url: Annotated[str, Field(description="The URL to start the browser agent navigation from.")],
163 | task_prompt: Annotated[str, Field(description="What browser agent should do.")],
164 | output_format: Annotated[
165 | Literal["json", "markdown", "html", "csv"],
166 | Field(
167 | description=(
168 | "The output format. "
169 | "Markdown returns full text of the page including links. "
170 | "If json or csv, the schema is required."
171 | )
172 | ),
173 | ] = "markdown",
174 | schema: Annotated[
175 | dict[str, Any] | None,
176 | Field(
177 | description=(
178 | "The schema to use for the scrape. Only required if output_format is json or csv."
179 | )
180 | ),
181 | ] = None,
182 | geo_location: Annotated[
183 | str | None,
184 | Field(description="Two letter ISO country code to use for the browser proxy."),
185 | ] = None,
186 | ) -> str:
187 | """Run the browser agent and return the data in the specified format.
188 |
189 | This tool is useful if you need navigate around the website and do some actions.
190 | It allows navigating to any url, clicking on links, filling forms, scrolling, etc.
191 | Finally it returns the data in the specified format. Schema is required only if output_format is json or csv.
192 | 'task_prompt' describes what browser agent should achieve
193 | """ # noqa: E501
194 | logger.info(
195 | f"Calling ai_browser_agent with: {url=}, {task_prompt=}, {output_format=}, {schema=}"
196 | )
197 | browser_agent = BrowserAgent(api_key=get_and_verify_oxylabs_ai_studio_api_key())
198 | result = await browser_agent.run_async(
199 | url=url,
200 | user_prompt=task_prompt,
201 | output_format=output_format,
202 | schema=schema,
203 | geo_location=geo_location,
204 | )
205 | data = result.data.model_dump(mode="json") if result.data else None
206 | return json.dumps({"data": data})
207 |
208 |
209 | @mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
210 | async def ai_search(
211 | query: Annotated[str, Field(description="The query to search for.")],
212 | limit: Annotated[int, Field(description="Maximum number of results to return.", le=50)] = 10,
213 | render_javascript: Annotated[ # noqa: FBT002
214 | bool,
215 | Field(
216 | description=(
217 | "Whether to render the HTML of the page using javascript. "
218 | "Much slower, therefore use it only if user asks to use it."
219 | "First try to search with setting it to False. "
220 | )
221 | ),
222 | ] = False,
223 | return_content: Annotated[ # noqa: FBT002
224 | bool,
225 | Field(description="Whether to return markdown content of the search results."),
226 | ] = False,
227 | geo_location: Annotated[
228 | str | None,
229 | Field(description="Two letter ISO country code to use for the search proxy."),
230 | ] = None,
231 | ) -> str:
232 | """Search the web based on a provided query.
233 |
234 | 'return_content' is used to return markdown content for each search result. If 'return_content'
235 | is set to True, you don't need to use ai_scraper to get the content of the search results urls,
236 | because it is already included in the search results.
237 | if 'return_content' is set to True, prefer lower 'limit' to reduce payload size.
238 | """ # noqa: E501
239 | logger.info(
240 | f"Calling ai_search with: {query=}, {limit=}, {render_javascript=}, {return_content=}"
241 | )
242 | search = AiSearch(api_key=get_and_verify_oxylabs_ai_studio_api_key())
243 | result = await search.search_async(
244 | query=query,
245 | limit=limit,
246 | render_javascript=render_javascript,
247 | return_content=return_content,
248 | geo_location=geo_location,
249 | )
250 | data = result.model_dump(mode="json")["data"]
251 | return json.dumps({"data": data})
252 |
253 |
254 | @mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
255 | async def generate_schema(
256 | user_prompt: str,
257 | app_name: Literal["ai_crawler", "ai_scraper", "browser_agent"],
258 | ) -> str:
259 | """Generate a json schema in openapi format."""
260 | if app_name == "ai_crawler":
261 | crawler = AiCrawler(api_key=get_and_verify_oxylabs_ai_studio_api_key())
262 | schema = crawler.generate_schema(prompt=user_prompt)
263 | elif app_name == "ai_scraper":
264 | scraper = AiScraper(api_key=get_and_verify_oxylabs_ai_studio_api_key())
265 | schema = scraper.generate_schema(prompt=user_prompt)
266 | elif app_name == "browser_agent":
267 | browser_agent = BrowserAgent(api_key=get_and_verify_oxylabs_ai_studio_api_key())
268 | schema = browser_agent.generate_schema(prompt=user_prompt)
269 | else:
270 | raise ValueError(f"Invalid app name: {app_name}")
271 |
272 | return json.dumps({"data": schema})
273 |
274 |
275 | @mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
276 | async def ai_map(
277 | url: Annotated[str, Field(description="The URL from which URLs mapping will be started.")],
278 | user_prompt: Annotated[
279 | str,
280 | Field(description="What kind of urls user wants to find."),
281 | ],
282 | render_javascript: Annotated[ # noqa: FBT002
283 | bool,
284 | Field(
285 | description=(
286 | "Whether to render the HTML of the page using javascript. Much slower, "
287 | "therefore use it only for websites "
288 | "that require javascript to render the page. "
289 | "Unless user asks to use it, first try to crawl the page without it. "
290 | "If results are unsatisfactory, try to use it."
291 | )
292 | ),
293 | ] = False,
294 | return_sources_limit: Annotated[
295 | int, Field(description="The maximum number of sources to return.", le=50)
296 | ] = 25,
297 | geo_location: Annotated[
298 | str | None,
299 | Field(description="Two letter ISO country code to use for the mapping proxy."),
300 | ] = None,
301 | ) -> str:
302 | """Tool useful for mapping website's urls.""" # noqa: E501
303 | logger.info(
304 | f"Calling ai_map with: {url=}, {user_prompt=}, "
305 | f"{render_javascript=}, "
306 | f"{return_sources_limit=}"
307 | )
308 | ai_map = AiMap(api_key=get_and_verify_oxylabs_ai_studio_api_key())
309 | result = await ai_map.map_async(
310 | url=url,
311 | user_prompt=user_prompt,
312 | render_javascript=render_javascript,
313 | return_sources_limit=return_sources_limit,
314 | geo_location=geo_location,
315 | )
316 | return json.dumps({"data": result.data})
317 |
```