# Directory Structure
```
├── .github
│ └── workflows
│ ├── lint_and_test.yml
│ └── publish_to_pypi.yml
├── .gitignore
├── CHANGELOG.md
├── Dockerfile
├── LICENSE
├── Makefile
├── pyproject.toml
├── README.md
├── server.json
├── smithery.yaml
├── src
│ └── oxylabs_mcp
│ ├── __init__.py
│ ├── config.py
│ ├── exceptions.py
│ ├── tools
│ │ ├── __init__.py
│ │ ├── ai_studio.py
│ │ ├── misc.py
│ │ └── scraper.py
│ ├── url_params.py
│ └── utils.py
├── tests
│ ├── __init__.py
│ ├── conftest.py
│ ├── e2e
│ │ ├── __init__.py
│ │ ├── conftest.py
│ │ ├── example.env
│ │ ├── test_call_tools.py
│ │ └── test_llm_agent.py
│ ├── integration
│ │ ├── __init__.py
│ │ ├── params.py
│ │ ├── test_ai_studio_tools.py
│ │ ├── test_scraper_tools.py
│ │ └── test_server.py
│ ├── unit
│ │ ├── __init__.py
│ │ ├── fixtures
│ │ │ ├── __init__.py
│ │ │ ├── after_strip.html
│ │ │ ├── before_strip.html
│ │ │ └── with_links.html
│ │ └── test_utils.py
│ └── utils.py
└── uv.lock
```
# Files
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
```
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Environments
.env
tests/e2e/.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
.envrc
# Rope project settings
.ropeproject
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
.idea/
# Ruff stuff:
.ruff_cache/
# PyPI configuration file
.pypirc
deployment/charts/
.mcpregistry_*
```
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
```markdown
<p align="center">
<img src="https://storage.googleapis.com/oxylabs-public-assets/oxylabs_mcp.svg" alt="Oxylabs + MCP">
</p>
<h1 align="center" style="border-bottom: none;">
Oxylabs MCP Server
</h1>
<p align="center">
<em>The missing link between AI models and the real‑world web: one API that delivers clean, structured data from any site.</em>
</p>
<div align="center">
[](https://smithery.ai/server/@oxylabs/oxylabs-mcp)
[](https://pypi.org/project/oxylabs-mcp/)
[](https://discord.gg/Pds3gBmKMH)
[](LICENSE)
[](https://mseep.ai/app/f6a9c0bc-83a6-4f78-89d9-f2cec4ece98d)

<br/>
<a href="https://glama.ai/mcp/servers/@oxylabs/oxylabs-mcp">
<img width="380" height="200" src="https://glama.ai/mcp/servers/@oxylabs/oxylabs-mcp/badge" alt="Oxylabs Server MCP server" />
</a>
</div>
---
## 📖 Overview
The Oxylabs MCP server provides a bridge between AI models and the web. It enables them to scrape any URL, render JavaScript-heavy pages, extract and format content for AI use, bypass anti-scraping measures, and access geo-restricted web data from 195+ countries.
## 🛠️ MCP Tools
Oxylabs MCP provides two sets of tools that can be used together or independently:
### Oxylabs Web Scraper API Tools
1. **universal_scraper**: Uses Oxylabs Web Scraper API for general website scraping;
2. **google_search_scraper**: Uses Oxylabs Web Scraper API to extract results from Google Search;
3. **amazon_search_scraper**: Uses Oxylabs Web Scraper API to scrape Amazon search result pages;
4. **amazon_product_scraper**: Uses Oxylabs Web Scraper API to extract data from individual Amazon product pages.
### Oxylabs AI Studio Tools
5. **ai_scraper**: Scrape content from any URL in JSON or Markdown format with AI-powered data extraction;
6. **ai_crawler**: Based on a prompt, crawls a website and collects data in Markdown or JSON format across multiple pages;
7. **ai_browser_agent**: Based on prompt, controls a browser and returns data in Markdown, JSON, HTML, or screenshot formats;
8. **ai_search**: Search the web for URLs and their contents with AI-powered content extraction.
## ✅ Prerequisites
Before you begin, make sure you have **at least one** of the following:
- **Oxylabs Web Scraper API Account**: Obtain your username and password from [Oxylabs](https://dashboard.oxylabs.io/) (1-week free trial available);
- **Oxylabs AI Studio API Key**: Obtain your API key from [Oxylabs AI Studio](https://aistudio.oxylabs.io/settings/api-key). (1000 credits free).
## 📦 Configuration
### Environment variables
Oxylabs MCP server supports the following environment variables:
| Name | Description | Default |
|----------------------------|-----------------------------------------------|---------|
| `OXYLABS_USERNAME` | Your Oxylabs Web Scraper API username | |
| `OXYLABS_PASSWORD` | Your Oxylabs Web Scraper API password | |
| `OXYLABS_AI_STUDIO_API_KEY`| Your Oxylabs AI Studio API key | |
| `LOG_LEVEL` | Log level for the logs returned to the client | `INFO` |
Based on provided credentials, the server will automatically expose the corresponding tools:
- If only `OXYLABS_USERNAME` and `OXYLABS_PASSWORD` are provided, the server will expose the Web Scraper API tools;
- If only `OXYLABS_AI_STUDIO_API_KEY` is provided, the server will expose the AI Studio tools;
- If both `OXYLABS_USERNAME` and `OXYLABS_PASSWORD` and `OXYLABS_AI_STUDIO_API_KEY` are provided, the server will expose all tools.
❗❗❗ **Important note: if you don't have Web Scraper API or Oxylabs AI studio credentials, delete the corresponding environment variables placeholders.
Leaving placeholder values will result in exposed tools that do not work.**
### Configure with uvx
- Install the uvx package manager:
```bash
# macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
```
OR:
```bash
# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
```
- Use the following config:
```json
{
"mcpServers": {
"oxylabs": {
"command": "uvx",
"args": ["oxylabs-mcp"],
"env": {
"OXYLABS_USERNAME": "OXYLABS_USERNAME",
"OXYLABS_PASSWORD": "OXYLABS_PASSWORD",
"OXYLABS_AI_STUDIO_API_KEY": "OXYLABS_AI_STUDIO_API_KEY"
}
}
}
}
```
### Configure with uv
- Install the uv package manager:
```bash
# macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
```
OR:
```bash
# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
```
- Use the following config:
```json
{
"mcpServers": {
"oxylabs": {
"command": "uv",
"args": [
"--directory",
"/<Absolute-path-to-folder>/oxylabs-mcp",
"run",
"oxylabs-mcp"
],
"env": {
"OXYLABS_USERNAME": "OXYLABS_USERNAME",
"OXYLABS_PASSWORD": "OXYLABS_PASSWORD",
"OXYLABS_AI_STUDIO_API_KEY": "OXYLABS_AI_STUDIO_API_KEY"
}
}
}
}
```
### Configure with Smithery Oauth2
- Go to https://smithery.ai/server/@oxylabs/oxylabs-mcp;
- Click _Auto_ to install the Oxylabs MCP configuration for the respective client;
- OR use the following config:
```json
{
"mcpServers": {
"oxylabs": {
"url": "https://server.smithery.ai/@oxylabs/oxylabs-mcp/mcp"
}
}
}
```
- Follow the instructions to authenticate Oxylabs MCP with Oauth2 flow
### Configure with Smithery query parameters
In case your client does not support the Oauth2 authentication, you can pass the Oxylabs authentication parameters directly in url
```json
{
"mcpServers": {
"oxylabs": {
"url": "https://server.smithery.ai/@oxylabs/oxylabs-mcp/mcp?oxylabsUsername=OXYLABS_USERNAME&oxylabsPassword=OXYLABS_PASSWORD&oxylabsAiStudioApiKey=OXYLABS_AI_STUDIO_API_KEY"
}
}
}
```
### Manual Setup with Claude Desktop
Navigate to **Claude → Settings → Developer → Edit Config** and add one of the configurations above to the `claude_desktop_config.json` file.
### Manual Setup with Cursor AI
Navigate to **Cursor → Settings → Cursor Settings → MCP**. Click **Add new global MCP server** and add one of the configurations above.
## 📝 Logging
Server provides additional information about the tool calls in `notification/message` events
```json
{
"method": "notifications/message",
"params": {
"level": "info",
"data": "Create job with params: {\"url\": \"https://ip.oxylabs.io\"}"
}
}
```
```json
{
"method": "notifications/message",
"params": {
"level": "info",
"data": "Job info: job_id=7333113830223918081 job_status=done"
}
}
```
```json
{
"method": "notifications/message",
"params": {
"level": "error",
"data": "Error: request to Oxylabs API failed"
}
}
```
---
## 🛡️ License
Distributed under the MIT License – see [LICENSE](LICENSE) for details.
---
## About Oxylabs
Established in 2015, Oxylabs is a market-leading web intelligence collection
platform, driven by the highest business, ethics, and compliance standards,
enabling companies worldwide to unlock data-driven insights.
[](https://oxylabs.io/)
<div align="center">
<sub>
Made with ☕ by <a href="https://oxylabs.io">Oxylabs</a>. Feel free to give us a ⭐ if MCP saved you a weekend.
</sub>
</div>
## ✨ Key Features
<details>
<summary><strong> Scrape content from any site</strong></summary>
<br>
- Extract data from any URL, including complex single-page applications
- Fully render dynamic websites using headless browser support
- Choose full JavaScript rendering, HTML-only, or none
- Emulate Mobile and Desktop viewports for realistic rendering
</details>
<details>
<summary><strong> Automatically get AI-ready data</strong></summary>
<br>
- Automatically clean and convert HTML to Markdown for improved readability
- Use automated parsers for popular targets like Google, Amazon, and more
</details>
<details>
<summary><strong> Bypass blocks & geo-restrictions</strong></summary>
<br>
- Bypass sophisticated bot protection systems with high success rate
- Reliably scrape even the most complex websites
- Get automatically rotating IPs from a proxy pool covering 195+ countries
</details>
<details>
<summary><strong> Flexible setup & cross-platform support</strong></summary>
<br>
- Set rendering and parsing options if needed
- Feed data directly into AI models or analytics tools
- Works on macOS, Windows, and Linux
</details>
<details>
<summary><strong> Built-in error handling and request management</strong></summary>
<br>
- Comprehensive error handling and reporting
- Smart rate limiting and request management
</details>
---
## Why Oxylabs MCP? 🕸️ ➜ 📦 ➜ 🤖
Imagine telling your LLM *"Summarise the latest Hacker News discussion about GPT‑5"* – and it simply answers.
MCP (Multi‑Client Proxy) makes that happen by doing the boring parts for you:
| What Oxylabs MCP does | Why it matters to you |
|-------------------------------------------------------------------|------------------------------------------|
| **Bypasses anti‑bot walls** with the Oxylabs global proxy network | Keeps you unblocked and anonymous |
| **Renders JavaScript** in headless Chrome | Single‑page apps, sorted |
| **Cleans HTML → JSON** | Drop straight into vector DBs or prompts |
| **Optional structured parsers** (Google, Amazon, etc.) | One‑line access to popular targets |
mcp-name: io.github.oxylabs/oxylabs-mcp
```
--------------------------------------------------------------------------------
/src/oxylabs_mcp/tools/__init__.py:
--------------------------------------------------------------------------------
```python
```
--------------------------------------------------------------------------------
/tests/__init__.py:
--------------------------------------------------------------------------------
```python
```
--------------------------------------------------------------------------------
/tests/e2e/__init__.py:
--------------------------------------------------------------------------------
```python
```
--------------------------------------------------------------------------------
/tests/integration/__init__.py:
--------------------------------------------------------------------------------
```python
```
--------------------------------------------------------------------------------
/tests/unit/__init__.py:
--------------------------------------------------------------------------------
```python
```
--------------------------------------------------------------------------------
/tests/unit/fixtures/__init__.py:
--------------------------------------------------------------------------------
```python
```
--------------------------------------------------------------------------------
/tests/e2e/conftest.py:
--------------------------------------------------------------------------------
```python
import dotenv
import pytest
dotenv.load_dotenv()
@pytest.fixture(scope="session", autouse=True)
def environment():
pass
```
--------------------------------------------------------------------------------
/tests/unit/fixtures/after_strip.html:
--------------------------------------------------------------------------------
```html
<html> <body> <div class="content"> <p>Welcome to my website</p> </div> <div class="other"> <p>Visible content</p> </div> </body>
</html>
```
--------------------------------------------------------------------------------
/tests/e2e/example.env:
--------------------------------------------------------------------------------
```
# Oxylabs Settings
OXYLABS_USERNAME=
OXYLABS_PASSWORD=
# LLM Providers
ANTHROPIC_API_KEY=
GOOGLE_API_KEY=
OPENAI_API_KEY=
# Misc
LOCAL_OXYLABS_MCP_DIRECTORY=
```
--------------------------------------------------------------------------------
/src/oxylabs_mcp/tools/misc.py:
--------------------------------------------------------------------------------
```python
# mypy: disable-error-code=import-untyped
from oxylabs_ai_studio import client
def setup() -> None:
"""Setups the environment."""
client._UA_API = "py-mcp"
```
--------------------------------------------------------------------------------
/src/oxylabs_mcp/exceptions.py:
--------------------------------------------------------------------------------
```python
from fastmcp.server.dependencies import get_context
class MCPServerError(Exception):
"""Generic MCP server exception."""
async def process(self) -> str:
"""Process exception."""
err = str(self)
await get_context().error(err)
return err
```
--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------
```markdown
# Changelog
## [0.2.2] - 2025-05-23
### Fixed
- Coverage badge
## [0.2.1] - 2025-05-23
### Added
- More tests
### Changed
- README.md
## [0.2.0] - 2025-05-13
### Added
- Changelog
- E2E tests
- Geolocation and User Agent type parameters to universal scraper
### Changed
- Descriptions for tools
- Descriptions for tool parameters
- Default values for tool parameters
### Removed
- WebUnblocker tool
- Parse parameter for universal scraper
```
--------------------------------------------------------------------------------
/tests/unit/fixtures/before_strip.html:
--------------------------------------------------------------------------------
```html
<html>
<body>
<div class="content">
<p>Welcome to my website</p>
</div>
<div id="footer">
<p>This is the footer content.</p>
</div>
<div class="hidden">
<p>This content is hidden.</p>
</div>
<div class="other">
<p>Visible content</p>
</div>
<script>console.log('script tag');</script>
<noscript>This is noscript content.</noscript>
<form><input type="text"/></form>
</body>
</html>
```
--------------------------------------------------------------------------------
/src/oxylabs_mcp/config.py:
--------------------------------------------------------------------------------
```python
from typing import Literal
from dotenv import load_dotenv
from pydantic_settings import BaseSettings
load_dotenv()
class Settings(BaseSettings):
"""Project settings."""
OXYLABS_SCRAPER_URL: str = "https://realtime.oxylabs.io/v1/queries"
OXYLABS_REQUEST_TIMEOUT_S: int = 100
LOG_LEVEL: str = "INFO"
MCP_TRANSPORT: Literal["stdio", "sse", "streamable-http"] = "stdio"
MCP_PORT: int = 8000
MCP_HOST: str = "localhost"
MCP_STATELESS_HTTP: bool = False
# smithery config
PORT: int | None = None
settings = Settings()
```
--------------------------------------------------------------------------------
/smithery.yaml:
--------------------------------------------------------------------------------
```yaml
runtime: "container"
build:
dockerfile: "Dockerfile"
dockerBuildPath: "."
startCommand:
type: "http"
configSchema:
type: "object"
properties:
oxylabsUsername:
type: "string"
description: "Oxylabs username"
oxylabsPassword:
type: "string"
description: "Oxylabs password"
oxylabsAiStudioApiKey:
type: "string"
description: "Oxylabs AI Studio api key"
required: []
exampleConfig:
oxylabsUsername: "Your Oxylabs username"
oxylabsPassword: "Your Oxylabs password"
oxylabsAiStudioApiKey: "Your Oxylabs AI Studio api key"
```
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
```dockerfile
FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim
ENV UV_COMPILE_BYTECODE=1
ENV UV_LINK_MODE=copy
ENV UV_CACHE_DIR=/opt/uv-cache/
RUN apt-get update && apt-get install -y --no-install-recommends git
WORKDIR /app
RUN --mount=type=cache,target=UV_CACHE_DIR \
--mount=type=bind,source=uv.lock,target=uv.lock \
--mount=type=bind,source=pyproject.toml,target=pyproject.toml \
uv sync --frozen --no-install-project --no-dev --no-editable
ADD . /app
RUN --mount=type=cache,target=UV_CACHE_DIR \
uv sync --frozen --no-dev --no-editable
# Add virtual environment to PATH
ENV PATH="/app/.venv/bin:$PATH"
ENV MCP_TRANSPORT="streamable-http"
ENTRYPOINT ["oxylabs-mcp"]
```
--------------------------------------------------------------------------------
/tests/utils.py:
--------------------------------------------------------------------------------
```python
def convert_context_params(arguments: dict) -> dict:
context_fields = ["category_id", "merchant_id", "currency", "autoselect_variant"]
arguments_copy = {**arguments}
for f in context_fields:
if f in arguments_copy:
if "context" not in arguments_copy:
arguments_copy["context"] = []
arguments_copy["context"].append({"key": f, "value": arguments_copy[f]})
del arguments_copy[f]
return arguments_copy
def prepare_expected_arguments(arguments: dict) -> dict:
arguments_copy = {**arguments}
if "output_format" in arguments_copy:
del arguments_copy["output_format"]
return arguments_copy
```
--------------------------------------------------------------------------------
/.github/workflows/publish_to_pypi.yml:
--------------------------------------------------------------------------------
```yaml
name: Publish Python 🐍 distributions 📦 to PyPI
on:
push:
tags:
- 'v[0-9]+.[0-9]+.[0-9]+'
jobs:
build-n-publish:
name: Build and publish Python distribution to PyPI
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.12
- name: Install uv
run: |
pip install uv
- name: Install dependencies
run: |
uv sync --no-dev
- name: Build a dist package
run: uv build
- name: Publish distribution to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
user: __token__
password: ${{ secrets.PYPI_API_TOKEN }}
```
--------------------------------------------------------------------------------
/server.json:
--------------------------------------------------------------------------------
```json
{
"$schema": "https://static.modelcontextprotocol.io/schemas/2025-10-17/server.schema.json",
"name": "io.github.oxylabs/oxylabs-mcp",
"description": "Fetch and process content from specified URLs & sources using the Oxylabs Web Scraper API.",
"repository": {
"url": "https://github.com/oxylabs/oxylabs-mcp",
"source": "github"
},
"version": "0.7.1",
"packages": [
{
"registryType": "pypi",
"identifier": "oxylabs-mcp",
"version": "0.7.1",
"transport": {
"type": "stdio"
},
"environmentVariables": [
{
"description": "Your Oxylabs username",
"isRequired": false,
"format": "string",
"isSecret": true,
"name": "OXYLABS_USERNAME"
},
{
"description": "Your Oxylabs password",
"isRequired": false,
"format": "string",
"isSecret": true,
"name": "OXYLABS_PASSWORD"
},
{
"description": "Your Oxylabs AI Studio api key",
"isRequired": false,
"format": "string",
"isSecret": true,
"name": "OXYLABS_AI_STUDIO_API_KEY"
}
]
}
]
}
```
--------------------------------------------------------------------------------
/tests/unit/fixtures/with_links.html:
--------------------------------------------------------------------------------
```html
<!doctype html>
<html>
<head>
<title>Example Domain</title>
<meta charset="utf-8" />
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<style type="text/css">
body {
background-color: #f0f0f2;
margin: 0;
padding: 0;
font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
}
div {
width: 600px;
margin: 5em auto;
padding: 2em;
background-color: #fdfdff;
border-radius: 0.5em;
box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
}
a:link, a:visited {
color: #38488f;
text-decoration: none;
}
@media (max-width: 700px) {
div {
margin: 0 auto;
width: auto;
}
}
</style>
</head>
<body>
<div>
<h1>Example Domain</h1>
<p>This domain is for use in illustrative examples in documents. You may use this
domain in literature without prior coordination or asking for permission.</p>
<p><a href="https://www.iana.org/domains/example">More information...</a></p>
<p><a href="https://example.com">Another link</a></p>
</div>
</body>
</html>
```
--------------------------------------------------------------------------------
/tests/e2e/test_call_tools.py:
--------------------------------------------------------------------------------
```python
import os
from contextlib import asynccontextmanager
import pytest
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
@asynccontextmanager
async def get_oxylabs_mcp_client():
server_params = StdioServerParameters(
command="uv", # Using uv to run the server
args=["run", "oxylabs-mcp"],
env={
"OXYLABS_USERNAME": os.getenv("OXYLABS_USERNAME"),
"OXYLABS_PASSWORD": os.getenv("OXYLABS_PASSWORD"),
},
cwd=os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))),
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
yield session
@pytest.mark.asyncio
@pytest.mark.parametrize(
("url", "min_response_len"),
[
(
"https://maisonpur.com/best-non-toxic-cutting-boards-safer-options-for-a-healthy-kitchen/",
10000,
),
("https://sandbox.oxylabs.io/products/1", 2500),
("https://sandbox.oxylabs.io/products/5", 3000),
],
)
async def test_universal_scraper_tool(url: str, min_response_len: int):
async with get_oxylabs_mcp_client() as session:
result = await session.call_tool("universal_scraper", arguments={"url": url})
assert len(result.content[0].text) > min_response_len
```
--------------------------------------------------------------------------------
/.github/workflows/lint_and_test.yml:
--------------------------------------------------------------------------------
```yaml
name: Lint & Test
on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]
permissions:
contents: write
jobs:
lint_and_test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python 3.12
uses: actions/setup-python@v3
with:
python-version: "3.12"
- name: Install uv
run: |
pip install uv
- name: Install dependencies
run: |
uv sync
- name: Run linters
run: |
uv run black --check .
uv run mypy src
uv run ruff check .
- name: Run tests
run: |
uv run pytest --cov=src --cov-report xml --cov-report term --cov-fail-under=90 tests/unit tests/integration
- name: Generate coverage badge
run: |
pip install "genbadge[coverage]"
genbadge coverage -i coverage.xml -o coverage-badge.svg
- name: Upload coverage report artifact
uses: actions/upload-artifact@v4
with:
name: coverage-report
path: coverage.xml
- name: Upload coverage badge artifact
uses: actions/upload-artifact@v4
with:
name: coverage-badge
path: coverage-badge.svg
- name: Deploy coverage report to branch
if: github.ref == 'refs/heads/main'
uses: peaceiris/actions-gh-pages@v4
with:
publish_branch: 'coverage'
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: .
keep_files: coverage-badge.svg
user_name: 'github-actions[bot]'
user_email: 'github-actions[bot]@users.noreply.github.com'
commit_message: 'chore: Update coverage data from workflow run ${{ github.event.workflow_run.id }}'
```
--------------------------------------------------------------------------------
/src/oxylabs_mcp/__init__.py:
--------------------------------------------------------------------------------
```python
import logging
from typing import Any
from fastmcp import Context, FastMCP
from mcp import Tool as MCPTool
from oxylabs_mcp.config import settings
from oxylabs_mcp.tools.ai_studio import AI_TOOLS
from oxylabs_mcp.tools.ai_studio import mcp as ai_studio_mcp
from oxylabs_mcp.tools.scraper import SCRAPER_TOOLS
from oxylabs_mcp.tools.scraper import mcp as scraper_mcp
from oxylabs_mcp.utils import get_oxylabs_ai_studio_api_key, get_oxylabs_auth
class OxylabsMCPServer(FastMCP):
"""Oxylabs MCP server."""
async def _mcp_list_tools(self) -> list[MCPTool]:
"""List all available Oxylabs tools."""
async with Context(fastmcp=self):
tools = await self._list_tools()
username, password = get_oxylabs_auth()
if not username or not password:
tools = [tool for tool in tools if tool.name not in SCRAPER_TOOLS]
if not get_oxylabs_ai_studio_api_key():
tools = [tool for tool in tools if tool.name not in AI_TOOLS]
return [
tool.to_mcp_tool(
name=tool.key,
include_fastmcp_meta=self.include_fastmcp_meta,
)
for tool in tools
]
mcp = OxylabsMCPServer("oxylabs_mcp")
mcp.mount(ai_studio_mcp)
mcp.mount(scraper_mcp)
def main() -> None:
"""Start the MCP server."""
logging.getLogger("oxylabs_mcp").setLevel(settings.LOG_LEVEL)
params: dict[str, Any] = {}
if settings.MCP_TRANSPORT == "streamable-http":
params["host"] = settings.MCP_HOST
params["port"] = settings.PORT or settings.MCP_PORT
params["log_level"] = settings.LOG_LEVEL
params["stateless_http"] = settings.MCP_STATELESS_HTTP
mcp.run(
settings.MCP_TRANSPORT,
**params,
)
# Optionally expose other important items at package level
__all__ = ["main", "mcp"]
```
--------------------------------------------------------------------------------
/tests/unit/test_utils.py:
--------------------------------------------------------------------------------
```python
from unittest.mock import patch
import pytest
from oxylabs_mcp.config import settings
from oxylabs_mcp.utils import extract_links_with_text, get_oxylabs_auth, strip_html
TEST_FIXTURES = "tests/unit/fixtures/"
@pytest.mark.parametrize(
"env_vars",
[
pytest.param(
{"OXYLABS_USERNAME": "test_user", "OXYLABS_PASSWORD": "test_pass"},
id="valid-env",
),
pytest.param(
{"OXYLABS_PASSWORD": "test_pass"},
id="no-username",
),
pytest.param(
{"OXYLABS_USERNAME": "test_user"},
id="no-password",
),
pytest.param({}, id="no-username-and-no-password"),
],
)
def test_get_oxylabs_auth(env_vars):
with patch("os.environ", new=env_vars):
settings.MCP_TRANSPORT = "stdio"
username, password = get_oxylabs_auth()
assert username == env_vars.get("OXYLABS_USERNAME")
assert password == env_vars.get("OXYLABS_PASSWORD")
@pytest.mark.parametrize(
("html_input", "expected_output"),
[pytest.param("before_strip.html", "after_strip.html", id="strip-html")],
)
def test_strip_html(html_input: str, expected_output: str):
with (
open(TEST_FIXTURES + html_input, "r", encoding="utf-8") as input_file,
open(TEST_FIXTURES + expected_output, "r", encoding="utf-8") as output_file,
):
input_html = input_file.read()
expected_html = output_file.read()
actual_output = strip_html(input_html)
assert actual_output == expected_html
@pytest.mark.parametrize(
("html_input", "expected_output"),
[
pytest.param(
"with_links.html",
"[More information...] https://www.iana.org/domains/example\n"
"[Another link] https://example.com",
id="strip-html",
)
],
)
def test_extract_links_with_text(html_input: str, expected_output: str):
with (open(TEST_FIXTURES + html_input, "r", encoding="utf-8") as input_file,):
input_html = input_file.read()
links = extract_links_with_text(input_html)
assert "\n".join(links) == expected_output
```
--------------------------------------------------------------------------------
/tests/conftest.py:
--------------------------------------------------------------------------------
```python
from contextlib import asynccontextmanager
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from fastmcp.server.context import Context, set_context
from httpx import Request
from mcp.server.lowlevel.server import request_ctx
from oxylabs_mcp import mcp as mcp_server
@pytest.fixture
def request_context():
request_context = MagicMock()
request_context.session.client_params.clientInfo.name = "fake_cursor"
request_context.request.headers = {
"x-oxylabs-username": "oxylabs_username",
"x-oxylabs-password": "oxylabs_password",
"x-oxylabs-ai-studio-api-key": "oxylabs_ai_studio_api_key",
}
ctx = Context(MagicMock())
ctx.info = AsyncMock()
ctx.error = AsyncMock()
request_ctx.set(request_context)
with set_context(ctx):
yield ctx
@pytest.fixture(scope="session", autouse=True)
def environment():
env = {
"OXYLABS_USERNAME": "oxylabs_username",
"OXYLABS_PASSWORD": "oxylabs_password",
"OXYLABS_AI_STUDIO_API_KEY": "oxylabs_ai_studio_api_key",
}
with patch("os.environ", new=env):
yield
@pytest.fixture
def mcp(request_context: Context):
return mcp_server
@pytest.fixture
def request_data():
return Request("POST", "https://example.com/v1/queries")
@pytest.fixture
def oxylabs_client():
client_mock = AsyncMock()
@asynccontextmanager
async def wrapper(*args, **kwargs):
client_mock.context_manager_call_args = args
client_mock.context_manager_call_kwargs = kwargs
yield client_mock
with patch("oxylabs_mcp.utils.AsyncClient", new=wrapper):
yield client_mock
@pytest.fixture
def request_session(request_context):
token = request_ctx.set(request_context)
yield request_context.session
request_ctx.reset(token)
@pytest.fixture(scope="session", autouse=True)
def is_api_key_valid_mock():
with patch("oxylabs_mcp.utils.is_api_key_valid", return_value=True):
yield
@pytest.fixture
def mock_schema():
return {"field_1": "value1", "field_2": "value2"}
@pytest.fixture
def ai_crawler(mock_schema):
mock_crawler = MagicMock()
mock_crawler.generate_schema.return_value = mock_schema
with patch("oxylabs_mcp.tools.ai_studio.AiCrawler", return_value=mock_crawler):
yield mock_crawler
@pytest.fixture
def ai_scraper(mock_schema):
mock_scraper = MagicMock()
mock_scraper.generate_schema.return_value = mock_schema
with patch("oxylabs_mcp.tools.ai_studio.AiScraper", return_value=mock_scraper):
yield mock_scraper
@pytest.fixture
def browser_agent(mock_schema):
mock_browser_agent = MagicMock()
mock_browser_agent.generate_schema.return_value = mock_schema
with patch("oxylabs_mcp.tools.ai_studio.BrowserAgent", return_value=mock_browser_agent):
yield mock_browser_agent
@pytest.fixture
def ai_search():
mock_ai_search = MagicMock()
with patch("oxylabs_mcp.tools.ai_studio.AiSearch", return_value=mock_ai_search):
yield mock_ai_search
@pytest.fixture
def ai_map():
mock_ai_map = MagicMock()
with patch("oxylabs_mcp.tools.ai_studio.AiMap", return_value=mock_ai_map):
yield mock_ai_map
```
--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
```toml
[project]
name = "oxylabs-mcp"
version = "0.7.1"
description = "Oxylabs MCP server"
authors = [
{name="Augis Braziunas", email="[email protected]"},
{name="Rostyslav Borovyk", email="[email protected]"},
]
readme = "README.md"
requires-python = ">=3.12"
classifiers = [
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
"Development Status :: 4 - Beta",
"Operating System :: OS Independent",
]
license = "MIT"
license-files = ["LICEN[CS]E*"]
dependencies = [
"fastmcp>=2.11.3",
"httpx>=0.28.1",
"lxml>=5.3.0,<6",
"lxml-html-clean>=0.4.1",
"markdownify>=0.14.1",
"oxylabs-ai-studio>=0.2.15",
"pydantic>=2.10.5",
"pydantic-settings>=2.8.1",
"smithery>=0.1.25",
]
[dependency-groups]
dev = [
"bandit>=1.8.6",
"black>=25.1.0",
"lxml-stubs>=0.5.1",
"mypy>=1.14.1",
"pytest>=8.3.4",
"pytest-asyncio>=0.25.2",
"pytest-cov>=6.1.1",
"pytest-mock>=3.14.0",
"ruff>=0.9.1",
]
e2e-tests = [
"agno>=1.8.1",
"anthropic>=0.50.0",
"google-genai>=1.13.0",
"openai>=1.77.0",
]
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project.scripts]
oxylabs-mcp = "oxylabs_mcp:main"
[project.urls]
Homepage = "https://github.com/oxylabs/oxylabs-mcp"
Repository = "https://github.com/oxylabs/oxylabs-mcp"
[[tool.mypy.overrides]]
module = "markdownify.*"
ignore_missing_imports = true
strict = true
[tool.ruff]
target-version = "py312"
lint.select = [
"E", # pycodestyle (E, W) - https://docs.astral.sh/ruff/rules/#pycodestyle-e-w
"F", # Pyflakes (F) - https://docs.astral.sh/ruff/rules/#pyflakes-f
"W", # pycodestyle (E, W) - https://docs.astral.sh/ruff/rules/#pycodestyle-e-w
"I", # isort (I) https://docs.astral.sh/ruff/rules/#isort-i
"D", # pydocstyle (D) https://docs.astral.sh/ruff/rules/#pydocstyle-d
"S", # bandit (S) https://docs.astral.sh/ruff/rules/#flake8-bandit-s
"ARG", # flake8-unused-arguments - https://docs.astral.sh/ruff/rules/#flake8-unused-arguments-arg
"B", # flake8-bugbear - https://docs.astral.sh/ruff/rules/#flake8-bugbear-b
"C4", # flake8-comprehensions - https://docs.astral.sh/ruff/rules/#flake8-comprehensions-c4
"ISC", # flake8-implicit-str-concat - https://docs.astral.sh/ruff/rules/#flake8-implicit-str-concat-isc
"FA", # flake8-future-annotations - https://docs.astral.sh/ruff/rules/#flake8-future-annotations-fa
"FBT", # flake8-boolean-trap - https://docs.astral.sh/ruff/rules/#flake8-boolean-trap-fbt
"Q", # flake8-quotes (Q) https://docs.astral.sh/ruff/rules/#flake8-quotes-q
"ANN", # flake8-annotations (ANN) https://docs.astral.sh/ruff/rules/#flake8-annotations-ann
"PLR", # Refactor (PLR) https://docs.astral.sh/ruff/rules/#refactor-plr
"PT", # flake8-pytest-style (PT) https://docs.astral.sh/ruff/rules/#flake8-pytest-style-pt
]
lint.ignore = [
"D213", # Contradicts D212.
"D203", # Contradicts D211.
"D104", # Allow no docstrings in packages
"D100", # Allow no docstrings in modules
"ANN002", # https://docs.astral.sh/ruff/rules/missing-type-args/
"ANN003", # https://docs.astral.sh/ruff/rules/missing-type-kwargs/
"PLR0913", # Allow functions with many arguments
"PLR0912", # Allow many branches for functions
]
[tool.ruff.lint.per-file-ignores]
"tests/*" = ["D", "S101", "ARG001", "ANN", "PT011", "FBT", "PLR2004"]
"src/oxylabs_mcp/url_params.py" = ["E501"]
[tool.ruff.lint.pycodestyle]
max-line-length = 100
[tool.ruff.lint.isort]
known-first-party = ["src", "tests"]
lines-after-imports = 2
[tool.pytest.ini_options]
asyncio_default_fixture_loop_scope = "session"
asyncio_mode = "auto"
[tool.black]
line-length = 100
```
--------------------------------------------------------------------------------
/tests/integration/test_server.py:
--------------------------------------------------------------------------------
```python
import json
import re
from unittest.mock import AsyncMock, MagicMock
import pytest
from fastmcp import FastMCP
from httpx import HTTPStatusError, Request, RequestError, Response
from oxylabs_mcp.config import settings
from tests.integration import params
@pytest.mark.asyncio
@pytest.mark.parametrize(
("tool", "arguments"),
[
pytest.param(
"universal_scraper",
{"url": "test_url"},
id="universal_scraper",
),
pytest.param(
"google_search_scraper",
{"query": "Generic query"},
id="google_search_scraper",
),
pytest.param(
"amazon_search_scraper",
{"query": "Generic query"},
id="amazon_search_scraper",
),
pytest.param(
"amazon_product_scraper",
{"query": "Generic query"},
id="amazon_product_scraper",
),
],
)
async def test_default_headers_are_set(
mcp: FastMCP,
request_data: Request,
oxylabs_client: AsyncMock,
tool: str,
arguments: dict,
):
mock_response = Response(
200,
content=json.dumps(params.STR_RESPONSE),
request=request_data,
)
oxylabs_client.post.return_value = mock_response
oxylabs_client.get.return_value = mock_response
await mcp._call_tool(tool, arguments=arguments)
assert "x-oxylabs-sdk" in oxylabs_client.context_manager_call_kwargs["headers"]
oxylabs_sdk_header = oxylabs_client.context_manager_call_kwargs["headers"]["x-oxylabs-sdk"]
client_info, _ = oxylabs_sdk_header.split(maxsplit=1)
client_info_pattern = re.compile(r"oxylabs-mcp-fake_cursor/(\d+)\.(\d+)\.(\d+)$")
assert re.match(client_info_pattern, client_info)
@pytest.mark.asyncio
@pytest.mark.parametrize(
("tool", "arguments"),
[
pytest.param(
"universal_scraper",
{"url": "test_url"},
id="universal_scraper",
),
pytest.param(
"google_search_scraper",
{"query": "Generic query"},
id="google_search_scraper",
),
pytest.param(
"amazon_search_scraper",
{"query": "Generic query"},
id="amazon_search_scraper",
),
pytest.param(
"amazon_product_scraper",
{"query": "Generic query"},
id="amazon_product_scraper",
),
],
)
@pytest.mark.parametrize(
("exception", "expected_text"),
[
pytest.param(
HTTPStatusError(
"HTTP status error",
request=MagicMock(),
response=MagicMock(status_code=500, text="Internal Server Error"),
),
"HTTP error during POST request: 500 - Internal Server Error",
id="https_status_error",
),
pytest.param(
RequestError("Request error"),
"Request error during POST request: Request error",
id="request_error",
),
pytest.param(
Exception("Unexpected exception"),
"Error: Unexpected exception",
id="unhandled_exception",
),
],
)
async def test_request_client_error_handling(
mcp: FastMCP,
request_data: Request,
oxylabs_client: AsyncMock,
tool: str,
arguments: dict,
exception: Exception,
expected_text: str,
):
oxylabs_client.post.side_effect = [exception]
oxylabs_client.get.side_effect = [exception]
result = await mcp._call_tool(tool, arguments=arguments)
assert result.content[0].text == expected_text
@pytest.mark.parametrize("transport", ["stdio", "streamable-http"])
async def test_list_tools(mcp: FastMCP, transport: str):
settings.MCP_TRANSPORT = transport
tools = await mcp._mcp_list_tools()
assert len(tools) == 10
```
--------------------------------------------------------------------------------
/src/oxylabs_mcp/url_params.py:
--------------------------------------------------------------------------------
```python
from typing import Annotated, Literal
from pydantic import Field
# Note: optional types (e.g `str | None`) break the introspection in the Cursor AI.
# See: https://github.com/getcursor/cursor/issues/2932
# Therefore, sentinel values (e.g. `""`, `0`) are used to represent a nullable parameter.
URL_PARAM = Annotated[str, Field(description="Website url to scrape.")]
PARSE_PARAM = Annotated[
bool,
Field(
description="Should result be parsed. If the result is not parsed, the output_format parameter is applied.",
),
]
RENDER_PARAM = Annotated[
Literal["", "html"],
Field(
description="""
Whether a headless browser should be used to render the page.
For example:
- 'html' when browser is required to render the page.
""",
examples=["", "html"],
),
]
OUTPUT_FORMAT_PARAM = Annotated[
Literal[
"",
"links",
"md",
"html",
],
Field(
description="""
The format of the output. Works only when parse parameter is false.
- links - Most efficient when the goal is navigation or finding specific URLs. Use this first when you need to locate a specific page within a website.
- md - Best for extracting and reading visible content once you've found the right page. Use this to get structured content that's easy to read and process.
- html - Should be used sparingly only when you need the raw HTML structure, JavaScript code, or styling information.
"""
),
]
GOOGLE_QUERY_PARAM = Annotated[str, Field(description="URL-encoded keyword to search for.")]
AMAZON_SEARCH_QUERY_PARAM = Annotated[str, Field(description="Keyword to search for.")]
USER_AGENT_TYPE_PARAM = Annotated[
Literal[
"",
"desktop",
"desktop_chrome",
"desktop_firefox",
"desktop_safari",
"desktop_edge",
"desktop_opera",
"mobile",
"mobile_ios",
"mobile_android",
"tablet",
],
Field(
description="Device type and browser that will be used to "
"determine User-Agent header value."
),
]
START_PAGE_PARAM = Annotated[
int,
Field(description="Starting page number."),
]
PAGES_PARAM = Annotated[
int,
Field(description="Number of pages to retrieve."),
]
LIMIT_PARAM = Annotated[
int,
Field(description="Number of results to retrieve in each page."),
]
DOMAIN_PARAM = Annotated[
str,
Field(
description="""
Domain localization for Google.
Use country top level domains.
For example:
- 'co.uk' for United Kingdom
- 'us' for United States
- 'fr' for France
""",
examples=["uk", "us", "fr"],
),
]
GEO_LOCATION_PARAM = Annotated[
str,
Field(
description="""
The geographical location that the result should be adapted for.
Use ISO-3166 country codes.
Examples:
- 'California, United States'
- 'Mexico'
- 'US' for United States
- 'DE' for Germany
- 'FR' for France
""",
examples=["US", "DE", "FR"],
),
]
LOCALE_PARAM = Annotated[
str,
Field(
description="""
Set 'Accept-Language' header value which changes your Google search page web interface language.
Examples:
- 'en-US' for English, United States
- 'de-AT' for German, Austria
- 'fr-FR' for French, France
""",
examples=["en-US", "de-AT", "fr-FR"],
),
]
AD_MODE_PARAM = Annotated[
bool,
Field(
description="If true will use the Google Ads source optimized for the paid ads.",
),
]
CATEGORY_ID_CONTEXT_PARAM = Annotated[
str,
Field(
description="Search for items in a particular browse node (product category).",
),
]
MERCHANT_ID_CONTEXT_PARAM = Annotated[
str,
Field(
description="Search for items sold by a particular seller.",
),
]
CURRENCY_CONTEXT_PARAM = Annotated[
str,
Field(
description="Currency that will be used to display the prices.",
examples=["USD", "EUR", "AUD"],
),
]
AUTOSELECT_VARIANT_CONTEXT_PARAM = Annotated[
bool,
Field(
description="To get accurate pricing/buybox data, set this parameter to true.",
),
]
```
--------------------------------------------------------------------------------
/tests/e2e/test_llm_agent.py:
--------------------------------------------------------------------------------
```python
import json
import os
from contextlib import asynccontextmanager
import pytest
from agno.agent import Agent
from agno.models.google import Gemini
from agno.models.openai import OpenAIChat
from agno.tools.mcp import MCPTools
MCP_SERVER = "local" # local, uvx
MODELS_CONFIG = [
("GOOGLE_API_KEY", "gemini"),
# ("OPENAI_API_KEY", "openai"),
]
def get_agent(model: str, oxylabs_mcp: MCPTools) -> Agent:
if model == "gemini":
model_ = Gemini(api_key=os.getenv("GOOGLE_API_KEY"))
elif model == "openai":
model_ = OpenAIChat(api_key=os.getenv("OPENAI_API_KEY"))
else:
raise ValueError(f"Unknown model: {model}")
return Agent(
model=model_,
tools=[oxylabs_mcp],
instructions=["Use MCP tools to fulfil the requests"],
markdown=True,
)
def get_models() -> list[str]:
models = []
for env_var, model_name in MODELS_CONFIG:
if os.getenv(env_var):
models.append(model_name)
return models
@asynccontextmanager
async def oxylabs_mcp_server():
if MCP_SERVER == "local":
command = f"uv run --directory {os.getenv('LOCAL_OXYLABS_MCP_DIRECTORY')} oxylabs-mcp"
elif MCP_SERVER == "uvx":
command = "uvx oxylabs-mcp"
else:
raise ValueError(f"Unknown mcp server option: {MCP_SERVER}")
async with MCPTools(
command,
env={
"OXYLABS_USERNAME": os.getenv("OXYLABS_USERNAME"),
"OXYLABS_PASSWORD": os.getenv("OXYLABS_PASSWORD"),
},
) as mcp_server:
yield mcp_server
@pytest.mark.skipif(not os.getenv("OXYLABS_USERNAME"), reason="`OXYLABS_USERNAME` is not set")
@pytest.mark.skipif(not os.getenv("OXYLABS_PASSWORD"), reason="`OXYLABS_PASSWORD` is not set")
@pytest.mark.asyncio
@pytest.mark.parametrize("model", get_models())
@pytest.mark.parametrize(
("query", "tool", "arguments", "expected_content"),
[
(
"Search for iPhone 16 in google with parsed result",
"google_search_scraper",
{
"query": "iPhone 16",
"parse": True,
},
"iPhone 16",
),
(
"Search for iPhone 16 in google with render html",
"google_search_scraper",
{
"query": "iPhone 16",
"render": "html",
},
"iPhone 16",
),
(
"Search for iPhone 16 in google with browser rendering",
"google_search_scraper",
{
"query": "iPhone 16",
"render": "html",
},
"iPhone 16",
),
(
"Search for iPhone 16 in google with user agent type mobile",
"google_search_scraper",
{
"query": "iPhone 16",
"user_agent_type": "mobile",
},
"iPhone 16",
),
(
"Search for iPhone 16 in google starting from the second page",
"google_search_scraper",
{
"query": "iPhone 16",
"start_page": 2,
},
"iPhone 16",
),
(
"Search for iPhone 16 in google with United Kingdom domain",
"google_search_scraper",
{
"query": "iPhone 16",
"domain": "co.uk",
},
"iPhone 16",
),
(
"Search for iPhone 16 in google with Brazil geolocation",
"google_search_scraper",
{
"query": "iPhone 16",
"geo_location": "BR",
},
"iPhone 16",
),
(
"Search for iPhone 16 in google with French locale",
"google_search_scraper",
{
"query": "iPhone 16",
"locale": "fr-FR",
},
"iPhone 16",
),
],
)
async def test_basic_agent_prompts(
model: str,
query: str,
tool: str,
arguments: dict,
expected_content: str,
):
async with oxylabs_mcp_server() as mcp_server:
agent = get_agent(model, mcp_server)
response = await agent.arun(query)
tool_calls = agent.memory.get_tool_calls(agent.session_id)
# [tool_call, tool_call_result]
assert len(tool_calls) == 2, "Extra tool calls found!"
assert tool_calls[0]["function"]["name"] == tool
assert json.loads(tool_calls[0]["function"]["arguments"]) == arguments
assert expected_content in response.content
@pytest.mark.asyncio
@pytest.mark.parametrize("model", get_models())
async def test_complex_agent_prompt(model: str):
async with oxylabs_mcp_server() as mcp_server:
agent = get_agent(model, mcp_server)
await agent.arun(
"Go to oxylabs.io, look for career page, "
"go to it and return all job titles in markdown format. "
"Don't invent URLs, start from one provided."
)
tool_calls = agent.memory.get_tool_calls(agent.session_id)
assert len(tool_calls) == 4, f"Not enough tool_calls, got {len(tool_calls)}: {tool_calls}"
oxylabs_page_call, _, careers_page_call, _ = agent.memory.get_tool_calls(agent.session_id)
assert oxylabs_page_call["function"]["name"] == "universal_scraper"
assert json.loads(oxylabs_page_call["function"]["arguments"]) == {
"output_format": "links",
"url": "https://oxylabs.io",
}
assert careers_page_call["function"]["name"] == "universal_scraper"
assert json.loads(careers_page_call["function"]["arguments"]) == {
"output_format": "md",
"url": "https://career.oxylabs.io/",
}
```
--------------------------------------------------------------------------------
/tests/integration/test_scraper_tools.py:
--------------------------------------------------------------------------------
```python
import json
from typing import Any
from unittest.mock import AsyncMock, patch
import pytest
from fastmcp import FastMCP
from httpx import Request, Response
from mcp.types import TextContent
from tests.integration import params
from tests.utils import convert_context_params, prepare_expected_arguments
@pytest.mark.parametrize(
("arguments", "expectation", "response_data", "expected_result"),
[
params.URL_ONLY,
params.NO_URL,
params.RENDER_HTML_WITH_URL,
params.RENDER_INVALID_WITH_URL,
*params.USER_AGENTS_WITH_URL,
params.GEO_LOCATION_SPECIFIED_WITH_URL,
],
)
@pytest.mark.asyncio
async def test_oxylabs_scraper_arguments(
mcp: FastMCP,
request_data: Request,
response_data: str,
arguments: dict,
expectation,
expected_result: str,
oxylabs_client: AsyncMock,
):
mock_response = Response(200, content=json.dumps(response_data), request=request_data)
oxylabs_client.post.return_value = mock_response
with (
expectation,
patch("httpx.AsyncClient.post", new=AsyncMock(return_value=mock_response)),
):
result = await mcp._call_tool("universal_scraper", arguments=arguments)
assert oxylabs_client.post.call_args.kwargs == {
"json": convert_context_params(prepare_expected_arguments(arguments)),
}
assert result.content == [TextContent(type="text", text=expected_result)]
@pytest.mark.parametrize(
("arguments", "expectation", "response_data", "expected_result"),
[
params.QUERY_ONLY,
params.PARSE_ENABLED,
params.RENDER_HTML_WITH_QUERY,
*params.USER_AGENTS_WITH_QUERY,
*params.OUTPUT_FORMATS,
params.INVALID_USER_AGENT,
params.START_PAGE_SPECIFIED,
params.PAGES_SPECIFIED,
params.LIMIT_SPECIFIED,
params.DOMAIN_SPECIFIED,
params.GEO_LOCATION_SPECIFIED_WITH_QUERY,
params.LOCALE_SPECIFIED,
],
)
@pytest.mark.asyncio
async def test_google_search_scraper_arguments(
mcp: FastMCP,
request_data: Request,
response_data: str,
arguments: dict,
expectation,
expected_result: str,
oxylabs_client: AsyncMock,
):
mock_response = Response(200, content=json.dumps(response_data), request=request_data)
oxylabs_client.post.return_value = mock_response
with expectation:
result = await mcp._call_tool("google_search_scraper", arguments=arguments)
assert oxylabs_client.post.call_args.kwargs == {
"json": {
"source": "google_search",
"parse": True,
**prepare_expected_arguments(arguments),
}
}
assert result.content == [TextContent(type="text", text=expected_result)]
@pytest.mark.parametrize(
("ad_mode", "expected_result"),
[
(False, {"parse": True, "query": "Iphone 16", "source": "google_search"}),
(True, {"parse": True, "query": "Iphone 16", "source": "google_ads"}),
],
)
@pytest.mark.asyncio
async def test_oxylabs_google_search_ad_mode_argument(
mcp: FastMCP,
request_data: Request,
ad_mode: bool,
expected_result: dict[str, Any],
oxylabs_client: AsyncMock,
):
arguments = {"query": "Iphone 16", "ad_mode": ad_mode}
mock_response = Response(200, content=json.dumps('{"data": "value"}'), request=request_data)
oxylabs_client.post.return_value = mock_response
await mcp._call_tool("google_search_scraper", arguments=arguments)
assert oxylabs_client.post.call_args.kwargs == {"json": expected_result}
assert oxylabs_client.post.await_args.kwargs["json"] == expected_result
@pytest.mark.parametrize(
("arguments", "expectation", "response_data", "expected_result"),
[
params.QUERY_ONLY,
params.PARSE_ENABLED,
params.RENDER_HTML_WITH_QUERY,
*params.USER_AGENTS_WITH_QUERY,
*params.OUTPUT_FORMATS,
params.INVALID_USER_AGENT,
params.START_PAGE_SPECIFIED,
params.PAGES_SPECIFIED,
params.DOMAIN_SPECIFIED,
params.GEO_LOCATION_SPECIFIED_WITH_QUERY,
params.LOCALE_SPECIFIED,
params.CATEGORY_SPECIFIED,
params.MERCHANT_ID_SPECIFIED,
params.CURRENCY_SPECIFIED,
],
)
@pytest.mark.asyncio
async def test_amazon_search_scraper_arguments(
mcp: FastMCP,
request_data: Request,
response_data: str,
arguments: dict,
expectation,
expected_result: str,
oxylabs_client: AsyncMock,
request_context,
):
mock_response = Response(200, content=json.dumps(response_data), request=request_data)
oxylabs_client.post.return_value = mock_response
with expectation:
result = await mcp._call_tool("amazon_search_scraper", arguments=arguments)
assert oxylabs_client.post.call_args.kwargs == {
"json": {
"source": "amazon_search",
"parse": True,
**convert_context_params(prepare_expected_arguments(arguments)),
}
}
assert result.content == [TextContent(type="text", text=expected_result)]
@pytest.mark.parametrize(
("arguments", "expectation", "response_data", "expected_result"),
[
params.QUERY_ONLY,
params.PARSE_ENABLED,
params.RENDER_HTML_WITH_QUERY,
*params.USER_AGENTS_WITH_QUERY,
*params.OUTPUT_FORMATS,
params.INVALID_USER_AGENT,
params.DOMAIN_SPECIFIED,
params.GEO_LOCATION_SPECIFIED_WITH_QUERY,
params.LOCALE_SPECIFIED,
params.CURRENCY_SPECIFIED,
params.AUTOSELECT_VARIANT_ENABLED,
],
)
@pytest.mark.asyncio
async def test_amazon_product_scraper_arguments(
mcp: FastMCP,
request_data: Request,
response_data: str,
arguments: dict,
expectation,
expected_result: str,
oxylabs_client: AsyncMock,
):
mock_response = Response(200, content=json.dumps(response_data), request=request_data)
oxylabs_client.post.return_value = mock_response
with expectation:
result = await mcp._call_tool("amazon_product_scraper", arguments=arguments)
assert oxylabs_client.post.call_args.kwargs == {
"json": {
"source": "amazon_product",
"parse": True,
**convert_context_params(prepare_expected_arguments(arguments)),
}
}
assert result.content == [TextContent(type="text", text=expected_result)]
```
--------------------------------------------------------------------------------
/src/oxylabs_mcp/tools/scraper.py:
--------------------------------------------------------------------------------
```python
from typing import Any
from fastmcp import FastMCP
from mcp.types import ToolAnnotations
from oxylabs_mcp import url_params
from oxylabs_mcp.exceptions import MCPServerError
from oxylabs_mcp.utils import (
get_content,
oxylabs_client,
)
SCRAPER_TOOLS = [
"universal_scraper",
"google_search_scraper",
"amazon_search_scraper",
"amazon_product_scraper",
]
mcp = FastMCP("scraper")
@mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
async def universal_scraper(
url: url_params.URL_PARAM,
render: url_params.RENDER_PARAM = "",
user_agent_type: url_params.USER_AGENT_TYPE_PARAM = "",
geo_location: url_params.GEO_LOCATION_PARAM = "",
output_format: url_params.OUTPUT_FORMAT_PARAM = "",
) -> str:
"""Get a content of any webpage.
Supports browser rendering, parsing of certain webpages
and different output formats.
"""
try:
async with oxylabs_client() as client:
payload: dict[str, Any] = {"url": url}
if render:
payload["render"] = render
if user_agent_type:
payload["user_agent_type"] = user_agent_type
if geo_location:
payload["geo_location"] = geo_location
response_json = await client.scrape(payload)
return get_content(response_json, output_format=output_format)
except MCPServerError as e:
return await e.process()
@mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
async def google_search_scraper(
query: url_params.GOOGLE_QUERY_PARAM,
parse: url_params.PARSE_PARAM = True, # noqa: FBT002
render: url_params.RENDER_PARAM = "",
user_agent_type: url_params.USER_AGENT_TYPE_PARAM = "",
start_page: url_params.START_PAGE_PARAM = 0,
pages: url_params.PAGES_PARAM = 0,
limit: url_params.LIMIT_PARAM = 0,
domain: url_params.DOMAIN_PARAM = "",
geo_location: url_params.GEO_LOCATION_PARAM = "",
locale: url_params.LOCALE_PARAM = "",
ad_mode: url_params.AD_MODE_PARAM = False, # noqa: FBT002
output_format: url_params.OUTPUT_FORMAT_PARAM = "",
) -> str:
"""Scrape Google Search results.
Supports content parsing, different user agent types, pagination,
domain, geolocation, locale parameters and different output formats.
"""
try:
async with oxylabs_client() as client:
payload: dict[str, Any] = {"query": query}
if ad_mode:
payload["source"] = "google_ads"
else:
payload["source"] = "google_search"
if parse:
payload["parse"] = parse
if render:
payload["render"] = render
if user_agent_type:
payload["user_agent_type"] = user_agent_type
if start_page:
payload["start_page"] = start_page
if pages:
payload["pages"] = pages
if limit:
payload["limit"] = limit
if domain:
payload["domain"] = domain
if geo_location:
payload["geo_location"] = geo_location
if locale:
payload["locale"] = locale
response_json = await client.scrape(payload)
return get_content(response_json, parse=parse, output_format=output_format)
except MCPServerError as e:
return await e.process()
@mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
async def amazon_search_scraper(
query: url_params.AMAZON_SEARCH_QUERY_PARAM,
category_id: url_params.CATEGORY_ID_CONTEXT_PARAM = "",
merchant_id: url_params.MERCHANT_ID_CONTEXT_PARAM = "",
currency: url_params.CURRENCY_CONTEXT_PARAM = "",
parse: url_params.PARSE_PARAM = True, # noqa: FBT002
render: url_params.RENDER_PARAM = "",
user_agent_type: url_params.USER_AGENT_TYPE_PARAM = "",
start_page: url_params.START_PAGE_PARAM = 0,
pages: url_params.PAGES_PARAM = 0,
domain: url_params.DOMAIN_PARAM = "",
geo_location: url_params.GEO_LOCATION_PARAM = "",
locale: url_params.LOCALE_PARAM = "",
output_format: url_params.OUTPUT_FORMAT_PARAM = "",
) -> str:
"""Scrape Amazon search results.
Supports content parsing, different user agent types, pagination,
domain, geolocation, locale parameters and different output formats.
Supports Amazon specific parameters such as category id, merchant id, currency.
"""
try:
async with oxylabs_client() as client:
payload: dict[str, Any] = {"source": "amazon_search", "query": query}
context = []
if category_id:
context.append({"key": "category_id", "value": category_id})
if merchant_id:
context.append({"key": "merchant_id", "value": merchant_id})
if currency:
context.append({"key": "currency", "value": currency})
if context:
payload["context"] = context
if parse:
payload["parse"] = parse
if render:
payload["render"] = render
if user_agent_type:
payload["user_agent_type"] = user_agent_type
if start_page:
payload["start_page"] = start_page
if pages:
payload["pages"] = pages
if domain:
payload["domain"] = domain
if geo_location:
payload["geo_location"] = geo_location
if locale:
payload["locale"] = locale
response_json = await client.scrape(payload)
return get_content(response_json, parse=parse, output_format=output_format)
except MCPServerError as e:
return await e.process()
@mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
async def amazon_product_scraper(
query: url_params.AMAZON_SEARCH_QUERY_PARAM,
autoselect_variant: url_params.AUTOSELECT_VARIANT_CONTEXT_PARAM = False, # noqa: FBT002
currency: url_params.CURRENCY_CONTEXT_PARAM = "",
parse: url_params.PARSE_PARAM = True, # noqa: FBT002
render: url_params.RENDER_PARAM = "",
user_agent_type: url_params.USER_AGENT_TYPE_PARAM = "",
domain: url_params.DOMAIN_PARAM = "",
geo_location: url_params.GEO_LOCATION_PARAM = "",
locale: url_params.LOCALE_PARAM = "",
output_format: url_params.OUTPUT_FORMAT_PARAM = "",
) -> str:
"""Scrape Amazon products.
Supports content parsing, different user agent types, domain,
geolocation, locale parameters and different output formats.
Supports Amazon specific parameters such as currency and getting
more accurate pricing data with auto select variant.
"""
try:
async with oxylabs_client() as client:
payload: dict[str, Any] = {"source": "amazon_product", "query": query}
context = []
if autoselect_variant:
context.append({"key": "autoselect_variant", "value": autoselect_variant})
if currency:
context.append({"key": "currency", "value": currency})
if context:
payload["context"] = context
if parse:
payload["parse"] = parse
if render:
payload["render"] = render
if user_agent_type:
payload["user_agent_type"] = user_agent_type
if domain:
payload["domain"] = domain
if geo_location:
payload["geo_location"] = geo_location
if locale:
payload["locale"] = locale
response_json = await client.scrape(payload)
return get_content(response_json, parse=parse, output_format=output_format)
except MCPServerError as e:
return await e.process()
```
--------------------------------------------------------------------------------
/tests/integration/test_ai_studio_tools.py:
--------------------------------------------------------------------------------
```python
import json
from unittest.mock import AsyncMock, MagicMock
import pytest
from fastmcp import FastMCP
from httpx import Request
from mcp.types import TextContent
from oxylabs_ai_studio.apps.ai_search import AiSearchJob, SearchResult
from tests.integration import params
from tests.integration.params import SimpleSchema
@pytest.mark.parametrize(
("arguments", "expectation", "response_data", "expected_result"),
[
params.AI_STUDIO_URL_ONLY,
params.AI_STUDIO_URL_AND_OUTPUT_FORMAT,
params.AI_STUDIO_URL_AND_SCHEMA,
params.AI_STUDIO_URL_AND_RENDER_JAVASCRIPT,
params.AI_STUDIO_URL_AND_RETURN_SOURCES_LIMIT,
params.AI_STUDIO_URL_AND_GEO_LOCATION,
],
)
@pytest.mark.asyncio
async def test_ai_crawler(
mcp: FastMCP,
request_data: Request,
response_data: str,
arguments: dict,
expectation,
expected_result: str,
oxylabs_client: AsyncMock,
ai_crawler: AsyncMock,
):
mock_result = MagicMock()
mock_result.data = expected_result
ai_crawler.crawl_async = AsyncMock(return_value=mock_result)
arguments = {"user_prompt": "Scrape price and title", **arguments}
with expectation:
result = await mcp._call_tool("ai_crawler", arguments=arguments)
assert result.content == [
TextContent(type="text", text=json.dumps({"data": expected_result}))
]
default_args = {
"geo_location": None,
"output_format": "markdown",
"render_javascript": False,
"return_sources_limit": 25,
"schema": None,
}
default_args = {k: v for k, v in default_args.items() if k not in arguments}
ai_crawler.crawl_async.assert_called_once_with(**default_args, **arguments)
@pytest.mark.parametrize(
("arguments", "expectation", "response_data", "expected_result"),
[
params.AI_STUDIO_URL_ONLY,
params.AI_STUDIO_URL_AND_OUTPUT_FORMAT,
params.AI_STUDIO_URL_AND_SCHEMA,
params.AI_STUDIO_URL_AND_RENDER_JAVASCRIPT,
params.AI_STUDIO_URL_AND_GEO_LOCATION,
],
)
@pytest.mark.asyncio
async def test_ai_scraper(
mcp: FastMCP,
request_data: Request,
response_data: str,
arguments: dict,
expectation,
expected_result: str,
oxylabs_client: AsyncMock,
ai_scraper: AsyncMock,
):
mock_result = MagicMock()
mock_result.data = expected_result
ai_scraper.scrape_async = AsyncMock(return_value=mock_result)
arguments = {**arguments}
with expectation:
result = await mcp._call_tool("ai_scraper", arguments=arguments)
assert result.content == [
TextContent(type="text", text=json.dumps({"data": expected_result}))
]
default_args = {
"geo_location": None,
"output_format": "markdown",
"render_javascript": False,
"schema": None,
}
default_args = {k: v for k, v in default_args.items() if k not in arguments}
ai_scraper.scrape_async.assert_called_once_with(**default_args, **arguments)
@pytest.mark.parametrize(
("arguments", "expectation", "response_data", "expected_result"),
[
params.AI_STUDIO_URL_ONLY,
params.AI_STUDIO_URL_AND_OUTPUT_FORMAT,
params.AI_STUDIO_URL_AND_SCHEMA,
params.AI_STUDIO_URL_AND_GEO_LOCATION,
],
)
@pytest.mark.asyncio
async def test_ai_browser_agent(
mcp: FastMCP,
request_data: Request,
response_data: str,
arguments: dict,
expectation,
expected_result: str,
oxylabs_client: AsyncMock,
browser_agent: AsyncMock,
):
mock_result = MagicMock()
mock_data = SimpleSchema(title="Title", price=0.0)
mock_result.data = mock_data
browser_agent.run_async = AsyncMock(return_value=mock_result)
arguments = {"task_prompt": "Scrape price and title", **arguments}
with expectation:
result = await mcp._call_tool("ai_browser_agent", arguments=arguments)
assert result.content == [
TextContent(type="text", text=json.dumps({"data": mock_data.model_dump()}))
]
default_args = {
"geo_location": None,
"output_format": "markdown",
"schema": None,
"user_prompt": arguments["task_prompt"],
}
del arguments["task_prompt"]
default_args = {k: v for k, v in default_args.items() if k not in arguments}
browser_agent.run_async.assert_called_once_with(**default_args, **arguments)
@pytest.mark.parametrize(
("arguments", "expectation", "response_data", "expected_result"),
[
params.AI_STUDIO_QUERY_ONLY,
params.AI_STUDIO_URL_AND_RENDER_JAVASCRIPT,
params.AI_STUDIO_URL_AND_GEO_LOCATION,
params.AI_STUDIO_URL_AND_LIMIT,
params.AI_STUDIO_QUERY_AND_RETURN_CONTENT,
],
)
@pytest.mark.asyncio
async def test_ai_search(
mcp: FastMCP,
request_data: Request,
response_data: str,
arguments: dict,
expectation,
expected_result: str,
oxylabs_client: AsyncMock,
ai_search: AsyncMock,
):
mock_result = AiSearchJob(
run_id="123",
data=[SearchResult(url="url", title="title", description="description", content=None)],
)
ai_search.search_async = AsyncMock(return_value=mock_result)
arguments = {**arguments}
if "url" in arguments:
del arguments["url"]
arguments["query"] = "Sample query"
with expectation:
result = await mcp._call_tool("ai_search", arguments=arguments)
assert result.content == [
TextContent(type="text", text=json.dumps({"data": [mock_result.data[0].model_dump()]}))
]
default_args = {
"limit": 10,
"render_javascript": False,
"return_content": False,
"geo_location": None,
}
default_args = {k: v for k, v in default_args.items() if k not in arguments}
ai_search.search_async.assert_called_once_with(**default_args, **arguments)
@pytest.mark.parametrize(
("arguments", "expectation", "response_data", "expected_result"),
[
params.AI_STUDIO_USER_PROMPT,
],
)
@pytest.mark.parametrize(
"app_name",
["ai_crawler", "ai_scraper", "browser_agent"],
)
@pytest.mark.asyncio
async def test_generate_schema(
mcp: FastMCP,
request_data: Request,
response_data: str,
arguments: dict,
expectation,
expected_result: str,
oxylabs_client: AsyncMock,
app_name: str,
ai_crawler: AsyncMock,
ai_scraper: AsyncMock,
browser_agent: AsyncMock,
mock_schema: dict,
):
arguments = {"app_name": app_name, **arguments}
with expectation:
result = await mcp._call_tool("generate_schema", arguments=arguments)
assert result.content == [TextContent(type="text", text=json.dumps({"data": mock_schema}))]
locals()[app_name].generate_schema.assert_called_once_with(prompt=arguments["user_prompt"])
@pytest.mark.parametrize(
("arguments", "expectation", "response_data", "expected_result"),
[
params.AI_STUDIO_URL_ONLY,
params.AI_STUDIO_URL_AND_RENDER_JAVASCRIPT,
params.AI_STUDIO_URL_AND_RETURN_SOURCES_LIMIT,
params.AI_STUDIO_URL_AND_GEO_LOCATION,
],
)
@pytest.mark.asyncio
async def test_ai_map(
mcp: FastMCP,
request_data: Request,
response_data: str,
arguments: dict,
expectation,
expected_result: str,
oxylabs_client: AsyncMock,
ai_map: AsyncMock,
):
mock_result = MagicMock()
mock_result.data = expected_result
ai_map.map_async = AsyncMock(return_value=mock_result)
arguments = {"user_prompt": "Scrape price and title", **arguments}
with expectation:
result = await mcp._call_tool("ai_map", arguments=arguments)
assert result.content == [
TextContent(type="text", text=json.dumps({"data": expected_result}))
]
default_args = {
"geo_location": None,
"render_javascript": False,
"return_sources_limit": 25,
}
default_args = {k: v for k, v in default_args.items() if k not in arguments}
ai_map.map_async.assert_called_once_with(**default_args, **arguments)
```
--------------------------------------------------------------------------------
/tests/integration/params.py:
--------------------------------------------------------------------------------
```python
from contextlib import nullcontext as does_not_raise
import pytest
from fastmcp.exceptions import ToolError
from pydantic import BaseModel
class SimpleSchema(BaseModel):
title: str
price: float
JOB_RESPONSE = {"id": "7333092420940211201", "status": "done"}
STR_RESPONSE = {
"results": [{"content": "Mocked content"}],
"job": JOB_RESPONSE,
}
JSON_RESPONSE = {
"results": [{"content": {"data": "value"}}],
"job": JOB_RESPONSE,
}
AI_STUDIO_JSON_RESPONSE = {
"results": [{"content": {"data": "value"}}],
"job": JOB_RESPONSE,
}
QUERY_ONLY = pytest.param(
{"query": "Generic query"},
does_not_raise(),
STR_RESPONSE,
"\n\nMocked content\n\n",
id="query-only-args",
)
PARSE_ENABLED = pytest.param(
{"query": "Generic query", "parse": True},
does_not_raise(),
JSON_RESPONSE,
'{"data": "value"}',
id="parse-enabled-args",
)
RENDER_HTML_WITH_QUERY = pytest.param(
{"query": "Generic query", "render": "html"},
does_not_raise(),
STR_RESPONSE,
"\n\nMocked content\n\n",
id="render-enabled-args",
)
RENDER_INVALID_WITH_QUERY = pytest.param(
{"query": "Generic query", "render": "png"},
pytest.raises(ToolError),
STR_RESPONSE,
None,
id="render-enabled-args",
)
OUTPUT_FORMATS = [
pytest.param(
{"query": "Generic query", "output_format": "links"},
does_not_raise(),
{
"results": [
{
"content": '<html><body><div><p><a href="https://example.com">link</a></p></div></body></html>'
}
],
"job": JOB_RESPONSE,
},
"[link] https://example.com",
id="links-output-format-args",
),
pytest.param(
{"query": "Generic query", "output_format": "md"},
does_not_raise(),
STR_RESPONSE,
"\n\nMocked content\n\n",
id="md-output-format-args",
),
pytest.param(
{"query": "Generic query", "output_format": "html"},
does_not_raise(),
STR_RESPONSE,
"Mocked content",
id="html-output-format-args",
),
]
USER_AGENTS_WITH_QUERY = [
pytest.param(
{"query": "Generic query", "user_agent_type": uat},
does_not_raise(),
STR_RESPONSE,
"\n\nMocked content\n\n",
id=f"{uat}-user-agent-specified-args",
)
for uat in [
"desktop",
"desktop_chrome",
"desktop_firefox",
"desktop_safari",
"desktop_edge",
"desktop_opera",
"mobile",
"mobile_ios",
"mobile_android",
"tablet",
]
]
USER_AGENTS_WITH_URL = [
pytest.param(
{"url": "https://example.com", "user_agent_type": uat},
does_not_raise(),
STR_RESPONSE,
"\n\nMocked content\n\n",
id=f"{uat}-user-agent-specified-args",
)
for uat in [
"desktop",
"desktop_chrome",
"desktop_firefox",
"desktop_safari",
"desktop_edge",
"desktop_opera",
"mobile",
"mobile_ios",
"mobile_android",
"tablet",
]
]
INVALID_USER_AGENT = pytest.param(
{"query": "Generic query", "user_agent_type": "invalid"},
pytest.raises(ToolError),
STR_RESPONSE,
"Mocked content",
id="invalid-user-agent-specified-args",
)
START_PAGE_SPECIFIED = pytest.param(
{"query": "Generic query", "start_page": 2},
does_not_raise(),
JSON_RESPONSE,
'{"data": "value"}',
id="start-page-specified-args",
)
START_PAGE_INVALID = pytest.param(
{"query": "Generic query", "start_page": -1},
pytest.raises(ToolError),
JSON_RESPONSE,
'{"data": "value"}',
id="start-page-invalid-args",
)
PAGES_SPECIFIED = pytest.param(
{"query": "Generic query", "pages": 20},
does_not_raise(),
JSON_RESPONSE,
'{"data": "value"}',
id="pages-specified-args",
)
PAGES_INVALID = pytest.param(
{"query": "Generic query", "pages": -10},
pytest.raises(ToolError),
JSON_RESPONSE,
'{"data": "value"}',
id="pages-invalid-args",
)
LIMIT_SPECIFIED = pytest.param(
{"query": "Generic query", "limit": 100},
does_not_raise(),
JSON_RESPONSE,
'{"data": "value"}',
id="limit-specified-args",
)
LIMIT_INVALID = pytest.param(
{"query": "Generic query", "limit": 0},
pytest.raises(ToolError),
JSON_RESPONSE,
'{"data": "value"}',
id="limit-invalid-args",
)
DOMAIN_SPECIFIED = pytest.param(
{"query": "Generic query", "domain": "io"},
does_not_raise(),
JSON_RESPONSE,
'{"data": "value"}',
id="domain-specified-args",
)
GEO_LOCATION_SPECIFIED_WITH_QUERY = pytest.param(
{"query": "Generic query", "geo_location": "Miami, Florida"},
does_not_raise(),
JSON_RESPONSE,
'{"data": "value"}',
id="geo-location-specified-args",
)
GEO_LOCATION_SPECIFIED_WITH_URL = pytest.param(
{"url": "https://example.com", "geo_location": "Miami, Florida"},
does_not_raise(),
STR_RESPONSE,
"\n\nMocked content\n\n",
id="geo-location-specified-args",
)
LOCALE_SPECIFIED = pytest.param(
{"query": "Generic query", "locale": "ja_JP"},
does_not_raise(),
JSON_RESPONSE,
'{"data": "value"}',
id="locale-specified-args",
)
CATEGORY_SPECIFIED = pytest.param(
{"query": "Man's T-shirt", "category_id": "QE21R9AV"},
does_not_raise(),
JSON_RESPONSE,
'{"data": "value"}',
id="category-id-specified-args",
)
MERCHANT_ID_SPECIFIED = pytest.param(
{"query": "Man's T-shirt", "merchant_id": "QE21R9AV"},
does_not_raise(),
JSON_RESPONSE,
'{"data": "value"}',
id="merchant-id-specified-args",
)
CURRENCY_SPECIFIED = pytest.param(
{"query": "Man's T-shirt", "currency": "USD"},
does_not_raise(),
JSON_RESPONSE,
'{"data": "value"}',
id="currency-specified-args",
)
AUTOSELECT_VARIANT_ENABLED = pytest.param(
{"query": "B0BVF87BST", "autoselect_variant": True},
does_not_raise(),
JSON_RESPONSE,
'{"data": "value"}',
id="autoselect-variant-enabled-args",
)
URL_ONLY = pytest.param(
{"url": "https://example.com"},
does_not_raise(),
STR_RESPONSE,
"\n\nMocked content\n\n",
id="url-only-args",
)
NO_URL = pytest.param(
{},
pytest.raises(ToolError),
STR_RESPONSE,
"\n\nMocked content\n\n",
id="no-url-args",
)
RENDER_HTML_WITH_URL = pytest.param(
{"url": "https://example.com", "render": "html"},
does_not_raise(),
STR_RESPONSE,
"\n\nMocked content\n\n",
id="render-enabled-args",
)
RENDER_INVALID_WITH_URL = pytest.param(
{"url": "https://example.com", "render": "png"},
pytest.raises(ToolError),
JSON_RESPONSE,
None,
id="render-enabled-args",
)
AI_STUDIO_URL_ONLY = pytest.param(
{"url": "https://example.com"},
does_not_raise(),
AI_STUDIO_JSON_RESPONSE,
{"data": "value"},
id="url-with-user-prompt-args",
)
AI_STUDIO_QUERY_ONLY = pytest.param(
{"query": "Generic query"},
does_not_raise(),
AI_STUDIO_JSON_RESPONSE,
{"data": "value"},
id="url-with-user-prompt-args",
)
AI_STUDIO_URL_AND_OUTPUT_FORMAT = pytest.param(
{"url": "https://example.com", "output_format": "json"},
does_not_raise(),
AI_STUDIO_JSON_RESPONSE,
{"data": "value"},
id="url-with-user-prompt-and-output-format-args",
)
AI_STUDIO_URL_AND_SCHEMA = pytest.param(
{
"url": "https://example.com",
"schema": SimpleSchema.model_json_schema(),
},
does_not_raise(),
AI_STUDIO_JSON_RESPONSE,
{"data": "value"},
id="url-with-user-prompt-and-schema-args",
)
AI_STUDIO_URL_AND_RENDER_JAVASCRIPT = pytest.param(
{
"url": "https://example.com",
"render_javascript": True,
},
does_not_raise(),
AI_STUDIO_JSON_RESPONSE,
{"data": "value"},
id="url-with-user-prompt-and-render-js-args",
)
AI_STUDIO_QUERY_AND_RETURN_CONTENT = pytest.param(
{
"url": "https://example.com",
"return_content": True,
},
does_not_raise(),
AI_STUDIO_JSON_RESPONSE,
{"data": "value"},
id="url-with-user-prompt-and-return-content-args",
)
AI_STUDIO_URL_AND_RETURN_SOURCES_LIMIT = pytest.param(
{
"url": "https://example.com",
"return_sources_limit": 10,
},
does_not_raise(),
AI_STUDIO_JSON_RESPONSE,
{"data": "value"},
id="url-with-user-prompt-and-return-sources-limit-args",
)
AI_STUDIO_URL_AND_GEO_LOCATION = pytest.param(
{
"url": "https://example.com",
"geo_location": "US",
},
does_not_raise(),
AI_STUDIO_JSON_RESPONSE,
{"data": "value"},
id="url-with-user-prompt-and-geo_location-args",
)
AI_STUDIO_URL_AND_LIMIT = pytest.param(
{
"url": "https://example.com",
"limit": 5,
},
does_not_raise(),
AI_STUDIO_JSON_RESPONSE,
{"data": "value"},
id="url-with-user-prompt-and-limit-args",
)
AI_STUDIO_USER_PROMPT = pytest.param(
{
"user_prompt": "Scrape price and title",
},
does_not_raise(),
AI_STUDIO_JSON_RESPONSE,
{"data": "value"},
id="user-prompt-args",
)
```
--------------------------------------------------------------------------------
/src/oxylabs_mcp/utils.py:
--------------------------------------------------------------------------------
```python
import json
import logging
import os
import re
import typing
from contextlib import asynccontextmanager
from importlib.metadata import version
from platform import architecture, python_version
from typing import AsyncIterator
from fastmcp.server.dependencies import get_context
from httpx import (
AsyncClient,
BasicAuth,
HTTPStatusError,
RequestError,
Timeout,
)
from lxml.html import defs, fromstring, tostring
from lxml.html.clean import Cleaner
from markdownify import markdownify
from mcp.server.fastmcp import Context
from mcp.shared.context import RequestContext
from oxylabs_ai_studio.utils import is_api_key_valid # type: ignore[import-untyped]
from starlette import status
from oxylabs_mcp.config import settings
from oxylabs_mcp.exceptions import MCPServerError
logger = logging.getLogger(__name__)
USERNAME_ENV = "OXYLABS_USERNAME"
PASSWORD_ENV = "OXYLABS_PASSWORD" # noqa: S105 # nosec
AI_STUDIO_API_KEY_ENV = "OXYLABS_AI_STUDIO_API_KEY"
USERNAME_HEADER = "X-Oxylabs-Username"
PASSWORD_HEADER = "X-Oxylabs-Password" # noqa: S105 # nosec
AI_STUDIO_API_KEY_HEADER = "X-Oxylabs-AI-Studio-Api-Key"
USERNAME_QUERY_PARAM = "oxylabsUsername"
PASSWORD_QUERY_PARAM = "oxylabsPassword" # noqa: S105 # nosec
AI_STUDIO_API_KEY_QUERY_PARAM = "oxylabsAiStudioApiKey"
def clean_html(html: str) -> str:
"""Clean an HTML string."""
cleaner = Cleaner(
scripts=True,
javascript=True,
style=True,
remove_tags=[],
kill_tags=["nav", "svg", "footer", "noscript", "script", "form"],
safe_attrs=list(defs.safe_attrs) + ["idx"],
comments=True,
inline_style=True,
links=True,
meta=False,
page_structure=False,
embedded=True,
frames=False,
forms=False,
annoying_tags=False,
)
return cleaner.clean_html(html) # type: ignore[no-any-return]
def strip_html(html: str) -> str:
"""Simplify an HTML string.
Will remove unwanted elements, attributes, and redundant content
Args:
html (str): The input HTML string.
Returns:
str: The cleaned and simplified HTML string.
"""
cleaned_html = clean_html(html)
html_tree = fromstring(cleaned_html)
for element in html_tree.iter():
# Remove style attributes.
if "style" in element.attrib:
del element.attrib["style"]
# Remove elements that have no attributes, no content and no children.
if (
(not element.attrib or (len(element.attrib) == 1 and "idx" in element.attrib))
and not element.getchildren() # type: ignore[attr-defined]
and (not element.text or not element.text.strip())
and (not element.tail or not element.tail.strip())
):
parent = element.getparent()
if parent is not None:
parent.remove(element)
# Remove elements with footer and hidden in class or id
xpath_query = (
".//*[contains(@class, 'footer') or contains(@id, 'footer') or "
"contains(@class, 'hidden') or contains(@id, 'hidden')]"
)
elements_to_remove = html_tree.xpath(xpath_query)
for element in elements_to_remove: # type: ignore[assignment, union-attr]
parent = element.getparent()
if parent is not None:
parent.remove(element)
# Serialize the HTML tree back to a string
stripped_html = tostring(html_tree, encoding="unicode")
# Previous cleaning produces empty spaces.
# Replace multiple spaces with a single one
stripped_html = re.sub(r"\s{2,}", " ", stripped_html)
# Replace consecutive newlines with an empty string
stripped_html = re.sub(r"\n{2,}", "", stripped_html)
return stripped_html
def _get_request_context(ctx: Context) -> RequestContext | None: # type: ignore[type-arg]
try:
return ctx.request_context
except ValueError:
return None
def _get_default_headers() -> dict[str, str]:
headers = {}
if request_ctx := get_context().request_context:
if client_params := request_ctx.session.client_params:
client = f"oxylabs-mcp-{client_params.clientInfo.name}"
else:
client = "oxylabs-mcp"
else:
client = "oxylabs-mcp"
bits, _ = architecture()
sdk_type = f"{client}/{version('oxylabs-mcp')} ({python_version()}; {bits})"
headers["x-oxylabs-sdk"] = sdk_type
return headers
class _OxylabsClientWrapper:
def __init__(
self,
client: AsyncClient,
) -> None:
self._client = client
self._ctx = get_context()
async def scrape(self, payload: dict[str, typing.Any]) -> dict[str, typing.Any]:
await self._ctx.info(f"Create job with params: {json.dumps(payload)}")
response = await self._client.post(settings.OXYLABS_SCRAPER_URL, json=payload)
response_json: dict[str, typing.Any] = response.json()
if response.status_code == status.HTTP_201_CREATED:
await self._ctx.info(
f"Job info: "
f"job_id={response_json['job']['id']} "
f"job_status={response_json['job']['status']}"
)
response.raise_for_status()
return response_json
def get_oxylabs_auth() -> tuple[str | None, str | None]:
"""Extract the Oxylabs credentials."""
if settings.MCP_TRANSPORT == "streamable-http":
request_headers = dict(get_context().request_context.request.headers) # type: ignore[union-attr]
username = request_headers.get(USERNAME_HEADER.lower())
password = request_headers.get(PASSWORD_HEADER.lower())
if not username or not password:
query_params = get_context().request_context.request.query_params # type: ignore[union-attr]
username = query_params.get(USERNAME_QUERY_PARAM)
password = query_params.get(PASSWORD_QUERY_PARAM)
else:
username = os.environ.get(USERNAME_ENV)
password = os.environ.get(PASSWORD_ENV)
return username, password
def get_oxylabs_ai_studio_api_key() -> str | None:
"""Extract the Oxylabs AI Studio API key."""
if settings.MCP_TRANSPORT == "streamable-http":
request_headers = dict(get_context().request_context.request.headers) # type: ignore[union-attr]
ai_studio_api_key = request_headers.get(AI_STUDIO_API_KEY_HEADER.lower())
if not ai_studio_api_key:
query_params = get_context().request_context.request.query_params # type: ignore[union-attr]
ai_studio_api_key = query_params.get(AI_STUDIO_API_KEY_QUERY_PARAM)
else:
ai_studio_api_key = os.getenv(AI_STUDIO_API_KEY_ENV)
return ai_studio_api_key
@asynccontextmanager
async def oxylabs_client() -> AsyncIterator[_OxylabsClientWrapper]:
"""Async context manager for Oxylabs client that is used in MCP tools."""
headers = _get_default_headers()
username, password = get_oxylabs_auth()
if not username or not password:
raise ValueError("Oxylabs username and password must be set.")
auth = BasicAuth(username=username, password=password)
async with AsyncClient(
timeout=Timeout(settings.OXYLABS_REQUEST_TIMEOUT_S),
verify=True,
headers=headers,
auth=auth,
) as client:
try:
yield _OxylabsClientWrapper(client)
except HTTPStatusError as e:
raise MCPServerError(
f"HTTP error during POST request: {e.response.status_code} - {e.response.text}"
) from None
except RequestError as e:
raise MCPServerError(f"Request error during POST request: {e}") from None
except Exception as e:
raise MCPServerError(f"Error: {str(e) or repr(e)}") from None
def get_and_verify_oxylabs_ai_studio_api_key() -> str:
"""Extract and varify the Oxylabs AI Studio API key."""
ai_studio_api_key = get_oxylabs_ai_studio_api_key()
if ai_studio_api_key is None:
msg = "AI Studio API key is not set"
logger.warning(msg)
raise ValueError(msg)
if not is_api_key_valid(ai_studio_api_key):
raise ValueError("AI Studio API key is not valid")
return ai_studio_api_key
def extract_links_with_text(html: str, base_url: str | None = None) -> list[str]:
"""Extract links with their display text from HTML.
Args:
html (str): The input HTML string.
base_url (str | None): Base URL to use for converting relative URLs to absolute.
If None, relative URLs will remain as is.
Returns:
list[str]: List of links in format [Display Text] URL
"""
html_tree = fromstring(html)
links = []
for link in html_tree.xpath("//a[@href]"): # type: ignore[union-attr]
href = link.get("href") # type: ignore[union-attr]
text = link.text_content().strip() # type: ignore[union-attr]
if href and text:
# Skip empty or whitespace-only text
if not text:
continue
# Skip anchor links
if href.startswith("#"):
continue
# Skip javascript links
if href.startswith("javascript:"):
continue
# Make relative URLs absolute if base_url is provided
if base_url and href.startswith("/"):
# Remove trailing slash from base_url if present
base = base_url.rstrip("/")
href = f"{base}{href}"
links.append(f"[{text}] {href}")
return links
def get_content(
response_json: dict[str, typing.Any],
*,
output_format: str,
parse: bool = False,
) -> str:
"""Extract content from response and convert to a proper format."""
content = response_json["results"][0]["content"]
if parse and isinstance(content, dict):
return json.dumps(content)
if output_format == "html":
return str(content)
if output_format == "links":
links = extract_links_with_text(str(content))
return "\n".join(links)
stripped_html = clean_html(str(content))
return markdownify(stripped_html) # type: ignore[no-any-return]
```
--------------------------------------------------------------------------------
/src/oxylabs_mcp/tools/ai_studio.py:
--------------------------------------------------------------------------------
```python
# mypy: disable-error-code=import-untyped
import json
import logging
from typing import Annotated, Any, Literal
from fastmcp import FastMCP
from mcp.types import ToolAnnotations
from oxylabs_ai_studio.apps.ai_crawler import AiCrawler
from oxylabs_ai_studio.apps.ai_map import AiMap
from oxylabs_ai_studio.apps.ai_scraper import AiScraper
from oxylabs_ai_studio.apps.ai_search import AiSearch
from oxylabs_ai_studio.apps.browser_agent import BrowserAgent
from pydantic import Field
from oxylabs_mcp.tools.misc import setup
from oxylabs_mcp.utils import get_and_verify_oxylabs_ai_studio_api_key
setup()
logger = logging.getLogger(__name__)
AI_TOOLS = [
"generate_schema",
"ai_search",
"ai_scraper",
"ai_crawler",
"ai_browser_agent",
"ai_map",
]
mcp = FastMCP("ai_studio")
@mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
async def ai_crawler(
url: Annotated[str, Field(description="The URL from which crawling will be started.")],
user_prompt: Annotated[
str,
Field(description="What information user wants to extract from the domain."),
],
output_format: Annotated[
Literal["json", "markdown", "csv"],
Field(
description=(
"The format of the output. If json or csv, the schema is required. "
"Markdown returns full text of the page. CSV returns data in CSV format."
)
),
] = "markdown",
schema: Annotated[
dict[str, Any] | None,
Field(
description="The schema to use for the crawl. Required if output_format is json or csv."
),
] = None,
render_javascript: Annotated[ # noqa: FBT002
bool,
Field(
description=(
"Whether to render the HTML of the page using javascript. Much slower, "
"therefore use it only for websites "
"that require javascript to render the page. "
"Unless user asks to use it, first try to crawl the page without it. "
"If results are unsatisfactory, try to use it."
)
),
] = False,
return_sources_limit: Annotated[
int, Field(description="The maximum number of sources to return.", le=50)
] = 25,
geo_location: Annotated[
str | None,
Field(description="Two letter ISO country code to use for the crawl proxy."),
] = None,
) -> str:
"""Tool useful for crawling a website from starting url and returning data in a specified format.
Schema is required only if output_format is json.
'render_javascript' is used to render javascript heavy websites.
'return_sources_limit' is used to limit the number of sources to return,
for example if you expect results from single source, you can set it to 1.
""" # noqa: E501
logger.info(
f"Calling ai_crawler with: {url=}, {user_prompt=}, "
f"{output_format=}, {schema=}, {render_javascript=}, "
f"{return_sources_limit=}"
)
crawler = AiCrawler(api_key=get_and_verify_oxylabs_ai_studio_api_key())
result = await crawler.crawl_async(
url=url,
user_prompt=user_prompt,
output_format=output_format,
schema=schema,
render_javascript=render_javascript,
return_sources_limit=return_sources_limit,
geo_location=geo_location,
)
return json.dumps({"data": result.data})
@mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
async def ai_scraper(
url: Annotated[str, Field(description="The URL to scrape")],
output_format: Annotated[
Literal["json", "markdown", "csv"],
Field(
description=(
"The format of the output. If json or csv, the schema is required. "
"Markdown returns full text of the page. CSV returns data in CSV format, "
"tabular like data."
)
),
] = "markdown",
schema: Annotated[
dict[str, Any] | None,
Field(
description=(
"The schema to use for the scrape. Only required if output_format is json or csv."
)
),
] = None,
render_javascript: Annotated[ # noqa: FBT002
bool,
Field(
description=(
"Whether to render the HTML of the page using javascript. "
"Much slower, therefore use it only for websites "
"that require javascript to render the page."
"Unless user asks to use it, first try to scrape the page without it. "
"If results are unsatisfactory, try to use it."
)
),
] = False,
geo_location: Annotated[
str | None,
Field(description="Two letter ISO country code to use for the scrape proxy."),
] = None,
) -> str:
"""Scrape the contents of the web page and return the data in the specified format.
Schema is required only if output_format is json or csv.
'render_javascript' is used to render javascript heavy websites.
"""
logger.info(
f"Calling ai_scraper with: {url=}, {output_format=}, {schema=}, {render_javascript=}"
)
scraper = AiScraper(api_key=get_and_verify_oxylabs_ai_studio_api_key())
result = await scraper.scrape_async(
url=url,
output_format=output_format,
schema=schema,
render_javascript=render_javascript,
geo_location=geo_location,
)
return json.dumps({"data": result.data})
@mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
async def ai_browser_agent(
url: Annotated[str, Field(description="The URL to start the browser agent navigation from.")],
task_prompt: Annotated[str, Field(description="What browser agent should do.")],
output_format: Annotated[
Literal["json", "markdown", "html", "csv"],
Field(
description=(
"The output format. "
"Markdown returns full text of the page including links. "
"If json or csv, the schema is required."
)
),
] = "markdown",
schema: Annotated[
dict[str, Any] | None,
Field(
description=(
"The schema to use for the scrape. Only required if output_format is json or csv."
)
),
] = None,
geo_location: Annotated[
str | None,
Field(description="Two letter ISO country code to use for the browser proxy."),
] = None,
) -> str:
"""Run the browser agent and return the data in the specified format.
This tool is useful if you need navigate around the website and do some actions.
It allows navigating to any url, clicking on links, filling forms, scrolling, etc.
Finally it returns the data in the specified format. Schema is required only if output_format is json or csv.
'task_prompt' describes what browser agent should achieve
""" # noqa: E501
logger.info(
f"Calling ai_browser_agent with: {url=}, {task_prompt=}, {output_format=}, {schema=}"
)
browser_agent = BrowserAgent(api_key=get_and_verify_oxylabs_ai_studio_api_key())
result = await browser_agent.run_async(
url=url,
user_prompt=task_prompt,
output_format=output_format,
schema=schema,
geo_location=geo_location,
)
data = result.data.model_dump(mode="json") if result.data else None
return json.dumps({"data": data})
@mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
async def ai_search(
query: Annotated[str, Field(description="The query to search for.")],
limit: Annotated[int, Field(description="Maximum number of results to return.", le=50)] = 10,
render_javascript: Annotated[ # noqa: FBT002
bool,
Field(
description=(
"Whether to render the HTML of the page using javascript. "
"Much slower, therefore use it only if user asks to use it."
"First try to search with setting it to False. "
)
),
] = False,
return_content: Annotated[ # noqa: FBT002
bool,
Field(description="Whether to return markdown content of the search results."),
] = False,
geo_location: Annotated[
str | None,
Field(description="Two letter ISO country code to use for the search proxy."),
] = None,
) -> str:
"""Search the web based on a provided query.
'return_content' is used to return markdown content for each search result. If 'return_content'
is set to True, you don't need to use ai_scraper to get the content of the search results urls,
because it is already included in the search results.
if 'return_content' is set to True, prefer lower 'limit' to reduce payload size.
""" # noqa: E501
logger.info(
f"Calling ai_search with: {query=}, {limit=}, {render_javascript=}, {return_content=}"
)
search = AiSearch(api_key=get_and_verify_oxylabs_ai_studio_api_key())
result = await search.search_async(
query=query,
limit=limit,
render_javascript=render_javascript,
return_content=return_content,
geo_location=geo_location,
)
data = result.model_dump(mode="json")["data"]
return json.dumps({"data": data})
@mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
async def generate_schema(
user_prompt: str,
app_name: Literal["ai_crawler", "ai_scraper", "browser_agent"],
) -> str:
"""Generate a json schema in openapi format."""
if app_name == "ai_crawler":
crawler = AiCrawler(api_key=get_and_verify_oxylabs_ai_studio_api_key())
schema = crawler.generate_schema(prompt=user_prompt)
elif app_name == "ai_scraper":
scraper = AiScraper(api_key=get_and_verify_oxylabs_ai_studio_api_key())
schema = scraper.generate_schema(prompt=user_prompt)
elif app_name == "browser_agent":
browser_agent = BrowserAgent(api_key=get_and_verify_oxylabs_ai_studio_api_key())
schema = browser_agent.generate_schema(prompt=user_prompt)
else:
raise ValueError(f"Invalid app name: {app_name}")
return json.dumps({"data": schema})
@mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
async def ai_map(
url: Annotated[str, Field(description="The URL from which URLs mapping will be started.")],
user_prompt: Annotated[
str,
Field(description="What kind of urls user wants to find."),
],
render_javascript: Annotated[ # noqa: FBT002
bool,
Field(
description=(
"Whether to render the HTML of the page using javascript. Much slower, "
"therefore use it only for websites "
"that require javascript to render the page. "
"Unless user asks to use it, first try to crawl the page without it. "
"If results are unsatisfactory, try to use it."
)
),
] = False,
return_sources_limit: Annotated[
int, Field(description="The maximum number of sources to return.", le=50)
] = 25,
geo_location: Annotated[
str | None,
Field(description="Two letter ISO country code to use for the mapping proxy."),
] = None,
) -> str:
"""Tool useful for mapping website's urls.""" # noqa: E501
logger.info(
f"Calling ai_map with: {url=}, {user_prompt=}, "
f"{render_javascript=}, "
f"{return_sources_limit=}"
)
ai_map = AiMap(api_key=get_and_verify_oxylabs_ai_studio_api_key())
result = await ai_map.map_async(
url=url,
user_prompt=user_prompt,
render_javascript=render_javascript,
return_sources_limit=return_sources_limit,
geo_location=geo_location,
)
return json.dumps({"data": result.data})
```