#
tokens: 1578/50000 2/2 files
lines: off (toggle) GitHub
raw markdown copy
# Directory Structure

```
├── README.md
└── scrape_mcp_server.py
```

# Files

--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------

```markdown
# Claude Web Scraper MCP

A simple Model Context Protocol (MCP) server that connects Claude for Desktop to a locally running eGet web scraper. This allows Claude to scrape website content through your local API.

## Prerequisites

- Claude for Desktop
- Python 3.7+
- eGet web scraper (from https://github.com/vishwajeetdabholkar/eGet-Crawler-for-ai)

## Setup Instructions

### 1. Set up eGet Web Scraper

First, make sure you have the eGet web scraper running:

```bash
# Clone the eGet repository
git clone https://github.com/vishwajeetdabholkar/eGet-Crawler-for-ai
cd eGet-Crawler-for-ai

# Set up and run eGet according to its instructions
# (typically using Docker or local Python installation)

# Verify the API is running (default: http://localhost:8000/api/v1/scrape)
```

### 2. Set up the MCP Server

```bash
# Create project directory
mkdir claude-scraper-mcp
cd claude-scraper-mcp

# Set up UV and virtual environment
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
uv add "mcp[cli]" httpx

# Create the MCP server script
touch scrape_mcp_server.py
```

Copy the `scrape_mcp_server.py` code into the file.

### 3. Configure Claude for Desktop

1. Create or edit the Claude desktop configuration:

```bash
# On macOS
mkdir -p ~/Library/Application\ Support/Claude/
```

2. Add this configuration to `~/Library/Application Support/Claude/claude_desktop_config.json`:

```json
{
    "mcpServers": {
        "scrape-service": {
            "command": "/absolute/path/to/claude-scraper-mcp/.venv/bin/python",
            "args": [
                "/absolute/path/to/claude-scraper-mcp/scrape_mcp_server.py"
            ]
        }
    }
}
```

Replace the paths with the actual absolute paths to your virtual environment and script.

3. Restart Claude for Desktop

## Usage

Once set up, you can use Claude to scrape websites with commands like:

- "Scrape the content from https://example.com and summarize it"
- "Get information about the website at https://news.ycombinator.com"

## Troubleshooting

If you encounter issues:

1. Check that eGet scraper is running
2. Verify the API endpoint in the script matches your eGet configuration
3. Make sure Claude for Desktop is using the correct Python interpreter
4. Restart Claude for Desktop after making changes to the configuration
```

--------------------------------------------------------------------------------
/scrape_mcp_server.py:
--------------------------------------------------------------------------------

```python
from typing import Any, Dict, Optional, List
import httpx
from mcp.server.fastmcp import FastMCP

# Initialize FastMCP server
mcp = FastMCP("scrape-service")

# Constants
API_ENDPOINT = "http://localhost:8000/api/v1/scrape"
TIMEOUT = 30  # seconds

async def make_scrape_request(params: Dict[str, Any]) -> Dict[str, Any]:
    """Make a request to the scrape API with proper error handling."""
    async with httpx.AsyncClient() as client:
        try:
            response = await client.post(
                API_ENDPOINT, 
                json=params,
                timeout=TIMEOUT
            )
            response.raise_for_status()
            return response.json()
        except Exception as e:
            return {"success": False, "error": str(e)}

@mcp.tool()
async def scrape_url(
    url: str, 
    get_full_content: bool = True,
    only_main_content: bool = True
) -> str:
    """Scrape content from a URL and return the content.
    
    Args:
        url: The URL to scrape
        get_full_content: Whether to get full content or just metadata
        only_main_content: Whether to extract only the main content
    """
    # Configure parameters
    params = {
        "url": url,
        "formats": ["markdown", "html"],
        "onlyMainContent": only_main_content,
        "includeRawHtml": False,
        "includeScreenshot": False
    }
    
    # Make the API call
    result = await make_scrape_request(params)
    
    if not result.get("success", False):
        error_msg = result.get("error", "Unknown error")
        return f"Error scraping {url}: {error_msg}"
    
    data = result.get("data", {})
    metadata = data.get("metadata", {})
    title = metadata.get("title", "No title")
    description = metadata.get("description", "")
    
    # If only metadata is requested
    if not get_full_content:
        # Format links for display
        links = data.get("links", [])
        formatted_links = '\n'.join([f"- {link}" for link in links[:5]])
        if len(links) > 5:
            formatted_links += f"\n... and {len(links) - 5} more links"
            
        return f"""
# {title}

{description}

## Links
{formatted_links if links else "No links found."}
        """.strip()
    
    # Return full content
    markdown_content = data.get("markdown", "No content available")
    return f"""
# {title}

{description}

## Content

{markdown_content}
    """.strip()

@mcp.tool()
async def scrape_advanced(
    url: str,
    mobile: bool = False,
    include_raw_html: bool = False,
    wait_time: Optional[int] = None,
    custom_headers: Optional[Dict[str, str]] = None
) -> str:
    """Advanced web scraping with additional options.
    
    Args:
        url: The URL to scrape
        mobile: Whether to use mobile user agent
        include_raw_html: Whether to include raw HTML in response
        wait_time: Time to wait after page load in milliseconds
        custom_headers: Custom HTTP headers to send with request
    """
    # Configure parameters
    params = {
        "url": url,
        "formats": ["markdown", "html"],
        "onlyMainContent": True,
        "includeRawHtml": include_raw_html,
        "mobile": mobile
    }
    
    if wait_time is not None:
        params["waitFor"] = wait_time
        
    if custom_headers is not None:
        params["headers"] = custom_headers
    
    # Make the API call
    result = await make_scrape_request(params)
    
    if not result.get("success", False):
        error_msg = result.get("error", "Unknown error")
        return f"Error scraping {url}: {error_msg}"
    
    data = result.get("data", {})
    metadata = data.get("metadata", {})
    title = metadata.get("title", "No title")
    description = metadata.get("description", "")
    markdown_content = data.get("markdown", "No content available")
    
    return f"""
# {title}

{description}

## Content

{markdown_content}
    """.strip()

if __name__ == "__main__":
    # Initialize and run the server
    mcp.run(transport='stdio')
```