codyde/mcp-firecrawl-tool # codebase.md

# Directory Structure

```
├── CLAUDE.md
├── package-lock.json
├── package.json
├── README.md
└── src
    └── server.js
```

# Files

--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------

```markdown
# MCP Firecrawl Server

This is a simple MCP server that provides tools to scrape websites and extract structured data using Firecrawl's APIs.

## Setup

1. Install dependencies:
```bash
npm install
```

2. Create a `.env` file in the root directory with the following variables:
```
FIRECRAWL_API_TOKEN=your_token_here
SENTRY_DSN=your_sentry_dsn_here
```

- `FIRECRAWL_API_TOKEN` (required): Your Firecrawl API token
- `SENTRY_DSN` (optional): Sentry DSN for error tracking and performance monitoring

3. Start the server:
```bash
npm start
```

Alternatively, you can set environment variables directly when running the server:
```bash
FIRECRAWL_API_TOKEN=your_token_here npm start
```

## Features

- **Website Scraping**: Extract content from websites in various formats
- **Structured Data Extraction**: Extract specific data points based on custom schemas
- **Error Tracking**: Integrated with Sentry for error tracking and performance monitoring

## Usage

The server exposes two tools:
1. `scrape-website`: Basic website scraping with multiple format options
2. `extract-data`: Structured data extraction based on prompts and schemas

### Tool: scrape-website

This tool scrapes a website and returns its content in the requested formats.

Parameters:
- `url` (string, required): The URL of the website to scrape
- `formats` (array of strings, optional): Array of desired output formats. Supported formats are:
  - `"markdown"` (default)
  - `"html"`
  - `"text"`

Example usage with MCP Inspector:
```bash
# Basic usage (defaults to markdown)
mcp-inspector --tool scrape-website --args '{
  "url": "https://example.com"
}'

# Multiple formats
mcp-inspector --tool scrape-website --args '{
  "url": "https://example.com",
  "formats": ["markdown", "html", "text"]
}'
```

### Tool: extract-data

This tool extracts structured data from websites based on a provided prompt and schema.

Parameters:
- `urls` (array of strings, required): Array of URLs to extract data from
- `prompt` (string, required): The prompt describing what data to extract
- `schema` (object, required): Schema definition for the data to extract

The schema definition should be an object where keys are field names and values are types. Supported types are:
- `"string"`: For text fields
- `"boolean"`: For true/false fields
- `"number"`: For numeric fields
- Arrays: Specified as `["type"]` where type is one of the above
- Objects: Nested objects with their own type definitions

Example usage with MCP Inspector:
```bash
# Basic example extracting company information
mcp-inspector --tool extract-data --args '{
  "urls": ["https://example.com"],
  "prompt": "Extract the company mission, whether it supports SSO, and whether it is open source.",
  "schema": {
    "company_mission": "string",
    "supports_sso": "boolean",
    "is_open_source": "boolean"
  }
}'

# Complex example with nested data
mcp-inspector --tool extract-data --args '{
  "urls": ["https://example.com/products", "https://example.com/pricing"],
  "prompt": "Extract product information including name, price, and features.",
  "schema": {
    "products": [{
      "name": "string",
      "price": "number",
      "features": ["string"]
    }]
  }
}'
```

Both tools will return appropriate error messages if the scraping or extraction fails and automatically log errors to Sentry if configured.

## Troubleshooting

If you encounter issues:

1. Verify your Firecrawl API token is valid
2. Check that the URLs you're trying to scrape are accessible
3. For complex schemas, ensure they follow the supported format
4. Review Sentry logs for detailed error information (if configured) 
```

--------------------------------------------------------------------------------
/CLAUDE.md:
--------------------------------------------------------------------------------

```markdown
# MCP Firecrawl Development Guide

## Commands
- Start server: `npm start` or `FIRECRAWL_API_TOKEN=your_token_here npm start`
- Tests: No tests available yet (use `npm test` to confirm)

## Code Style Guidelines

### Structure
- ES Modules format (`type: "module"` in package.json)
- Server implementation in `/src/server.js`

### Imports & Exports
- Use ES Module imports: `import { Name } from "package"`
- Order imports: external libs first, internal modules second

### Formatting
- Use 2-space indentation
- Use single quotes for strings except when nesting: `'example'`
- Include semicolons at end of statements
- Prefer explicit return statements in functions

### Types
- Type validation using Zod schemas for API inputs
- Define API parameters with appropriate Zod validators

### Error Handling
- Use try/catch blocks for async operations
- Return standardized error responses with `isError: true`
- Log errors to console with descriptive messages
- Include error messages in responses

### Logging
- Use `console.error` for debug information and errors
```

--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------

```json
{
  "name": "mcp-firecrawl",
  "version": "1.0.0",
  "description": "An MCP server for crawling websites and translating into model context.",
  "main": "src/server.js",
  "type": "module",
  "scripts": {
    "start": "node src/server.js",
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "dependencies": {
    "@mendable/firecrawl-js": "^1.18.2",
    "@modelcontextprotocol/sdk": "^1.6.0",
    "@sentry/node": "^9.2.0",
    "@sentry/profiling-node": "^9.2.0",
    "firecrawl": "^1.0.0",
    "zod": "^3.22.4"
  }
}

```

--------------------------------------------------------------------------------
/src/server.js:
--------------------------------------------------------------------------------

```javascript
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
import FirecrawlApp from '@mendable/firecrawl-js';
import * as Sentry from "@sentry/node";
import { nodeProfilingIntegration } from "@sentry/profiling-node";

// Initialize Sentry
Sentry.init({
  dsn: process.env.SENTRY_DSN,
  integrations: [
    nodeProfilingIntegration(),
  ],
  tracesSampleRate: 1.0,
  profilesSampleRate: 1.0,
});

// Create an MCP server
const server = new McpServer({
  name: "Firecrawl MCP Server",
  version: "1.0.0"
});

// Get the Firecrawl API token from environment variable
const firecrawlToken = process.env.FIRECRAWL_API_TOKEN;
if (!firecrawlToken) {
  console.error("Error: FIRECRAWL_API_TOKEN environment variable is required");
  process.exit(1);
}

// Initialize Firecrawl client
const firecrawl = new FirecrawlApp({apiKey: firecrawlToken});

// Helper function to create a Zod schema from a schema definition
function createDynamicSchema(schemaDefinition) {
  const schemaMap = {
    string: z.string(),
    boolean: z.boolean(),
    number: z.number(),
    array: (itemType) => z.array(createDynamicSchema(itemType)),
    object: (properties) => {
      const shape = {};
      for (const [key, type] of Object.entries(properties)) {
        shape[key] = createDynamicSchema(type);
      }
      return z.object(shape);
    }
  };

  if (typeof schemaDefinition === 'string') {
    return schemaMap[schemaDefinition];
  } else if (Array.isArray(schemaDefinition)) {
    return schemaMap.array(schemaDefinition[0]);
  } else if (typeof schemaDefinition === 'object') {
    return schemaMap.object(schemaDefinition);
  }
  
  throw new Error(`Unsupported schema type: ${typeof schemaDefinition}`);
}

// Tool 1: Basic website scraping
server.tool(
  "scrape-website",
  { 
    url: z.string().url(),
    formats: z.array(z.enum(['markdown', 'html', 'text'])).default(['markdown'])
  },
  async ({ url, formats }) => {
    return await Sentry.startSpan(
      { name: "scrape-website" },
      async () => {
        try {
          // Debug input
          console.error('DEBUG: Scraping URL:', url, 'with formats:', formats);

          // Add Sentry breadcrumb for debugging
          Sentry.addBreadcrumb({
            category: 'scrape-website',
            message: `Scraping URL: ${url}`,
            data: { formats },
            level: 'info'
          });

          // Scrape the website
          const scrapeResult = await firecrawl.scrapeUrl(url, { 
            formats: formats 
          });

          // Debug raw response
          console.error('DEBUG: Raw scrape result:', JSON.stringify(scrapeResult, null, 2));

          if (!scrapeResult.success) {
            // Capture error in Sentry
            Sentry.captureMessage(`Failed to scrape website: ${scrapeResult.error}`, 'error');
            return {
              content: [{ 
                type: "text", 
                text: `Failed to scrape website: ${scrapeResult.error}` 
              }],
              isError: true
            };
          }

          // Return the content directly
          return {
            content: [{ 
              type: "text", 
              text: scrapeResult.markdown || scrapeResult.content || 'No content available'
            }]
          };

        } catch (error) {
          console.error('DEBUG: Caught error:', error);
          // Capture exception in Sentry
          Sentry.captureException(error);
          return {
            content: [{ 
              type: "text", 
              text: `Error scraping website: ${error.message}` 
            }],
            isError: true
          };
        }
      }
    );
  }
);

// Tool 2: Structured data extraction
server.tool(
  "extract-data",
  { 
    urls: z.array(z.string().url()),
    prompt: z.string(),
    schema: z.record(z.union([
      z.literal('string'),
      z.literal('boolean'),
      z.literal('number'),
      z.array(z.any()),
      z.record(z.any())
    ]))
  },
  async ({ urls, prompt, schema }) => {
    return await Sentry.startSpan(
      { name: "extract-data" },
      async () => {
        try {
          // Add Sentry breadcrumb for debugging
          Sentry.addBreadcrumb({
            category: 'extract-data',
            message: `Extracting data from URLs`,
            data: { urlCount: urls.length, prompt },
            level: 'info'
          });

          // Create the Zod schema from the provided definition
          const zodSchema = createDynamicSchema(schema);

          // Extract data using Firecrawl
          const extractResponse = await firecrawl.extract(urls, {
            prompt: prompt,
            schema: zodSchema
          });

          if (!extractResponse.success) {
            // Capture error in Sentry
            Sentry.captureMessage(`Failed to extract data: ${extractResponse.error}`, 'error');
            return {
              content: [{ 
                type: "text", 
                text: `Failed to extract data: ${extractResponse.error}` 
              }],
              isError: true
            };
          }

          return {
            content: [{ 
              type: "text", 
              text: JSON.stringify(extractResponse.data, null, 2)
            }]
          };
        } catch (error) {
          // Capture exception in Sentry
          Sentry.captureException(error);
          return {
            content: [{ 
              type: "text", 
              text: `Error extracting data: ${error.message}` 
            }],
            isError: true
          };
        }
      }
    );
  }
);

// Start receiving messages on stdin and sending messages on stdout
const transport = new StdioServerTransport();
await server.connect(transport); 
```