codyde/mcp-firecrawl-tool # codebase.md

# Directory Structure

```
├── CLAUDE.md
├── package-lock.json
├── package.json
├── README.md
└── src
    └── server.js
```

# Files

--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------

```markdown
  1 | # MCP Firecrawl Server
  2 | 
  3 | This is a simple MCP server that provides tools to scrape websites and extract structured data using Firecrawl's APIs.
  4 | 
  5 | ## Setup
  6 | 
  7 | 1. Install dependencies:
  8 | ```bash
  9 | npm install
 10 | ```
 11 | 
 12 | 2. Create a `.env` file in the root directory with the following variables:
 13 | ```
 14 | FIRECRAWL_API_TOKEN=your_token_here
 15 | SENTRY_DSN=your_sentry_dsn_here
 16 | ```
 17 | 
 18 | - `FIRECRAWL_API_TOKEN` (required): Your Firecrawl API token
 19 | - `SENTRY_DSN` (optional): Sentry DSN for error tracking and performance monitoring
 20 | 
 21 | 3. Start the server:
 22 | ```bash
 23 | npm start
 24 | ```
 25 | 
 26 | Alternatively, you can set environment variables directly when running the server:
 27 | ```bash
 28 | FIRECRAWL_API_TOKEN=your_token_here npm start
 29 | ```
 30 | 
 31 | ## Features
 32 | 
 33 | - **Website Scraping**: Extract content from websites in various formats
 34 | - **Structured Data Extraction**: Extract specific data points based on custom schemas
 35 | - **Error Tracking**: Integrated with Sentry for error tracking and performance monitoring
 36 | 
 37 | ## Usage
 38 | 
 39 | The server exposes two tools:
 40 | 1. `scrape-website`: Basic website scraping with multiple format options
 41 | 2. `extract-data`: Structured data extraction based on prompts and schemas
 42 | 
 43 | ### Tool: scrape-website
 44 | 
 45 | This tool scrapes a website and returns its content in the requested formats.
 46 | 
 47 | Parameters:
 48 | - `url` (string, required): The URL of the website to scrape
 49 | - `formats` (array of strings, optional): Array of desired output formats. Supported formats are:
 50 |   - `"markdown"` (default)
 51 |   - `"html"`
 52 |   - `"text"`
 53 | 
 54 | Example usage with MCP Inspector:
 55 | ```bash
 56 | # Basic usage (defaults to markdown)
 57 | mcp-inspector --tool scrape-website --args '{
 58 |   "url": "https://example.com"
 59 | }'
 60 | 
 61 | # Multiple formats
 62 | mcp-inspector --tool scrape-website --args '{
 63 |   "url": "https://example.com",
 64 |   "formats": ["markdown", "html", "text"]
 65 | }'
 66 | ```
 67 | 
 68 | ### Tool: extract-data
 69 | 
 70 | This tool extracts structured data from websites based on a provided prompt and schema.
 71 | 
 72 | Parameters:
 73 | - `urls` (array of strings, required): Array of URLs to extract data from
 74 | - `prompt` (string, required): The prompt describing what data to extract
 75 | - `schema` (object, required): Schema definition for the data to extract
 76 | 
 77 | The schema definition should be an object where keys are field names and values are types. Supported types are:
 78 | - `"string"`: For text fields
 79 | - `"boolean"`: For true/false fields
 80 | - `"number"`: For numeric fields
 81 | - Arrays: Specified as `["type"]` where type is one of the above
 82 | - Objects: Nested objects with their own type definitions
 83 | 
 84 | Example usage with MCP Inspector:
 85 | ```bash
 86 | # Basic example extracting company information
 87 | mcp-inspector --tool extract-data --args '{
 88 |   "urls": ["https://example.com"],
 89 |   "prompt": "Extract the company mission, whether it supports SSO, and whether it is open source.",
 90 |   "schema": {
 91 |     "company_mission": "string",
 92 |     "supports_sso": "boolean",
 93 |     "is_open_source": "boolean"
 94 |   }
 95 | }'
 96 | 
 97 | # Complex example with nested data
 98 | mcp-inspector --tool extract-data --args '{
 99 |   "urls": ["https://example.com/products", "https://example.com/pricing"],
100 |   "prompt": "Extract product information including name, price, and features.",
101 |   "schema": {
102 |     "products": [{
103 |       "name": "string",
104 |       "price": "number",
105 |       "features": ["string"]
106 |     }]
107 |   }
108 | }'
109 | ```
110 | 
111 | Both tools will return appropriate error messages if the scraping or extraction fails and automatically log errors to Sentry if configured.
112 | 
113 | ## Troubleshooting
114 | 
115 | If you encounter issues:
116 | 
117 | 1. Verify your Firecrawl API token is valid
118 | 2. Check that the URLs you're trying to scrape are accessible
119 | 3. For complex schemas, ensure they follow the supported format
120 | 4. Review Sentry logs for detailed error information (if configured) 
```

--------------------------------------------------------------------------------
/CLAUDE.md:
--------------------------------------------------------------------------------

```markdown
 1 | # MCP Firecrawl Development Guide
 2 | 
 3 | ## Commands
 4 | - Start server: `npm start` or `FIRECRAWL_API_TOKEN=your_token_here npm start`
 5 | - Tests: No tests available yet (use `npm test` to confirm)
 6 | 
 7 | ## Code Style Guidelines
 8 | 
 9 | ### Structure
10 | - ES Modules format (`type: "module"` in package.json)
11 | - Server implementation in `/src/server.js`
12 | 
13 | ### Imports & Exports
14 | - Use ES Module imports: `import { Name } from "package"`
15 | - Order imports: external libs first, internal modules second
16 | 
17 | ### Formatting
18 | - Use 2-space indentation
19 | - Use single quotes for strings except when nesting: `'example'`
20 | - Include semicolons at end of statements
21 | - Prefer explicit return statements in functions
22 | 
23 | ### Types
24 | - Type validation using Zod schemas for API inputs
25 | - Define API parameters with appropriate Zod validators
26 | 
27 | ### Error Handling
28 | - Use try/catch blocks for async operations
29 | - Return standardized error responses with `isError: true`
30 | - Log errors to console with descriptive messages
31 | - Include error messages in responses
32 | 
33 | ### Logging
34 | - Use `console.error` for debug information and errors
```

--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------

```json
 1 | {
 2 |   "name": "mcp-firecrawl",
 3 |   "version": "1.0.0",
 4 |   "description": "An MCP server for crawling websites and translating into model context.",
 5 |   "main": "src/server.js",
 6 |   "type": "module",
 7 |   "scripts": {
 8 |     "start": "node src/server.js",
 9 |     "test": "echo \"Error: no test specified\" && exit 1"
10 |   },
11 |   "author": "",
12 |   "license": "ISC",
13 |   "dependencies": {
14 |     "@mendable/firecrawl-js": "^1.18.2",
15 |     "@modelcontextprotocol/sdk": "^1.6.0",
16 |     "@sentry/node": "^9.2.0",
17 |     "@sentry/profiling-node": "^9.2.0",
18 |     "firecrawl": "^1.0.0",
19 |     "zod": "^3.22.4"
20 |   }
21 | }
22 | 
```

--------------------------------------------------------------------------------
/src/server.js:
--------------------------------------------------------------------------------

```javascript
  1 | import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
  2 | import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
  3 | import { z } from "zod";
  4 | import FirecrawlApp from '@mendable/firecrawl-js';
  5 | import * as Sentry from "@sentry/node";
  6 | import { nodeProfilingIntegration } from "@sentry/profiling-node";
  7 | 
  8 | // Initialize Sentry
  9 | Sentry.init({
 10 |   dsn: process.env.SENTRY_DSN,
 11 |   integrations: [
 12 |     nodeProfilingIntegration(),
 13 |   ],
 14 |   tracesSampleRate: 1.0,
 15 |   profilesSampleRate: 1.0,
 16 | });
 17 | 
 18 | // Create an MCP server
 19 | const server = new McpServer({
 20 |   name: "Firecrawl MCP Server",
 21 |   version: "1.0.0"
 22 | });
 23 | 
 24 | // Get the Firecrawl API token from environment variable
 25 | const firecrawlToken = process.env.FIRECRAWL_API_TOKEN;
 26 | if (!firecrawlToken) {
 27 |   console.error("Error: FIRECRAWL_API_TOKEN environment variable is required");
 28 |   process.exit(1);
 29 | }
 30 | 
 31 | // Initialize Firecrawl client
 32 | const firecrawl = new FirecrawlApp({apiKey: firecrawlToken});
 33 | 
 34 | // Helper function to create a Zod schema from a schema definition
 35 | function createDynamicSchema(schemaDefinition) {
 36 |   const schemaMap = {
 37 |     string: z.string(),
 38 |     boolean: z.boolean(),
 39 |     number: z.number(),
 40 |     array: (itemType) => z.array(createDynamicSchema(itemType)),
 41 |     object: (properties) => {
 42 |       const shape = {};
 43 |       for (const [key, type] of Object.entries(properties)) {
 44 |         shape[key] = createDynamicSchema(type);
 45 |       }
 46 |       return z.object(shape);
 47 |     }
 48 |   };
 49 | 
 50 |   if (typeof schemaDefinition === 'string') {
 51 |     return schemaMap[schemaDefinition];
 52 |   } else if (Array.isArray(schemaDefinition)) {
 53 |     return schemaMap.array(schemaDefinition[0]);
 54 |   } else if (typeof schemaDefinition === 'object') {
 55 |     return schemaMap.object(schemaDefinition);
 56 |   }
 57 |   
 58 |   throw new Error(`Unsupported schema type: ${typeof schemaDefinition}`);
 59 | }
 60 | 
 61 | // Tool 1: Basic website scraping
 62 | server.tool(
 63 |   "scrape-website",
 64 |   { 
 65 |     url: z.string().url(),
 66 |     formats: z.array(z.enum(['markdown', 'html', 'text'])).default(['markdown'])
 67 |   },
 68 |   async ({ url, formats }) => {
 69 |     return await Sentry.startSpan(
 70 |       { name: "scrape-website" },
 71 |       async () => {
 72 |         try {
 73 |           // Debug input
 74 |           console.error('DEBUG: Scraping URL:', url, 'with formats:', formats);
 75 | 
 76 |           // Add Sentry breadcrumb for debugging
 77 |           Sentry.addBreadcrumb({
 78 |             category: 'scrape-website',
 79 |             message: `Scraping URL: ${url}`,
 80 |             data: { formats },
 81 |             level: 'info'
 82 |           });
 83 | 
 84 |           // Scrape the website
 85 |           const scrapeResult = await firecrawl.scrapeUrl(url, { 
 86 |             formats: formats 
 87 |           });
 88 | 
 89 |           // Debug raw response
 90 |           console.error('DEBUG: Raw scrape result:', JSON.stringify(scrapeResult, null, 2));
 91 | 
 92 |           if (!scrapeResult.success) {
 93 |             // Capture error in Sentry
 94 |             Sentry.captureMessage(`Failed to scrape website: ${scrapeResult.error}`, 'error');
 95 |             return {
 96 |               content: [{ 
 97 |                 type: "text", 
 98 |                 text: `Failed to scrape website: ${scrapeResult.error}` 
 99 |               }],
100 |               isError: true
101 |             };
102 |           }
103 | 
104 |           // Return the content directly
105 |           return {
106 |             content: [{ 
107 |               type: "text", 
108 |               text: scrapeResult.markdown || scrapeResult.content || 'No content available'
109 |             }]
110 |           };
111 | 
112 |         } catch (error) {
113 |           console.error('DEBUG: Caught error:', error);
114 |           // Capture exception in Sentry
115 |           Sentry.captureException(error);
116 |           return {
117 |             content: [{ 
118 |               type: "text", 
119 |               text: `Error scraping website: ${error.message}` 
120 |             }],
121 |             isError: true
122 |           };
123 |         }
124 |       }
125 |     );
126 |   }
127 | );
128 | 
129 | // Tool 2: Structured data extraction
130 | server.tool(
131 |   "extract-data",
132 |   { 
133 |     urls: z.array(z.string().url()),
134 |     prompt: z.string(),
135 |     schema: z.record(z.union([
136 |       z.literal('string'),
137 |       z.literal('boolean'),
138 |       z.literal('number'),
139 |       z.array(z.any()),
140 |       z.record(z.any())
141 |     ]))
142 |   },
143 |   async ({ urls, prompt, schema }) => {
144 |     return await Sentry.startSpan(
145 |       { name: "extract-data" },
146 |       async () => {
147 |         try {
148 |           // Add Sentry breadcrumb for debugging
149 |           Sentry.addBreadcrumb({
150 |             category: 'extract-data',
151 |             message: `Extracting data from URLs`,
152 |             data: { urlCount: urls.length, prompt },
153 |             level: 'info'
154 |           });
155 | 
156 |           // Create the Zod schema from the provided definition
157 |           const zodSchema = createDynamicSchema(schema);
158 | 
159 |           // Extract data using Firecrawl
160 |           const extractResponse = await firecrawl.extract(urls, {
161 |             prompt: prompt,
162 |             schema: zodSchema
163 |           });
164 | 
165 |           if (!extractResponse.success) {
166 |             // Capture error in Sentry
167 |             Sentry.captureMessage(`Failed to extract data: ${extractResponse.error}`, 'error');
168 |             return {
169 |               content: [{ 
170 |                 type: "text", 
171 |                 text: `Failed to extract data: ${extractResponse.error}` 
172 |               }],
173 |               isError: true
174 |             };
175 |           }
176 | 
177 |           return {
178 |             content: [{ 
179 |               type: "text", 
180 |               text: JSON.stringify(extractResponse.data, null, 2)
181 |             }]
182 |           };
183 |         } catch (error) {
184 |           // Capture exception in Sentry
185 |           Sentry.captureException(error);
186 |           return {
187 |             content: [{ 
188 |               type: "text", 
189 |               text: `Error extracting data: ${error.message}` 
190 |             }],
191 |             isError: true
192 |           };
193 |         }
194 |       }
195 |     );
196 |   }
197 | );
198 | 
199 | // Start receiving messages on stdin and sending messages on stdout
200 | const transport = new StdioServerTransport();
201 | await server.connect(transport); 
```