# Directory Structure
```
├── CLAUDE.md
├── package-lock.json
├── package.json
├── README.md
└── src
└── server.js
```
# Files
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
```markdown
1 | # MCP Firecrawl Server
2 |
3 | This is a simple MCP server that provides tools to scrape websites and extract structured data using Firecrawl's APIs.
4 |
5 | ## Setup
6 |
7 | 1. Install dependencies:
8 | ```bash
9 | npm install
10 | ```
11 |
12 | 2. Create a `.env` file in the root directory with the following variables:
13 | ```
14 | FIRECRAWL_API_TOKEN=your_token_here
15 | SENTRY_DSN=your_sentry_dsn_here
16 | ```
17 |
18 | - `FIRECRAWL_API_TOKEN` (required): Your Firecrawl API token
19 | - `SENTRY_DSN` (optional): Sentry DSN for error tracking and performance monitoring
20 |
21 | 3. Start the server:
22 | ```bash
23 | npm start
24 | ```
25 |
26 | Alternatively, you can set environment variables directly when running the server:
27 | ```bash
28 | FIRECRAWL_API_TOKEN=your_token_here npm start
29 | ```
30 |
31 | ## Features
32 |
33 | - **Website Scraping**: Extract content from websites in various formats
34 | - **Structured Data Extraction**: Extract specific data points based on custom schemas
35 | - **Error Tracking**: Integrated with Sentry for error tracking and performance monitoring
36 |
37 | ## Usage
38 |
39 | The server exposes two tools:
40 | 1. `scrape-website`: Basic website scraping with multiple format options
41 | 2. `extract-data`: Structured data extraction based on prompts and schemas
42 |
43 | ### Tool: scrape-website
44 |
45 | This tool scrapes a website and returns its content in the requested formats.
46 |
47 | Parameters:
48 | - `url` (string, required): The URL of the website to scrape
49 | - `formats` (array of strings, optional): Array of desired output formats. Supported formats are:
50 | - `"markdown"` (default)
51 | - `"html"`
52 | - `"text"`
53 |
54 | Example usage with MCP Inspector:
55 | ```bash
56 | # Basic usage (defaults to markdown)
57 | mcp-inspector --tool scrape-website --args '{
58 | "url": "https://example.com"
59 | }'
60 |
61 | # Multiple formats
62 | mcp-inspector --tool scrape-website --args '{
63 | "url": "https://example.com",
64 | "formats": ["markdown", "html", "text"]
65 | }'
66 | ```
67 |
68 | ### Tool: extract-data
69 |
70 | This tool extracts structured data from websites based on a provided prompt and schema.
71 |
72 | Parameters:
73 | - `urls` (array of strings, required): Array of URLs to extract data from
74 | - `prompt` (string, required): The prompt describing what data to extract
75 | - `schema` (object, required): Schema definition for the data to extract
76 |
77 | The schema definition should be an object where keys are field names and values are types. Supported types are:
78 | - `"string"`: For text fields
79 | - `"boolean"`: For true/false fields
80 | - `"number"`: For numeric fields
81 | - Arrays: Specified as `["type"]` where type is one of the above
82 | - Objects: Nested objects with their own type definitions
83 |
84 | Example usage with MCP Inspector:
85 | ```bash
86 | # Basic example extracting company information
87 | mcp-inspector --tool extract-data --args '{
88 | "urls": ["https://example.com"],
89 | "prompt": "Extract the company mission, whether it supports SSO, and whether it is open source.",
90 | "schema": {
91 | "company_mission": "string",
92 | "supports_sso": "boolean",
93 | "is_open_source": "boolean"
94 | }
95 | }'
96 |
97 | # Complex example with nested data
98 | mcp-inspector --tool extract-data --args '{
99 | "urls": ["https://example.com/products", "https://example.com/pricing"],
100 | "prompt": "Extract product information including name, price, and features.",
101 | "schema": {
102 | "products": [{
103 | "name": "string",
104 | "price": "number",
105 | "features": ["string"]
106 | }]
107 | }
108 | }'
109 | ```
110 |
111 | Both tools will return appropriate error messages if the scraping or extraction fails and automatically log errors to Sentry if configured.
112 |
113 | ## Troubleshooting
114 |
115 | If you encounter issues:
116 |
117 | 1. Verify your Firecrawl API token is valid
118 | 2. Check that the URLs you're trying to scrape are accessible
119 | 3. For complex schemas, ensure they follow the supported format
120 | 4. Review Sentry logs for detailed error information (if configured)
```
--------------------------------------------------------------------------------
/CLAUDE.md:
--------------------------------------------------------------------------------
```markdown
1 | # MCP Firecrawl Development Guide
2 |
3 | ## Commands
4 | - Start server: `npm start` or `FIRECRAWL_API_TOKEN=your_token_here npm start`
5 | - Tests: No tests available yet (use `npm test` to confirm)
6 |
7 | ## Code Style Guidelines
8 |
9 | ### Structure
10 | - ES Modules format (`type: "module"` in package.json)
11 | - Server implementation in `/src/server.js`
12 |
13 | ### Imports & Exports
14 | - Use ES Module imports: `import { Name } from "package"`
15 | - Order imports: external libs first, internal modules second
16 |
17 | ### Formatting
18 | - Use 2-space indentation
19 | - Use single quotes for strings except when nesting: `'example'`
20 | - Include semicolons at end of statements
21 | - Prefer explicit return statements in functions
22 |
23 | ### Types
24 | - Type validation using Zod schemas for API inputs
25 | - Define API parameters with appropriate Zod validators
26 |
27 | ### Error Handling
28 | - Use try/catch blocks for async operations
29 | - Return standardized error responses with `isError: true`
30 | - Log errors to console with descriptive messages
31 | - Include error messages in responses
32 |
33 | ### Logging
34 | - Use `console.error` for debug information and errors
```
--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------
```json
1 | {
2 | "name": "mcp-firecrawl",
3 | "version": "1.0.0",
4 | "description": "An MCP server for crawling websites and translating into model context.",
5 | "main": "src/server.js",
6 | "type": "module",
7 | "scripts": {
8 | "start": "node src/server.js",
9 | "test": "echo \"Error: no test specified\" && exit 1"
10 | },
11 | "author": "",
12 | "license": "ISC",
13 | "dependencies": {
14 | "@mendable/firecrawl-js": "^1.18.2",
15 | "@modelcontextprotocol/sdk": "^1.6.0",
16 | "@sentry/node": "^9.2.0",
17 | "@sentry/profiling-node": "^9.2.0",
18 | "firecrawl": "^1.0.0",
19 | "zod": "^3.22.4"
20 | }
21 | }
22 |
```
--------------------------------------------------------------------------------
/src/server.js:
--------------------------------------------------------------------------------
```javascript
1 | import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
2 | import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
3 | import { z } from "zod";
4 | import FirecrawlApp from '@mendable/firecrawl-js';
5 | import * as Sentry from "@sentry/node";
6 | import { nodeProfilingIntegration } from "@sentry/profiling-node";
7 |
8 | // Initialize Sentry
9 | Sentry.init({
10 | dsn: process.env.SENTRY_DSN,
11 | integrations: [
12 | nodeProfilingIntegration(),
13 | ],
14 | tracesSampleRate: 1.0,
15 | profilesSampleRate: 1.0,
16 | });
17 |
18 | // Create an MCP server
19 | const server = new McpServer({
20 | name: "Firecrawl MCP Server",
21 | version: "1.0.0"
22 | });
23 |
24 | // Get the Firecrawl API token from environment variable
25 | const firecrawlToken = process.env.FIRECRAWL_API_TOKEN;
26 | if (!firecrawlToken) {
27 | console.error("Error: FIRECRAWL_API_TOKEN environment variable is required");
28 | process.exit(1);
29 | }
30 |
31 | // Initialize Firecrawl client
32 | const firecrawl = new FirecrawlApp({apiKey: firecrawlToken});
33 |
34 | // Helper function to create a Zod schema from a schema definition
35 | function createDynamicSchema(schemaDefinition) {
36 | const schemaMap = {
37 | string: z.string(),
38 | boolean: z.boolean(),
39 | number: z.number(),
40 | array: (itemType) => z.array(createDynamicSchema(itemType)),
41 | object: (properties) => {
42 | const shape = {};
43 | for (const [key, type] of Object.entries(properties)) {
44 | shape[key] = createDynamicSchema(type);
45 | }
46 | return z.object(shape);
47 | }
48 | };
49 |
50 | if (typeof schemaDefinition === 'string') {
51 | return schemaMap[schemaDefinition];
52 | } else if (Array.isArray(schemaDefinition)) {
53 | return schemaMap.array(schemaDefinition[0]);
54 | } else if (typeof schemaDefinition === 'object') {
55 | return schemaMap.object(schemaDefinition);
56 | }
57 |
58 | throw new Error(`Unsupported schema type: ${typeof schemaDefinition}`);
59 | }
60 |
61 | // Tool 1: Basic website scraping
62 | server.tool(
63 | "scrape-website",
64 | {
65 | url: z.string().url(),
66 | formats: z.array(z.enum(['markdown', 'html', 'text'])).default(['markdown'])
67 | },
68 | async ({ url, formats }) => {
69 | return await Sentry.startSpan(
70 | { name: "scrape-website" },
71 | async () => {
72 | try {
73 | // Debug input
74 | console.error('DEBUG: Scraping URL:', url, 'with formats:', formats);
75 |
76 | // Add Sentry breadcrumb for debugging
77 | Sentry.addBreadcrumb({
78 | category: 'scrape-website',
79 | message: `Scraping URL: ${url}`,
80 | data: { formats },
81 | level: 'info'
82 | });
83 |
84 | // Scrape the website
85 | const scrapeResult = await firecrawl.scrapeUrl(url, {
86 | formats: formats
87 | });
88 |
89 | // Debug raw response
90 | console.error('DEBUG: Raw scrape result:', JSON.stringify(scrapeResult, null, 2));
91 |
92 | if (!scrapeResult.success) {
93 | // Capture error in Sentry
94 | Sentry.captureMessage(`Failed to scrape website: ${scrapeResult.error}`, 'error');
95 | return {
96 | content: [{
97 | type: "text",
98 | text: `Failed to scrape website: ${scrapeResult.error}`
99 | }],
100 | isError: true
101 | };
102 | }
103 |
104 | // Return the content directly
105 | return {
106 | content: [{
107 | type: "text",
108 | text: scrapeResult.markdown || scrapeResult.content || 'No content available'
109 | }]
110 | };
111 |
112 | } catch (error) {
113 | console.error('DEBUG: Caught error:', error);
114 | // Capture exception in Sentry
115 | Sentry.captureException(error);
116 | return {
117 | content: [{
118 | type: "text",
119 | text: `Error scraping website: ${error.message}`
120 | }],
121 | isError: true
122 | };
123 | }
124 | }
125 | );
126 | }
127 | );
128 |
129 | // Tool 2: Structured data extraction
130 | server.tool(
131 | "extract-data",
132 | {
133 | urls: z.array(z.string().url()),
134 | prompt: z.string(),
135 | schema: z.record(z.union([
136 | z.literal('string'),
137 | z.literal('boolean'),
138 | z.literal('number'),
139 | z.array(z.any()),
140 | z.record(z.any())
141 | ]))
142 | },
143 | async ({ urls, prompt, schema }) => {
144 | return await Sentry.startSpan(
145 | { name: "extract-data" },
146 | async () => {
147 | try {
148 | // Add Sentry breadcrumb for debugging
149 | Sentry.addBreadcrumb({
150 | category: 'extract-data',
151 | message: `Extracting data from URLs`,
152 | data: { urlCount: urls.length, prompt },
153 | level: 'info'
154 | });
155 |
156 | // Create the Zod schema from the provided definition
157 | const zodSchema = createDynamicSchema(schema);
158 |
159 | // Extract data using Firecrawl
160 | const extractResponse = await firecrawl.extract(urls, {
161 | prompt: prompt,
162 | schema: zodSchema
163 | });
164 |
165 | if (!extractResponse.success) {
166 | // Capture error in Sentry
167 | Sentry.captureMessage(`Failed to extract data: ${extractResponse.error}`, 'error');
168 | return {
169 | content: [{
170 | type: "text",
171 | text: `Failed to extract data: ${extractResponse.error}`
172 | }],
173 | isError: true
174 | };
175 | }
176 |
177 | return {
178 | content: [{
179 | type: "text",
180 | text: JSON.stringify(extractResponse.data, null, 2)
181 | }]
182 | };
183 | } catch (error) {
184 | // Capture exception in Sentry
185 | Sentry.captureException(error);
186 | return {
187 | content: [{
188 | type: "text",
189 | text: `Error extracting data: ${error.message}`
190 | }],
191 | isError: true
192 | };
193 | }
194 | }
195 | );
196 | }
197 | );
198 |
199 | // Start receiving messages on stdin and sending messages on stdout
200 | const transport = new StdioServerTransport();
201 | await server.connect(transport);
```