# Directory Structure ``` ├── .eslintrc.json ├── .github │ └── workflows │ ├── ci.yml │ └── lint.yml ├── .gitignore ├── .prettierrc ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── docker-compose.yml ├── Dockerfile ├── docs │ ├── api.md │ └── configuration.md ├── examples │ ├── crawl-and-map.ts │ ├── scrape.ts │ └── search.ts ├── jest.config.js ├── LICENSE ├── package.json ├── pnpm-lock.yaml ├── README.md ├── smithery.yaml ├── src │ ├── error-handling.ts │ ├── index.ts │ ├── tools │ │ ├── crawl.ts │ │ ├── extract.ts │ │ ├── map.ts │ │ ├── scrape.ts │ │ └── search.ts │ └── types.ts ├── tests │ ├── index.test.ts │ ├── jest-setup.ts │ ├── setup.ts │ ├── tools │ │ └── scrape.test.ts │ └── types.d.ts ├── tsconfig.json └── tsconfig.test.json ``` # Files -------------------------------------------------------------------------------- /.prettierrc: -------------------------------------------------------------------------------- ``` { "semi": true, "trailingComma": "es5", "singleQuote": false, "printWidth": 80, "tabWidth": 2, "useTabs": false, "endOfLine": "lf", "arrowParens": "always", "bracketSpacing": true } ``` -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- ``` # Dependencies node_modules/ package-lock.json # Build outputs build/ dist/ *.tsbuildinfo # IDE and editor files .vscode/ .idea/ *.swp *.swo .DS_Store # Logs *.log npm-debug.log* yarn-debug.log* yarn-error.log* # Environment variables .env .env.local .env.*.local # Test coverage coverage/ # Temporary files tmp/ temp/ ``` -------------------------------------------------------------------------------- /.eslintrc.json: -------------------------------------------------------------------------------- ```json { "root": true, "parser": "@typescript-eslint/parser", "plugins": ["@typescript-eslint"], "extends": [ "eslint:recommended", "plugin:@typescript-eslint/recommended", "prettier" ], "env": { "node": true, "es2022": true }, "rules": { "@typescript-eslint/explicit-function-return-type": "off", "@typescript-eslint/no-explicit-any": "warn", "@typescript-eslint/no-unused-vars": ["error", { "argsIgnorePattern": "^_" }], "no-console": ["warn", { "allow": ["error", "warn"] }] }, "overrides": [ { "files": ["tests/**/*.ts"], "env": { "jest": true } } ] } ``` -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- ```markdown # Firecrawl MCP Server A Model Context Protocol (MCP) server for web scraping, content searching, site crawling, and data extraction using the Firecrawl API. ## Features - **Web Scraping**: Extract content from any webpage with customizable options - Mobile device emulation - Ad and popup blocking - Content filtering - Structured data extraction - Multiple output formats - **Content Search**: Intelligent search capabilities - Multi-language support - Location-based results - Customizable result limits - Structured output formats - **Site Crawling**: Advanced web crawling functionality - Depth control - Path filtering - Rate limiting - Progress tracking - Sitemap integration - **Site Mapping**: Generate site structure maps - Subdomain support - Search filtering - Link analysis - Visual hierarchy - **Data Extraction**: Extract structured data from multiple URLs - Schema validation - Batch processing - Web search enrichment - Custom extraction prompts ## Installation ```bash # Global installation npm install -g @modelcontextprotocol/mcp-server-firecrawl # Local project installation npm install @modelcontextprotocol/mcp-server-firecrawl ``` ## Quick Start 1. Get your Firecrawl API key from the [developer portal](https://firecrawl.dev/dashboard) 2. Set your API key: **Unix/Linux/macOS (bash/zsh):** ```bash export FIRECRAWL_API_KEY=your-api-key ``` **Windows (Command Prompt):** ```cmd set FIRECRAWL_API_KEY=your-api-key ``` **Windows (PowerShell):** ```powershell $env:FIRECRAWL_API_KEY = "your-api-key" ``` **Alternative: Using .env file (recommended for development):** ```bash # Install dotenv npm install dotenv # Create .env file echo "FIRECRAWL_API_KEY=your-api-key" > .env ``` Then in your code: ```javascript import dotenv from 'dotenv'; dotenv.config(); ``` 3. Run the server: ```bash mcp-server-firecrawl ``` ## Integration ### Claude Desktop App Add to your MCP settings: ```json { "firecrawl": { "command": "mcp-server-firecrawl", "env": { "FIRECRAWL_API_KEY": "your-api-key" } } } ``` ### Claude VSCode Extension Add to your MCP configuration: ```json { "mcpServers": { "firecrawl": { "command": "mcp-server-firecrawl", "env": { "FIRECRAWL_API_KEY": "your-api-key" } } } } ``` ## Usage Examples ### Web Scraping ```typescript // Basic scraping { name: "scrape_url", arguments: { url: "https://example.com", formats: ["markdown"], onlyMainContent: true } } // Advanced extraction { name: "scrape_url", arguments: { url: "https://example.com/blog", jsonOptions: { prompt: "Extract article content", schema: { title: "string", content: "string" } }, mobile: true, blockAds: true } } ``` ### Site Crawling ```typescript // Basic crawling { name: "crawl", arguments: { url: "https://example.com", maxDepth: 2, limit: 100 } } // Advanced crawling { name: "crawl", arguments: { url: "https://example.com", maxDepth: 3, includePaths: ["/blog", "/products"], excludePaths: ["/admin"], ignoreQueryParameters: true } } ``` ### Site Mapping ```typescript // Generate site map { name: "map", arguments: { url: "https://example.com", includeSubdomains: true, limit: 1000 } } ``` ### Data Extraction ```typescript // Extract structured data { name: "extract", arguments: { urls: ["https://example.com/product1", "https://example.com/product2"], prompt: "Extract product details", schema: { name: "string", price: "number", description: "string" } } } ``` ## Configuration See [configuration guide](https://github.com/Msparihar/mcp-server-firecrawl/blob/main/docs/configuration.md) for detailed setup options. ## API Documentation See [API documentation](https://github.com/Msparihar/mcp-server-firecrawl/blob/main/docs/api.md) for detailed endpoint specifications. ## Development ```bash # Install dependencies npm install # Build npm run build # Run tests npm test # Start in development mode npm run dev ``` ## Examples Check the [examples](https://github.com/Msparihar/mcp-server-firecrawl/tree/main/examples) directory for more usage examples: - Basic scraping: [scrape.ts](https://github.com/Msparihar/mcp-server-firecrawl/blob/main/examples/scrape.ts) - Crawling and mapping: [crawl-and-map.ts](https://github.com/Msparihar/mcp-server-firecrawl/blob/main/examples/crawl-and-map.ts) ## Error Handling The server implements robust error handling: - Rate limiting with exponential backoff - Automatic retries - Detailed error messages - Debug logging ## Security - API key protection - Request validation - Domain allowlisting - Rate limiting - Safe error messages ## Contributing See [CONTRIBUTING.md](https://github.com/Msparihar/mcp-server-firecrawl/blob/main/CONTRIBUTING.md) for contribution guidelines. ## License MIT License - see [LICENSE](https://github.com/Msparihar/mcp-server-firecrawl/blob/main/LICENSE) for details. ``` -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- ```markdown # Contributor Covenant Code of Conduct ## Our Pledge We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation. We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community. ## Our Standards Examples of behavior that contributes to a positive environment for our community include: * Demonstrating empathy and kindness toward other people * Being respectful of differing opinions, viewpoints, and experiences * Giving and gracefully accepting constructive feedback * Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience * Focusing on what is best not just for us as individuals, but for the overall community Examples of unacceptable behavior include: * The use of sexualized language or imagery, and sexual attention or advances of any kind * Trolling, insulting or derogatory comments, and personal or political attacks * Public or private harassment * Publishing others' private information, such as a physical or email address, without their explicit permission * Other conduct which could reasonably be considered inappropriate in a professional setting ## Enforcement Responsibilities Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful. ## Scope This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. ## Enforcement Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement. All complaints will be reviewed and investigated promptly and fairly. All community leaders are obligated to respect the privacy and security of the reporter of any incident. ## Attribution This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 2.0, available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html. [homepage]: https://www.contributor-covenant.org ``` -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- ```markdown # Contributing to Firecrawl MCP Server We love your input! We want to make contributing to the Firecrawl MCP server as easy and transparent as possible, whether it's: - Reporting a bug - Discussing the current state of the code - Submitting a fix - Proposing new features - Becoming a maintainer ## Development Process We use GitHub to host code, to track issues and feature requests, as well as accept pull requests. 1. Fork the repo and create your branch from `main` 2. If you've added code that should be tested, add tests 3. If you've changed APIs, update the documentation 4. Ensure the test suite passes 5. Make sure your code lints 6. Issue that pull request! ## Local Development Setup 1. Install dependencies: ```bash npm install ``` 2. Set up environment variables: ```bash export FIRECRAWL_API_KEY=your-api-key ``` 3. Start development server: ```bash npm run dev ``` ### Using Docker for Development 1. Start the development container: ```bash docker-compose up mcp-server-dev ``` 2. Run tests in container: ```bash docker-compose up mcp-server-test ``` ## Testing We use Jest for testing. Run the test suite with: ```bash npm test ``` Make sure to: - Write tests for new features - Maintain test coverage above 80% - Use meaningful test descriptions ## Code Style We use ESLint and Prettier to maintain code quality. Before committing: 1. Run linter: ```bash npm run lint ``` 2. Format code: ```bash npm run format ``` ## Documentation - Keep README.md updated - Document all new tools and configuration options - Update API documentation for changes - Include examples for new features ## Pull Request Process 1. Update the README.md with details of changes to the interface 2. Update the API documentation if endpoints or tools change 3. Update the version numbers following [SemVer](http://semver.org/) 4. The PR will be merged once you have the sign-off of two other developers ## Any Contributions You Make Will Be Under the MIT Software License In short, when you submit code changes, your submissions are understood to be under the same [MIT License](http://choosealicense.com/licenses/mit/) that covers the project. Feel free to contact the maintainers if that's a concern. ## Report Bugs Using GitHub's [Issue Tracker](https://github.com/yourusername/mcp-server-firecrawl/issues) Report a bug by [opening a new issue](https://github.com/yourusername/mcp-server-firecrawl/issues/new); it's that easy! ## Write Bug Reports With Detail, Background, and Sample Code **Great Bug Reports** tend to have: - A quick summary and/or background - Steps to reproduce - Be specific! - Give sample code if you can - What you expected would happen - What actually happens - Notes (possibly including why you think this might be happening, or stuff you tried that didn't work) ## License By contributing, you agree that your contributions will be licensed under its MIT License. ## References This document was adapted from the open-source contribution guidelines for [Facebook's Draft](https://github.com/facebook/draft-js/blob/a9316a723f9e918afde44dea68b5f9f39b7d9b00/CONTRIBUTING.md). ``` -------------------------------------------------------------------------------- /tsconfig.test.json: -------------------------------------------------------------------------------- ```json { "extends": "./tsconfig.json", "compilerOptions": { "rootDir": ".", "types": ["node", "jest", "@jest/globals"], "typeRoots": [ "./node_modules/@types", "./src/types", "./tests/types" ] }, "include": [ "src/**/*", "tests/**/*" ], "exclude": [ "node_modules", "build" ] } ``` -------------------------------------------------------------------------------- /.github/workflows/lint.yml: -------------------------------------------------------------------------------- ```yaml name: Lint on: push: branches: [ main ] pull_request: branches: [ main ] jobs: lint: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Use Node.js uses: actions/setup-node@v3 with: node-version: 18.x cache: 'pnpm' - name: Install pnpm uses: pnpm/action-setup@v2 with: version: 8 - name: Install dependencies run: pnpm install - name: Run linting run: pnpm run lint ``` -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- ```dockerfile # Build stage FROM node:18-alpine AS builder WORKDIR /app # Copy package files COPY package*.json ./ COPY tsconfig.json ./ # Install dependencies RUN npm ci # Copy source code COPY src/ ./src/ # Build TypeScript code RUN npm run build # Production stage FROM node:18-alpine WORKDIR /app # Copy package files and built code COPY package*.json ./ COPY --from=builder /app/build ./build # Install production dependencies only RUN npm ci --only=production # Set environment variables ENV NODE_ENV=production # Execute MCP server CMD ["node", "build/index.js"] ``` -------------------------------------------------------------------------------- /smithery.yaml: -------------------------------------------------------------------------------- ```yaml # Smithery configuration file: https://smithery.ai/docs/config#smitheryyaml startCommand: type: stdio configSchema: # JSON Schema defining the configuration options for the MCP. type: object required: - firecrawlApiKey properties: firecrawlApiKey: type: string description: The API key for the Firecrawl API. commandFunction: # A function that produces the CLI command to start the MCP on stdio. |- config => ({command: 'node', args: ['build/index.js'], env: {FIRECRAWL_API_KEY: config.firecrawlApiKey}}) ``` -------------------------------------------------------------------------------- /tests/jest-setup.ts: -------------------------------------------------------------------------------- ```typescript import { jest, expect } from "@jest/globals"; import type { ScrapeUrlArgs } from "../src/types"; declare global { namespace jest { interface Matchers<R> { toHaveBeenCalledWithUrl(url: string): R; } } } expect.extend({ toHaveBeenCalledWithUrl(received: jest.Mock, url: string) { const calls = received.mock.calls; const urlCalls = calls.some((call) => { const arg = call[0] as ScrapeUrlArgs; return arg && arg.url === url; }); return { pass: urlCalls, message: () => `expected ${received.getMockName()} to have been called with URL ${url}`, }; }, }); // Configure Jest globals global.jest = jest; ``` -------------------------------------------------------------------------------- /tsconfig.json: -------------------------------------------------------------------------------- ```json { "compilerOptions": { "target": "ES2020", "module": "ES2020", "moduleResolution": "node", "esModuleInterop": true, "allowJs": true, "checkJs": true, "declaration": true, "declarationMap": true, "sourceMap": true, "strict": true, "skipLibCheck": true, "forceConsistentCasingInFileNames": true, "outDir": "build", "rootDir": "src", "baseUrl": ".", "paths": { "*": ["node_modules/*", "src/types/*"] }, "typeRoots": [ "./node_modules/@types", "./src/types" ], "types": ["node", "jest"], "resolveJsonModule": true }, "include": [ "src/**/*" ], "exclude": [ "node_modules", "build", "tests", "**/*.test.ts" ] } ``` -------------------------------------------------------------------------------- /jest.config.js: -------------------------------------------------------------------------------- ```javascript /** @type {import('ts-jest').JestConfigWithTsJest} */ export default { preset: "ts-jest", testEnvironment: "node", extensionsToTreatAsEsm: [".ts"], moduleNameMapper: { "^(\\.{1,2}/.*)\\.js$": "$1", }, transform: { "^.+\\.tsx?$": [ "ts-jest", { useESM: true, }, ], }, setupFilesAfterEnv: ["<rootDir>/tests/jest-setup.ts"], testMatch: ["**/tests/**/*.test.ts"], collectCoverage: true, coverageDirectory: "coverage", coveragePathIgnorePatterns: ["/node_modules/", "/tests/", "/build/"], coverageThreshold: { global: { branches: 80, functions: 80, lines: 80, statements: 80, }, }, globals: { "ts-jest": { useESM: true, tsconfig: { // Override tsconfig for tests moduleResolution: "node", esModuleInterop: true, allowJs: true, checkJs: true, strict: true, types: ["node", "jest", "@jest/globals"], typeRoots: ["./node_modules/@types", "./src/types", "./tests/types"], }, }, }, }; ``` -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- ```yaml version: '3.8' services: # Production service mcp-server: build: context: . dockerfile: Dockerfile environment: - NODE_ENV=production - FIRECRAWL_API_KEY=${FIRECRAWL_API_KEY} - FIRECRAWL_TIMEOUT=30000 - FIRECRAWL_MAX_RETRIES=3 restart: unless-stopped stdin_open: true # Required for stdio transport tty: true # Development service with hot-reload mcp-server-dev: build: context: . dockerfile: Dockerfile target: builder command: npm run dev environment: - NODE_ENV=development - FIRECRAWL_API_KEY=${FIRECRAWL_API_KEY} - FIRECRAWL_TIMEOUT=30000 - FIRECRAWL_MAX_RETRIES=3 - DEBUG=true volumes: - ./src:/app/src - ./tests:/app/tests stdin_open: true tty: true # Test service mcp-server-test: build: context: . dockerfile: Dockerfile target: builder command: npm test environment: - NODE_ENV=test - FIRECRAWL_API_KEY=test-api-key volumes: - ./src:/app/src - ./tests:/app/tests - ./coverage:/app/coverage ``` -------------------------------------------------------------------------------- /.github/workflows/ci.yml: -------------------------------------------------------------------------------- ```yaml name: CI on: release: types: [created] jobs: test: runs-on: ubuntu-latest strategy: matrix: node-version: [18.x, 20.x] steps: - uses: actions/checkout@v3 - name: Use Node.js ${{ matrix.node-version }} uses: actions/setup-node@v3 with: node-version: ${{ matrix.node-version }} cache: 'pnpm' - name: Install pnpm uses: pnpm/action-setup@v2 with: version: 8 - name: Install dependencies run: pnpm install - name: Run tests run: pnpm test env: FIRECRAWL_API_KEY: test-api-key - name: Upload coverage reports uses: codecov/codecov-action@v3 with: token: ${{ secrets.CODECOV_TOKEN }} build: needs: test runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Use Node.js 18.x uses: actions/setup-node@v3 with: node-version: 18.x cache: 'pnpm' - name: Install pnpm uses: pnpm/action-setup@v2 with: version: 8 - name: Install dependencies run: pnpm install - name: Build run: pnpm run build ``` -------------------------------------------------------------------------------- /examples/search.ts: -------------------------------------------------------------------------------- ```typescript import { Client } from "@modelcontextprotocol/sdk/client/index.js"; import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js"; async function example() { // Create a new MCP client const client = new Client({ name: "firecrawl-example", version: "1.0.0", }); // Connect to the Firecrawl MCP server const transport = new StdioClientTransport({ command: "node index.js", env: { FIRECRAWL_API_KEY: "your-api-key-here" }, }); await client.connect(transport); try { // Example 1: Basic search with default options const result1 = await client.callTool({ name: "search_content", arguments: { query: "latest developments in artificial intelligence", }, }); console.log("Basic search result:", result1); // Example 2: Advanced search with custom options const result2 = await client.callTool({ name: "search_content", arguments: { query: "machine learning tutorials", scrapeOptions: { formats: ["markdown"], }, limit: 5, }, }); console.log("Advanced search result:", result2); } catch (error) { console.error("Error:", error); } finally { await client.close(); } } example().catch(console.error); ``` -------------------------------------------------------------------------------- /tests/setup.ts: -------------------------------------------------------------------------------- ```typescript /** * Test environment setup and configuration */ import { jest, beforeAll, afterAll } from "@jest/globals"; // Configure test environment variables process.env.FIRECRAWL_API_KEY = "test-api-key"; process.env.FIRECRAWL_API_BASE_URL = "https://api.test.firecrawl.dev/v1"; process.env.FIRECRAWL_TIMEOUT = "1000"; process.env.FIRECRAWL_MAX_RETRIES = "0"; process.env.DEBUG = "false"; // Clean up function for after tests export function cleanupEnvironment() { delete process.env.FIRECRAWL_API_KEY; delete process.env.FIRECRAWL_API_BASE_URL; delete process.env.FIRECRAWL_TIMEOUT; delete process.env.FIRECRAWL_MAX_RETRIES; delete process.env.DEBUG; } // Store original console methods const originalConsoleError = console.error; const originalConsoleWarn = console.warn; const originalConsoleLog = console.log; // Mock console methods to reduce noise in tests const mockConsole = { error: jest.fn(), warn: jest.fn(), log: jest.fn(), }; beforeAll(() => { // Replace console methods with mocks console.error = mockConsole.error; console.warn = mockConsole.warn; console.log = mockConsole.log; }); afterAll(() => { // Restore original console methods console.error = originalConsoleError; console.warn = originalConsoleWarn; console.log = originalConsoleLog; cleanupEnvironment(); }); // Export mocks for test usage export const consoleMocks = mockConsole; ``` -------------------------------------------------------------------------------- /package.json: -------------------------------------------------------------------------------- ```json { "name": "@modelcontextprotocol/mcp-server-firecrawl", "version": "1.0.0", "description": "MCP server for web scraping, content searching, and site mapping using the Firecrawl API", "keywords": [ "mcp", "model-context-protocol", "web-scraping", "search", "crawling", "site-mapping", "data-extraction", "firecrawl", "ai", "server" ], "homepage": "https://github.com/Msparihar/mcp-server-firecrawl#readme", "bugs": { "url": "https://github.com/Msparihar/mcp-server-firecrawl/issues" }, "repository": { "type": "git", "url": "git+https://github.com/Msparihar/mcp-server-firecrawl.git" }, "license": "MIT", "author": "Msparihar", "type": "module", "main": "build/index.js", "types": "./build/index.d.ts", "bin": { "mcp-server-firecrawl": "build/index.js" }, "directories": { "doc": "docs", "example": "examples", "test": "tests" }, "files": [ "build", "README.md", "LICENSE", "docs", "CONTRIBUTING.md", "CODE_OF_CONDUCT.md" ], "scripts": { "build": "tsc", "start": "node build/index.js", "dev": "tsc --watch", "test": "jest", "lint": "eslint . --ext .ts", "format": "prettier --write \"src/**/*.ts\"", "prepare": "npm run build", "prepublishOnly": "npm test && npm run lint", "preversion": "npm run lint", "version": "npm run format && git add -A src", "postversion": "git push && git push --tags" }, "dependencies": { "@modelcontextprotocol/sdk": "^1.4.1", "@types/node": "^22.13.1", "axios": "^1.7.9" }, "devDependencies": { "@jest/globals": "^29.7.0", "@types/jest": "^29.5.11", "@typescript-eslint/eslint-plugin": "^6.21.0", "@typescript-eslint/parser": "^6.21.0", "cross-env": "^7.0.3", "eslint": "^8.56.0", "eslint-config-prettier": "^9.1.0", "jest": "^29.7.0", "prettier": "^3.2.5", "ts-jest": "^29.1.2", "typescript": "^5.7.3" }, "engines": { "node": ">=18" } } ``` -------------------------------------------------------------------------------- /tests/types.d.ts: -------------------------------------------------------------------------------- ```typescript import type { AxiosInstance, AxiosResponse, AxiosResponseHeaders } from "axios"; import type { ScrapeUrlArgs, SearchContentArgs, CrawlArgs, MapArgs, ExtractArgs, ToolResponse, } from "../src/types"; declare global { // Extend Jest matchers namespace jest { interface Matchers<R> { toHaveBeenCalledWithUrl(url: string): R; toHaveBeenCalledWithValidArgs( type: "scrape" | "search" | "crawl" | "map" | "extract" ): R; } } } // Test-specific types export interface TestResponse<T = any> extends Omit<AxiosResponse<T>, "headers"> { data: T; status: number; headers: AxiosResponseHeaders; } // Mock function type type MockFunction<T extends (...args: any) => any> = { (...args: Parameters<T>): ReturnType<T>; mockClear: () => void; mockReset: () => void; mockImplementation: (fn: T) => MockFunction<T>; mockImplementationOnce: (fn: T) => MockFunction<T>; mockResolvedValue: <U>(value: U) => MockFunction<T>; mockResolvedValueOnce: <U>(value: U) => MockFunction<T>; mockRejectedValue: (error: unknown) => MockFunction<T>; mockRejectedValueOnce: (error: unknown) => MockFunction<T>; }; export interface MockAxiosInstance extends Omit<AxiosInstance, "get" | "post"> { get: MockFunction< <T = any>(url: string, config?: any) => Promise<TestResponse<T>> >; post: MockFunction< <T = any>(url: string, data?: any, config?: any) => Promise<TestResponse<T>> >; } export interface TestToolResponse extends ToolResponse { error?: string; metadata?: Record<string, unknown>; } export interface TestArgs { scrape: ScrapeUrlArgs; search: SearchContentArgs; crawl: CrawlArgs; map: MapArgs; extract: ExtractArgs; } // Helper type for resolved promise types type ResolvedType<T> = T extends Promise<infer R> ? R : T; // Console mock types export interface ConsoleMocks { error: MockFunction<typeof console.error>; warn: MockFunction<typeof console.warn>; log: MockFunction<typeof console.log>; } // Environment variable types export interface TestEnvironment { FIRECRAWL_API_KEY: string; FIRECRAWL_API_BASE_URL: string; FIRECRAWL_TIMEOUT: string; FIRECRAWL_MAX_RETRIES: string; DEBUG: string; } // Extend the Jest namespace declare global { namespace jest { type Mock<T extends (...args: any[]) => any> = MockFunction<T>; } } ``` -------------------------------------------------------------------------------- /examples/scrape.ts: -------------------------------------------------------------------------------- ```typescript /** * Example demonstrating how to use the scrape_url tool * * This example shows different ways to configure and use the scraping functionality, * including basic scraping, structured data extraction, and mobile-optimized scraping. */ import { ScrapeTool } from "../src/tools/scrape.js"; import { DEFAULT_ERROR_CONFIG } from "../src/error-handling.js"; import axios from "axios"; async function main() { // Create a test axios instance const axiosInstance = axios.create({ baseURL: process.env.FIRECRAWL_API_BASE_URL || "https://api.firecrawl.dev/v1", headers: { Authorization: `Bearer ${process.env.FIRECRAWL_API_KEY}`, "Content-Type": "application/json", }, }); // Initialize the scrape tool const scrapeTool = new ScrapeTool({ axiosInstance, errorConfig: DEFAULT_ERROR_CONFIG, }); try { // Basic scraping example console.log("Basic scraping example:"); const basicResult = await scrapeTool.execute({ url: "https://example.com", formats: ["markdown"], onlyMainContent: true, blockAds: true }); console.log(JSON.stringify(basicResult, null, 2)); // Advanced scraping with structured data extraction console.log("\nAdvanced scraping example:"); const advancedResult = await scrapeTool.execute({ url: "https://example.com/blog", jsonOptions: { prompt: "Extract the article title, author, date, and main content", schema: { type: "object", properties: { title: { type: "string" }, author: { type: "string" }, date: { type: "string" }, content: { type: "string" } }, required: ["title", "content"] } }, formats: ["markdown", "json"], mobile: true, location: { country: "US", languages: ["en-US"] }, waitFor: 2000, blockAds: true }); console.log(JSON.stringify(advancedResult, null, 2)); // Mobile-optimized scraping console.log("\nMobile scraping example:"); const mobileResult = await scrapeTool.execute({ url: "https://example.com/store", mobile: true, formats: ["markdown"], includeTags: ["article", "main", "product"], excludeTags: ["nav", "footer", "ads"], removeBase64Images: true }); console.log(JSON.stringify(mobileResult, null, 2)); } catch (error) { console.error("Error running examples:", error); process.exit(1); } } // Check for API key if (!process.env.FIRECRAWL_API_KEY) { console.error("Error: FIRECRAWL_API_KEY environment variable is required"); console.error("Please set it before running the example:"); console.error("export FIRECRAWL_API_KEY=your-api-key"); process.exit(1); } main().catch((error) => { console.error("Unhandled error:", error); process.exit(1); }); ``` -------------------------------------------------------------------------------- /tests/index.test.ts: -------------------------------------------------------------------------------- ```typescript import { Server } from "@modelcontextprotocol/sdk/server/index.js"; import { ListToolsRequestSchema } from "@modelcontextprotocol/sdk/types.js"; describe("Firecrawl MCP Server Structure", () => { let server: Server; beforeEach(() => { server = new Server( { name: "firecrawl", version: "1.0.0", }, { capabilities: { tools: {}, }, } ); }); describe("Tool Schema Validation", () => { it("should define required tools", async () => { const handler = server["requestHandlers"].get(ListToolsRequestSchema.name); expect(handler).toBeDefined(); if (!handler) throw new Error("Handler not found"); const result = await handler({ schema: ListToolsRequestSchema.name, params: {}, }); const tools = result.tools; expect(tools).toHaveLength(2); const scrapeUrlTool = tools.find((t) => t.name === "scrape_url"); expect(scrapeUrlTool).toBeDefined(); expect(scrapeUrlTool?.inputSchema.required).toContain("url"); const searchContentTool = tools.find((t) => t.name === "search_content"); expect(searchContentTool).toBeDefined(); expect(searchContentTool?.inputSchema.required).toContain("query"); }); it("should have valid schema for scrape_url tool", async () => { const handler = server["requestHandlers"].get(ListToolsRequestSchema.name); if (!handler) throw new Error("Handler not found"); const result = await handler({ schema: ListToolsRequestSchema.name, params: {}, }); const tool = result.tools.find((t) => t.name === "scrape_url"); expect(tool).toBeDefined(); expect(tool?.inputSchema.properties).toHaveProperty("url"); expect(tool?.inputSchema.properties).toHaveProperty("jsonOptions"); expect(tool?.inputSchema.properties).toHaveProperty("formats"); expect(tool?.inputSchema.properties).toHaveProperty("blockAds"); }); it("should have valid schema for search_content tool", async () => { const handler = server["requestHandlers"].get(ListToolsRequestSchema.name); if (!handler) throw new Error("Handler not found"); const result = await handler({ schema: ListToolsRequestSchema.name, params: {}, }); const tool = result.tools.find((t) => t.name === "search_content"); expect(tool).toBeDefined(); expect(tool?.inputSchema.properties).toHaveProperty("query"); expect(tool?.inputSchema.properties).toHaveProperty("scrapeOptions"); expect(tool?.inputSchema.properties).toHaveProperty("limit"); }); }); describe("Environment Validation", () => { it("should check for required environment variables", () => { expect(() => { process.env.FIRECRAWL_API_KEY = ""; // This will throw due to missing API key require("../src/index.js"); }).toThrow("FIRECRAWL_API_KEY environment variable is required"); }); }); }); ``` -------------------------------------------------------------------------------- /src/tools/search.ts: -------------------------------------------------------------------------------- ```typescript import { AxiosInstance } from "axios"; import { ErrorHandlingConfig, retryRequest } from "../error-handling.js"; import { SearchContentArgs } from "../types.js"; /** * Options for configuring the search tool */ export interface SearchToolOptions { /** Axios instance for making requests */ axiosInstance: AxiosInstance; /** Error handling configuration */ errorConfig: ErrorHandlingConfig; } /** * Handles content search operations */ export class SearchTool { private axiosInstance: AxiosInstance; private errorConfig: ErrorHandlingConfig; constructor(options: SearchToolOptions) { this.axiosInstance = options.axiosInstance; this.errorConfig = options.errorConfig; } /** * Get the tool definition for registration */ getDefinition() { return { name: "search_content", description: "Search content using Firecrawl API", inputSchema: { type: "object", properties: { query: { type: "string", description: "Search query", }, scrapeOptions: { type: "object", properties: { formats: { type: "array", items: { type: "string", enum: ["markdown"], }, description: "Output formats", }, }, }, limit: { type: "number", description: "Maximum number of results", minimum: 1, maximum: 100, }, lang: { type: "string", description: "Language code", default: "en", }, country: { type: "string", description: "Country code", default: "us", }, location: { type: "string", description: "Location parameter", }, timeout: { type: "number", description: "Request timeout in milliseconds", default: 60000, }, }, required: ["query"], }, }; } /** * Execute the search operation */ async execute(args: SearchContentArgs) { const response = await retryRequest( () => this.axiosInstance.post("/search", args), this.errorConfig ); return { content: [ { type: "text", text: JSON.stringify(response.data, null, 2), }, ], }; } /** * Validate the search operation arguments */ validate(args: unknown): args is SearchContentArgs { if (typeof args !== "object" || args === null) { return false; } const { query, scrapeOptions, limit } = args as any; if (typeof query !== "string") { return false; } if (scrapeOptions !== undefined) { if ( typeof scrapeOptions !== "object" || scrapeOptions === null || (scrapeOptions.formats !== undefined && (!Array.isArray(scrapeOptions.formats) || !scrapeOptions.formats.every((f: any) => typeof f === "string"))) ) { return false; } } if (limit !== undefined && typeof limit !== "number") { return false; } return true; } /** * Process search results with optional formatting * @private */ private processResults(results: any[], format = "markdown") { if (format === "markdown") { return results.map((result) => ({ title: result.title, url: result.url, snippet: result.snippet, content: result.content, })); } return results; } } ``` -------------------------------------------------------------------------------- /examples/crawl-and-map.ts: -------------------------------------------------------------------------------- ```typescript /** * Example demonstrating the use of crawl and map tools * * This example shows how to: * 1. Crawl a website with various configurations * 2. Map site structure * 3. Extract data from multiple pages * * To run this example: * 1. Set your API key: export FIRECRAWL_API_KEY=your-api-key * 2. Build the project: npm run build * 3. Run the example: ts-node examples/crawl-and-map.ts */ import { CrawlTool } from "../src/tools/crawl.js"; import { MapTool } from "../src/tools/map.js"; import { ExtractTool } from "../src/tools/extract.js"; import { DEFAULT_ERROR_CONFIG } from "../src/error-handling.js"; import axios from "axios"; async function main() { // Create a test axios instance const axiosInstance = axios.create({ baseURL: process.env.FIRECRAWL_API_BASE_URL || "https://api.firecrawl.dev/v1", headers: { Authorization: `Bearer ${process.env.FIRECRAWL_API_KEY}`, "Content-Type": "application/json", }, }); // Initialize tools const toolOptions = { axiosInstance, errorConfig: DEFAULT_ERROR_CONFIG, }; const crawlTool = new CrawlTool(toolOptions); const mapTool = new MapTool(toolOptions); const extractTool = new ExtractTool(toolOptions); try { // Basic crawling example console.log("Basic crawling example:"); const basicCrawlResult = await crawlTool.execute({ url: "https://example.com", maxDepth: 2, limit: 10, ignoreSitemap: false }); console.log(JSON.stringify(basicCrawlResult, null, 2)); // Advanced crawling with filters console.log("\nAdvanced crawling example:"); const advancedCrawlResult = await crawlTool.execute({ url: "https://example.com", maxDepth: 3, excludePaths: ["/admin", "/private", "/login"], includePaths: ["/blog", "/products"], ignoreQueryParameters: true, limit: 20, allowBackwardLinks: false, allowExternalLinks: false, scrapeOptions: { formats: ["markdown"] } }); console.log(JSON.stringify(advancedCrawlResult, null, 2)); // Site mapping example console.log("\nSite mapping example:"); const mapResult = await mapTool.execute({ url: "https://example.com", includeSubdomains: true, limit: 100, ignoreSitemap: false }); console.log(JSON.stringify(mapResult, null, 2)); // Targeted mapping with search console.log("\nTargeted mapping example:"); const targetedMapResult = await mapTool.execute({ url: "https://example.com", search: "products", sitemapOnly: true, limit: 50 }); console.log(JSON.stringify(targetedMapResult, null, 2)); // Extract data from crawled pages console.log("\nBulk data extraction example:"); const extractResult = await extractTool.execute({ urls: [ "https://example.com/products/1", "https://example.com/products/2", "https://example.com/products/3" ], prompt: "Extract product information including name, price, and description", schema: { type: "object", properties: { name: { type: "string" }, price: { type: "number" }, description: { type: "string" }, specifications: { type: "object", additionalProperties: true } }, required: ["name", "price"] }, enableWebSearch: false }); console.log(JSON.stringify(extractResult, null, 2)); } catch (error) { console.error("Error running examples:", error); process.exit(1); } } // Check for API key if (!process.env.FIRECRAWL_API_KEY) { console.error("Error: FIRECRAWL_API_KEY environment variable is required"); console.error("Please set it before running the example:"); console.error("export FIRECRAWL_API_KEY=your-api-key"); process.exit(1); } main().catch((error) => { console.error("Unhandled error:", error); process.exit(1); }); ``` -------------------------------------------------------------------------------- /src/error-handling.ts: -------------------------------------------------------------------------------- ```typescript import { ErrorCode, McpError } from "@modelcontextprotocol/sdk/types.js"; import type { AxiosError } from "axios"; import axios from "axios"; /** * Types of errors that can occur in Firecrawl operations */ export enum FirecrawlErrorType { /** Rate limit exceeded */ RateLimit = "RATE_LIMIT", /** Invalid input parameters */ InvalidInput = "INVALID_INPUT", /** Network or connection error */ NetworkError = "NETWORK_ERROR", /** API-related error */ APIError = "API_ERROR", } /** * Configuration for error handling and retries */ export interface ErrorHandlingConfig { /** Maximum number of retry attempts */ maxRetries: number; /** Initial delay between retries in milliseconds */ retryDelay: number; /** Multiplier for exponential backoff */ backoffMultiplier: number; /** Maximum delay between retries in milliseconds */ maxBackoff: number; /** Enable debug logging */ debug: boolean; } /** * Default configuration for error handling */ export const DEFAULT_ERROR_CONFIG: ErrorHandlingConfig = { maxRetries: 3, retryDelay: 1000, backoffMultiplier: 2, maxBackoff: 8000, debug: false, }; interface ApiErrorResponse { message?: string; error?: string; code?: string; } /** * Converts API errors to standardized MCP errors */ export const handleError = (error: unknown, debug = false): McpError => { if (debug) { console.error("[Debug] Error details:", error); } if (axios.isAxiosError(error)) { const axiosError = error as AxiosError<ApiErrorResponse>; const status = axiosError.response?.status; const responseData = axiosError.response?.data; const message = responseData?.message || responseData?.error || axiosError.message; switch (status) { case 429: return new McpError( ErrorCode.InvalidRequest, `Rate limit exceeded: ${message}` ); case 401: return new McpError( ErrorCode.InvalidParams, `Invalid API key: ${message}` ); case 400: return new McpError( ErrorCode.InvalidParams, `Invalid request: ${message}` ); case 404: return new McpError( ErrorCode.InvalidRequest, `Resource not found: ${message}` ); default: return new McpError(ErrorCode.InternalError, `API error: ${message}`); } } if (error instanceof Error) { return new McpError(ErrorCode.InternalError, error.message); } return new McpError(ErrorCode.InternalError, "An unknown error occurred"); }; /** * Determines if a request should be retried based on the error */ export const shouldRetry = ( error: unknown, retryCount: number, config: ErrorHandlingConfig ): boolean => { if (retryCount >= config.maxRetries) { return false; } if (axios.isAxiosError(error)) { const status = error.response?.status; return ( status === 429 || // Rate limit status === 500 || // Server error error.code === "ECONNABORTED" || // Timeout error.code === "ECONNRESET" // Connection reset ); } return false; }; /** * Calculates the delay for the next retry attempt */ export const calculateRetryDelay = ( retryCount: number, config: ErrorHandlingConfig ): number => { const delay = config.retryDelay * Math.pow(config.backoffMultiplier, retryCount - 1); return Math.min(delay, config.maxBackoff); }; /** * Retries a failed request with exponential backoff */ export const retryRequest = async <T>( requestFn: () => Promise<T>, config: ErrorHandlingConfig = DEFAULT_ERROR_CONFIG ): Promise<T> => { let retryCount = 0; let lastError: unknown; do { try { return await requestFn(); } catch (error) { lastError = error; if (!shouldRetry(error, retryCount, config)) { throw handleError(error, config.debug); } retryCount++; const delay = calculateRetryDelay(retryCount, config); if (config.debug) { console.error( `[Debug] Retry ${retryCount}/${config.maxRetries}, waiting ${delay}ms` ); } await new Promise((resolve) => setTimeout(resolve, delay)); } } while (retryCount <= config.maxRetries); throw handleError(lastError, config.debug); }; ``` -------------------------------------------------------------------------------- /src/tools/scrape.ts: -------------------------------------------------------------------------------- ```typescript import { AxiosInstance } from "axios"; import { ErrorHandlingConfig, retryRequest } from "../error-handling.js"; import { ScrapeUrlArgs } from "../types.js"; /** * Options for configuring the scrape tool */ export interface ScrapeToolOptions { /** Axios instance for making requests */ axiosInstance: AxiosInstance; /** Error handling configuration */ errorConfig: ErrorHandlingConfig; } /** * Handles content scraping operations */ export class ScrapeTool { private axiosInstance: AxiosInstance; private errorConfig: ErrorHandlingConfig; constructor(options: ScrapeToolOptions) { this.axiosInstance = options.axiosInstance; this.errorConfig = options.errorConfig; } /** * Get the tool definition for registration */ getDefinition() { return { name: "scrape_url", description: "Scrape content from a URL using Firecrawl API", inputSchema: { type: "object", properties: { url: { type: "string", description: "URL to scrape", }, jsonOptions: { type: "object", properties: { prompt: { type: "string", description: "Prompt for extracting specific information", }, schema: { type: "object", description: "Schema for extraction", }, systemPrompt: { type: "string", description: "System prompt for extraction", }, }, }, formats: { type: "array", items: { type: "string", enum: [ "markdown", "html", "rawHtml", "links", "screenshot", "screenshot@fullPage", "json", ], }, description: "Output formats", }, onlyMainContent: { type: "boolean", description: "Only return main content excluding headers, navs, footers", default: true, }, includeTags: { type: "array", items: { type: "string" }, description: "Tags to include in output", }, excludeTags: { type: "array", items: { type: "string" }, description: "Tags to exclude from output", }, waitFor: { type: "number", description: "Delay in milliseconds before fetching content", default: 0, }, mobile: { type: "boolean", description: "Emulate mobile device", default: false, }, location: { type: "object", properties: { country: { type: "string", description: "ISO 3166-1 alpha-2 country code", }, languages: { type: "array", items: { type: "string" }, description: "Preferred languages/locales", }, }, }, blockAds: { type: "boolean", description: "Enable ad/cookie popup blocking", default: true, }, }, required: ["url"], }, }; } /** * Execute the scrape operation */ async execute(args: ScrapeUrlArgs) { const response = await retryRequest( () => this.axiosInstance.post("/scrape", args), this.errorConfig ); return { content: [ { type: "text", text: JSON.stringify(response.data, null, 2), }, ], }; } /** * Validate the scrape operation arguments */ validate(args: unknown): args is ScrapeUrlArgs { if (typeof args !== "object" || args === null) { return false; } const { url, jsonOptions, formats } = args as any; if (typeof url !== "string") { return false; } if (jsonOptions !== undefined) { if ( typeof jsonOptions !== "object" || jsonOptions === null || typeof jsonOptions.prompt !== "string" ) { return false; } } if (formats !== undefined) { if ( !Array.isArray(formats) || !formats.every((f) => typeof f === "string") ) { return false; } } return true; } } ``` -------------------------------------------------------------------------------- /src/tools/map.ts: -------------------------------------------------------------------------------- ```typescript import { AxiosInstance } from "axios"; import { ErrorHandlingConfig, retryRequest } from "../error-handling.js"; import { MapArgs } from "../types.js"; /** * Options for configuring the map tool */ export interface MapToolOptions { /** Axios instance for making requests */ axiosInstance: AxiosInstance; /** Error handling configuration */ errorConfig: ErrorHandlingConfig; } /** * Structure representing a node in the site map */ interface SiteMapNode { url: string; children?: SiteMapNode[]; } /** * Link structure for mapping */ interface SiteLink { url: string; parent?: string; } /** * Handles website structure mapping operations */ export class MapTool { private axiosInstance: AxiosInstance; private errorConfig: ErrorHandlingConfig; constructor(options: MapToolOptions) { this.axiosInstance = options.axiosInstance; this.errorConfig = options.errorConfig; } /** * Get the tool definition for registration */ getDefinition() { return { name: "map", description: "Maps a website's structure", inputSchema: { type: "object", properties: { url: { type: "string", description: "Base URL to map", }, search: { type: "string", description: "Search query for mapping", }, ignoreSitemap: { type: "boolean", description: "Ignore sitemap.xml during mapping", }, sitemapOnly: { type: "boolean", description: "Only use sitemap.xml for mapping", }, includeSubdomains: { type: "boolean", description: "Include subdomains in mapping", }, limit: { type: "number", description: "Maximum links to return", default: 5000, }, timeout: { type: "number", description: "Request timeout", }, }, required: ["url"], }, }; } /** * Execute the map operation */ async execute(args: MapArgs) { const response = await retryRequest( () => this.axiosInstance.post("/map", args), this.errorConfig ); return { content: [ { type: "text", text: JSON.stringify(response.data, null, 2), }, ], }; } /** * Validate the map operation arguments */ validate(args: unknown): args is MapArgs { if (typeof args !== "object" || args === null) { return false; } const { url, search, limit, timeout, ignoreSitemap, sitemapOnly, includeSubdomains, } = args as any; if (typeof url !== "string") { return false; } if (search !== undefined && typeof search !== "string") { return false; } if (limit !== undefined && typeof limit !== "number") { return false; } if (timeout !== undefined && typeof timeout !== "number") { return false; } if (ignoreSitemap !== undefined && typeof ignoreSitemap !== "boolean") { return false; } if (sitemapOnly !== undefined && typeof sitemapOnly !== "boolean") { return false; } if (includeSubdomains !== undefined && typeof includeSubdomains !== "boolean") { return false; } return true; } /** * Format the mapping results into a tree structure * @private */ private formatTree(links: SiteLink[]): Record<string, SiteMapNode> { const tree: Record<string, SiteMapNode> = {}; const rootNodes = links.filter((link) => !link.parent); const buildNode = (node: SiteLink): SiteMapNode => { const children = links.filter((link) => link.parent === node.url); return { url: node.url, ...(children.length > 0 && { children: children.map(buildNode), }), }; }; rootNodes.forEach((node) => { tree[node.url] = buildNode(node); }); return tree; } /** * Extract sitemap URLs if available * @private */ private async getSitemapUrls(baseUrl: string): Promise<string[]> { try { const response = await this.axiosInstance.get(`${baseUrl}/sitemap.xml`); // Simple XML parsing for demonstration const urls = response.data.match(/<loc>(.*?)<\/loc>/g) || []; return urls.map((url: string) => url.replace(/<\/?loc>/g, "").trim() ); } catch { return []; } } } ``` -------------------------------------------------------------------------------- /src/tools/crawl.ts: -------------------------------------------------------------------------------- ```typescript import { AxiosInstance } from "axios"; import { ErrorHandlingConfig, retryRequest } from "../error-handling.js"; import { CrawlArgs } from "../types.js"; /** * Options for configuring the crawl tool */ export interface CrawlToolOptions { /** Axios instance for making requests */ axiosInstance: AxiosInstance; /** Error handling configuration */ errorConfig: ErrorHandlingConfig; } /** * Handles web crawling operations */ export class CrawlTool { private axiosInstance: AxiosInstance; private errorConfig: ErrorHandlingConfig; constructor(options: CrawlToolOptions) { this.axiosInstance = options.axiosInstance; this.errorConfig = options.errorConfig; } /** * Get the tool definition for registration */ getDefinition() { return { name: "crawl", description: "Crawls a website starting from a base URL", inputSchema: { type: "object", properties: { url: { type: "string", description: "Base URL to start crawling from", }, maxDepth: { type: "number", description: "Maximum crawl depth", default: 2, }, excludePaths: { type: "array", items: { type: "string" }, description: "URL patterns to exclude", }, includePaths: { type: "array", items: { type: "string" }, description: "URL patterns to include", }, ignoreSitemap: { type: "boolean", description: "Ignore sitemap.xml during crawling", }, ignoreQueryParameters: { type: "boolean", description: "Ignore URL query parameters when comparing URLs", }, limit: { type: "number", description: "Maximum pages to crawl", default: 10000, }, allowBackwardLinks: { type: "boolean", description: "Allow crawling links that point to parent directories", }, allowExternalLinks: { type: "boolean", description: "Allow crawling links to external domains", }, webhook: { type: "string", description: "Webhook URL for progress notifications", }, scrapeOptions: { type: "object", description: "Options for scraping crawled pages", }, }, required: ["url"], }, }; } /** * Execute the crawl operation */ async execute(args: CrawlArgs) { const response = await retryRequest( () => this.axiosInstance.post("/crawl", args), this.errorConfig ); return { content: [ { type: "text", text: JSON.stringify(response.data, null, 2), }, ], }; } /** * Validate the crawl operation arguments */ validate(args: unknown): args is CrawlArgs { if (typeof args !== "object" || args === null) { return false; } const { url, maxDepth, excludePaths, includePaths, limit, webhook, } = args as any; if (typeof url !== "string") { return false; } if (maxDepth !== undefined && typeof maxDepth !== "number") { return false; } if ( excludePaths !== undefined && (!Array.isArray(excludePaths) || !excludePaths.every((path) => typeof path === "string")) ) { return false; } if ( includePaths !== undefined && (!Array.isArray(includePaths) || !includePaths.every((path) => typeof path === "string")) ) { return false; } if (limit !== undefined && typeof limit !== "number") { return false; } if (webhook !== undefined && typeof webhook !== "string") { return false; } return true; } /** * Process and normalize URLs for crawling * @private */ private normalizeUrl(url: string): string { try { const parsed = new URL(url); return parsed.toString(); } catch { return url; } } /** * Check if a URL should be crawled based on patterns * @private */ private shouldCrawl( url: string, includePaths?: string[], excludePaths?: string[] ): boolean { if (excludePaths?.some((pattern) => url.includes(pattern))) { return false; } if (includePaths?.length && !includePaths.some((pattern) => url.includes(pattern))) { return false; } return true; } } ``` -------------------------------------------------------------------------------- /tests/tools/scrape.test.ts: -------------------------------------------------------------------------------- ```typescript import { jest, describe, expect, it, beforeEach } from "@jest/globals"; import type { AxiosInstance } from "axios"; import axios from "axios"; import { ScrapeTool } from "../../src/tools/scrape.js"; import { DEFAULT_ERROR_CONFIG } from "../../src/error-handling.js"; import type { ScrapeUrlArgs } from "../../src/types.js"; jest.mock("axios"); describe("ScrapeTool", () => { let scrapeTool: ScrapeTool; let mockAxiosInstance: jest.Mocked<AxiosInstance>; beforeEach(() => { mockAxiosInstance = { post: jest.fn(), get: jest.fn(), } as unknown as jest.Mocked<AxiosInstance>; (axios.create as jest.Mock).mockReturnValue(mockAxiosInstance); scrapeTool = new ScrapeTool({ axiosInstance: mockAxiosInstance, errorConfig: DEFAULT_ERROR_CONFIG, }); }); describe("execute", () => { it("should successfully scrape a URL with basic options", async () => { const mockResponse = { data: { content: "Scraped content", format: "markdown", }, }; mockAxiosInstance.post.mockResolvedValueOnce(mockResponse); const args: ScrapeUrlArgs = { url: "https://example.com", formats: ["markdown"], onlyMainContent: true, }; const result = await scrapeTool.execute(args); expect(mockAxiosInstance.post).toHaveBeenCalledWith("/scrape", args); expect(result).toEqual({ content: [ { type: "text", text: JSON.stringify(mockResponse.data, null, 2), }, ], }); }); it("should handle structured data extraction", async () => { const mockResponse = { data: { title: "Example Title", content: "Example Content", date: "2024-02-13", }, }; mockAxiosInstance.post.mockResolvedValueOnce(mockResponse); const args: ScrapeUrlArgs = { url: "https://example.com/blog", jsonOptions: { prompt: "Extract title and content", schema: { type: "object", properties: { title: { type: "string" }, content: { type: "string" }, }, }, }, formats: ["json"], }; const result = await scrapeTool.execute(args); expect(mockAxiosInstance.post).toHaveBeenCalledWith("/scrape", args); expect(result).toEqual({ content: [ { type: "text", text: JSON.stringify(mockResponse.data, null, 2), }, ], }); }); it("should handle mobile device emulation", async () => { const mockResponse = { data: { content: "Mobile optimized content", }, }; mockAxiosInstance.post.mockResolvedValueOnce(mockResponse); const args: ScrapeUrlArgs = { url: "https://example.com", mobile: true, location: { country: "US", languages: ["en-US"], }, blockAds: true, }; const result = await scrapeTool.execute(args); expect(mockAxiosInstance.post).toHaveBeenCalledWith("/scrape", args); expect(result).toEqual({ content: [ { type: "text", text: JSON.stringify(mockResponse.data, null, 2), }, ], }); }); it("should handle API errors", async () => { const errorMessage = "API request failed"; mockAxiosInstance.post.mockRejectedValueOnce(new Error(errorMessage)); await expect( scrapeTool.execute({ url: "https://example.com", }) ).rejects.toThrow(errorMessage); }); }); describe("validate", () => { it("should validate correct scrape arguments", () => { const args: ScrapeUrlArgs = { url: "https://example.com", formats: ["markdown"], onlyMainContent: true, }; expect(scrapeTool.validate(args)).toBe(true); }); it("should reject invalid scrape arguments", () => { const args = { formats: ["markdown"], }; expect(scrapeTool.validate(args)).toBe(false); }); it("should validate complex scrape arguments", () => { const args: ScrapeUrlArgs = { url: "https://example.com", jsonOptions: { prompt: "Extract data", schema: { type: "object" }, }, formats: ["json", "markdown"], mobile: true, location: { country: "US", languages: ["en-US"], }, }; expect(scrapeTool.validate(args)).toBe(true); }); }); }); ``` -------------------------------------------------------------------------------- /src/tools/extract.ts: -------------------------------------------------------------------------------- ```typescript import { AxiosInstance } from "axios"; import { ErrorHandlingConfig, retryRequest } from "../error-handling.js"; import { ExtractArgs } from "../types.js"; /** * Options for configuring the extract tool */ export interface ExtractToolOptions { /** Axios instance for making requests */ axiosInstance: AxiosInstance; /** Error handling configuration */ errorConfig: ErrorHandlingConfig; } /** * Interface for extraction results */ interface ExtractionResult { url: string; data: Record<string, any>; error?: string; } /** * Handles data extraction operations */ export class ExtractTool { private axiosInstance: AxiosInstance; private errorConfig: ErrorHandlingConfig; constructor(options: ExtractToolOptions) { this.axiosInstance = options.axiosInstance; this.errorConfig = options.errorConfig; } /** * Get the tool definition for registration */ getDefinition() { return { name: "extract", description: "Extracts structured data from URLs", inputSchema: { type: "object", properties: { urls: { type: "array", items: { type: "string" }, description: "URLs to extract from", }, prompt: { type: "string", description: "Extraction guidance prompt", }, schema: { type: "object", description: "Data structure schema", }, enableWebSearch: { type: "boolean", description: "Use web search for additional data", default: false, }, ignoreSitemap: { type: "boolean", description: "Ignore sitemap.xml during processing", }, includeSubdomains: { type: "boolean", description: "Include subdomains in processing", }, }, required: ["urls"], }, }; } /** * Execute the extract operation */ async execute(args: ExtractArgs) { const response = await retryRequest( () => this.axiosInstance.post("/extract", args), this.errorConfig ); return { content: [ { type: "text", text: JSON.stringify(response.data, null, 2), }, ], }; } /** * Validate the extract operation arguments */ validate(args: unknown): args is ExtractArgs { if (typeof args !== "object" || args === null) { return false; } const { urls, prompt, schema, enableWebSearch, ignoreSitemap, includeSubdomains, } = args as any; if (!Array.isArray(urls) || !urls.every((url) => typeof url === "string")) { return false; } if (prompt !== undefined && typeof prompt !== "string") { return false; } if (schema !== undefined && (typeof schema !== "object" || schema === null)) { return false; } if (enableWebSearch !== undefined && typeof enableWebSearch !== "boolean") { return false; } if (ignoreSitemap !== undefined && typeof ignoreSitemap !== "boolean") { return false; } if (includeSubdomains !== undefined && typeof includeSubdomains !== "boolean") { return false; } return true; } /** * Process a single URL for extraction * @private */ private async processUrl( url: string, options: { prompt?: string; schema?: object; } ): Promise<ExtractionResult> { try { const response = await this.axiosInstance.post("/extract", { urls: [url], ...options, }); return { url, data: response.data, }; } catch (error) { return { url, data: {}, error: error instanceof Error ? error.message : "Unknown error occurred", }; } } /** * Process multiple URLs in parallel with rate limiting * @private */ private async processBatch( urls: string[], options: { prompt?: string; schema?: object; batchSize?: number; delayMs?: number; } ): Promise<ExtractionResult[]> { const { batchSize = 5, delayMs = 1000 } = options; const results: ExtractionResult[] = []; for (let i = 0; i < urls.length; i += batchSize) { const batch = urls.slice(i, i + batchSize); const batchPromises = batch.map(url => this.processUrl(url, { prompt: options.prompt, schema: options.schema, }) ); results.push(...await Promise.all(batchPromises)); if (i + batchSize < urls.length) { await new Promise(resolve => setTimeout(resolve, delayMs)); } } return results; } } ``` -------------------------------------------------------------------------------- /src/types.ts: -------------------------------------------------------------------------------- ```typescript import type { Server } from "@modelcontextprotocol/sdk/server/index.js"; import type { ErrorCode, McpError, CallToolRequestSchema, ListToolsRequestSchema, } from "@modelcontextprotocol/sdk/types.js"; // Server types export interface ServerConfig { name: string; version: string; } export interface ServerCapabilities { tools: Record<string, unknown>; } // Request/Response types export interface RequestHandler<T> { (request: T): Promise<any>; } export interface ErrorHandler { (error: Error): void; } // Tool types export interface ToolDefinition { name: string; description: string; inputSchema: { type: string; properties: Record<string, unknown>; required?: string[]; }; } export interface ToolResponse { content: Array<{ type: string; text: string; }>; } // Re-export types from SDK export type { Server, ErrorCode, McpError, CallToolRequestSchema, ListToolsRequestSchema, }; // Configuration types export interface FirecrawlConfig { apiKey: string; apiBaseUrl?: string; timeout?: number; maxRetries?: number; retryDelay?: number; backoffMultiplier?: number; maxBackoff?: number; debug?: boolean; customHeaders?: Record<string, string>; allowedDomains?: string[]; validateRequests?: boolean; logLevel?: "error" | "warn" | "info" | "debug"; logFile?: string; sandbox?: boolean; } // Tool-specific types export interface ScrapeUrlArgs { url: string; jsonOptions?: { prompt: string; schema?: object; systemPrompt?: string; }; formats?: string[]; onlyMainContent?: boolean; includeTags?: string[]; excludeTags?: string[]; waitFor?: number; mobile?: boolean; location?: { country?: string; languages?: string[]; }; blockAds?: boolean; removeBase64Images?: boolean; } export interface SearchContentArgs { query: string; scrapeOptions?: { formats?: string[]; }; limit?: number; lang?: string; country?: string; location?: string; timeout?: number; } export interface CrawlArgs { url: string; maxDepth?: number; excludePaths?: string[]; includePaths?: string[]; ignoreSitemap?: boolean; ignoreQueryParameters?: boolean; limit?: number; allowBackwardLinks?: boolean; allowExternalLinks?: boolean; webhook?: string; scrapeOptions?: Record<string, unknown>; } export interface MapArgs { url: string; search?: string; ignoreSitemap?: boolean; sitemapOnly?: boolean; includeSubdomains?: boolean; limit?: number; timeout?: number; } export interface ExtractArgs { urls: string[]; prompt?: string; schema?: object; enableWebSearch?: boolean; ignoreSitemap?: boolean; includeSubdomains?: boolean; } // Type guards export const isScrapeUrlArgs = (args: unknown): args is ScrapeUrlArgs => { if (typeof args !== "object" || args === null) { return false; } const { url, jsonOptions, formats } = args as ScrapeUrlArgs; if (typeof url !== "string") { return false; } if (jsonOptions !== undefined) { if ( typeof jsonOptions !== "object" || jsonOptions === null || typeof jsonOptions.prompt !== "string" ) { return false; } } if (formats !== undefined) { if ( !Array.isArray(formats) || !formats.every((f) => typeof f === "string") ) { return false; } } return true; }; export const isSearchContentArgs = ( args: unknown ): args is SearchContentArgs => { if (typeof args !== "object" || args === null) { return false; } const { query, scrapeOptions, limit } = args as SearchContentArgs; if (typeof query !== "string") { return false; } if (scrapeOptions !== undefined) { if ( typeof scrapeOptions !== "object" || scrapeOptions === null || (scrapeOptions.formats !== undefined && (!Array.isArray(scrapeOptions.formats) || !scrapeOptions.formats.every((f) => typeof f === "string"))) ) { return false; } } if (limit !== undefined && typeof limit !== "number") { return false; } return true; }; export const isCrawlArgs = (args: unknown): args is CrawlArgs => { if (typeof args !== "object" || args === null) { return false; } const { url } = args as CrawlArgs; return typeof url === "string"; }; export const isMapArgs = (args: unknown): args is MapArgs => { if (typeof args !== "object" || args === null) { return false; } const { url } = args as MapArgs; return typeof url === "string"; }; export const isExtractArgs = (args: unknown): args is ExtractArgs => { if (typeof args !== "object" || args === null) { return false; } const { urls } = args as ExtractArgs; return Array.isArray(urls) && urls.every((url) => typeof url === "string"); }; ``` -------------------------------------------------------------------------------- /docs/configuration.md: -------------------------------------------------------------------------------- ```markdown # Firecrawl MCP Server Configuration Guide This guide explains how to configure and customize the Firecrawl MCP server for different environments and use cases. ## Environment Variables ### Required Variables - `FIRECRAWL_API_KEY`: Your Firecrawl API key (required) ```bash export FIRECRAWL_API_KEY=your-api-key-here ``` ### Optional Variables #### API Configuration - `FIRECRAWL_API_BASE_URL`: Override the default API endpoint ```bash export FIRECRAWL_API_BASE_URL=https://custom-api.firecrawl.dev/v1 ``` #### Request Configuration - `FIRECRAWL_TIMEOUT`: Request timeout in milliseconds ```bash export FIRECRAWL_TIMEOUT=30000 # 30 seconds ``` #### Retry Configuration - `FIRECRAWL_MAX_RETRIES`: Maximum retry attempts for failed requests ```bash export FIRECRAWL_MAX_RETRIES=3 ``` - `FIRECRAWL_RETRY_DELAY`: Initial delay between retries (milliseconds) ```bash export FIRECRAWL_RETRY_DELAY=1000 # 1 second ``` - `FIRECRAWL_BACKOFF_MULTIPLIER`: Multiplier for exponential backoff ```bash export FIRECRAWL_BACKOFF_MULTIPLIER=2 ``` - `FIRECRAWL_MAX_BACKOFF`: Maximum delay between retries (milliseconds) ```bash export FIRECRAWL_BACKOFF_MULTIPLIER=8000 # 8 seconds ``` #### Debugging - `DEBUG`: Enable debug logging ```bash export DEBUG=true ``` #### Security - `FIRECRAWL_VALIDATE_REQUESTS`: Enable request validation ```bash export FIRECRAWL_VALIDATE_REQUESTS=true ``` - `FIRECRAWL_ALLOWED_DOMAINS`: List of allowed domains (JSON array) ```bash export FIRECRAWL_ALLOWED_DOMAINS='["example.com","api.example.com"]' ``` ## Installation Methods ### Global Installation Install the server globally to run it from anywhere: ```bash npm install -g @modelcontextprotocol/mcp-server-firecrawl ``` Then run: ```bash mcp-server-firecrawl ``` ### Local Project Installation Install as a project dependency: ```bash npm install @modelcontextprotocol/mcp-server-firecrawl ``` Add to your package.json scripts: ```json { "scripts": { "start-mcp": "mcp-server-firecrawl" } } ``` ## Integration with MCP Clients ### Claude Desktop App 1. Open Claude desktop app settings 2. Navigate to MCP Server settings 3. Add new server configuration: ```json { "firecrawl": { "command": "mcp-server-firecrawl", "env": { "FIRECRAWL_API_KEY": "your-api-key", "DEBUG": "true" } } } ``` ### Claude VSCode Extension 1. Open VSCode settings 2. Search for "Claude MCP Settings" 3. Add server configuration: ```json { "mcpServers": { "firecrawl": { "command": "mcp-server-firecrawl", "env": { "FIRECRAWL_API_KEY": "your-api-key", "DEBUG": "true" } } } } ``` ## Advanced Configuration ### Custom HTTP Headers Add custom headers to API requests: ```bash export FIRECRAWL_CUSTOM_HEADERS='{"X-Custom-Header": "value"}' ``` ### Rate Limiting The server implements intelligent rate limiting with exponential backoff. Configure behavior with: ```bash export FIRECRAWL_MAX_RETRIES=3 export FIRECRAWL_RETRY_DELAY=1000 # milliseconds export FIRECRAWL_BACKOFF_MULTIPLIER=2 export FIRECRAWL_MAX_BACKOFF=8000 # milliseconds ``` ### Proxy Support Configure proxy settings using standard Node.js environment variables: ```bash export HTTP_PROXY=http://proxy.company.com:8080 export HTTPS_PROXY=http://proxy.company.com:8080 ``` ### Logging Configuration Customize logging behavior: ```bash # Log levels: error, warn, info, debug export FIRECRAWL_LOG_LEVEL=debug # Log to file export FIRECRAWL_LOG_FILE=/path/to/firecrawl.log ``` ## Development and Testing When developing or testing, you can use the sandbox environment: ```bash export FIRECRAWL_SANDBOX=true export FIRECRAWL_API_KEY=test-key ``` ## Security Considerations 1. API Key Security: - Store in environment variables or secure secrets management - Never commit to version control - Rotate keys periodically 2. Request Validation: ```bash export FIRECRAWL_VALIDATE_REQUESTS=true export FIRECRAWL_ALLOWED_DOMAINS='["trusted-domain.com"]' ``` 3. Rate Limiting: - Configure appropriate retry limits - Use exponential backoff - Monitor usage patterns ## Monitoring and Error Handling Enable comprehensive logging and monitoring: ```bash # Debug logging export DEBUG=true # Detailed error logging export FIRECRAWL_LOG_LEVEL=debug # Error tracking export FIRECRAWL_ERROR_TRACKING=true ``` ## Performance Tuning Optimize performance for your use case: ```bash # Increase timeouts for large operations export FIRECRAWL_TIMEOUT=60000 # Adjust concurrent request limits export FIRECRAWL_MAX_CONCURRENT=5 # Configure batch processing export FIRECRAWL_BATCH_SIZE=10 export FIRECRAWL_BATCH_DELAY=1000 ``` ## Docker Configuration When running in Docker, configure environment variables in your docker-compose.yml: ```yaml version: '3' services: firecrawl-mcp: image: mcp-server-firecrawl environment: - FIRECRAWL_API_KEY=your-api-key - DEBUG=true - FIRECRAWL_TIMEOUT=30000 volumes: - ./logs:/app/logs ``` Or use a .env file: ```env FIRECRAWL_API_KEY=your-api-key DEBUG=true FIRECRAWL_TIMEOUT=30000 ``` -------------------------------------------------------------------------------- /src/index.ts: -------------------------------------------------------------------------------- ```typescript #!/usr/bin/env node /** * Firecrawl MCP Server * A Model Context Protocol server for web scraping and content searching using the Firecrawl API. */ import { Server } from "@modelcontextprotocol/sdk/server/index.js"; import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js"; import { CallToolRequestSchema, ErrorCode, ListToolsRequestSchema, McpError, } from "@modelcontextprotocol/sdk/types.js"; import axios from "axios"; import { DEFAULT_ERROR_CONFIG, ErrorHandlingConfig, handleError, } from "./error-handling.js"; import { ScrapeTool } from "./tools/scrape.js"; import { SearchTool } from "./tools/search.js"; import { CrawlTool } from "./tools/crawl.js"; import { MapTool } from "./tools/map.js"; import { ExtractTool } from "./tools/extract.js"; import { isScrapeUrlArgs, isSearchContentArgs, isCrawlArgs, isMapArgs, isExtractArgs, } from "./types.js"; // Load and validate configuration const config = { apiKey: process.env.FIRECRAWL_API_KEY, apiBaseUrl: process.env.FIRECRAWL_API_BASE_URL || "https://api.firecrawl.dev/v1", timeout: parseInt(process.env.FIRECRAWL_TIMEOUT || "30000"), maxRetries: parseInt(process.env.FIRECRAWL_MAX_RETRIES || "3"), retryDelay: parseInt(process.env.FIRECRAWL_RETRY_DELAY || "1000"), debug: process.env.DEBUG === "true", }; if (!config.apiKey) { throw new Error("FIRECRAWL_API_KEY environment variable is required"); } /** * Main server class for the Firecrawl MCP implementation */ class FirecrawlServer { private server: Server; private axiosInstance; private errorConfig: ErrorHandlingConfig; private tools: { scrape: ScrapeTool; search: SearchTool; crawl: CrawlTool; map: MapTool; extract: ExtractTool; }; constructor() { this.server = new Server( { name: "firecrawl", version: "1.0.0", }, { capabilities: { tools: {}, }, } ); // Configure error handling this.errorConfig = { ...DEFAULT_ERROR_CONFIG, maxRetries: config.maxRetries, retryDelay: config.retryDelay, debug: config.debug, }; // Configure axios instance this.axiosInstance = axios.create({ baseURL: config.apiBaseUrl, timeout: config.timeout, headers: { Authorization: `Bearer ${config.apiKey}`, "Content-Type": "application/json", }, }); // Initialize tools const toolOptions = { axiosInstance: this.axiosInstance, errorConfig: this.errorConfig, }; this.tools = { scrape: new ScrapeTool(toolOptions), search: new SearchTool(toolOptions), crawl: new CrawlTool(toolOptions), map: new MapTool(toolOptions), extract: new ExtractTool(toolOptions), }; this.setupToolHandlers(); // Error handling this.server.onerror = (error: Error) => { console.error("[MCP Error]", error); if (config.debug) { console.error("[Debug] Stack trace:", error.stack); } }; process.on("SIGINT", async () => { await this.server.close(); process.exit(0); }); } /** * Set up the tool handlers for all operations */ private setupToolHandlers() { this.server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: [ this.tools.scrape.getDefinition(), this.tools.search.getDefinition(), this.tools.crawl.getDefinition(), this.tools.map.getDefinition(), this.tools.extract.getDefinition(), ], })); this.server.setRequestHandler(CallToolRequestSchema, async (request) => { try { const { name, arguments: args } = request.params; switch (name) { case "scrape_url": { if (!isScrapeUrlArgs(args)) { throw new McpError( ErrorCode.InvalidParams, "Invalid scrape_url arguments" ); } return await this.tools.scrape.execute(args); } case "search_content": { if (!isSearchContentArgs(args)) { throw new McpError( ErrorCode.InvalidParams, "Invalid search_content arguments" ); } return await this.tools.search.execute(args); } case "crawl": { if (!isCrawlArgs(args)) { throw new McpError( ErrorCode.InvalidParams, "Invalid crawl arguments" ); } return await this.tools.crawl.execute(args); } case "map": { if (!isMapArgs(args)) { throw new McpError( ErrorCode.InvalidParams, "Invalid map arguments" ); } return await this.tools.map.execute(args); } case "extract": { if (!isExtractArgs(args)) { throw new McpError( ErrorCode.InvalidParams, "Invalid extract arguments" ); } return await this.tools.extract.execute(args); } default: throw new McpError( ErrorCode.MethodNotFound, `Unknown tool: ${name}` ); } } catch (error) { throw handleError(error); } }); } /** * Start the MCP server */ async run() { const transport = new StdioServerTransport(); await this.server.connect(transport); console.error("Firecrawl MCP server running on stdio"); } } const server = new FirecrawlServer(); server.run().catch(console.error); ``` -------------------------------------------------------------------------------- /docs/api.md: -------------------------------------------------------------------------------- ```markdown # Firecrawl MCP Server API Documentation This document provides detailed information about the Firecrawl MCP server's API and available tools. ## Available Tools ### `scrape_url` Scrapes content from a specified URL with customizable extraction options. #### Input Schema ```typescript { url: string; // URL to scrape content from jsonOptions?: { prompt: string; // Prompt for extracting specific information schema?: object; // Schema for structured data extraction systemPrompt?: string; // System prompt for extraction context }; formats?: string[]; // Output formats (markdown, html, rawHtml, links, etc.) onlyMainContent?: boolean;// Exclude headers, navs, footers includeTags?: string[]; // HTML tags to include excludeTags?: string[]; // HTML tags to exclude waitFor?: number; // Delay before fetching (milliseconds) mobile?: boolean; // Emulate mobile device location?: { country?: string; // ISO 3166-1 alpha-2 country code languages?: string[]; // Preferred languages/locales }; blockAds?: boolean; // Enable ad/cookie popup blocking } ``` #### Example ```typescript { name: "scrape_url", arguments: { url: "https://example.com", jsonOptions: { prompt: "Extract the article title, date, and main content", schema: { title: "string", date: "string", content: "string" } }, formats: ["markdown"], mobile: true, blockAds: true } } ``` ### `search_content` Performs intelligent content searches with customizable parameters. #### Input Schema ```typescript { query: string; // Search query string scrapeOptions?: { formats?: string[]; // Output formats for results }; limit?: number; // Maximum results (1-100) lang?: string; // Language code country?: string; // Country code location?: string; // Location string timeout?: number; // Request timeout (milliseconds) } ``` #### Example ```typescript { name: "search_content", arguments: { query: "latest developments in AI", scrapeOptions: { formats: ["markdown"] }, limit: 10, lang: "en", country: "us" } } ``` ### `crawl` Crawls websites recursively with advanced configuration options. #### Input Schema ```typescript { url: string; // Base URL to start crawling maxDepth?: number; // Maximum crawl depth excludePaths?: string[]; // URL patterns to exclude includePaths?: string[]; // URL patterns to include ignoreSitemap?: boolean; // Ignore sitemap.xml ignoreQueryParameters?: boolean; // Ignore URL parameters limit?: number; // Maximum pages to crawl allowBackwardLinks?: boolean;// Allow parent directory links allowExternalLinks?: boolean;// Allow external domain links webhook?: string; // Progress notification URL scrapeOptions?: object; // Options for scraping pages } ``` #### Example ```typescript { name: "crawl", arguments: { url: "https://example.com", maxDepth: 3, excludePaths: ["/admin", "/private"], limit: 1000, scrapeOptions: { formats: ["markdown"] } } } ``` ### `map` Maps website structure and generates site hierarchies. #### Input Schema ```typescript { url: string; // Base URL to map search?: string; // Search query for filtering ignoreSitemap?: boolean; // Ignore sitemap.xml sitemapOnly?: boolean; // Only use sitemap.xml includeSubdomains?: boolean;// Include subdomains limit?: number; // Maximum links (default: 5000) timeout?: number; // Request timeout } ``` #### Example ```typescript { name: "map", arguments: { url: "https://example.com", includeSubdomains: true, limit: 1000 } } ``` ### `extract` Extracts structured data from multiple URLs with schema validation. #### Input Schema ```typescript { urls: string[]; // URLs to extract from prompt?: string; // Extraction guidance schema?: object; // Data structure schema enableWebSearch?: boolean; // Use web search for context ignoreSitemap?: boolean; // Ignore sitemap.xml includeSubdomains?: boolean;// Include subdomains } ``` #### Example ```typescript { name: "extract", arguments: { urls: ["https://example.com/page1", "https://example.com/page2"], prompt: "Extract product information", schema: { name: "string", price: "number", description: "string" } } } ``` ## Response Format All tools return responses in a standard format: ```typescript { content: [ { type: "text", text: string // JSON string containing the results } ] } ``` ## Error Handling The server implements robust error handling with retry capability: - Rate limiting with exponential backoff - Configurable retry attempts and delays - Detailed error messages - Debug logging options Error responses follow the standard MCP format: ```typescript { error: { code: ErrorCode; message: string; data?: unknown; } } ``` Common error codes: - `InvalidParams`: Invalid or missing parameters - `InvalidRequest`: Invalid request (e.g., rate limit) - `InternalError`: Server or API errors - `MethodNotFound`: Unknown tool name ## Configuration Server behavior can be customized through environment variables: ```bash # Required export FIRECRAWL_API_KEY=your-api-key # Optional export FIRECRAWL_API_BASE_URL=https://custom-api.firecrawl.dev/v1 export FIRECRAWL_TIMEOUT=30000 export FIRECRAWL_MAX_RETRIES=3 export FIRECRAWL_RETRY_DELAY=1000 export DEBUG=true ``` For detailed configuration options, see the [configuration guide](configuration.md). ```