# Directory Structure ``` ├── .eslintrc.json ├── .github │ └── workflows │ ├── ci.yml │ └── lint.yml ├── .gitignore ├── .prettierrc ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── docker-compose.yml ├── Dockerfile ├── docs │ ├── api.md │ └── configuration.md ├── examples │ ├── crawl-and-map.ts │ ├── scrape.ts │ └── search.ts ├── jest.config.js ├── LICENSE ├── package.json ├── pnpm-lock.yaml ├── README.md ├── smithery.yaml ├── src │ ├── error-handling.ts │ ├── index.ts │ ├── tools │ │ ├── crawl.ts │ │ ├── extract.ts │ │ ├── map.ts │ │ ├── scrape.ts │ │ └── search.ts │ └── types.ts ├── tests │ ├── index.test.ts │ ├── jest-setup.ts │ ├── setup.ts │ ├── tools │ │ └── scrape.test.ts │ └── types.d.ts ├── tsconfig.json └── tsconfig.test.json ``` # Files -------------------------------------------------------------------------------- /.prettierrc: -------------------------------------------------------------------------------- ``` 1 | { 2 | "semi": true, 3 | "trailingComma": "es5", 4 | "singleQuote": false, 5 | "printWidth": 80, 6 | "tabWidth": 2, 7 | "useTabs": false, 8 | "endOfLine": "lf", 9 | "arrowParens": "always", 10 | "bracketSpacing": true 11 | } 12 | ``` -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- ``` 1 | # Dependencies 2 | node_modules/ 3 | package-lock.json 4 | 5 | # Build outputs 6 | build/ 7 | dist/ 8 | *.tsbuildinfo 9 | 10 | # IDE and editor files 11 | .vscode/ 12 | .idea/ 13 | *.swp 14 | *.swo 15 | .DS_Store 16 | 17 | # Logs 18 | *.log 19 | npm-debug.log* 20 | yarn-debug.log* 21 | yarn-error.log* 22 | 23 | # Environment variables 24 | .env 25 | .env.local 26 | .env.*.local 27 | 28 | # Test coverage 29 | coverage/ 30 | 31 | # Temporary files 32 | tmp/ 33 | temp/ 34 | ``` -------------------------------------------------------------------------------- /.eslintrc.json: -------------------------------------------------------------------------------- ```json 1 | { 2 | "root": true, 3 | "parser": "@typescript-eslint/parser", 4 | "plugins": ["@typescript-eslint"], 5 | "extends": [ 6 | "eslint:recommended", 7 | "plugin:@typescript-eslint/recommended", 8 | "prettier" 9 | ], 10 | "env": { 11 | "node": true, 12 | "es2022": true 13 | }, 14 | "rules": { 15 | "@typescript-eslint/explicit-function-return-type": "off", 16 | "@typescript-eslint/no-explicit-any": "warn", 17 | "@typescript-eslint/no-unused-vars": ["error", { "argsIgnorePattern": "^_" }], 18 | "no-console": ["warn", { "allow": ["error", "warn"] }] 19 | }, 20 | "overrides": [ 21 | { 22 | "files": ["tests/**/*.ts"], 23 | "env": { 24 | "jest": true 25 | } 26 | } 27 | ] 28 | } 29 | ``` -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- ```markdown 1 | # Firecrawl MCP Server 2 | 3 | A Model Context Protocol (MCP) server for web scraping, content searching, site crawling, and data extraction using the Firecrawl API. 4 | 5 | ## Features 6 | 7 | - **Web Scraping**: Extract content from any webpage with customizable options 8 | - Mobile device emulation 9 | - Ad and popup blocking 10 | - Content filtering 11 | - Structured data extraction 12 | - Multiple output formats 13 | 14 | - **Content Search**: Intelligent search capabilities 15 | - Multi-language support 16 | - Location-based results 17 | - Customizable result limits 18 | - Structured output formats 19 | 20 | - **Site Crawling**: Advanced web crawling functionality 21 | - Depth control 22 | - Path filtering 23 | - Rate limiting 24 | - Progress tracking 25 | - Sitemap integration 26 | 27 | - **Site Mapping**: Generate site structure maps 28 | - Subdomain support 29 | - Search filtering 30 | - Link analysis 31 | - Visual hierarchy 32 | 33 | - **Data Extraction**: Extract structured data from multiple URLs 34 | - Schema validation 35 | - Batch processing 36 | - Web search enrichment 37 | - Custom extraction prompts 38 | 39 | ## Installation 40 | 41 | ```bash 42 | # Global installation 43 | npm install -g @modelcontextprotocol/mcp-server-firecrawl 44 | 45 | # Local project installation 46 | npm install @modelcontextprotocol/mcp-server-firecrawl 47 | ``` 48 | 49 | ## Quick Start 50 | 51 | 1. Get your Firecrawl API key from the [developer portal](https://firecrawl.dev/dashboard) 52 | 53 | 2. Set your API key: 54 | 55 | **Unix/Linux/macOS (bash/zsh):** 56 | 57 | ```bash 58 | export FIRECRAWL_API_KEY=your-api-key 59 | ``` 60 | 61 | **Windows (Command Prompt):** 62 | 63 | ```cmd 64 | set FIRECRAWL_API_KEY=your-api-key 65 | ``` 66 | 67 | **Windows (PowerShell):** 68 | 69 | ```powershell 70 | $env:FIRECRAWL_API_KEY = "your-api-key" 71 | ``` 72 | 73 | **Alternative: Using .env file (recommended for development):** 74 | 75 | ```bash 76 | # Install dotenv 77 | npm install dotenv 78 | 79 | # Create .env file 80 | echo "FIRECRAWL_API_KEY=your-api-key" > .env 81 | ``` 82 | 83 | Then in your code: 84 | 85 | ```javascript 86 | import dotenv from 'dotenv'; 87 | dotenv.config(); 88 | ``` 89 | 90 | 3. Run the server: 91 | 92 | ```bash 93 | mcp-server-firecrawl 94 | ``` 95 | 96 | ## Integration 97 | 98 | ### Claude Desktop App 99 | 100 | Add to your MCP settings: 101 | 102 | ```json 103 | { 104 | "firecrawl": { 105 | "command": "mcp-server-firecrawl", 106 | "env": { 107 | "FIRECRAWL_API_KEY": "your-api-key" 108 | } 109 | } 110 | } 111 | ``` 112 | 113 | ### Claude VSCode Extension 114 | 115 | Add to your MCP configuration: 116 | 117 | ```json 118 | { 119 | "mcpServers": { 120 | "firecrawl": { 121 | "command": "mcp-server-firecrawl", 122 | "env": { 123 | "FIRECRAWL_API_KEY": "your-api-key" 124 | } 125 | } 126 | } 127 | } 128 | ``` 129 | 130 | ## Usage Examples 131 | 132 | ### Web Scraping 133 | 134 | ```typescript 135 | // Basic scraping 136 | { 137 | name: "scrape_url", 138 | arguments: { 139 | url: "https://example.com", 140 | formats: ["markdown"], 141 | onlyMainContent: true 142 | } 143 | } 144 | 145 | // Advanced extraction 146 | { 147 | name: "scrape_url", 148 | arguments: { 149 | url: "https://example.com/blog", 150 | jsonOptions: { 151 | prompt: "Extract article content", 152 | schema: { 153 | title: "string", 154 | content: "string" 155 | } 156 | }, 157 | mobile: true, 158 | blockAds: true 159 | } 160 | } 161 | ``` 162 | 163 | ### Site Crawling 164 | 165 | ```typescript 166 | // Basic crawling 167 | { 168 | name: "crawl", 169 | arguments: { 170 | url: "https://example.com", 171 | maxDepth: 2, 172 | limit: 100 173 | } 174 | } 175 | 176 | // Advanced crawling 177 | { 178 | name: "crawl", 179 | arguments: { 180 | url: "https://example.com", 181 | maxDepth: 3, 182 | includePaths: ["/blog", "/products"], 183 | excludePaths: ["/admin"], 184 | ignoreQueryParameters: true 185 | } 186 | } 187 | ``` 188 | 189 | ### Site Mapping 190 | 191 | ```typescript 192 | // Generate site map 193 | { 194 | name: "map", 195 | arguments: { 196 | url: "https://example.com", 197 | includeSubdomains: true, 198 | limit: 1000 199 | } 200 | } 201 | ``` 202 | 203 | ### Data Extraction 204 | 205 | ```typescript 206 | // Extract structured data 207 | { 208 | name: "extract", 209 | arguments: { 210 | urls: ["https://example.com/product1", "https://example.com/product2"], 211 | prompt: "Extract product details", 212 | schema: { 213 | name: "string", 214 | price: "number", 215 | description: "string" 216 | } 217 | } 218 | } 219 | ``` 220 | 221 | ## Configuration 222 | 223 | See [configuration guide](https://github.com/Msparihar/mcp-server-firecrawl/blob/main/docs/configuration.md) for detailed setup options. 224 | 225 | ## API Documentation 226 | 227 | See [API documentation](https://github.com/Msparihar/mcp-server-firecrawl/blob/main/docs/api.md) for detailed endpoint specifications. 228 | 229 | ## Development 230 | 231 | ```bash 232 | # Install dependencies 233 | npm install 234 | 235 | # Build 236 | npm run build 237 | 238 | # Run tests 239 | npm test 240 | 241 | # Start in development mode 242 | npm run dev 243 | ``` 244 | 245 | ## Examples 246 | 247 | Check the [examples](https://github.com/Msparihar/mcp-server-firecrawl/tree/main/examples) directory for more usage examples: 248 | 249 | - Basic scraping: [scrape.ts](https://github.com/Msparihar/mcp-server-firecrawl/blob/main/examples/scrape.ts) 250 | - Crawling and mapping: [crawl-and-map.ts](https://github.com/Msparihar/mcp-server-firecrawl/blob/main/examples/crawl-and-map.ts) 251 | 252 | ## Error Handling 253 | 254 | The server implements robust error handling: 255 | 256 | - Rate limiting with exponential backoff 257 | - Automatic retries 258 | - Detailed error messages 259 | - Debug logging 260 | 261 | ## Security 262 | 263 | - API key protection 264 | - Request validation 265 | - Domain allowlisting 266 | - Rate limiting 267 | - Safe error messages 268 | 269 | ## Contributing 270 | 271 | See [CONTRIBUTING.md](https://github.com/Msparihar/mcp-server-firecrawl/blob/main/CONTRIBUTING.md) for contribution guidelines. 272 | 273 | ## License 274 | 275 | MIT License - see [LICENSE](https://github.com/Msparihar/mcp-server-firecrawl/blob/main/LICENSE) for details. 276 | ``` -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- ```markdown 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | We as members, contributors, and leaders pledge to make participation in our 6 | community a harassment-free experience for everyone, regardless of age, body 7 | size, visible or invisible disability, ethnicity, sex characteristics, gender 8 | identity and expression, level of experience, education, socio-economic status, 9 | nationality, personal appearance, race, religion, or sexual identity 10 | and orientation. 11 | 12 | We pledge to act and interact in ways that contribute to an open, welcoming, 13 | diverse, inclusive, and healthy community. 14 | 15 | ## Our Standards 16 | 17 | Examples of behavior that contributes to a positive environment for our 18 | community include: 19 | 20 | * Demonstrating empathy and kindness toward other people 21 | * Being respectful of differing opinions, viewpoints, and experiences 22 | * Giving and gracefully accepting constructive feedback 23 | * Accepting responsibility and apologizing to those affected by our mistakes, 24 | and learning from the experience 25 | * Focusing on what is best not just for us as individuals, but for the 26 | overall community 27 | 28 | Examples of unacceptable behavior include: 29 | 30 | * The use of sexualized language or imagery, and sexual attention or 31 | advances of any kind 32 | * Trolling, insulting or derogatory comments, and personal or political attacks 33 | * Public or private harassment 34 | * Publishing others' private information, such as a physical or email 35 | address, without their explicit permission 36 | * Other conduct which could reasonably be considered inappropriate in a 37 | professional setting 38 | 39 | ## Enforcement Responsibilities 40 | 41 | Community leaders are responsible for clarifying and enforcing our standards of 42 | acceptable behavior and will take appropriate and fair corrective action in 43 | response to any behavior that they deem inappropriate, threatening, offensive, 44 | or harmful. 45 | 46 | ## Scope 47 | 48 | This Code of Conduct applies within all community spaces, and also applies when 49 | an individual is officially representing the community in public spaces. 50 | 51 | ## Enforcement 52 | 53 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 54 | reported to the community leaders responsible for enforcement. 55 | All complaints will be reviewed and investigated promptly and fairly. 56 | 57 | All community leaders are obligated to respect the privacy and security of the 58 | reporter of any incident. 59 | 60 | ## Attribution 61 | 62 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], 63 | version 2.0, available at 64 | https://www.contributor-covenant.org/version/2/0/code_of_conduct.html. 65 | 66 | [homepage]: https://www.contributor-covenant.org 67 | ``` -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- ```markdown 1 | # Contributing to Firecrawl MCP Server 2 | 3 | We love your input! We want to make contributing to the Firecrawl MCP server as easy and transparent as possible, whether it's: 4 | 5 | - Reporting a bug 6 | - Discussing the current state of the code 7 | - Submitting a fix 8 | - Proposing new features 9 | - Becoming a maintainer 10 | 11 | ## Development Process 12 | 13 | We use GitHub to host code, to track issues and feature requests, as well as accept pull requests. 14 | 15 | 1. Fork the repo and create your branch from `main` 16 | 2. If you've added code that should be tested, add tests 17 | 3. If you've changed APIs, update the documentation 18 | 4. Ensure the test suite passes 19 | 5. Make sure your code lints 20 | 6. Issue that pull request! 21 | 22 | ## Local Development Setup 23 | 24 | 1. Install dependencies: 25 | 26 | ```bash 27 | npm install 28 | ``` 29 | 30 | 2. Set up environment variables: 31 | 32 | ```bash 33 | export FIRECRAWL_API_KEY=your-api-key 34 | ``` 35 | 36 | 3. Start development server: 37 | 38 | ```bash 39 | npm run dev 40 | ``` 41 | 42 | ### Using Docker for Development 43 | 44 | 1. Start the development container: 45 | 46 | ```bash 47 | docker-compose up mcp-server-dev 48 | ``` 49 | 50 | 2. Run tests in container: 51 | 52 | ```bash 53 | docker-compose up mcp-server-test 54 | ``` 55 | 56 | ## Testing 57 | 58 | We use Jest for testing. Run the test suite with: 59 | 60 | ```bash 61 | npm test 62 | ``` 63 | 64 | Make sure to: 65 | 66 | - Write tests for new features 67 | - Maintain test coverage above 80% 68 | - Use meaningful test descriptions 69 | 70 | ## Code Style 71 | 72 | We use ESLint and Prettier to maintain code quality. Before committing: 73 | 74 | 1. Run linter: 75 | 76 | ```bash 77 | npm run lint 78 | ``` 79 | 80 | 2. Format code: 81 | 82 | ```bash 83 | npm run format 84 | ``` 85 | 86 | ## Documentation 87 | 88 | - Keep README.md updated 89 | - Document all new tools and configuration options 90 | - Update API documentation for changes 91 | - Include examples for new features 92 | 93 | ## Pull Request Process 94 | 95 | 1. Update the README.md with details of changes to the interface 96 | 2. Update the API documentation if endpoints or tools change 97 | 3. Update the version numbers following [SemVer](http://semver.org/) 98 | 4. The PR will be merged once you have the sign-off of two other developers 99 | 100 | ## Any Contributions You Make Will Be Under the MIT Software License 101 | 102 | In short, when you submit code changes, your submissions are understood to be under the same [MIT License](http://choosealicense.com/licenses/mit/) that covers the project. Feel free to contact the maintainers if that's a concern. 103 | 104 | ## Report Bugs Using GitHub's [Issue Tracker](https://github.com/yourusername/mcp-server-firecrawl/issues) 105 | 106 | Report a bug by [opening a new issue](https://github.com/yourusername/mcp-server-firecrawl/issues/new); it's that easy! 107 | 108 | ## Write Bug Reports With Detail, Background, and Sample Code 109 | 110 | **Great Bug Reports** tend to have: 111 | 112 | - A quick summary and/or background 113 | - Steps to reproduce 114 | - Be specific! 115 | - Give sample code if you can 116 | - What you expected would happen 117 | - What actually happens 118 | - Notes (possibly including why you think this might be happening, or stuff you tried that didn't work) 119 | 120 | ## License 121 | 122 | By contributing, you agree that your contributions will be licensed under its MIT License. 123 | 124 | ## References 125 | 126 | This document was adapted from the open-source contribution guidelines for [Facebook's Draft](https://github.com/facebook/draft-js/blob/a9316a723f9e918afde44dea68b5f9f39b7d9b00/CONTRIBUTING.md). 127 | ``` -------------------------------------------------------------------------------- /tsconfig.test.json: -------------------------------------------------------------------------------- ```json 1 | { 2 | "extends": "./tsconfig.json", 3 | "compilerOptions": { 4 | "rootDir": ".", 5 | "types": ["node", "jest", "@jest/globals"], 6 | "typeRoots": [ 7 | "./node_modules/@types", 8 | "./src/types", 9 | "./tests/types" 10 | ] 11 | }, 12 | "include": [ 13 | "src/**/*", 14 | "tests/**/*" 15 | ], 16 | "exclude": [ 17 | "node_modules", 18 | "build" 19 | ] 20 | } 21 | ``` -------------------------------------------------------------------------------- /.github/workflows/lint.yml: -------------------------------------------------------------------------------- ```yaml 1 | name: Lint 2 | 3 | on: 4 | push: 5 | branches: [ main ] 6 | pull_request: 7 | branches: [ main ] 8 | 9 | jobs: 10 | lint: 11 | runs-on: ubuntu-latest 12 | 13 | steps: 14 | - uses: actions/checkout@v3 15 | 16 | - name: Use Node.js 17 | uses: actions/setup-node@v3 18 | with: 19 | node-version: 18.x 20 | cache: 'pnpm' 21 | 22 | - name: Install pnpm 23 | uses: pnpm/action-setup@v2 24 | with: 25 | version: 8 26 | 27 | - name: Install dependencies 28 | run: pnpm install 29 | 30 | - name: Run linting 31 | run: pnpm run lint 32 | ``` -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- ```dockerfile 1 | # Build stage 2 | FROM node:18-alpine AS builder 3 | WORKDIR /app 4 | 5 | # Copy package files 6 | COPY package*.json ./ 7 | COPY tsconfig.json ./ 8 | 9 | # Install dependencies 10 | RUN npm ci 11 | 12 | # Copy source code 13 | COPY src/ ./src/ 14 | 15 | # Build TypeScript code 16 | RUN npm run build 17 | 18 | # Production stage 19 | FROM node:18-alpine 20 | WORKDIR /app 21 | 22 | # Copy package files and built code 23 | COPY package*.json ./ 24 | COPY --from=builder /app/build ./build 25 | 26 | # Install production dependencies only 27 | RUN npm ci --only=production 28 | 29 | # Set environment variables 30 | ENV NODE_ENV=production 31 | 32 | # Execute MCP server 33 | CMD ["node", "build/index.js"] 34 | ``` -------------------------------------------------------------------------------- /smithery.yaml: -------------------------------------------------------------------------------- ```yaml 1 | # Smithery configuration file: https://smithery.ai/docs/config#smitheryyaml 2 | 3 | startCommand: 4 | type: stdio 5 | configSchema: 6 | # JSON Schema defining the configuration options for the MCP. 7 | type: object 8 | required: 9 | - firecrawlApiKey 10 | properties: 11 | firecrawlApiKey: 12 | type: string 13 | description: The API key for the Firecrawl API. 14 | commandFunction: 15 | # A function that produces the CLI command to start the MCP on stdio. 16 | |- 17 | config => ({command: 'node', args: ['build/index.js'], env: {FIRECRAWL_API_KEY: config.firecrawlApiKey}}) 18 | ``` -------------------------------------------------------------------------------- /tests/jest-setup.ts: -------------------------------------------------------------------------------- ```typescript 1 | import { jest, expect } from "@jest/globals"; 2 | import type { ScrapeUrlArgs } from "../src/types"; 3 | 4 | declare global { 5 | namespace jest { 6 | interface Matchers<R> { 7 | toHaveBeenCalledWithUrl(url: string): R; 8 | } 9 | } 10 | } 11 | 12 | expect.extend({ 13 | toHaveBeenCalledWithUrl(received: jest.Mock, url: string) { 14 | const calls = received.mock.calls; 15 | const urlCalls = calls.some((call) => { 16 | const arg = call[0] as ScrapeUrlArgs; 17 | return arg && arg.url === url; 18 | }); 19 | 20 | return { 21 | pass: urlCalls, 22 | message: () => 23 | `expected ${received.getMockName()} to have been called with URL ${url}`, 24 | }; 25 | }, 26 | }); 27 | 28 | // Configure Jest globals 29 | global.jest = jest; 30 | ``` -------------------------------------------------------------------------------- /tsconfig.json: -------------------------------------------------------------------------------- ```json 1 | { 2 | "compilerOptions": { 3 | "target": "ES2020", 4 | "module": "ES2020", 5 | "moduleResolution": "node", 6 | "esModuleInterop": true, 7 | "allowJs": true, 8 | "checkJs": true, 9 | "declaration": true, 10 | "declarationMap": true, 11 | "sourceMap": true, 12 | "strict": true, 13 | "skipLibCheck": true, 14 | "forceConsistentCasingInFileNames": true, 15 | "outDir": "build", 16 | "rootDir": "src", 17 | "baseUrl": ".", 18 | "paths": { 19 | "*": ["node_modules/*", "src/types/*"] 20 | }, 21 | "typeRoots": [ 22 | "./node_modules/@types", 23 | "./src/types" 24 | ], 25 | "types": ["node", "jest"], 26 | "resolveJsonModule": true 27 | }, 28 | "include": [ 29 | "src/**/*" 30 | ], 31 | "exclude": [ 32 | "node_modules", 33 | "build", 34 | "tests", 35 | "**/*.test.ts" 36 | ] 37 | } 38 | ``` -------------------------------------------------------------------------------- /jest.config.js: -------------------------------------------------------------------------------- ```javascript 1 | /** @type {import('ts-jest').JestConfigWithTsJest} */ 2 | export default { 3 | preset: "ts-jest", 4 | testEnvironment: "node", 5 | extensionsToTreatAsEsm: [".ts"], 6 | moduleNameMapper: { 7 | "^(\\.{1,2}/.*)\\.js$": "$1", 8 | }, 9 | transform: { 10 | "^.+\\.tsx?$": [ 11 | "ts-jest", 12 | { 13 | useESM: true, 14 | }, 15 | ], 16 | }, 17 | setupFilesAfterEnv: ["<rootDir>/tests/jest-setup.ts"], 18 | testMatch: ["**/tests/**/*.test.ts"], 19 | collectCoverage: true, 20 | coverageDirectory: "coverage", 21 | coveragePathIgnorePatterns: ["/node_modules/", "/tests/", "/build/"], 22 | coverageThreshold: { 23 | global: { 24 | branches: 80, 25 | functions: 80, 26 | lines: 80, 27 | statements: 80, 28 | }, 29 | }, 30 | globals: { 31 | "ts-jest": { 32 | useESM: true, 33 | tsconfig: { 34 | // Override tsconfig for tests 35 | moduleResolution: "node", 36 | esModuleInterop: true, 37 | allowJs: true, 38 | checkJs: true, 39 | strict: true, 40 | types: ["node", "jest", "@jest/globals"], 41 | typeRoots: ["./node_modules/@types", "./src/types", "./tests/types"], 42 | }, 43 | }, 44 | }, 45 | }; 46 | ``` -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- ```yaml 1 | version: '3.8' 2 | 3 | services: 4 | # Production service 5 | mcp-server: 6 | build: 7 | context: . 8 | dockerfile: Dockerfile 9 | environment: 10 | - NODE_ENV=production 11 | - FIRECRAWL_API_KEY=${FIRECRAWL_API_KEY} 12 | - FIRECRAWL_TIMEOUT=30000 13 | - FIRECRAWL_MAX_RETRIES=3 14 | restart: unless-stopped 15 | stdin_open: true # Required for stdio transport 16 | tty: true 17 | 18 | # Development service with hot-reload 19 | mcp-server-dev: 20 | build: 21 | context: . 22 | dockerfile: Dockerfile 23 | target: builder 24 | command: npm run dev 25 | environment: 26 | - NODE_ENV=development 27 | - FIRECRAWL_API_KEY=${FIRECRAWL_API_KEY} 28 | - FIRECRAWL_TIMEOUT=30000 29 | - FIRECRAWL_MAX_RETRIES=3 30 | - DEBUG=true 31 | volumes: 32 | - ./src:/app/src 33 | - ./tests:/app/tests 34 | stdin_open: true 35 | tty: true 36 | 37 | # Test service 38 | mcp-server-test: 39 | build: 40 | context: . 41 | dockerfile: Dockerfile 42 | target: builder 43 | command: npm test 44 | environment: 45 | - NODE_ENV=test 46 | - FIRECRAWL_API_KEY=test-api-key 47 | volumes: 48 | - ./src:/app/src 49 | - ./tests:/app/tests 50 | - ./coverage:/app/coverage 51 | ``` -------------------------------------------------------------------------------- /.github/workflows/ci.yml: -------------------------------------------------------------------------------- ```yaml 1 | name: CI 2 | 3 | on: 4 | release: 5 | types: [created] 6 | 7 | jobs: 8 | test: 9 | runs-on: ubuntu-latest 10 | strategy: 11 | matrix: 12 | node-version: [18.x, 20.x] 13 | 14 | steps: 15 | - uses: actions/checkout@v3 16 | 17 | - name: Use Node.js ${{ matrix.node-version }} 18 | uses: actions/setup-node@v3 19 | with: 20 | node-version: ${{ matrix.node-version }} 21 | cache: 'pnpm' 22 | 23 | - name: Install pnpm 24 | uses: pnpm/action-setup@v2 25 | with: 26 | version: 8 27 | 28 | - name: Install dependencies 29 | run: pnpm install 30 | 31 | - name: Run tests 32 | run: pnpm test 33 | env: 34 | FIRECRAWL_API_KEY: test-api-key 35 | 36 | - name: Upload coverage reports 37 | uses: codecov/codecov-action@v3 38 | with: 39 | token: ${{ secrets.CODECOV_TOKEN }} 40 | 41 | build: 42 | needs: test 43 | runs-on: ubuntu-latest 44 | 45 | steps: 46 | - uses: actions/checkout@v3 47 | 48 | - name: Use Node.js 18.x 49 | uses: actions/setup-node@v3 50 | with: 51 | node-version: 18.x 52 | cache: 'pnpm' 53 | 54 | - name: Install pnpm 55 | uses: pnpm/action-setup@v2 56 | with: 57 | version: 8 58 | 59 | - name: Install dependencies 60 | run: pnpm install 61 | 62 | - name: Build 63 | run: pnpm run build 64 | ``` -------------------------------------------------------------------------------- /examples/search.ts: -------------------------------------------------------------------------------- ```typescript 1 | import { Client } from "@modelcontextprotocol/sdk/client/index.js"; 2 | import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js"; 3 | 4 | async function example() { 5 | // Create a new MCP client 6 | const client = new Client({ 7 | name: "firecrawl-example", 8 | version: "1.0.0", 9 | }); 10 | 11 | // Connect to the Firecrawl MCP server 12 | const transport = new StdioClientTransport({ 13 | command: "node index.js", 14 | env: { FIRECRAWL_API_KEY: "your-api-key-here" }, 15 | }); 16 | await client.connect(transport); 17 | 18 | try { 19 | // Example 1: Basic search with default options 20 | const result1 = await client.callTool({ 21 | name: "search_content", 22 | arguments: { 23 | query: "latest developments in artificial intelligence", 24 | }, 25 | }); 26 | console.log("Basic search result:", result1); 27 | 28 | // Example 2: Advanced search with custom options 29 | const result2 = await client.callTool({ 30 | name: "search_content", 31 | arguments: { 32 | query: "machine learning tutorials", 33 | scrapeOptions: { 34 | formats: ["markdown"], 35 | }, 36 | limit: 5, 37 | }, 38 | }); 39 | console.log("Advanced search result:", result2); 40 | } catch (error) { 41 | console.error("Error:", error); 42 | } finally { 43 | await client.close(); 44 | } 45 | } 46 | 47 | example().catch(console.error); 48 | ``` -------------------------------------------------------------------------------- /tests/setup.ts: -------------------------------------------------------------------------------- ```typescript 1 | /** 2 | * Test environment setup and configuration 3 | */ 4 | import { jest, beforeAll, afterAll } from "@jest/globals"; 5 | 6 | // Configure test environment variables 7 | process.env.FIRECRAWL_API_KEY = "test-api-key"; 8 | process.env.FIRECRAWL_API_BASE_URL = "https://api.test.firecrawl.dev/v1"; 9 | process.env.FIRECRAWL_TIMEOUT = "1000"; 10 | process.env.FIRECRAWL_MAX_RETRIES = "0"; 11 | process.env.DEBUG = "false"; 12 | 13 | // Clean up function for after tests 14 | export function cleanupEnvironment() { 15 | delete process.env.FIRECRAWL_API_KEY; 16 | delete process.env.FIRECRAWL_API_BASE_URL; 17 | delete process.env.FIRECRAWL_TIMEOUT; 18 | delete process.env.FIRECRAWL_MAX_RETRIES; 19 | delete process.env.DEBUG; 20 | } 21 | 22 | // Store original console methods 23 | const originalConsoleError = console.error; 24 | const originalConsoleWarn = console.warn; 25 | const originalConsoleLog = console.log; 26 | 27 | // Mock console methods to reduce noise in tests 28 | const mockConsole = { 29 | error: jest.fn(), 30 | warn: jest.fn(), 31 | log: jest.fn(), 32 | }; 33 | 34 | beforeAll(() => { 35 | // Replace console methods with mocks 36 | console.error = mockConsole.error; 37 | console.warn = mockConsole.warn; 38 | console.log = mockConsole.log; 39 | }); 40 | 41 | afterAll(() => { 42 | // Restore original console methods 43 | console.error = originalConsoleError; 44 | console.warn = originalConsoleWarn; 45 | console.log = originalConsoleLog; 46 | cleanupEnvironment(); 47 | }); 48 | 49 | // Export mocks for test usage 50 | export const consoleMocks = mockConsole; 51 | ``` -------------------------------------------------------------------------------- /package.json: -------------------------------------------------------------------------------- ```json 1 | { 2 | "name": "@modelcontextprotocol/mcp-server-firecrawl", 3 | "version": "1.0.0", 4 | "description": "MCP server for web scraping, content searching, and site mapping using the Firecrawl API", 5 | "keywords": [ 6 | "mcp", 7 | "model-context-protocol", 8 | "web-scraping", 9 | "search", 10 | "crawling", 11 | "site-mapping", 12 | "data-extraction", 13 | "firecrawl", 14 | "ai", 15 | "server" 16 | ], 17 | "homepage": "https://github.com/Msparihar/mcp-server-firecrawl#readme", 18 | "bugs": { 19 | "url": "https://github.com/Msparihar/mcp-server-firecrawl/issues" 20 | }, 21 | "repository": { 22 | "type": "git", 23 | "url": "git+https://github.com/Msparihar/mcp-server-firecrawl.git" 24 | }, 25 | "license": "MIT", 26 | "author": "Msparihar", 27 | "type": "module", 28 | "main": "build/index.js", 29 | "types": "./build/index.d.ts", 30 | "bin": { 31 | "mcp-server-firecrawl": "build/index.js" 32 | }, 33 | "directories": { 34 | "doc": "docs", 35 | "example": "examples", 36 | "test": "tests" 37 | }, 38 | "files": [ 39 | "build", 40 | "README.md", 41 | "LICENSE", 42 | "docs", 43 | "CONTRIBUTING.md", 44 | "CODE_OF_CONDUCT.md" 45 | ], 46 | "scripts": { 47 | "build": "tsc", 48 | "start": "node build/index.js", 49 | "dev": "tsc --watch", 50 | "test": "jest", 51 | "lint": "eslint . --ext .ts", 52 | "format": "prettier --write \"src/**/*.ts\"", 53 | "prepare": "npm run build", 54 | "prepublishOnly": "npm test && npm run lint", 55 | "preversion": "npm run lint", 56 | "version": "npm run format && git add -A src", 57 | "postversion": "git push && git push --tags" 58 | }, 59 | "dependencies": { 60 | "@modelcontextprotocol/sdk": "^1.4.1", 61 | "@types/node": "^22.13.1", 62 | "axios": "^1.7.9" 63 | }, 64 | "devDependencies": { 65 | "@jest/globals": "^29.7.0", 66 | "@types/jest": "^29.5.11", 67 | "@typescript-eslint/eslint-plugin": "^6.21.0", 68 | "@typescript-eslint/parser": "^6.21.0", 69 | "cross-env": "^7.0.3", 70 | "eslint": "^8.56.0", 71 | "eslint-config-prettier": "^9.1.0", 72 | "jest": "^29.7.0", 73 | "prettier": "^3.2.5", 74 | "ts-jest": "^29.1.2", 75 | "typescript": "^5.7.3" 76 | }, 77 | "engines": { 78 | "node": ">=18" 79 | } 80 | } 81 | ``` -------------------------------------------------------------------------------- /tests/types.d.ts: -------------------------------------------------------------------------------- ```typescript 1 | import type { AxiosInstance, AxiosResponse, AxiosResponseHeaders } from "axios"; 2 | import type { 3 | ScrapeUrlArgs, 4 | SearchContentArgs, 5 | CrawlArgs, 6 | MapArgs, 7 | ExtractArgs, 8 | ToolResponse, 9 | } from "../src/types"; 10 | 11 | declare global { 12 | // Extend Jest matchers 13 | namespace jest { 14 | interface Matchers<R> { 15 | toHaveBeenCalledWithUrl(url: string): R; 16 | toHaveBeenCalledWithValidArgs( 17 | type: "scrape" | "search" | "crawl" | "map" | "extract" 18 | ): R; 19 | } 20 | } 21 | } 22 | 23 | // Test-specific types 24 | export interface TestResponse<T = any> 25 | extends Omit<AxiosResponse<T>, "headers"> { 26 | data: T; 27 | status: number; 28 | headers: AxiosResponseHeaders; 29 | } 30 | 31 | // Mock function type 32 | type MockFunction<T extends (...args: any) => any> = { 33 | (...args: Parameters<T>): ReturnType<T>; 34 | mockClear: () => void; 35 | mockReset: () => void; 36 | mockImplementation: (fn: T) => MockFunction<T>; 37 | mockImplementationOnce: (fn: T) => MockFunction<T>; 38 | mockResolvedValue: <U>(value: U) => MockFunction<T>; 39 | mockResolvedValueOnce: <U>(value: U) => MockFunction<T>; 40 | mockRejectedValue: (error: unknown) => MockFunction<T>; 41 | mockRejectedValueOnce: (error: unknown) => MockFunction<T>; 42 | }; 43 | 44 | export interface MockAxiosInstance extends Omit<AxiosInstance, "get" | "post"> { 45 | get: MockFunction< 46 | <T = any>(url: string, config?: any) => Promise<TestResponse<T>> 47 | >; 48 | post: MockFunction< 49 | <T = any>(url: string, data?: any, config?: any) => Promise<TestResponse<T>> 50 | >; 51 | } 52 | 53 | export interface TestToolResponse extends ToolResponse { 54 | error?: string; 55 | metadata?: Record<string, unknown>; 56 | } 57 | 58 | export interface TestArgs { 59 | scrape: ScrapeUrlArgs; 60 | search: SearchContentArgs; 61 | crawl: CrawlArgs; 62 | map: MapArgs; 63 | extract: ExtractArgs; 64 | } 65 | 66 | // Helper type for resolved promise types 67 | type ResolvedType<T> = T extends Promise<infer R> ? R : T; 68 | 69 | // Console mock types 70 | export interface ConsoleMocks { 71 | error: MockFunction<typeof console.error>; 72 | warn: MockFunction<typeof console.warn>; 73 | log: MockFunction<typeof console.log>; 74 | } 75 | 76 | // Environment variable types 77 | export interface TestEnvironment { 78 | FIRECRAWL_API_KEY: string; 79 | FIRECRAWL_API_BASE_URL: string; 80 | FIRECRAWL_TIMEOUT: string; 81 | FIRECRAWL_MAX_RETRIES: string; 82 | DEBUG: string; 83 | } 84 | 85 | // Extend the Jest namespace 86 | declare global { 87 | namespace jest { 88 | type Mock<T extends (...args: any[]) => any> = MockFunction<T>; 89 | } 90 | } 91 | ``` -------------------------------------------------------------------------------- /examples/scrape.ts: -------------------------------------------------------------------------------- ```typescript 1 | /** 2 | * Example demonstrating how to use the scrape_url tool 3 | * 4 | * This example shows different ways to configure and use the scraping functionality, 5 | * including basic scraping, structured data extraction, and mobile-optimized scraping. 6 | */ 7 | 8 | import { ScrapeTool } from "../src/tools/scrape.js"; 9 | import { DEFAULT_ERROR_CONFIG } from "../src/error-handling.js"; 10 | import axios from "axios"; 11 | 12 | async function main() { 13 | // Create a test axios instance 14 | const axiosInstance = axios.create({ 15 | baseURL: process.env.FIRECRAWL_API_BASE_URL || "https://api.firecrawl.dev/v1", 16 | headers: { 17 | Authorization: `Bearer ${process.env.FIRECRAWL_API_KEY}`, 18 | "Content-Type": "application/json", 19 | }, 20 | }); 21 | 22 | // Initialize the scrape tool 23 | const scrapeTool = new ScrapeTool({ 24 | axiosInstance, 25 | errorConfig: DEFAULT_ERROR_CONFIG, 26 | }); 27 | 28 | try { 29 | // Basic scraping example 30 | console.log("Basic scraping example:"); 31 | const basicResult = await scrapeTool.execute({ 32 | url: "https://example.com", 33 | formats: ["markdown"], 34 | onlyMainContent: true, 35 | blockAds: true 36 | }); 37 | console.log(JSON.stringify(basicResult, null, 2)); 38 | 39 | // Advanced scraping with structured data extraction 40 | console.log("\nAdvanced scraping example:"); 41 | const advancedResult = await scrapeTool.execute({ 42 | url: "https://example.com/blog", 43 | jsonOptions: { 44 | prompt: "Extract the article title, author, date, and main content", 45 | schema: { 46 | type: "object", 47 | properties: { 48 | title: { type: "string" }, 49 | author: { type: "string" }, 50 | date: { type: "string" }, 51 | content: { type: "string" } 52 | }, 53 | required: ["title", "content"] 54 | } 55 | }, 56 | formats: ["markdown", "json"], 57 | mobile: true, 58 | location: { 59 | country: "US", 60 | languages: ["en-US"] 61 | }, 62 | waitFor: 2000, 63 | blockAds: true 64 | }); 65 | console.log(JSON.stringify(advancedResult, null, 2)); 66 | 67 | // Mobile-optimized scraping 68 | console.log("\nMobile scraping example:"); 69 | const mobileResult = await scrapeTool.execute({ 70 | url: "https://example.com/store", 71 | mobile: true, 72 | formats: ["markdown"], 73 | includeTags: ["article", "main", "product"], 74 | excludeTags: ["nav", "footer", "ads"], 75 | removeBase64Images: true 76 | }); 77 | console.log(JSON.stringify(mobileResult, null, 2)); 78 | 79 | } catch (error) { 80 | console.error("Error running examples:", error); 81 | process.exit(1); 82 | } 83 | } 84 | 85 | // Check for API key 86 | if (!process.env.FIRECRAWL_API_KEY) { 87 | console.error("Error: FIRECRAWL_API_KEY environment variable is required"); 88 | console.error("Please set it before running the example:"); 89 | console.error("export FIRECRAWL_API_KEY=your-api-key"); 90 | process.exit(1); 91 | } 92 | 93 | main().catch((error) => { 94 | console.error("Unhandled error:", error); 95 | process.exit(1); 96 | }); 97 | ``` -------------------------------------------------------------------------------- /tests/index.test.ts: -------------------------------------------------------------------------------- ```typescript 1 | import { Server } from "@modelcontextprotocol/sdk/server/index.js"; 2 | import { ListToolsRequestSchema } from "@modelcontextprotocol/sdk/types.js"; 3 | 4 | describe("Firecrawl MCP Server Structure", () => { 5 | let server: Server; 6 | 7 | beforeEach(() => { 8 | server = new Server( 9 | { 10 | name: "firecrawl", 11 | version: "1.0.0", 12 | }, 13 | { 14 | capabilities: { 15 | tools: {}, 16 | }, 17 | } 18 | ); 19 | }); 20 | 21 | describe("Tool Schema Validation", () => { 22 | it("should define required tools", async () => { 23 | const handler = server["requestHandlers"].get(ListToolsRequestSchema.name); 24 | expect(handler).toBeDefined(); 25 | if (!handler) throw new Error("Handler not found"); 26 | 27 | const result = await handler({ 28 | schema: ListToolsRequestSchema.name, 29 | params: {}, 30 | }); 31 | 32 | const tools = result.tools; 33 | expect(tools).toHaveLength(2); 34 | 35 | const scrapeUrlTool = tools.find((t) => t.name === "scrape_url"); 36 | expect(scrapeUrlTool).toBeDefined(); 37 | expect(scrapeUrlTool?.inputSchema.required).toContain("url"); 38 | 39 | const searchContentTool = tools.find((t) => t.name === "search_content"); 40 | expect(searchContentTool).toBeDefined(); 41 | expect(searchContentTool?.inputSchema.required).toContain("query"); 42 | }); 43 | 44 | it("should have valid schema for scrape_url tool", async () => { 45 | const handler = server["requestHandlers"].get(ListToolsRequestSchema.name); 46 | if (!handler) throw new Error("Handler not found"); 47 | 48 | const result = await handler({ 49 | schema: ListToolsRequestSchema.name, 50 | params: {}, 51 | }); 52 | 53 | const tool = result.tools.find((t) => t.name === "scrape_url"); 54 | expect(tool).toBeDefined(); 55 | expect(tool?.inputSchema.properties).toHaveProperty("url"); 56 | expect(tool?.inputSchema.properties).toHaveProperty("jsonOptions"); 57 | expect(tool?.inputSchema.properties).toHaveProperty("formats"); 58 | expect(tool?.inputSchema.properties).toHaveProperty("blockAds"); 59 | }); 60 | 61 | it("should have valid schema for search_content tool", async () => { 62 | const handler = server["requestHandlers"].get(ListToolsRequestSchema.name); 63 | if (!handler) throw new Error("Handler not found"); 64 | 65 | const result = await handler({ 66 | schema: ListToolsRequestSchema.name, 67 | params: {}, 68 | }); 69 | 70 | const tool = result.tools.find((t) => t.name === "search_content"); 71 | expect(tool).toBeDefined(); 72 | expect(tool?.inputSchema.properties).toHaveProperty("query"); 73 | expect(tool?.inputSchema.properties).toHaveProperty("scrapeOptions"); 74 | expect(tool?.inputSchema.properties).toHaveProperty("limit"); 75 | }); 76 | }); 77 | 78 | describe("Environment Validation", () => { 79 | it("should check for required environment variables", () => { 80 | expect(() => { 81 | process.env.FIRECRAWL_API_KEY = ""; 82 | // This will throw due to missing API key 83 | require("../src/index.js"); 84 | }).toThrow("FIRECRAWL_API_KEY environment variable is required"); 85 | }); 86 | }); 87 | }); 88 | ``` -------------------------------------------------------------------------------- /src/tools/search.ts: -------------------------------------------------------------------------------- ```typescript 1 | import { AxiosInstance } from "axios"; 2 | import { ErrorHandlingConfig, retryRequest } from "../error-handling.js"; 3 | import { SearchContentArgs } from "../types.js"; 4 | 5 | /** 6 | * Options for configuring the search tool 7 | */ 8 | export interface SearchToolOptions { 9 | /** Axios instance for making requests */ 10 | axiosInstance: AxiosInstance; 11 | /** Error handling configuration */ 12 | errorConfig: ErrorHandlingConfig; 13 | } 14 | 15 | /** 16 | * Handles content search operations 17 | */ 18 | export class SearchTool { 19 | private axiosInstance: AxiosInstance; 20 | private errorConfig: ErrorHandlingConfig; 21 | 22 | constructor(options: SearchToolOptions) { 23 | this.axiosInstance = options.axiosInstance; 24 | this.errorConfig = options.errorConfig; 25 | } 26 | 27 | /** 28 | * Get the tool definition for registration 29 | */ 30 | getDefinition() { 31 | return { 32 | name: "search_content", 33 | description: "Search content using Firecrawl API", 34 | inputSchema: { 35 | type: "object", 36 | properties: { 37 | query: { 38 | type: "string", 39 | description: "Search query", 40 | }, 41 | scrapeOptions: { 42 | type: "object", 43 | properties: { 44 | formats: { 45 | type: "array", 46 | items: { 47 | type: "string", 48 | enum: ["markdown"], 49 | }, 50 | description: "Output formats", 51 | }, 52 | }, 53 | }, 54 | limit: { 55 | type: "number", 56 | description: "Maximum number of results", 57 | minimum: 1, 58 | maximum: 100, 59 | }, 60 | lang: { 61 | type: "string", 62 | description: "Language code", 63 | default: "en", 64 | }, 65 | country: { 66 | type: "string", 67 | description: "Country code", 68 | default: "us", 69 | }, 70 | location: { 71 | type: "string", 72 | description: "Location parameter", 73 | }, 74 | timeout: { 75 | type: "number", 76 | description: "Request timeout in milliseconds", 77 | default: 60000, 78 | }, 79 | }, 80 | required: ["query"], 81 | }, 82 | }; 83 | } 84 | 85 | /** 86 | * Execute the search operation 87 | */ 88 | async execute(args: SearchContentArgs) { 89 | const response = await retryRequest( 90 | () => this.axiosInstance.post("/search", args), 91 | this.errorConfig 92 | ); 93 | 94 | return { 95 | content: [ 96 | { 97 | type: "text", 98 | text: JSON.stringify(response.data, null, 2), 99 | }, 100 | ], 101 | }; 102 | } 103 | 104 | /** 105 | * Validate the search operation arguments 106 | */ 107 | validate(args: unknown): args is SearchContentArgs { 108 | if (typeof args !== "object" || args === null) { 109 | return false; 110 | } 111 | 112 | const { query, scrapeOptions, limit } = args as any; 113 | 114 | if (typeof query !== "string") { 115 | return false; 116 | } 117 | 118 | if (scrapeOptions !== undefined) { 119 | if ( 120 | typeof scrapeOptions !== "object" || 121 | scrapeOptions === null || 122 | (scrapeOptions.formats !== undefined && 123 | (!Array.isArray(scrapeOptions.formats) || 124 | !scrapeOptions.formats.every((f: any) => typeof f === "string"))) 125 | ) { 126 | return false; 127 | } 128 | } 129 | 130 | if (limit !== undefined && typeof limit !== "number") { 131 | return false; 132 | } 133 | 134 | return true; 135 | } 136 | 137 | /** 138 | * Process search results with optional formatting 139 | * @private 140 | */ 141 | private processResults(results: any[], format = "markdown") { 142 | if (format === "markdown") { 143 | return results.map((result) => ({ 144 | title: result.title, 145 | url: result.url, 146 | snippet: result.snippet, 147 | content: result.content, 148 | })); 149 | } 150 | return results; 151 | } 152 | } 153 | ``` -------------------------------------------------------------------------------- /examples/crawl-and-map.ts: -------------------------------------------------------------------------------- ```typescript 1 | /** 2 | * Example demonstrating the use of crawl and map tools 3 | * 4 | * This example shows how to: 5 | * 1. Crawl a website with various configurations 6 | * 2. Map site structure 7 | * 3. Extract data from multiple pages 8 | * 9 | * To run this example: 10 | * 1. Set your API key: export FIRECRAWL_API_KEY=your-api-key 11 | * 2. Build the project: npm run build 12 | * 3. Run the example: ts-node examples/crawl-and-map.ts 13 | */ 14 | 15 | import { CrawlTool } from "../src/tools/crawl.js"; 16 | import { MapTool } from "../src/tools/map.js"; 17 | import { ExtractTool } from "../src/tools/extract.js"; 18 | import { DEFAULT_ERROR_CONFIG } from "../src/error-handling.js"; 19 | import axios from "axios"; 20 | 21 | async function main() { 22 | // Create a test axios instance 23 | const axiosInstance = axios.create({ 24 | baseURL: process.env.FIRECRAWL_API_BASE_URL || "https://api.firecrawl.dev/v1", 25 | headers: { 26 | Authorization: `Bearer ${process.env.FIRECRAWL_API_KEY}`, 27 | "Content-Type": "application/json", 28 | }, 29 | }); 30 | 31 | // Initialize tools 32 | const toolOptions = { 33 | axiosInstance, 34 | errorConfig: DEFAULT_ERROR_CONFIG, 35 | }; 36 | 37 | const crawlTool = new CrawlTool(toolOptions); 38 | const mapTool = new MapTool(toolOptions); 39 | const extractTool = new ExtractTool(toolOptions); 40 | 41 | try { 42 | // Basic crawling example 43 | console.log("Basic crawling example:"); 44 | const basicCrawlResult = await crawlTool.execute({ 45 | url: "https://example.com", 46 | maxDepth: 2, 47 | limit: 10, 48 | ignoreSitemap: false 49 | }); 50 | console.log(JSON.stringify(basicCrawlResult, null, 2)); 51 | 52 | // Advanced crawling with filters 53 | console.log("\nAdvanced crawling example:"); 54 | const advancedCrawlResult = await crawlTool.execute({ 55 | url: "https://example.com", 56 | maxDepth: 3, 57 | excludePaths: ["/admin", "/private", "/login"], 58 | includePaths: ["/blog", "/products"], 59 | ignoreQueryParameters: true, 60 | limit: 20, 61 | allowBackwardLinks: false, 62 | allowExternalLinks: false, 63 | scrapeOptions: { 64 | formats: ["markdown"] 65 | } 66 | }); 67 | console.log(JSON.stringify(advancedCrawlResult, null, 2)); 68 | 69 | // Site mapping example 70 | console.log("\nSite mapping example:"); 71 | const mapResult = await mapTool.execute({ 72 | url: "https://example.com", 73 | includeSubdomains: true, 74 | limit: 100, 75 | ignoreSitemap: false 76 | }); 77 | console.log(JSON.stringify(mapResult, null, 2)); 78 | 79 | // Targeted mapping with search 80 | console.log("\nTargeted mapping example:"); 81 | const targetedMapResult = await mapTool.execute({ 82 | url: "https://example.com", 83 | search: "products", 84 | sitemapOnly: true, 85 | limit: 50 86 | }); 87 | console.log(JSON.stringify(targetedMapResult, null, 2)); 88 | 89 | // Extract data from crawled pages 90 | console.log("\nBulk data extraction example:"); 91 | const extractResult = await extractTool.execute({ 92 | urls: [ 93 | "https://example.com/products/1", 94 | "https://example.com/products/2", 95 | "https://example.com/products/3" 96 | ], 97 | prompt: "Extract product information including name, price, and description", 98 | schema: { 99 | type: "object", 100 | properties: { 101 | name: { type: "string" }, 102 | price: { type: "number" }, 103 | description: { type: "string" }, 104 | specifications: { 105 | type: "object", 106 | additionalProperties: true 107 | } 108 | }, 109 | required: ["name", "price"] 110 | }, 111 | enableWebSearch: false 112 | }); 113 | console.log(JSON.stringify(extractResult, null, 2)); 114 | 115 | } catch (error) { 116 | console.error("Error running examples:", error); 117 | process.exit(1); 118 | } 119 | } 120 | 121 | // Check for API key 122 | if (!process.env.FIRECRAWL_API_KEY) { 123 | console.error("Error: FIRECRAWL_API_KEY environment variable is required"); 124 | console.error("Please set it before running the example:"); 125 | console.error("export FIRECRAWL_API_KEY=your-api-key"); 126 | process.exit(1); 127 | } 128 | 129 | main().catch((error) => { 130 | console.error("Unhandled error:", error); 131 | process.exit(1); 132 | }); 133 | ``` -------------------------------------------------------------------------------- /src/error-handling.ts: -------------------------------------------------------------------------------- ```typescript 1 | import { ErrorCode, McpError } from "@modelcontextprotocol/sdk/types.js"; 2 | import type { AxiosError } from "axios"; 3 | import axios from "axios"; 4 | 5 | /** 6 | * Types of errors that can occur in Firecrawl operations 7 | */ 8 | export enum FirecrawlErrorType { 9 | /** Rate limit exceeded */ 10 | RateLimit = "RATE_LIMIT", 11 | /** Invalid input parameters */ 12 | InvalidInput = "INVALID_INPUT", 13 | /** Network or connection error */ 14 | NetworkError = "NETWORK_ERROR", 15 | /** API-related error */ 16 | APIError = "API_ERROR", 17 | } 18 | 19 | /** 20 | * Configuration for error handling and retries 21 | */ 22 | export interface ErrorHandlingConfig { 23 | /** Maximum number of retry attempts */ 24 | maxRetries: number; 25 | /** Initial delay between retries in milliseconds */ 26 | retryDelay: number; 27 | /** Multiplier for exponential backoff */ 28 | backoffMultiplier: number; 29 | /** Maximum delay between retries in milliseconds */ 30 | maxBackoff: number; 31 | /** Enable debug logging */ 32 | debug: boolean; 33 | } 34 | 35 | /** 36 | * Default configuration for error handling 37 | */ 38 | export const DEFAULT_ERROR_CONFIG: ErrorHandlingConfig = { 39 | maxRetries: 3, 40 | retryDelay: 1000, 41 | backoffMultiplier: 2, 42 | maxBackoff: 8000, 43 | debug: false, 44 | }; 45 | 46 | interface ApiErrorResponse { 47 | message?: string; 48 | error?: string; 49 | code?: string; 50 | } 51 | 52 | /** 53 | * Converts API errors to standardized MCP errors 54 | */ 55 | export const handleError = (error: unknown, debug = false): McpError => { 56 | if (debug) { 57 | console.error("[Debug] Error details:", error); 58 | } 59 | 60 | if (axios.isAxiosError(error)) { 61 | const axiosError = error as AxiosError<ApiErrorResponse>; 62 | const status = axiosError.response?.status; 63 | const responseData = axiosError.response?.data; 64 | const message = 65 | responseData?.message || responseData?.error || axiosError.message; 66 | 67 | switch (status) { 68 | case 429: 69 | return new McpError( 70 | ErrorCode.InvalidRequest, 71 | `Rate limit exceeded: ${message}` 72 | ); 73 | case 401: 74 | return new McpError( 75 | ErrorCode.InvalidParams, 76 | `Invalid API key: ${message}` 77 | ); 78 | case 400: 79 | return new McpError( 80 | ErrorCode.InvalidParams, 81 | `Invalid request: ${message}` 82 | ); 83 | case 404: 84 | return new McpError( 85 | ErrorCode.InvalidRequest, 86 | `Resource not found: ${message}` 87 | ); 88 | default: 89 | return new McpError(ErrorCode.InternalError, `API error: ${message}`); 90 | } 91 | } 92 | 93 | if (error instanceof Error) { 94 | return new McpError(ErrorCode.InternalError, error.message); 95 | } 96 | 97 | return new McpError(ErrorCode.InternalError, "An unknown error occurred"); 98 | }; 99 | 100 | /** 101 | * Determines if a request should be retried based on the error 102 | */ 103 | export const shouldRetry = ( 104 | error: unknown, 105 | retryCount: number, 106 | config: ErrorHandlingConfig 107 | ): boolean => { 108 | if (retryCount >= config.maxRetries) { 109 | return false; 110 | } 111 | 112 | if (axios.isAxiosError(error)) { 113 | const status = error.response?.status; 114 | return ( 115 | status === 429 || // Rate limit 116 | status === 500 || // Server error 117 | error.code === "ECONNABORTED" || // Timeout 118 | error.code === "ECONNRESET" // Connection reset 119 | ); 120 | } 121 | 122 | return false; 123 | }; 124 | 125 | /** 126 | * Calculates the delay for the next retry attempt 127 | */ 128 | export const calculateRetryDelay = ( 129 | retryCount: number, 130 | config: ErrorHandlingConfig 131 | ): number => { 132 | const delay = 133 | config.retryDelay * Math.pow(config.backoffMultiplier, retryCount - 1); 134 | return Math.min(delay, config.maxBackoff); 135 | }; 136 | 137 | /** 138 | * Retries a failed request with exponential backoff 139 | */ 140 | export const retryRequest = async <T>( 141 | requestFn: () => Promise<T>, 142 | config: ErrorHandlingConfig = DEFAULT_ERROR_CONFIG 143 | ): Promise<T> => { 144 | let retryCount = 0; 145 | let lastError: unknown; 146 | 147 | do { 148 | try { 149 | return await requestFn(); 150 | } catch (error) { 151 | lastError = error; 152 | 153 | if (!shouldRetry(error, retryCount, config)) { 154 | throw handleError(error, config.debug); 155 | } 156 | 157 | retryCount++; 158 | const delay = calculateRetryDelay(retryCount, config); 159 | 160 | if (config.debug) { 161 | console.error( 162 | `[Debug] Retry ${retryCount}/${config.maxRetries}, waiting ${delay}ms` 163 | ); 164 | } 165 | 166 | await new Promise((resolve) => setTimeout(resolve, delay)); 167 | } 168 | } while (retryCount <= config.maxRetries); 169 | 170 | throw handleError(lastError, config.debug); 171 | }; 172 | ``` -------------------------------------------------------------------------------- /src/tools/scrape.ts: -------------------------------------------------------------------------------- ```typescript 1 | import { AxiosInstance } from "axios"; 2 | import { ErrorHandlingConfig, retryRequest } from "../error-handling.js"; 3 | import { ScrapeUrlArgs } from "../types.js"; 4 | 5 | /** 6 | * Options for configuring the scrape tool 7 | */ 8 | export interface ScrapeToolOptions { 9 | /** Axios instance for making requests */ 10 | axiosInstance: AxiosInstance; 11 | /** Error handling configuration */ 12 | errorConfig: ErrorHandlingConfig; 13 | } 14 | 15 | /** 16 | * Handles content scraping operations 17 | */ 18 | export class ScrapeTool { 19 | private axiosInstance: AxiosInstance; 20 | private errorConfig: ErrorHandlingConfig; 21 | 22 | constructor(options: ScrapeToolOptions) { 23 | this.axiosInstance = options.axiosInstance; 24 | this.errorConfig = options.errorConfig; 25 | } 26 | 27 | /** 28 | * Get the tool definition for registration 29 | */ 30 | getDefinition() { 31 | return { 32 | name: "scrape_url", 33 | description: "Scrape content from a URL using Firecrawl API", 34 | inputSchema: { 35 | type: "object", 36 | properties: { 37 | url: { 38 | type: "string", 39 | description: "URL to scrape", 40 | }, 41 | jsonOptions: { 42 | type: "object", 43 | properties: { 44 | prompt: { 45 | type: "string", 46 | description: "Prompt for extracting specific information", 47 | }, 48 | schema: { 49 | type: "object", 50 | description: "Schema for extraction", 51 | }, 52 | systemPrompt: { 53 | type: "string", 54 | description: "System prompt for extraction", 55 | }, 56 | }, 57 | }, 58 | formats: { 59 | type: "array", 60 | items: { 61 | type: "string", 62 | enum: [ 63 | "markdown", 64 | "html", 65 | "rawHtml", 66 | "links", 67 | "screenshot", 68 | "screenshot@fullPage", 69 | "json", 70 | ], 71 | }, 72 | description: "Output formats", 73 | }, 74 | onlyMainContent: { 75 | type: "boolean", 76 | description: 77 | "Only return main content excluding headers, navs, footers", 78 | default: true, 79 | }, 80 | includeTags: { 81 | type: "array", 82 | items: { type: "string" }, 83 | description: "Tags to include in output", 84 | }, 85 | excludeTags: { 86 | type: "array", 87 | items: { type: "string" }, 88 | description: "Tags to exclude from output", 89 | }, 90 | waitFor: { 91 | type: "number", 92 | description: "Delay in milliseconds before fetching content", 93 | default: 0, 94 | }, 95 | mobile: { 96 | type: "boolean", 97 | description: "Emulate mobile device", 98 | default: false, 99 | }, 100 | location: { 101 | type: "object", 102 | properties: { 103 | country: { 104 | type: "string", 105 | description: "ISO 3166-1 alpha-2 country code", 106 | }, 107 | languages: { 108 | type: "array", 109 | items: { type: "string" }, 110 | description: "Preferred languages/locales", 111 | }, 112 | }, 113 | }, 114 | blockAds: { 115 | type: "boolean", 116 | description: "Enable ad/cookie popup blocking", 117 | default: true, 118 | }, 119 | }, 120 | required: ["url"], 121 | }, 122 | }; 123 | } 124 | 125 | /** 126 | * Execute the scrape operation 127 | */ 128 | async execute(args: ScrapeUrlArgs) { 129 | const response = await retryRequest( 130 | () => this.axiosInstance.post("/scrape", args), 131 | this.errorConfig 132 | ); 133 | 134 | return { 135 | content: [ 136 | { 137 | type: "text", 138 | text: JSON.stringify(response.data, null, 2), 139 | }, 140 | ], 141 | }; 142 | } 143 | 144 | /** 145 | * Validate the scrape operation arguments 146 | */ 147 | validate(args: unknown): args is ScrapeUrlArgs { 148 | if (typeof args !== "object" || args === null) { 149 | return false; 150 | } 151 | 152 | const { url, jsonOptions, formats } = args as any; 153 | 154 | if (typeof url !== "string") { 155 | return false; 156 | } 157 | 158 | if (jsonOptions !== undefined) { 159 | if ( 160 | typeof jsonOptions !== "object" || 161 | jsonOptions === null || 162 | typeof jsonOptions.prompt !== "string" 163 | ) { 164 | return false; 165 | } 166 | } 167 | 168 | if (formats !== undefined) { 169 | if ( 170 | !Array.isArray(formats) || 171 | !formats.every((f) => typeof f === "string") 172 | ) { 173 | return false; 174 | } 175 | } 176 | 177 | return true; 178 | } 179 | } 180 | ``` -------------------------------------------------------------------------------- /src/tools/map.ts: -------------------------------------------------------------------------------- ```typescript 1 | import { AxiosInstance } from "axios"; 2 | import { ErrorHandlingConfig, retryRequest } from "../error-handling.js"; 3 | import { MapArgs } from "../types.js"; 4 | 5 | /** 6 | * Options for configuring the map tool 7 | */ 8 | export interface MapToolOptions { 9 | /** Axios instance for making requests */ 10 | axiosInstance: AxiosInstance; 11 | /** Error handling configuration */ 12 | errorConfig: ErrorHandlingConfig; 13 | } 14 | 15 | /** 16 | * Structure representing a node in the site map 17 | */ 18 | interface SiteMapNode { 19 | url: string; 20 | children?: SiteMapNode[]; 21 | } 22 | 23 | /** 24 | * Link structure for mapping 25 | */ 26 | interface SiteLink { 27 | url: string; 28 | parent?: string; 29 | } 30 | 31 | /** 32 | * Handles website structure mapping operations 33 | */ 34 | export class MapTool { 35 | private axiosInstance: AxiosInstance; 36 | private errorConfig: ErrorHandlingConfig; 37 | 38 | constructor(options: MapToolOptions) { 39 | this.axiosInstance = options.axiosInstance; 40 | this.errorConfig = options.errorConfig; 41 | } 42 | 43 | /** 44 | * Get the tool definition for registration 45 | */ 46 | getDefinition() { 47 | return { 48 | name: "map", 49 | description: "Maps a website's structure", 50 | inputSchema: { 51 | type: "object", 52 | properties: { 53 | url: { 54 | type: "string", 55 | description: "Base URL to map", 56 | }, 57 | search: { 58 | type: "string", 59 | description: "Search query for mapping", 60 | }, 61 | ignoreSitemap: { 62 | type: "boolean", 63 | description: "Ignore sitemap.xml during mapping", 64 | }, 65 | sitemapOnly: { 66 | type: "boolean", 67 | description: "Only use sitemap.xml for mapping", 68 | }, 69 | includeSubdomains: { 70 | type: "boolean", 71 | description: "Include subdomains in mapping", 72 | }, 73 | limit: { 74 | type: "number", 75 | description: "Maximum links to return", 76 | default: 5000, 77 | }, 78 | timeout: { 79 | type: "number", 80 | description: "Request timeout", 81 | }, 82 | }, 83 | required: ["url"], 84 | }, 85 | }; 86 | } 87 | 88 | /** 89 | * Execute the map operation 90 | */ 91 | async execute(args: MapArgs) { 92 | const response = await retryRequest( 93 | () => this.axiosInstance.post("/map", args), 94 | this.errorConfig 95 | ); 96 | 97 | return { 98 | content: [ 99 | { 100 | type: "text", 101 | text: JSON.stringify(response.data, null, 2), 102 | }, 103 | ], 104 | }; 105 | } 106 | 107 | /** 108 | * Validate the map operation arguments 109 | */ 110 | validate(args: unknown): args is MapArgs { 111 | if (typeof args !== "object" || args === null) { 112 | return false; 113 | } 114 | 115 | const { 116 | url, 117 | search, 118 | limit, 119 | timeout, 120 | ignoreSitemap, 121 | sitemapOnly, 122 | includeSubdomains, 123 | } = args as any; 124 | 125 | if (typeof url !== "string") { 126 | return false; 127 | } 128 | 129 | if (search !== undefined && typeof search !== "string") { 130 | return false; 131 | } 132 | 133 | if (limit !== undefined && typeof limit !== "number") { 134 | return false; 135 | } 136 | 137 | if (timeout !== undefined && typeof timeout !== "number") { 138 | return false; 139 | } 140 | 141 | if (ignoreSitemap !== undefined && typeof ignoreSitemap !== "boolean") { 142 | return false; 143 | } 144 | 145 | if (sitemapOnly !== undefined && typeof sitemapOnly !== "boolean") { 146 | return false; 147 | } 148 | 149 | if (includeSubdomains !== undefined && typeof includeSubdomains !== "boolean") { 150 | return false; 151 | } 152 | 153 | return true; 154 | } 155 | 156 | /** 157 | * Format the mapping results into a tree structure 158 | * @private 159 | */ 160 | private formatTree(links: SiteLink[]): Record<string, SiteMapNode> { 161 | const tree: Record<string, SiteMapNode> = {}; 162 | const rootNodes = links.filter((link) => !link.parent); 163 | 164 | const buildNode = (node: SiteLink): SiteMapNode => { 165 | const children = links.filter((link) => link.parent === node.url); 166 | return { 167 | url: node.url, 168 | ...(children.length > 0 && { 169 | children: children.map(buildNode), 170 | }), 171 | }; 172 | }; 173 | 174 | rootNodes.forEach((node) => { 175 | tree[node.url] = buildNode(node); 176 | }); 177 | 178 | return tree; 179 | } 180 | 181 | /** 182 | * Extract sitemap URLs if available 183 | * @private 184 | */ 185 | private async getSitemapUrls(baseUrl: string): Promise<string[]> { 186 | try { 187 | const response = await this.axiosInstance.get(`${baseUrl}/sitemap.xml`); 188 | // Simple XML parsing for demonstration 189 | const urls = response.data.match(/<loc>(.*?)<\/loc>/g) || []; 190 | return urls.map((url: string) => 191 | url.replace(/<\/?loc>/g, "").trim() 192 | ); 193 | } catch { 194 | return []; 195 | } 196 | } 197 | } 198 | ``` -------------------------------------------------------------------------------- /src/tools/crawl.ts: -------------------------------------------------------------------------------- ```typescript 1 | import { AxiosInstance } from "axios"; 2 | import { ErrorHandlingConfig, retryRequest } from "../error-handling.js"; 3 | import { CrawlArgs } from "../types.js"; 4 | 5 | /** 6 | * Options for configuring the crawl tool 7 | */ 8 | export interface CrawlToolOptions { 9 | /** Axios instance for making requests */ 10 | axiosInstance: AxiosInstance; 11 | /** Error handling configuration */ 12 | errorConfig: ErrorHandlingConfig; 13 | } 14 | 15 | /** 16 | * Handles web crawling operations 17 | */ 18 | export class CrawlTool { 19 | private axiosInstance: AxiosInstance; 20 | private errorConfig: ErrorHandlingConfig; 21 | 22 | constructor(options: CrawlToolOptions) { 23 | this.axiosInstance = options.axiosInstance; 24 | this.errorConfig = options.errorConfig; 25 | } 26 | 27 | /** 28 | * Get the tool definition for registration 29 | */ 30 | getDefinition() { 31 | return { 32 | name: "crawl", 33 | description: "Crawls a website starting from a base URL", 34 | inputSchema: { 35 | type: "object", 36 | properties: { 37 | url: { 38 | type: "string", 39 | description: "Base URL to start crawling from", 40 | }, 41 | maxDepth: { 42 | type: "number", 43 | description: "Maximum crawl depth", 44 | default: 2, 45 | }, 46 | excludePaths: { 47 | type: "array", 48 | items: { type: "string" }, 49 | description: "URL patterns to exclude", 50 | }, 51 | includePaths: { 52 | type: "array", 53 | items: { type: "string" }, 54 | description: "URL patterns to include", 55 | }, 56 | ignoreSitemap: { 57 | type: "boolean", 58 | description: "Ignore sitemap.xml during crawling", 59 | }, 60 | ignoreQueryParameters: { 61 | type: "boolean", 62 | description: "Ignore URL query parameters when comparing URLs", 63 | }, 64 | limit: { 65 | type: "number", 66 | description: "Maximum pages to crawl", 67 | default: 10000, 68 | }, 69 | allowBackwardLinks: { 70 | type: "boolean", 71 | description: "Allow crawling links that point to parent directories", 72 | }, 73 | allowExternalLinks: { 74 | type: "boolean", 75 | description: "Allow crawling links to external domains", 76 | }, 77 | webhook: { 78 | type: "string", 79 | description: "Webhook URL for progress notifications", 80 | }, 81 | scrapeOptions: { 82 | type: "object", 83 | description: "Options for scraping crawled pages", 84 | }, 85 | }, 86 | required: ["url"], 87 | }, 88 | }; 89 | } 90 | 91 | /** 92 | * Execute the crawl operation 93 | */ 94 | async execute(args: CrawlArgs) { 95 | const response = await retryRequest( 96 | () => this.axiosInstance.post("/crawl", args), 97 | this.errorConfig 98 | ); 99 | 100 | return { 101 | content: [ 102 | { 103 | type: "text", 104 | text: JSON.stringify(response.data, null, 2), 105 | }, 106 | ], 107 | }; 108 | } 109 | 110 | /** 111 | * Validate the crawl operation arguments 112 | */ 113 | validate(args: unknown): args is CrawlArgs { 114 | if (typeof args !== "object" || args === null) { 115 | return false; 116 | } 117 | 118 | const { 119 | url, 120 | maxDepth, 121 | excludePaths, 122 | includePaths, 123 | limit, 124 | webhook, 125 | } = args as any; 126 | 127 | if (typeof url !== "string") { 128 | return false; 129 | } 130 | 131 | if (maxDepth !== undefined && typeof maxDepth !== "number") { 132 | return false; 133 | } 134 | 135 | if ( 136 | excludePaths !== undefined && 137 | (!Array.isArray(excludePaths) || 138 | !excludePaths.every((path) => typeof path === "string")) 139 | ) { 140 | return false; 141 | } 142 | 143 | if ( 144 | includePaths !== undefined && 145 | (!Array.isArray(includePaths) || 146 | !includePaths.every((path) => typeof path === "string")) 147 | ) { 148 | return false; 149 | } 150 | 151 | if (limit !== undefined && typeof limit !== "number") { 152 | return false; 153 | } 154 | 155 | if (webhook !== undefined && typeof webhook !== "string") { 156 | return false; 157 | } 158 | 159 | return true; 160 | } 161 | 162 | /** 163 | * Process and normalize URLs for crawling 164 | * @private 165 | */ 166 | private normalizeUrl(url: string): string { 167 | try { 168 | const parsed = new URL(url); 169 | return parsed.toString(); 170 | } catch { 171 | return url; 172 | } 173 | } 174 | 175 | /** 176 | * Check if a URL should be crawled based on patterns 177 | * @private 178 | */ 179 | private shouldCrawl( 180 | url: string, 181 | includePaths?: string[], 182 | excludePaths?: string[] 183 | ): boolean { 184 | if (excludePaths?.some((pattern) => url.includes(pattern))) { 185 | return false; 186 | } 187 | 188 | if (includePaths?.length && !includePaths.some((pattern) => url.includes(pattern))) { 189 | return false; 190 | } 191 | 192 | return true; 193 | } 194 | } 195 | ``` -------------------------------------------------------------------------------- /tests/tools/scrape.test.ts: -------------------------------------------------------------------------------- ```typescript 1 | import { jest, describe, expect, it, beforeEach } from "@jest/globals"; 2 | import type { AxiosInstance } from "axios"; 3 | import axios from "axios"; 4 | import { ScrapeTool } from "../../src/tools/scrape.js"; 5 | import { DEFAULT_ERROR_CONFIG } from "../../src/error-handling.js"; 6 | import type { ScrapeUrlArgs } from "../../src/types.js"; 7 | 8 | jest.mock("axios"); 9 | 10 | describe("ScrapeTool", () => { 11 | let scrapeTool: ScrapeTool; 12 | let mockAxiosInstance: jest.Mocked<AxiosInstance>; 13 | 14 | beforeEach(() => { 15 | mockAxiosInstance = { 16 | post: jest.fn(), 17 | get: jest.fn(), 18 | } as unknown as jest.Mocked<AxiosInstance>; 19 | 20 | (axios.create as jest.Mock).mockReturnValue(mockAxiosInstance); 21 | 22 | scrapeTool = new ScrapeTool({ 23 | axiosInstance: mockAxiosInstance, 24 | errorConfig: DEFAULT_ERROR_CONFIG, 25 | }); 26 | }); 27 | 28 | describe("execute", () => { 29 | it("should successfully scrape a URL with basic options", async () => { 30 | const mockResponse = { 31 | data: { 32 | content: "Scraped content", 33 | format: "markdown", 34 | }, 35 | }; 36 | 37 | mockAxiosInstance.post.mockResolvedValueOnce(mockResponse); 38 | 39 | const args: ScrapeUrlArgs = { 40 | url: "https://example.com", 41 | formats: ["markdown"], 42 | onlyMainContent: true, 43 | }; 44 | 45 | const result = await scrapeTool.execute(args); 46 | 47 | expect(mockAxiosInstance.post).toHaveBeenCalledWith("/scrape", args); 48 | expect(result).toEqual({ 49 | content: [ 50 | { 51 | type: "text", 52 | text: JSON.stringify(mockResponse.data, null, 2), 53 | }, 54 | ], 55 | }); 56 | }); 57 | 58 | it("should handle structured data extraction", async () => { 59 | const mockResponse = { 60 | data: { 61 | title: "Example Title", 62 | content: "Example Content", 63 | date: "2024-02-13", 64 | }, 65 | }; 66 | 67 | mockAxiosInstance.post.mockResolvedValueOnce(mockResponse); 68 | 69 | const args: ScrapeUrlArgs = { 70 | url: "https://example.com/blog", 71 | jsonOptions: { 72 | prompt: "Extract title and content", 73 | schema: { 74 | type: "object", 75 | properties: { 76 | title: { type: "string" }, 77 | content: { type: "string" }, 78 | }, 79 | }, 80 | }, 81 | formats: ["json"], 82 | }; 83 | 84 | const result = await scrapeTool.execute(args); 85 | 86 | expect(mockAxiosInstance.post).toHaveBeenCalledWith("/scrape", args); 87 | expect(result).toEqual({ 88 | content: [ 89 | { 90 | type: "text", 91 | text: JSON.stringify(mockResponse.data, null, 2), 92 | }, 93 | ], 94 | }); 95 | }); 96 | 97 | it("should handle mobile device emulation", async () => { 98 | const mockResponse = { 99 | data: { 100 | content: "Mobile optimized content", 101 | }, 102 | }; 103 | 104 | mockAxiosInstance.post.mockResolvedValueOnce(mockResponse); 105 | 106 | const args: ScrapeUrlArgs = { 107 | url: "https://example.com", 108 | mobile: true, 109 | location: { 110 | country: "US", 111 | languages: ["en-US"], 112 | }, 113 | blockAds: true, 114 | }; 115 | 116 | const result = await scrapeTool.execute(args); 117 | 118 | expect(mockAxiosInstance.post).toHaveBeenCalledWith("/scrape", args); 119 | expect(result).toEqual({ 120 | content: [ 121 | { 122 | type: "text", 123 | text: JSON.stringify(mockResponse.data, null, 2), 124 | }, 125 | ], 126 | }); 127 | }); 128 | 129 | it("should handle API errors", async () => { 130 | const errorMessage = "API request failed"; 131 | mockAxiosInstance.post.mockRejectedValueOnce(new Error(errorMessage)); 132 | 133 | await expect( 134 | scrapeTool.execute({ 135 | url: "https://example.com", 136 | }) 137 | ).rejects.toThrow(errorMessage); 138 | }); 139 | }); 140 | 141 | describe("validate", () => { 142 | it("should validate correct scrape arguments", () => { 143 | const args: ScrapeUrlArgs = { 144 | url: "https://example.com", 145 | formats: ["markdown"], 146 | onlyMainContent: true, 147 | }; 148 | 149 | expect(scrapeTool.validate(args)).toBe(true); 150 | }); 151 | 152 | it("should reject invalid scrape arguments", () => { 153 | const args = { 154 | formats: ["markdown"], 155 | }; 156 | 157 | expect(scrapeTool.validate(args)).toBe(false); 158 | }); 159 | 160 | it("should validate complex scrape arguments", () => { 161 | const args: ScrapeUrlArgs = { 162 | url: "https://example.com", 163 | jsonOptions: { 164 | prompt: "Extract data", 165 | schema: { type: "object" }, 166 | }, 167 | formats: ["json", "markdown"], 168 | mobile: true, 169 | location: { 170 | country: "US", 171 | languages: ["en-US"], 172 | }, 173 | }; 174 | 175 | expect(scrapeTool.validate(args)).toBe(true); 176 | }); 177 | }); 178 | }); 179 | ``` -------------------------------------------------------------------------------- /src/tools/extract.ts: -------------------------------------------------------------------------------- ```typescript 1 | import { AxiosInstance } from "axios"; 2 | import { ErrorHandlingConfig, retryRequest } from "../error-handling.js"; 3 | import { ExtractArgs } from "../types.js"; 4 | 5 | /** 6 | * Options for configuring the extract tool 7 | */ 8 | export interface ExtractToolOptions { 9 | /** Axios instance for making requests */ 10 | axiosInstance: AxiosInstance; 11 | /** Error handling configuration */ 12 | errorConfig: ErrorHandlingConfig; 13 | } 14 | 15 | /** 16 | * Interface for extraction results 17 | */ 18 | interface ExtractionResult { 19 | url: string; 20 | data: Record<string, any>; 21 | error?: string; 22 | } 23 | 24 | /** 25 | * Handles data extraction operations 26 | */ 27 | export class ExtractTool { 28 | private axiosInstance: AxiosInstance; 29 | private errorConfig: ErrorHandlingConfig; 30 | 31 | constructor(options: ExtractToolOptions) { 32 | this.axiosInstance = options.axiosInstance; 33 | this.errorConfig = options.errorConfig; 34 | } 35 | 36 | /** 37 | * Get the tool definition for registration 38 | */ 39 | getDefinition() { 40 | return { 41 | name: "extract", 42 | description: "Extracts structured data from URLs", 43 | inputSchema: { 44 | type: "object", 45 | properties: { 46 | urls: { 47 | type: "array", 48 | items: { type: "string" }, 49 | description: "URLs to extract from", 50 | }, 51 | prompt: { 52 | type: "string", 53 | description: "Extraction guidance prompt", 54 | }, 55 | schema: { 56 | type: "object", 57 | description: "Data structure schema", 58 | }, 59 | enableWebSearch: { 60 | type: "boolean", 61 | description: "Use web search for additional data", 62 | default: false, 63 | }, 64 | ignoreSitemap: { 65 | type: "boolean", 66 | description: "Ignore sitemap.xml during processing", 67 | }, 68 | includeSubdomains: { 69 | type: "boolean", 70 | description: "Include subdomains in processing", 71 | }, 72 | }, 73 | required: ["urls"], 74 | }, 75 | }; 76 | } 77 | 78 | /** 79 | * Execute the extract operation 80 | */ 81 | async execute(args: ExtractArgs) { 82 | const response = await retryRequest( 83 | () => this.axiosInstance.post("/extract", args), 84 | this.errorConfig 85 | ); 86 | 87 | return { 88 | content: [ 89 | { 90 | type: "text", 91 | text: JSON.stringify(response.data, null, 2), 92 | }, 93 | ], 94 | }; 95 | } 96 | 97 | /** 98 | * Validate the extract operation arguments 99 | */ 100 | validate(args: unknown): args is ExtractArgs { 101 | if (typeof args !== "object" || args === null) { 102 | return false; 103 | } 104 | 105 | const { 106 | urls, 107 | prompt, 108 | schema, 109 | enableWebSearch, 110 | ignoreSitemap, 111 | includeSubdomains, 112 | } = args as any; 113 | 114 | if (!Array.isArray(urls) || !urls.every((url) => typeof url === "string")) { 115 | return false; 116 | } 117 | 118 | if (prompt !== undefined && typeof prompt !== "string") { 119 | return false; 120 | } 121 | 122 | if (schema !== undefined && (typeof schema !== "object" || schema === null)) { 123 | return false; 124 | } 125 | 126 | if (enableWebSearch !== undefined && typeof enableWebSearch !== "boolean") { 127 | return false; 128 | } 129 | 130 | if (ignoreSitemap !== undefined && typeof ignoreSitemap !== "boolean") { 131 | return false; 132 | } 133 | 134 | if (includeSubdomains !== undefined && typeof includeSubdomains !== "boolean") { 135 | return false; 136 | } 137 | 138 | return true; 139 | } 140 | 141 | /** 142 | * Process a single URL for extraction 143 | * @private 144 | */ 145 | private async processUrl( 146 | url: string, 147 | options: { 148 | prompt?: string; 149 | schema?: object; 150 | } 151 | ): Promise<ExtractionResult> { 152 | try { 153 | const response = await this.axiosInstance.post("/extract", { 154 | urls: [url], 155 | ...options, 156 | }); 157 | 158 | return { 159 | url, 160 | data: response.data, 161 | }; 162 | } catch (error) { 163 | return { 164 | url, 165 | data: {}, 166 | error: error instanceof Error ? error.message : "Unknown error occurred", 167 | }; 168 | } 169 | } 170 | 171 | /** 172 | * Process multiple URLs in parallel with rate limiting 173 | * @private 174 | */ 175 | private async processBatch( 176 | urls: string[], 177 | options: { 178 | prompt?: string; 179 | schema?: object; 180 | batchSize?: number; 181 | delayMs?: number; 182 | } 183 | ): Promise<ExtractionResult[]> { 184 | const { batchSize = 5, delayMs = 1000 } = options; 185 | const results: ExtractionResult[] = []; 186 | 187 | for (let i = 0; i < urls.length; i += batchSize) { 188 | const batch = urls.slice(i, i + batchSize); 189 | const batchPromises = batch.map(url => 190 | this.processUrl(url, { 191 | prompt: options.prompt, 192 | schema: options.schema, 193 | }) 194 | ); 195 | 196 | results.push(...await Promise.all(batchPromises)); 197 | 198 | if (i + batchSize < urls.length) { 199 | await new Promise(resolve => setTimeout(resolve, delayMs)); 200 | } 201 | } 202 | 203 | return results; 204 | } 205 | } 206 | ``` -------------------------------------------------------------------------------- /src/types.ts: -------------------------------------------------------------------------------- ```typescript 1 | import type { Server } from "@modelcontextprotocol/sdk/server/index.js"; 2 | import type { 3 | ErrorCode, 4 | McpError, 5 | CallToolRequestSchema, 6 | ListToolsRequestSchema, 7 | } from "@modelcontextprotocol/sdk/types.js"; 8 | 9 | // Server types 10 | export interface ServerConfig { 11 | name: string; 12 | version: string; 13 | } 14 | 15 | export interface ServerCapabilities { 16 | tools: Record<string, unknown>; 17 | } 18 | 19 | // Request/Response types 20 | export interface RequestHandler<T> { 21 | (request: T): Promise<any>; 22 | } 23 | 24 | export interface ErrorHandler { 25 | (error: Error): void; 26 | } 27 | 28 | // Tool types 29 | export interface ToolDefinition { 30 | name: string; 31 | description: string; 32 | inputSchema: { 33 | type: string; 34 | properties: Record<string, unknown>; 35 | required?: string[]; 36 | }; 37 | } 38 | 39 | export interface ToolResponse { 40 | content: Array<{ 41 | type: string; 42 | text: string; 43 | }>; 44 | } 45 | 46 | // Re-export types from SDK 47 | export type { 48 | Server, 49 | ErrorCode, 50 | McpError, 51 | CallToolRequestSchema, 52 | ListToolsRequestSchema, 53 | }; 54 | 55 | // Configuration types 56 | export interface FirecrawlConfig { 57 | apiKey: string; 58 | apiBaseUrl?: string; 59 | timeout?: number; 60 | maxRetries?: number; 61 | retryDelay?: number; 62 | backoffMultiplier?: number; 63 | maxBackoff?: number; 64 | debug?: boolean; 65 | customHeaders?: Record<string, string>; 66 | allowedDomains?: string[]; 67 | validateRequests?: boolean; 68 | logLevel?: "error" | "warn" | "info" | "debug"; 69 | logFile?: string; 70 | sandbox?: boolean; 71 | } 72 | 73 | // Tool-specific types 74 | export interface ScrapeUrlArgs { 75 | url: string; 76 | jsonOptions?: { 77 | prompt: string; 78 | schema?: object; 79 | systemPrompt?: string; 80 | }; 81 | formats?: string[]; 82 | onlyMainContent?: boolean; 83 | includeTags?: string[]; 84 | excludeTags?: string[]; 85 | waitFor?: number; 86 | mobile?: boolean; 87 | location?: { 88 | country?: string; 89 | languages?: string[]; 90 | }; 91 | blockAds?: boolean; 92 | removeBase64Images?: boolean; 93 | } 94 | 95 | export interface SearchContentArgs { 96 | query: string; 97 | scrapeOptions?: { 98 | formats?: string[]; 99 | }; 100 | limit?: number; 101 | lang?: string; 102 | country?: string; 103 | location?: string; 104 | timeout?: number; 105 | } 106 | 107 | export interface CrawlArgs { 108 | url: string; 109 | maxDepth?: number; 110 | excludePaths?: string[]; 111 | includePaths?: string[]; 112 | ignoreSitemap?: boolean; 113 | ignoreQueryParameters?: boolean; 114 | limit?: number; 115 | allowBackwardLinks?: boolean; 116 | allowExternalLinks?: boolean; 117 | webhook?: string; 118 | scrapeOptions?: Record<string, unknown>; 119 | } 120 | 121 | export interface MapArgs { 122 | url: string; 123 | search?: string; 124 | ignoreSitemap?: boolean; 125 | sitemapOnly?: boolean; 126 | includeSubdomains?: boolean; 127 | limit?: number; 128 | timeout?: number; 129 | } 130 | 131 | export interface ExtractArgs { 132 | urls: string[]; 133 | prompt?: string; 134 | schema?: object; 135 | enableWebSearch?: boolean; 136 | ignoreSitemap?: boolean; 137 | includeSubdomains?: boolean; 138 | } 139 | 140 | // Type guards 141 | export const isScrapeUrlArgs = (args: unknown): args is ScrapeUrlArgs => { 142 | if (typeof args !== "object" || args === null) { 143 | return false; 144 | } 145 | 146 | const { url, jsonOptions, formats } = args as ScrapeUrlArgs; 147 | 148 | if (typeof url !== "string") { 149 | return false; 150 | } 151 | 152 | if (jsonOptions !== undefined) { 153 | if ( 154 | typeof jsonOptions !== "object" || 155 | jsonOptions === null || 156 | typeof jsonOptions.prompt !== "string" 157 | ) { 158 | return false; 159 | } 160 | } 161 | 162 | if (formats !== undefined) { 163 | if ( 164 | !Array.isArray(formats) || 165 | !formats.every((f) => typeof f === "string") 166 | ) { 167 | return false; 168 | } 169 | } 170 | 171 | return true; 172 | }; 173 | 174 | export const isSearchContentArgs = ( 175 | args: unknown 176 | ): args is SearchContentArgs => { 177 | if (typeof args !== "object" || args === null) { 178 | return false; 179 | } 180 | 181 | const { query, scrapeOptions, limit } = args as SearchContentArgs; 182 | 183 | if (typeof query !== "string") { 184 | return false; 185 | } 186 | 187 | if (scrapeOptions !== undefined) { 188 | if ( 189 | typeof scrapeOptions !== "object" || 190 | scrapeOptions === null || 191 | (scrapeOptions.formats !== undefined && 192 | (!Array.isArray(scrapeOptions.formats) || 193 | !scrapeOptions.formats.every((f) => typeof f === "string"))) 194 | ) { 195 | return false; 196 | } 197 | } 198 | 199 | if (limit !== undefined && typeof limit !== "number") { 200 | return false; 201 | } 202 | 203 | return true; 204 | }; 205 | 206 | export const isCrawlArgs = (args: unknown): args is CrawlArgs => { 207 | if (typeof args !== "object" || args === null) { 208 | return false; 209 | } 210 | 211 | const { url } = args as CrawlArgs; 212 | return typeof url === "string"; 213 | }; 214 | 215 | export const isMapArgs = (args: unknown): args is MapArgs => { 216 | if (typeof args !== "object" || args === null) { 217 | return false; 218 | } 219 | 220 | const { url } = args as MapArgs; 221 | return typeof url === "string"; 222 | }; 223 | 224 | export const isExtractArgs = (args: unknown): args is ExtractArgs => { 225 | if (typeof args !== "object" || args === null) { 226 | return false; 227 | } 228 | 229 | const { urls } = args as ExtractArgs; 230 | return Array.isArray(urls) && urls.every((url) => typeof url === "string"); 231 | }; 232 | ``` -------------------------------------------------------------------------------- /docs/configuration.md: -------------------------------------------------------------------------------- ```markdown 1 | # Firecrawl MCP Server Configuration Guide 2 | 3 | This guide explains how to configure and customize the Firecrawl MCP server for different environments and use cases. 4 | 5 | ## Environment Variables 6 | 7 | ### Required Variables 8 | 9 | - `FIRECRAWL_API_KEY`: Your Firecrawl API key (required) 10 | 11 | ```bash 12 | export FIRECRAWL_API_KEY=your-api-key-here 13 | ``` 14 | 15 | ### Optional Variables 16 | 17 | #### API Configuration 18 | 19 | - `FIRECRAWL_API_BASE_URL`: Override the default API endpoint 20 | 21 | ```bash 22 | export FIRECRAWL_API_BASE_URL=https://custom-api.firecrawl.dev/v1 23 | ``` 24 | 25 | #### Request Configuration 26 | 27 | - `FIRECRAWL_TIMEOUT`: Request timeout in milliseconds 28 | 29 | ```bash 30 | export FIRECRAWL_TIMEOUT=30000 # 30 seconds 31 | ``` 32 | 33 | #### Retry Configuration 34 | 35 | - `FIRECRAWL_MAX_RETRIES`: Maximum retry attempts for failed requests 36 | 37 | ```bash 38 | export FIRECRAWL_MAX_RETRIES=3 39 | ``` 40 | 41 | - `FIRECRAWL_RETRY_DELAY`: Initial delay between retries (milliseconds) 42 | 43 | ```bash 44 | export FIRECRAWL_RETRY_DELAY=1000 # 1 second 45 | ``` 46 | 47 | - `FIRECRAWL_BACKOFF_MULTIPLIER`: Multiplier for exponential backoff 48 | 49 | ```bash 50 | export FIRECRAWL_BACKOFF_MULTIPLIER=2 51 | ``` 52 | 53 | - `FIRECRAWL_MAX_BACKOFF`: Maximum delay between retries (milliseconds) 54 | 55 | ```bash 56 | export FIRECRAWL_BACKOFF_MULTIPLIER=8000 # 8 seconds 57 | ``` 58 | 59 | #### Debugging 60 | 61 | - `DEBUG`: Enable debug logging 62 | 63 | ```bash 64 | export DEBUG=true 65 | ``` 66 | 67 | #### Security 68 | 69 | - `FIRECRAWL_VALIDATE_REQUESTS`: Enable request validation 70 | 71 | ```bash 72 | export FIRECRAWL_VALIDATE_REQUESTS=true 73 | ``` 74 | 75 | - `FIRECRAWL_ALLOWED_DOMAINS`: List of allowed domains (JSON array) 76 | 77 | ```bash 78 | export FIRECRAWL_ALLOWED_DOMAINS='["example.com","api.example.com"]' 79 | ``` 80 | 81 | ## Installation Methods 82 | 83 | ### Global Installation 84 | 85 | Install the server globally to run it from anywhere: 86 | 87 | ```bash 88 | npm install -g @modelcontextprotocol/mcp-server-firecrawl 89 | ``` 90 | 91 | Then run: 92 | 93 | ```bash 94 | mcp-server-firecrawl 95 | ``` 96 | 97 | ### Local Project Installation 98 | 99 | Install as a project dependency: 100 | 101 | ```bash 102 | npm install @modelcontextprotocol/mcp-server-firecrawl 103 | ``` 104 | 105 | Add to your package.json scripts: 106 | 107 | ```json 108 | { 109 | "scripts": { 110 | "start-mcp": "mcp-server-firecrawl" 111 | } 112 | } 113 | ``` 114 | 115 | ## Integration with MCP Clients 116 | 117 | ### Claude Desktop App 118 | 119 | 1. Open Claude desktop app settings 120 | 2. Navigate to MCP Server settings 121 | 3. Add new server configuration: 122 | 123 | ```json 124 | { 125 | "firecrawl": { 126 | "command": "mcp-server-firecrawl", 127 | "env": { 128 | "FIRECRAWL_API_KEY": "your-api-key", 129 | "DEBUG": "true" 130 | } 131 | } 132 | } 133 | ``` 134 | 135 | ### Claude VSCode Extension 136 | 137 | 1. Open VSCode settings 138 | 2. Search for "Claude MCP Settings" 139 | 3. Add server configuration: 140 | 141 | ```json 142 | { 143 | "mcpServers": { 144 | "firecrawl": { 145 | "command": "mcp-server-firecrawl", 146 | "env": { 147 | "FIRECRAWL_API_KEY": "your-api-key", 148 | "DEBUG": "true" 149 | } 150 | } 151 | } 152 | } 153 | ``` 154 | 155 | ## Advanced Configuration 156 | 157 | ### Custom HTTP Headers 158 | 159 | Add custom headers to API requests: 160 | 161 | ```bash 162 | export FIRECRAWL_CUSTOM_HEADERS='{"X-Custom-Header": "value"}' 163 | ``` 164 | 165 | ### Rate Limiting 166 | 167 | The server implements intelligent rate limiting with exponential backoff. Configure behavior with: 168 | 169 | ```bash 170 | export FIRECRAWL_MAX_RETRIES=3 171 | export FIRECRAWL_RETRY_DELAY=1000 # milliseconds 172 | export FIRECRAWL_BACKOFF_MULTIPLIER=2 173 | export FIRECRAWL_MAX_BACKOFF=8000 # milliseconds 174 | ``` 175 | 176 | ### Proxy Support 177 | 178 | Configure proxy settings using standard Node.js environment variables: 179 | 180 | ```bash 181 | export HTTP_PROXY=http://proxy.company.com:8080 182 | export HTTPS_PROXY=http://proxy.company.com:8080 183 | ``` 184 | 185 | ### Logging Configuration 186 | 187 | Customize logging behavior: 188 | 189 | ```bash 190 | # Log levels: error, warn, info, debug 191 | export FIRECRAWL_LOG_LEVEL=debug 192 | 193 | # Log to file 194 | export FIRECRAWL_LOG_FILE=/path/to/firecrawl.log 195 | ``` 196 | 197 | ## Development and Testing 198 | 199 | When developing or testing, you can use the sandbox environment: 200 | 201 | ```bash 202 | export FIRECRAWL_SANDBOX=true 203 | export FIRECRAWL_API_KEY=test-key 204 | ``` 205 | 206 | ## Security Considerations 207 | 208 | 1. API Key Security: 209 | - Store in environment variables or secure secrets management 210 | - Never commit to version control 211 | - Rotate keys periodically 212 | 213 | 2. Request Validation: 214 | 215 | ```bash 216 | export FIRECRAWL_VALIDATE_REQUESTS=true 217 | export FIRECRAWL_ALLOWED_DOMAINS='["trusted-domain.com"]' 218 | ``` 219 | 220 | 3. Rate Limiting: 221 | - Configure appropriate retry limits 222 | - Use exponential backoff 223 | - Monitor usage patterns 224 | 225 | ## Monitoring and Error Handling 226 | 227 | Enable comprehensive logging and monitoring: 228 | 229 | ```bash 230 | # Debug logging 231 | export DEBUG=true 232 | 233 | # Detailed error logging 234 | export FIRECRAWL_LOG_LEVEL=debug 235 | 236 | # Error tracking 237 | export FIRECRAWL_ERROR_TRACKING=true 238 | ``` 239 | 240 | ## Performance Tuning 241 | 242 | Optimize performance for your use case: 243 | 244 | ```bash 245 | # Increase timeouts for large operations 246 | export FIRECRAWL_TIMEOUT=60000 247 | 248 | # Adjust concurrent request limits 249 | export FIRECRAWL_MAX_CONCURRENT=5 250 | 251 | # Configure batch processing 252 | export FIRECRAWL_BATCH_SIZE=10 253 | export FIRECRAWL_BATCH_DELAY=1000 254 | ``` 255 | 256 | ## Docker Configuration 257 | 258 | When running in Docker, configure environment variables in your docker-compose.yml: 259 | 260 | ```yaml 261 | version: '3' 262 | services: 263 | firecrawl-mcp: 264 | image: mcp-server-firecrawl 265 | environment: 266 | - FIRECRAWL_API_KEY=your-api-key 267 | - DEBUG=true 268 | - FIRECRAWL_TIMEOUT=30000 269 | volumes: 270 | - ./logs:/app/logs 271 | ``` 272 | 273 | Or use a .env file: 274 | 275 | ```env 276 | FIRECRAWL_API_KEY=your-api-key 277 | DEBUG=true 278 | FIRECRAWL_TIMEOUT=30000 279 | ``` -------------------------------------------------------------------------------- /src/index.ts: -------------------------------------------------------------------------------- ```typescript 1 | #!/usr/bin/env node 2 | 3 | /** 4 | * Firecrawl MCP Server 5 | * A Model Context Protocol server for web scraping and content searching using the Firecrawl API. 6 | */ 7 | 8 | import { Server } from "@modelcontextprotocol/sdk/server/index.js"; 9 | import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js"; 10 | import { 11 | CallToolRequestSchema, 12 | ErrorCode, 13 | ListToolsRequestSchema, 14 | McpError, 15 | } from "@modelcontextprotocol/sdk/types.js"; 16 | import axios from "axios"; 17 | 18 | import { 19 | DEFAULT_ERROR_CONFIG, 20 | ErrorHandlingConfig, 21 | handleError, 22 | } from "./error-handling.js"; 23 | import { ScrapeTool } from "./tools/scrape.js"; 24 | import { SearchTool } from "./tools/search.js"; 25 | import { CrawlTool } from "./tools/crawl.js"; 26 | import { MapTool } from "./tools/map.js"; 27 | import { ExtractTool } from "./tools/extract.js"; 28 | import { 29 | isScrapeUrlArgs, 30 | isSearchContentArgs, 31 | isCrawlArgs, 32 | isMapArgs, 33 | isExtractArgs, 34 | } from "./types.js"; 35 | 36 | // Load and validate configuration 37 | const config = { 38 | apiKey: process.env.FIRECRAWL_API_KEY, 39 | apiBaseUrl: 40 | process.env.FIRECRAWL_API_BASE_URL || "https://api.firecrawl.dev/v1", 41 | timeout: parseInt(process.env.FIRECRAWL_TIMEOUT || "30000"), 42 | maxRetries: parseInt(process.env.FIRECRAWL_MAX_RETRIES || "3"), 43 | retryDelay: parseInt(process.env.FIRECRAWL_RETRY_DELAY || "1000"), 44 | debug: process.env.DEBUG === "true", 45 | }; 46 | 47 | if (!config.apiKey) { 48 | throw new Error("FIRECRAWL_API_KEY environment variable is required"); 49 | } 50 | 51 | /** 52 | * Main server class for the Firecrawl MCP implementation 53 | */ 54 | class FirecrawlServer { 55 | private server: Server; 56 | private axiosInstance; 57 | private errorConfig: ErrorHandlingConfig; 58 | private tools: { 59 | scrape: ScrapeTool; 60 | search: SearchTool; 61 | crawl: CrawlTool; 62 | map: MapTool; 63 | extract: ExtractTool; 64 | }; 65 | 66 | constructor() { 67 | this.server = new Server( 68 | { 69 | name: "firecrawl", 70 | version: "1.0.0", 71 | }, 72 | { 73 | capabilities: { 74 | tools: {}, 75 | }, 76 | } 77 | ); 78 | 79 | // Configure error handling 80 | this.errorConfig = { 81 | ...DEFAULT_ERROR_CONFIG, 82 | maxRetries: config.maxRetries, 83 | retryDelay: config.retryDelay, 84 | debug: config.debug, 85 | }; 86 | 87 | // Configure axios instance 88 | this.axiosInstance = axios.create({ 89 | baseURL: config.apiBaseUrl, 90 | timeout: config.timeout, 91 | headers: { 92 | Authorization: `Bearer ${config.apiKey}`, 93 | "Content-Type": "application/json", 94 | }, 95 | }); 96 | 97 | // Initialize tools 98 | const toolOptions = { 99 | axiosInstance: this.axiosInstance, 100 | errorConfig: this.errorConfig, 101 | }; 102 | 103 | this.tools = { 104 | scrape: new ScrapeTool(toolOptions), 105 | search: new SearchTool(toolOptions), 106 | crawl: new CrawlTool(toolOptions), 107 | map: new MapTool(toolOptions), 108 | extract: new ExtractTool(toolOptions), 109 | }; 110 | 111 | this.setupToolHandlers(); 112 | 113 | // Error handling 114 | this.server.onerror = (error: Error) => { 115 | console.error("[MCP Error]", error); 116 | if (config.debug) { 117 | console.error("[Debug] Stack trace:", error.stack); 118 | } 119 | }; 120 | 121 | process.on("SIGINT", async () => { 122 | await this.server.close(); 123 | process.exit(0); 124 | }); 125 | } 126 | 127 | /** 128 | * Set up the tool handlers for all operations 129 | */ 130 | private setupToolHandlers() { 131 | this.server.setRequestHandler(ListToolsRequestSchema, async () => ({ 132 | tools: [ 133 | this.tools.scrape.getDefinition(), 134 | this.tools.search.getDefinition(), 135 | this.tools.crawl.getDefinition(), 136 | this.tools.map.getDefinition(), 137 | this.tools.extract.getDefinition(), 138 | ], 139 | })); 140 | 141 | this.server.setRequestHandler(CallToolRequestSchema, async (request) => { 142 | try { 143 | const { name, arguments: args } = request.params; 144 | 145 | switch (name) { 146 | case "scrape_url": { 147 | if (!isScrapeUrlArgs(args)) { 148 | throw new McpError( 149 | ErrorCode.InvalidParams, 150 | "Invalid scrape_url arguments" 151 | ); 152 | } 153 | return await this.tools.scrape.execute(args); 154 | } 155 | 156 | case "search_content": { 157 | if (!isSearchContentArgs(args)) { 158 | throw new McpError( 159 | ErrorCode.InvalidParams, 160 | "Invalid search_content arguments" 161 | ); 162 | } 163 | return await this.tools.search.execute(args); 164 | } 165 | 166 | case "crawl": { 167 | if (!isCrawlArgs(args)) { 168 | throw new McpError( 169 | ErrorCode.InvalidParams, 170 | "Invalid crawl arguments" 171 | ); 172 | } 173 | return await this.tools.crawl.execute(args); 174 | } 175 | 176 | case "map": { 177 | if (!isMapArgs(args)) { 178 | throw new McpError( 179 | ErrorCode.InvalidParams, 180 | "Invalid map arguments" 181 | ); 182 | } 183 | return await this.tools.map.execute(args); 184 | } 185 | 186 | case "extract": { 187 | if (!isExtractArgs(args)) { 188 | throw new McpError( 189 | ErrorCode.InvalidParams, 190 | "Invalid extract arguments" 191 | ); 192 | } 193 | return await this.tools.extract.execute(args); 194 | } 195 | 196 | default: 197 | throw new McpError( 198 | ErrorCode.MethodNotFound, 199 | `Unknown tool: ${name}` 200 | ); 201 | } 202 | } catch (error) { 203 | throw handleError(error); 204 | } 205 | }); 206 | } 207 | 208 | /** 209 | * Start the MCP server 210 | */ 211 | async run() { 212 | const transport = new StdioServerTransport(); 213 | await this.server.connect(transport); 214 | console.error("Firecrawl MCP server running on stdio"); 215 | } 216 | } 217 | 218 | const server = new FirecrawlServer(); 219 | server.run().catch(console.error); 220 | ``` -------------------------------------------------------------------------------- /docs/api.md: -------------------------------------------------------------------------------- ```markdown 1 | # Firecrawl MCP Server API Documentation 2 | 3 | This document provides detailed information about the Firecrawl MCP server's API and available tools. 4 | 5 | ## Available Tools 6 | 7 | ### `scrape_url` 8 | 9 | Scrapes content from a specified URL with customizable extraction options. 10 | 11 | #### Input Schema 12 | 13 | ```typescript 14 | { 15 | url: string; // URL to scrape content from 16 | jsonOptions?: { 17 | prompt: string; // Prompt for extracting specific information 18 | schema?: object; // Schema for structured data extraction 19 | systemPrompt?: string; // System prompt for extraction context 20 | }; 21 | formats?: string[]; // Output formats (markdown, html, rawHtml, links, etc.) 22 | onlyMainContent?: boolean;// Exclude headers, navs, footers 23 | includeTags?: string[]; // HTML tags to include 24 | excludeTags?: string[]; // HTML tags to exclude 25 | waitFor?: number; // Delay before fetching (milliseconds) 26 | mobile?: boolean; // Emulate mobile device 27 | location?: { 28 | country?: string; // ISO 3166-1 alpha-2 country code 29 | languages?: string[]; // Preferred languages/locales 30 | }; 31 | blockAds?: boolean; // Enable ad/cookie popup blocking 32 | } 33 | ``` 34 | 35 | #### Example 36 | 37 | ```typescript 38 | { 39 | name: "scrape_url", 40 | arguments: { 41 | url: "https://example.com", 42 | jsonOptions: { 43 | prompt: "Extract the article title, date, and main content", 44 | schema: { 45 | title: "string", 46 | date: "string", 47 | content: "string" 48 | } 49 | }, 50 | formats: ["markdown"], 51 | mobile: true, 52 | blockAds: true 53 | } 54 | } 55 | ``` 56 | 57 | ### `search_content` 58 | 59 | Performs intelligent content searches with customizable parameters. 60 | 61 | #### Input Schema 62 | 63 | ```typescript 64 | { 65 | query: string; // Search query string 66 | scrapeOptions?: { 67 | formats?: string[]; // Output formats for results 68 | }; 69 | limit?: number; // Maximum results (1-100) 70 | lang?: string; // Language code 71 | country?: string; // Country code 72 | location?: string; // Location string 73 | timeout?: number; // Request timeout (milliseconds) 74 | } 75 | ``` 76 | 77 | #### Example 78 | 79 | ```typescript 80 | { 81 | name: "search_content", 82 | arguments: { 83 | query: "latest developments in AI", 84 | scrapeOptions: { 85 | formats: ["markdown"] 86 | }, 87 | limit: 10, 88 | lang: "en", 89 | country: "us" 90 | } 91 | } 92 | ``` 93 | 94 | ### `crawl` 95 | 96 | Crawls websites recursively with advanced configuration options. 97 | 98 | #### Input Schema 99 | 100 | ```typescript 101 | { 102 | url: string; // Base URL to start crawling 103 | maxDepth?: number; // Maximum crawl depth 104 | excludePaths?: string[]; // URL patterns to exclude 105 | includePaths?: string[]; // URL patterns to include 106 | ignoreSitemap?: boolean; // Ignore sitemap.xml 107 | ignoreQueryParameters?: boolean; // Ignore URL parameters 108 | limit?: number; // Maximum pages to crawl 109 | allowBackwardLinks?: boolean;// Allow parent directory links 110 | allowExternalLinks?: boolean;// Allow external domain links 111 | webhook?: string; // Progress notification URL 112 | scrapeOptions?: object; // Options for scraping pages 113 | } 114 | ``` 115 | 116 | #### Example 117 | 118 | ```typescript 119 | { 120 | name: "crawl", 121 | arguments: { 122 | url: "https://example.com", 123 | maxDepth: 3, 124 | excludePaths: ["/admin", "/private"], 125 | limit: 1000, 126 | scrapeOptions: { 127 | formats: ["markdown"] 128 | } 129 | } 130 | } 131 | ``` 132 | 133 | ### `map` 134 | 135 | Maps website structure and generates site hierarchies. 136 | 137 | #### Input Schema 138 | 139 | ```typescript 140 | { 141 | url: string; // Base URL to map 142 | search?: string; // Search query for filtering 143 | ignoreSitemap?: boolean; // Ignore sitemap.xml 144 | sitemapOnly?: boolean; // Only use sitemap.xml 145 | includeSubdomains?: boolean;// Include subdomains 146 | limit?: number; // Maximum links (default: 5000) 147 | timeout?: number; // Request timeout 148 | } 149 | ``` 150 | 151 | #### Example 152 | 153 | ```typescript 154 | { 155 | name: "map", 156 | arguments: { 157 | url: "https://example.com", 158 | includeSubdomains: true, 159 | limit: 1000 160 | } 161 | } 162 | ``` 163 | 164 | ### `extract` 165 | 166 | Extracts structured data from multiple URLs with schema validation. 167 | 168 | #### Input Schema 169 | 170 | ```typescript 171 | { 172 | urls: string[]; // URLs to extract from 173 | prompt?: string; // Extraction guidance 174 | schema?: object; // Data structure schema 175 | enableWebSearch?: boolean; // Use web search for context 176 | ignoreSitemap?: boolean; // Ignore sitemap.xml 177 | includeSubdomains?: boolean;// Include subdomains 178 | } 179 | ``` 180 | 181 | #### Example 182 | 183 | ```typescript 184 | { 185 | name: "extract", 186 | arguments: { 187 | urls: ["https://example.com/page1", "https://example.com/page2"], 188 | prompt: "Extract product information", 189 | schema: { 190 | name: "string", 191 | price: "number", 192 | description: "string" 193 | } 194 | } 195 | } 196 | ``` 197 | 198 | ## Response Format 199 | 200 | All tools return responses in a standard format: 201 | 202 | ```typescript 203 | { 204 | content: [ 205 | { 206 | type: "text", 207 | text: string // JSON string containing the results 208 | } 209 | ] 210 | } 211 | ``` 212 | 213 | ## Error Handling 214 | 215 | The server implements robust error handling with retry capability: 216 | 217 | - Rate limiting with exponential backoff 218 | - Configurable retry attempts and delays 219 | - Detailed error messages 220 | - Debug logging options 221 | 222 | Error responses follow the standard MCP format: 223 | 224 | ```typescript 225 | { 226 | error: { 227 | code: ErrorCode; 228 | message: string; 229 | data?: unknown; 230 | } 231 | } 232 | ``` 233 | 234 | Common error codes: 235 | 236 | - `InvalidParams`: Invalid or missing parameters 237 | - `InvalidRequest`: Invalid request (e.g., rate limit) 238 | - `InternalError`: Server or API errors 239 | - `MethodNotFound`: Unknown tool name 240 | 241 | ## Configuration 242 | 243 | Server behavior can be customized through environment variables: 244 | 245 | ```bash 246 | # Required 247 | export FIRECRAWL_API_KEY=your-api-key 248 | 249 | # Optional 250 | export FIRECRAWL_API_BASE_URL=https://custom-api.firecrawl.dev/v1 251 | export FIRECRAWL_TIMEOUT=30000 252 | export FIRECRAWL_MAX_RETRIES=3 253 | export FIRECRAWL_RETRY_DELAY=1000 254 | export DEBUG=true 255 | ``` 256 | 257 | For detailed configuration options, see the [configuration guide](configuration.md). 258 | ```