omgwtfwow/mcp-crawl4ai-ts # codebase.md

This is page 1 of 4. Use http://codebase.md/omgwtfwow/mcp-crawl4ai-ts?lines=true&page={x} to view the full context.

# Directory Structure

```
├── .env.example
├── .github
│   ├── CI.md
│   ├── copilot-instructions.md
│   └── workflows
│       └── ci.yml
├── .gitignore
├── .prettierignore
├── .prettierrc.json
├── CHANGELOG.md
├── eslint.config.mjs
├── jest.config.cjs
├── jest.setup.cjs
├── LICENSE
├── package-lock.json
├── package.json
├── README.md
├── src
│   ├── __tests__
│   │   ├── crawl.test.ts
│   │   ├── crawl4ai-service.network.test.ts
│   │   ├── crawl4ai-service.test.ts
│   │   ├── handlers
│   │   │   ├── crawl-handlers.test.ts
│   │   │   ├── parameter-combinations.test.ts
│   │   │   ├── screenshot-saving.test.ts
│   │   │   ├── session-handlers.test.ts
│   │   │   └── utility-handlers.test.ts
│   │   ├── index.cli.test.ts
│   │   ├── index.npx.test.ts
│   │   ├── index.server.test.ts
│   │   ├── index.test.ts
│   │   ├── integration
│   │   │   ├── batch-crawl.integration.test.ts
│   │   │   ├── capture-screenshot.integration.test.ts
│   │   │   ├── crawl-advanced.integration.test.ts
│   │   │   ├── crawl-handlers.integration.test.ts
│   │   │   ├── crawl-recursive.integration.test.ts
│   │   │   ├── crawl.integration.test.ts
│   │   │   ├── execute-js.integration.test.ts
│   │   │   ├── extract-links.integration.test.ts
│   │   │   ├── extract-with-llm.integration.test.ts
│   │   │   ├── generate-pdf.integration.test.ts
│   │   │   ├── get-html.integration.test.ts
│   │   │   ├── get-markdown.integration.test.ts
│   │   │   ├── parse-sitemap.integration.test.ts
│   │   │   ├── session-management.integration.test.ts
│   │   │   ├── smart-crawl.integration.test.ts
│   │   │   └── test-utils.ts
│   │   ├── request-handler.test.ts
│   │   ├── schemas
│   │   │   └── validation-edge-cases.test.ts
│   │   ├── types
│   │   │   └── mocks.ts
│   │   └── utils
│   │       └── javascript-validation.test.ts
│   ├── crawl4ai-service.ts
│   ├── handlers
│   │   ├── base-handler.ts
│   │   ├── content-handlers.ts
│   │   ├── crawl-handlers.ts
│   │   ├── session-handlers.ts
│   │   └── utility-handlers.ts
│   ├── index.ts
│   ├── schemas
│   │   ├── helpers.ts
│   │   └── validation-schemas.ts
│   ├── server.ts
│   └── types.ts
├── tsconfig.build.json
└── tsconfig.json
```

# Files

--------------------------------------------------------------------------------
/.prettierignore:
--------------------------------------------------------------------------------

```
1 | dist
2 | node_modules
3 | *.md
4 | *.json
5 | .env
6 | .env.*
7 | coverage
8 | .nyc_output
```

--------------------------------------------------------------------------------
/.prettierrc.json:
--------------------------------------------------------------------------------

```json
 1 | {
 2 |   "semi": true,
 3 |   "trailingComma": "all",
 4 |   "singleQuote": true,
 5 |   "printWidth": 120,
 6 |   "tabWidth": 2,
 7 |   "useTabs": false,
 8 |   "bracketSpacing": true,
 9 |   "arrowParens": "always",
10 |   "endOfLine": "lf"
11 | }
```

--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------

```
 1 | # Dependencies
 2 | node_modules/
 3 | npm-debug.log*
 4 | yarn-debug.log*
 5 | yarn-error.log*
 6 | 
 7 | # Build output
 8 | dist/
 9 | build/
10 | *.js
11 | *.js.map
12 | *.d.ts
13 | *.d.ts.map
14 | 
15 | # Environment
16 | .env
17 | .env.local
18 | .env.*.local
19 | 
20 | # IDE
21 | .vscode/
22 | .idea/
23 | *.swp
24 | *.swo
25 | *~
26 | 
27 | # OS
28 | .DS_Store
29 | Thumbs.db
30 | 
31 | # Logs
32 | logs/
33 | *.log
34 | 
35 | # Testing
36 | coverage/
37 | .nyc_output/
38 | src/__tests__/mock-responses.json
39 | 
40 | # Temporary files
41 | tmp/
42 | temp/
43 | 
44 | add-to-claude.sh
```

--------------------------------------------------------------------------------
/.env.example:
--------------------------------------------------------------------------------

```
 1 | # Required: URL of your Crawl4AI server
 2 | CRAWL4AI_BASE_URL=http://localhost:11235
 3 | 
 4 | # Optional: API key for authentication (if your server requires it)
 5 | CRAWL4AI_API_KEY=
 6 | 
 7 | # Optional: Custom server name and version
 8 | SERVER_NAME=crawl4ai-mcp
 9 | SERVER_VERSION=0.7.4
10 | 
11 | # Optional: For LLM extraction tests
12 | LLM_PROVIDER=openai/gpt-4o-mini
13 | LLM_API_TOKEN=your-llm-api-key
14 | LLM_BASE_URL=https://api.openai.com/v1  # If using custom endpoint
15 | 
```

--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------

```markdown
  1 | # MCP Server for Crawl4AI
  2 | 
  3 | > **Note:** Tested with Crawl4AI version 0.7.4
  4 | 
  5 | [![npm version](https://img.shields.io/npm/v/mcp-crawl4ai-ts.svg)](https://www.npmjs.com/package/mcp-crawl4ai-ts)
  6 | [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
  7 | [![Node.js CI](https://img.shields.io/badge/Node.js-18+-green.svg)](https://nodejs.org/)
  8 | [![coverage](https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2Fomgwtfwow%2Fe2abffb0deb25afa2bf9185f440dae81%2Fraw%2Fcoverage.json&cacheSeconds=300)](https://omgwtfwow.github.io/mcp-crawl4ai-ts/coverage/)
  9 | 
 10 | TypeScript implementation of an MCP server for Crawl4AI. Provides tools for web crawling, content extraction, and browser automation.
 11 | 
 12 | ## Table of Contents
 13 | 
 14 | - [Prerequisites](#prerequisites)
 15 | - [Quick Start](#quick-start)
 16 | - [Configuration](#configuration)
 17 | - [Client-Specific Instructions](#client-specific-instructions)
 18 | - [Available Tools](#available-tools)
 19 |   - [1. get_markdown](#1-get_markdown---extract-content-as-markdown-with-filtering)
 20 |   - [2. capture_screenshot](#2-capture_screenshot---capture-webpage-screenshot)
 21 |   - [3. generate_pdf](#3-generate_pdf---convert-webpage-to-pdf)
 22 |   - [4. execute_js](#4-execute_js---execute-javascript-and-get-return-values)
 23 |   - [5. batch_crawl](#5-batch_crawl---crawl-multiple-urls-concurrently)
 24 |   - [6. smart_crawl](#6-smart_crawl---auto-detect-and-handle-different-content-types)
 25 |   - [7. get_html](#7-get_html---get-sanitized-html-for-analysis)
 26 |   - [8. extract_links](#8-extract_links---extract-and-categorize-page-links)
 27 |   - [9. crawl_recursive](#9-crawl_recursive---deep-crawl-website-following-links)
 28 |   - [10. parse_sitemap](#10-parse_sitemap---extract-urls-from-xml-sitemaps)
 29 |   - [11. crawl](#11-crawl---advanced-web-crawling-with-full-configuration)
 30 |   - [12. manage_session](#12-manage_session---unified-session-management)
 31 |   - [13. extract_with_llm](#13-extract_with_llm---extract-structured-data-using-ai)
 32 | - [Advanced Configuration](#advanced-configuration)
 33 | - [Development](#development)
 34 | - [License](#license)
 35 | 
 36 | ## Prerequisites
 37 | 
 38 | - Node.js 18+ and npm
 39 | - A running Crawl4AI server
 40 | 
 41 | ## Quick Start
 42 | 
 43 | ### 1. Start the Crawl4AI server (for example, local docker)
 44 | 
 45 | ```bash
 46 | docker run -d -p 11235:11235 --name crawl4ai --shm-size=1g unclecode/crawl4ai:0.7.4
 47 | ```
 48 | 
 49 | ### 2. Add to your MCP client
 50 | 
 51 | This MCP server works with any MCP-compatible client (Claude Desktop, Claude Code, Cursor, LMStudio, etc.).
 52 | 
 53 | #### Using npx (Recommended)
 54 | 
 55 | ```json
 56 | {
 57 |   "mcpServers": {
 58 |     "crawl4ai": {
 59 |       "command": "npx",
 60 |       "args": ["mcp-crawl4ai-ts"],
 61 |       "env": {
 62 |         "CRAWL4AI_BASE_URL": "http://localhost:11235"
 63 |       }
 64 |     }
 65 |   }
 66 | }
 67 | ```
 68 | 
 69 | #### Using local installation
 70 | 
 71 | ```json
 72 | {
 73 |   "mcpServers": {
 74 |     "crawl4ai": {
 75 |       "command": "node",
 76 |       "args": ["/path/to/mcp-crawl4ai-ts/dist/index.js"],
 77 |       "env": {
 78 |         "CRAWL4AI_BASE_URL": "http://localhost:11235"
 79 |       }
 80 |     }
 81 |   }
 82 | }
 83 | ```
 84 | 
 85 | #### With all optional variables
 86 | 
 87 | ```json
 88 | {
 89 |   "mcpServers": {
 90 |     "crawl4ai": {
 91 |       "command": "npx",
 92 |       "args": ["mcp-crawl4ai-ts"],
 93 |       "env": {
 94 |         "CRAWL4AI_BASE_URL": "http://localhost:11235",
 95 |         "CRAWL4AI_API_KEY": "your-api-key",
 96 |         "SERVER_NAME": "custom-name",
 97 |         "SERVER_VERSION": "1.0.0"
 98 |       }
 99 |     }
100 |   }
101 | }
102 | ```
103 | 
104 | ## Configuration
105 | 
106 | ### Environment Variables
107 | 
108 | ```env
109 | # Required
110 | CRAWL4AI_BASE_URL=http://localhost:11235
111 | 
112 | # Optional - Server Configuration
113 | CRAWL4AI_API_KEY=          # If your server requires auth
114 | SERVER_NAME=crawl4ai-mcp   # Custom name for the MCP server
115 | SERVER_VERSION=1.0.0       # Custom version
116 | ```
117 | 
118 | ## Client-Specific Instructions
119 | 
120 | ### Claude Desktop
121 | 
122 | Add to `~/Library/Application Support/Claude/claude_desktop_config.json`
123 | 
124 | ### Claude Code
125 | 
126 | ```bash
127 | claude mcp add crawl4ai -e CRAWL4AI_BASE_URL=http://localhost:11235 -- npx mcp-crawl4ai-ts
128 | ```
129 | 
130 | ### Other MCP Clients
131 | 
132 | Consult your client's documentation for MCP server configuration. The key details:
133 | 
134 | - **Command**: `npx mcp-crawl4ai-ts` or `node /path/to/dist/index.js`
135 | - **Required env**: `CRAWL4AI_BASE_URL`
136 | - **Optional env**: `CRAWL4AI_API_KEY`, `SERVER_NAME`, `SERVER_VERSION`
137 | 
138 | ## Available Tools
139 | 
140 | ### 1. `get_markdown` - Extract content as markdown with filtering
141 | 
142 | ```typescript
143 | { 
144 |   url: string,                              // Required: URL to extract markdown from
145 |   filter?: 'raw'|'fit'|'bm25'|'llm',       // Filter type (default: 'fit')
146 |   query?: string,                           // Query for bm25/llm filters
147 |   cache?: string                            // Cache-bust parameter (default: '0')
148 | }
149 | ```
150 | 
151 | Extracts content as markdown with various filtering options. Use 'bm25' or 'llm' filters with a query for specific content extraction.
152 | 
153 | ### 2. `capture_screenshot` - Capture webpage screenshot
154 | 
155 | ```typescript
156 | { 
157 |   url: string,                   // Required: URL to capture
158 |   screenshot_wait_for?: number   // Seconds to wait before screenshot (default: 2)
159 | }
160 | ```
161 | 
162 | Returns base64-encoded PNG. Note: This is stateless - for screenshots after JS execution, use `crawl` with `screenshot: true`.
163 | 
164 | ### 3. `generate_pdf` - Convert webpage to PDF
165 | 
166 | ```typescript
167 | { 
168 |   url: string  // Required: URL to convert to PDF
169 | }
170 | ```
171 | 
172 | Returns base64-encoded PDF. Stateless tool - for PDFs after JS execution, use `crawl` with `pdf: true`.
173 | 
174 | ### 4. `execute_js` - Execute JavaScript and get return values
175 | 
176 | ```typescript
177 | { 
178 |   url: string,                    // Required: URL to load
179 |   scripts: string | string[]      // Required: JavaScript to execute
180 | }
181 | ```
182 | 
183 | Executes JavaScript and returns results. Each script can use 'return' to get values back. Stateless - for persistent JS execution use `crawl` with `js_code`.
184 | 
185 | ### 5. `batch_crawl` - Crawl multiple URLs concurrently
186 | 
187 | ```typescript
188 | { 
189 |   urls: string[],           // Required: List of URLs to crawl
190 |   max_concurrent?: number,  // Parallel request limit (default: 5)
191 |   remove_images?: boolean,  // Remove images from output (default: false)
192 |   bypass_cache?: boolean,   // Bypass cache for all URLs (default: false)
193 |   configs?: Array<{         // Optional: Per-URL configurations (v3.0.0+)
194 |     url: string,
195 |     [key: string]: any      // Any crawl parameters for this specific URL
196 |   }>
197 | }
198 | ```
199 | 
200 | Efficiently crawls multiple URLs in parallel. Each URL gets a fresh browser instance. With `configs` array, you can specify different parameters for each URL.
201 | 
202 | ### 6. `smart_crawl` - Auto-detect and handle different content types
203 | 
204 | ```typescript
205 | { 
206 |   url: string,            // Required: URL to crawl
207 |   max_depth?: number,     // Maximum depth for recursive crawling (default: 2)
208 |   follow_links?: boolean, // Follow links in content (default: true)
209 |   bypass_cache?: boolean  // Bypass cache (default: false)
210 | }
211 | ```
212 | 
213 | Intelligently detects content type (HTML/sitemap/RSS) and processes accordingly.
214 | 
215 | ### 7. `get_html` - Get sanitized HTML for analysis
216 | 
217 | ```typescript
218 | { 
219 |   url: string  // Required: URL to extract HTML from
220 | }
221 | ```
222 | 
223 | Returns preprocessed HTML optimized for structure analysis. Use for building schemas or analyzing patterns.
224 | 
225 | ### 8. `extract_links` - Extract and categorize page links
226 | 
227 | ```typescript
228 | { 
229 |   url: string,          // Required: URL to extract links from
230 |   categorize?: boolean  // Group by type (default: true)
231 | }
232 | ```
233 | 
234 | Extracts all links and groups them by type: internal, external, social media, documents, images.
235 | 
236 | ### 9. `crawl_recursive` - Deep crawl website following links
237 | 
238 | ```typescript
239 | { 
240 |   url: string,              // Required: Starting URL
241 |   max_depth?: number,       // Maximum depth to crawl (default: 3)
242 |   max_pages?: number,       // Maximum pages to crawl (default: 50)
243 |   include_pattern?: string, // Regex pattern for URLs to include
244 |   exclude_pattern?: string  // Regex pattern for URLs to exclude
245 | }
246 | ```
247 | 
248 | Crawls a website following internal links up to specified depth. Returns content from all discovered pages.
249 | 
250 | ### 10. `parse_sitemap` - Extract URLs from XML sitemaps
251 | 
252 | ```typescript
253 | { 
254 |   url: string,              // Required: Sitemap URL (e.g., /sitemap.xml)
255 |   filter_pattern?: string   // Optional: Regex pattern to filter URLs
256 | }
257 | ```
258 | 
259 | Extracts all URLs from XML sitemaps. Supports regex filtering for specific URL patterns.
260 | 
261 | ### 11. `crawl` - Advanced web crawling with full configuration
262 | 
263 | ```typescript
264 | {
265 |   url: string,                              // URL to crawl
266 |   // Browser Configuration
267 |   browser_type?: 'chromium'|'firefox'|'webkit'|'undetected',  // Browser engine (undetected = stealth mode)
268 |   viewport_width?: number,                  // Browser width (default: 1080)
269 |   viewport_height?: number,                 // Browser height (default: 600)
270 |   user_agent?: string,                      // Custom user agent
271 |   proxy_server?: string | {                 // Proxy URL (string or object format)
272 |     server: string,
273 |     username?: string,
274 |     password?: string
275 |   },
276 |   proxy_username?: string,                  // Proxy auth (if using string format)
277 |   proxy_password?: string,                  // Proxy password (if using string format)
278 |   cookies?: Array<{name, value, domain}>,   // Pre-set cookies
279 |   headers?: Record<string,string>,          // Custom headers
280 |   
281 |   // Crawler Configuration
282 |   word_count_threshold?: number,            // Min words per block (default: 200)
283 |   excluded_tags?: string[],                 // HTML tags to exclude
284 |   remove_overlay_elements?: boolean,        // Remove popups/modals
285 |   js_code?: string | string[],              // JavaScript to execute
286 |   wait_for?: string,                        // Wait condition (selector or JS)
287 |   wait_for_timeout?: number,                // Wait timeout (default: 30000)
288 |   delay_before_scroll?: number,             // Pre-scroll delay
289 |   scroll_delay?: number,                    // Between-scroll delay
290 |   process_iframes?: boolean,                // Include iframe content
291 |   exclude_external_links?: boolean,         // Remove external links
292 |   screenshot?: boolean,                     // Capture screenshot
293 |   pdf?: boolean,                           // Generate PDF
294 |   session_id?: string,                      // Reuse browser session (only works with crawl tool)
295 |   cache_mode?: 'ENABLED'|'BYPASS'|'DISABLED',  // Cache control
296 |   
297 |   // New in v3.0.0 (Crawl4AI 0.7.3/0.7.4)
298 |   css_selector?: string,                    // CSS selector to filter content
299 |   delay_before_return_html?: number,        // Delay in seconds before returning HTML
300 |   include_links?: boolean,                  // Include extracted links in response
301 |   resolve_absolute_urls?: boolean,          // Convert relative URLs to absolute
302 |   
303 |   // LLM Extraction (REST API only supports 'llm' type)
304 |   extraction_type?: 'llm',                  // Only 'llm' extraction is supported via REST API
305 |   extraction_schema?: object,               // Schema for structured extraction
306 |   extraction_instruction?: string,          // Natural language extraction prompt
307 |   extraction_strategy?: {                   // Advanced extraction configuration
308 |     provider?: string,
309 |     api_key?: string,
310 |     model?: string,
311 |     [key: string]: any
312 |   },
313 |   table_extraction_strategy?: {             // Table extraction configuration
314 |     enable_chunking?: boolean,
315 |     thresholds?: object,
316 |     [key: string]: any
317 |   },
318 |   markdown_generator_options?: {            // Markdown generation options
319 |     include_links?: boolean,
320 |     preserve_formatting?: boolean,
321 |     [key: string]: any
322 |   },
323 |   
324 |   timeout?: number,                         // Overall timeout (default: 60000)
325 |   verbose?: boolean                         // Detailed logging
326 | }
327 | ```
328 | 
329 | ### 12. `manage_session` - Unified session management
330 | 
331 | ```typescript
332 | { 
333 |   action: 'create' | 'clear' | 'list',    // Required: Action to perform
334 |   session_id?: string,                    // For 'create' and 'clear' actions
335 |   initial_url?: string,                   // For 'create' action: URL to load
336 |   browser_type?: 'chromium' | 'firefox' | 'webkit' | 'undetected'  // For 'create' action
337 | }
338 | ```
339 | 
340 | Unified tool for managing browser sessions. Supports three actions:
341 | - **create**: Start a persistent browser session
342 | - **clear**: Remove a session from local tracking
343 | - **list**: Show all active sessions
344 | 
345 | Examples:
346 | ```typescript
347 | // Create a new session
348 | { action: 'create', session_id: 'my-session', initial_url: 'https://example.com' }
349 | 
350 | // Clear a session
351 | { action: 'clear', session_id: 'my-session' }
352 | 
353 | // List all sessions
354 | { action: 'list' }
355 | ```
356 | 
357 | ### 13. `extract_with_llm` - Extract structured data using AI
358 | 
359 | ```typescript
360 | { 
361 |   url: string,          // URL to extract data from
362 |   query: string         // Natural language extraction instructions
363 | }
364 | ```
365 | 
366 | Uses AI to extract structured data from webpages. Returns results immediately without any polling or job management. This is the recommended way to extract specific information since CSS/XPath extraction is not supported via the REST API.
367 | 
368 | ## Advanced Configuration
369 | 
370 | For detailed information about all available configuration options, extraction strategies, and advanced features, please refer to the official Crawl4AI documentation:
371 | 
372 | - [Crawl4AI Documentation](https://docs.crawl4ai.com/)
373 | - [Crawl4AI GitHub Repository](https://github.com/unclecode/crawl4ai)
374 | 
375 | ## Changelog
376 | 
377 | See [CHANGELOG.md](CHANGELOG.md) for detailed version history and recent updates.
378 | 
379 | ## Development
380 | 
381 | ### Setup
382 | 
383 | ```bash
384 | # 1. Start the Crawl4AI server
385 | docker run -d -p 11235:11235 --name crawl4ai --shm-size=1g unclecode/crawl4ai:latest
386 | 
387 | # 2. Install MCP server
388 | git clone https://github.com/omgwtfwow/mcp-crawl4ai-ts.git
389 | cd mcp-crawl4ai-ts
390 | npm install
391 | cp .env.example .env
392 | 
393 | # 3. Development commands
394 | npm run dev    # Development mode
395 | npm test       # Run tests
396 | npm run lint   # Check code quality
397 | npm run build  # Production build
398 | 
399 | # 4. Add to your MCP client (See "Using local installation")
400 | ```
401 | 
402 | ### Running Integration Tests
403 | 
404 | Integration tests require a running Crawl4AI server. Configure your environment:
405 | 
406 | ```bash
407 | # Required for integration tests
408 | export CRAWL4AI_BASE_URL=http://localhost:11235
409 | export CRAWL4AI_API_KEY=your-api-key  # If authentication is required
410 | 
411 | # Optional: For LLM extraction tests
412 | export LLM_PROVIDER=openai/gpt-4o-mini
413 | export LLM_API_TOKEN=your-llm-api-key
414 | export LLM_BASE_URL=https://api.openai.com/v1  # If using custom endpoint
415 | 
416 | # Run integration tests (ALWAYS use the npm script; don't call `jest` directly)
417 | npm run test:integration
418 | 
419 | # Run a single integration test file
420 | npm run test:integration -- src/__tests__/integration/extract-links.integration.test.ts
421 | 
422 | > IMPORTANT: Do NOT run `npx jest` directly for integration tests. The npm script injects `NODE_OPTIONS=--experimental-vm-modules` which is required for ESM + ts-jest. Running Jest directly will produce `SyntaxError: Cannot use import statement outside a module` and hang.
423 | ```
424 | 
425 | Integration tests cover:
426 | 
427 | - Dynamic content and JavaScript execution
428 | - Session management and cookies
429 | - Content extraction (LLM-based only)
430 | - Media handling (screenshots, PDFs)
431 | - Performance and caching
432 | - Content filtering
433 | - Bot detection avoidance
434 | - Error handling
435 | 
436 | ### Integration Test Checklist
437 | 1. Docker container healthy:
438 |   ```bash
439 |   docker ps --filter name=crawl4ai --format '{{.Names}} {{.Status}}'
440 |   curl -sf http://localhost:11235/health || echo "Health check failed"
441 |   ```
442 | 2. Env vars loaded (either exported or in `.env`): `CRAWL4AI_BASE_URL` (required), optional: `CRAWL4AI_API_KEY`, `LLM_PROVIDER`, `LLM_API_TOKEN`, `LLM_BASE_URL`.
443 | 3. Use `npm run test:integration` (never raw `jest`).
444 | 4. To target one file add it after `--` (see example above).
445 | 5. Expect total runtime ~2–3 minutes; longer or immediate hang usually means missing `NODE_OPTIONS` or wrong Jest version.
446 | 
447 | ### Troubleshooting
448 | | Symptom | Likely Cause | Fix |
449 | |---------|--------------|-----|
450 | | `SyntaxError: Cannot use import statement outside a module` | Ran `jest` directly without script flags | Re-run with `npm run test:integration` |
451 | | Hangs on first test (RUNS ...) | Missing experimental VM modules flag | Use npm script / ensure `NODE_OPTIONS=--experimental-vm-modules` |
452 | | Network timeouts | Crawl4AI container not healthy / DNS blocked | Restart container: `docker restart <name>` |
453 | | LLM tests skipped | Missing `LLM_PROVIDER` or `LLM_API_TOKEN` | Export required LLM vars |
454 | | New Jest major upgrade breaks tests | Version mismatch with `ts-jest` | Keep Jest 29.x unless `ts-jest` upgraded accordingly |
455 | 
456 | ### Version Compatibility Note
457 | Current stack: `[email protected]` + `[email protected]` + ESM (`"type": "module"`). Updating Jest to 30+ requires upgrading `ts-jest` and revisiting `jest.config.cjs`. Keep versions aligned to avoid parse errors.
458 | 
459 | ## License
460 | 
461 | MIT
```

--------------------------------------------------------------------------------
/tsconfig.build.json:
--------------------------------------------------------------------------------

```json
1 | {
2 |   "extends": "./tsconfig.json",
3 |   "exclude": [
4 |     "node_modules",
5 |     "dist",
6 |     "src/**/*.test.ts",
7 |     "src/__tests__/**/*"
8 |   ]
9 | }
```

--------------------------------------------------------------------------------
/tsconfig.json:
--------------------------------------------------------------------------------

```json
 1 | {
 2 |   "compilerOptions": {
 3 |     "target": "ES2022",
 4 |     "module": "NodeNext",
 5 |     "moduleResolution": "NodeNext",
 6 |     "lib": ["ES2022"],
 7 |     "outDir": "./dist",
 8 |     "rootDir": "./src",
 9 |     "strict": true,
10 |     "esModuleInterop": true,
11 |     "skipLibCheck": true,
12 |     "forceConsistentCasingInFileNames": true,
13 |     "resolveJsonModule": true,
14 |     "declaration": true,
15 |     "declarationMap": true,
16 |     "sourceMap": true,
17 |     "isolatedModules": true
18 |   },
19 |   "include": ["src/**/*"],
20 |   "exclude": ["node_modules", "dist"]
21 | }
```

--------------------------------------------------------------------------------
/jest.setup.cjs:
--------------------------------------------------------------------------------

```
 1 | // Load dotenv for integration tests
 2 | const dotenv = require('dotenv');
 3 | const path = require('path');
 4 | 
 5 | // The npm script sets an env var to identify integration tests
 6 | const isIntegrationTest = process.env.JEST_TEST_TYPE === 'integration';
 7 | 
 8 | if (isIntegrationTest) {
 9 |   // For integration tests, load from .env file
10 |   dotenv.config({ path: path.resolve(__dirname, '.env') });
11 |   
12 |   // For integration tests, we MUST have proper environment variables
13 |   // No fallback to localhost - tests should fail if not configured
14 | } else {
15 |   // For unit tests, always use localhost
16 |   process.env.CRAWL4AI_BASE_URL = 'http://localhost:11235';
17 |   process.env.CRAWL4AI_API_KEY = 'test-api-key';
18 | }
```

--------------------------------------------------------------------------------
/jest.config.cjs:
--------------------------------------------------------------------------------

```
 1 | /** @type {import('jest').Config} */
 2 | module.exports = {
 3 |   preset: 'ts-jest/presets/default-esm',
 4 |   testEnvironment: 'node',
 5 |   roots: ['<rootDir>/src'],
 6 |   testMatch: ['**/__tests__/**/*.test.ts'],
 7 |   setupFiles: ['<rootDir>/jest.setup.cjs'],
 8 |   collectCoverageFrom: [
 9 |     'src/**/*.ts',
10 |     '!src/**/__tests__/**',
11 |     '!src/**/*.test.ts',
12 |     '!src/**/types/**',
13 |   ],
14 |   coverageDirectory: 'coverage',
15 |   coverageReporters: ['text', 'lcov', 'html', 'json'],
16 |   moduleNameMapper: {
17 |     '^(\\.{1,2}/.*)\\.js$': '$1',
18 |   },
19 |   transform: {
20 |     '^.+\\.tsx?$': [
21 |       'ts-jest',
22 |       {
23 |         useESM: true,
24 |       },
25 |     ],
26 |   },
27 |   extensionsToTreatAsEsm: ['.ts'],
28 |   clearMocks: true,
29 |   // Limit parallelization for integration tests to avoid overwhelming the server
30 |   ...(process.env.NODE_ENV === 'test' && process.argv.some(arg => arg.includes('integration')) ? { maxWorkers: 1 } : {}),
31 | };
```

--------------------------------------------------------------------------------
/src/__tests__/types/mocks.ts:
--------------------------------------------------------------------------------

```typescript
 1 | /* eslint-env jest */
 2 | import type { AxiosResponse } from 'axios';
 3 | 
 4 | /**
 5 |  * Mock axios instance for testing HTTP client behavior
 6 |  */
 7 | export interface MockAxiosInstance {
 8 |   post: jest.Mock<Promise<AxiosResponse>>;
 9 |   get: jest.Mock<Promise<AxiosResponse>>;
10 |   head: jest.Mock<Promise<AxiosResponse>>;
11 |   put?: jest.Mock<Promise<AxiosResponse>>;
12 |   delete?: jest.Mock<Promise<AxiosResponse>>;
13 |   patch?: jest.Mock<Promise<AxiosResponse>>;
14 | }
15 | 
16 | /**
17 |  * Mock function type that returns a promise with content array
18 |  */
19 | type MockFunction = jest.Mock<Promise<{ content: TestContent }>>;
20 | 
21 | /**
22 |  * Mock server interface for MCP server testing
23 |  */
24 | export interface MockMCPServer {
25 |   listTools: MockFunction;
26 |   callTool: MockFunction;
27 |   listResources?: MockFunction;
28 |   readResource?: MockFunction;
29 |   listPrompts?: MockFunction;
30 |   getPrompt?: MockFunction;
31 | }
32 | 
33 | /**
34 |  * Type for test content arrays used in MCP responses
35 |  */
36 | export type TestContent = Array<{
37 |   type: string;
38 |   text?: string;
39 |   resource?: {
40 |     uri: string;
41 |     mimeType: string;
42 |     blob?: string;
43 |   };
44 | }>;
45 | 
46 | /**
47 |  * Generic test response type
48 |  */
49 | export interface TestResponse<T = unknown> {
50 |   content: TestContent;
51 |   data?: T;
52 |   error?: string;
53 | }
54 | 
```

--------------------------------------------------------------------------------
/src/index.ts:
--------------------------------------------------------------------------------

```typescript
 1 | #!/usr/bin/env node
 2 | 
 3 | import { Crawl4AIServer } from './server.js';
 4 | 
 5 | // Try to load dotenv only in development
 6 | // In production (via npx), env vars come from the MCP client
 7 | try {
 8 |   // Only try to load dotenv if CRAWL4AI_BASE_URL is not set
 9 |   if (!process.env.CRAWL4AI_BASE_URL) {
10 |     const dotenv = await import('dotenv');
11 |     dotenv.config();
12 |   }
13 | } catch {
14 |   // dotenv is not available in production, which is expected
15 | }
16 | 
17 | const CRAWL4AI_BASE_URL = process.env.CRAWL4AI_BASE_URL;
18 | const CRAWL4AI_API_KEY = process.env.CRAWL4AI_API_KEY || '';
19 | const SERVER_NAME = process.env.SERVER_NAME || 'crawl4ai-mcp';
20 | const SERVER_VERSION = process.env.SERVER_VERSION || '1.0.0';
21 | 
22 | if (!CRAWL4AI_BASE_URL) {
23 |   console.error('Error: CRAWL4AI_BASE_URL environment variable is required');
24 |   console.error('Please set it to your Crawl4AI server URL (e.g., http://localhost:8080)');
25 |   process.exit(1);
26 | }
27 | 
28 | // Always start the server when this script is executed
29 | // This script is meant to be run as an MCP server
30 | const server = new Crawl4AIServer(CRAWL4AI_BASE_URL, CRAWL4AI_API_KEY, SERVER_NAME, SERVER_VERSION);
31 | server.start().catch((err) => {
32 |   console.error('Server failed to start:', err);
33 |   process.exit(1);
34 | });
35 | 
```

--------------------------------------------------------------------------------
/.github/CI.md:
--------------------------------------------------------------------------------

```markdown
 1 | # GitHub Actions CI/CD
 2 | 
 3 | This project uses GitHub Actions for continuous integration.
 4 | 
 5 | ## Workflows
 6 | 
 7 | ### CI (`ci.yml`)
 8 | Runs on every push to main and on pull requests:
 9 | - Linting (ESLint)
10 | - Code formatting check (Prettier)
11 | - Build (TypeScript compilation)
12 | - Unit tests (with nock mocks)
13 | - Test coverage report
14 | 
15 | Tests run on Node.js 18.x and 20.x.
16 | 
17 | ## Mock Maintenance
18 | 
19 | The unit tests use [nock](https://github.com/nock/nock) for HTTP mocking. This provides:
20 | - Fast test execution (~1 second)
21 | - Predictable test results
22 | - No external dependencies during CI
23 | 
24 | **How to update mocks:**
25 | 
26 | Option 1 - Generate mock code from real API:
27 | ```bash
28 | # This will call the real API and generate nock mock code
29 | CRAWL4AI_API_KEY=your-key npm run generate-mocks
30 | ```
31 | 
32 | Option 2 - View API responses as JSON:
33 | ```bash
34 | # This will save responses to mock-responses.json for inspection
35 | CRAWL4AI_API_KEY=your-key npm run view-mocks
36 | ```
37 | 
38 | Option 3 - Manual update:
39 | 1. Run integration tests to see current API behavior: `npm run test:integration`
40 | 2. Update the mock responses in `src/__tests__/crawl4ai-service.test.ts`
41 | 3. Ensure unit tests pass: `npm run test:unit`
42 | 
43 | The mocks are intentionally simple and focus on testing our code's behavior, not the API's exact responses.
44 | 
45 | ## Running Tests Locally
46 | 
47 | ```bash
48 | # Run all tests
49 | npm test
50 | 
51 | # Run only unit tests (fast, with mocks)
52 | npm run test:unit
53 | 
54 | # Run only integration tests (slow, real API)
55 | npm run test:integration
56 | 
57 | # Run with coverage
58 | npm run test:coverage
59 | ```
```

--------------------------------------------------------------------------------
/src/handlers/base-handler.ts:
--------------------------------------------------------------------------------

```typescript
 1 | import { Crawl4AIService } from '../crawl4ai-service.js';
 2 | import { AxiosInstance } from 'axios';
 3 | 
 4 | // Error handling types
 5 | export interface ErrorWithResponse {
 6 |   response?: {
 7 |     data?:
 8 |       | {
 9 |           detail?: string;
10 |         }
11 |       | string
12 |       | unknown;
13 |   };
14 |   message?: string;
15 | }
16 | 
17 | export interface SessionInfo {
18 |   id: string;
19 |   created_at: Date;
20 |   last_used: Date;
21 |   initial_url?: string;
22 |   metadata?: Record<string, unknown>;
23 | }
24 | 
25 | export abstract class BaseHandler {
26 |   protected service: Crawl4AIService;
27 |   protected axiosClient: AxiosInstance;
28 |   protected sessions: Map<string, SessionInfo>;
29 | 
30 |   constructor(service: Crawl4AIService, axiosClient: AxiosInstance, sessions: Map<string, SessionInfo>) {
31 |     this.service = service;
32 |     this.axiosClient = axiosClient;
33 |     this.sessions = sessions;
34 |   }
35 | 
36 |   protected formatError(error: unknown, operation: string): Error {
37 |     const errorWithResponse = error as ErrorWithResponse;
38 |     let errorMessage = '';
39 | 
40 |     const data = errorWithResponse.response?.data;
41 |     if (typeof data === 'object' && data && 'detail' in data) {
42 |       errorMessage = (data as { detail: string }).detail;
43 |     } else if (data) {
44 |       // If data is an object, stringify it
45 |       errorMessage = typeof data === 'object' ? JSON.stringify(data) : String(data);
46 |     } else if (error instanceof Error) {
47 |       errorMessage = error.message;
48 |     } else {
49 |       errorMessage = String(error);
50 |     }
51 | 
52 |     return new Error(`Failed to ${operation}: ${errorMessage}`);
53 |   }
54 | }
55 | 
```

--------------------------------------------------------------------------------
/eslint.config.mjs:
--------------------------------------------------------------------------------

```
 1 | import eslint from '@eslint/js';
 2 | import tseslint from '@typescript-eslint/eslint-plugin';
 3 | import tsparser from '@typescript-eslint/parser';
 4 | import prettier from 'eslint-config-prettier';
 5 | import prettierPlugin from 'eslint-plugin-prettier';
 6 | 
 7 | export default [
 8 |   eslint.configs.recommended,
 9 |   prettier,
10 |   {
11 |     files: ['src/**/*.ts'],
12 |     languageOptions: {
13 |       parser: tsparser,
14 |       parserOptions: {
15 |         project: './tsconfig.json',
16 |         ecmaVersion: 'latest',
17 |         sourceType: 'module',
18 |       },
19 |       globals: {
20 |         console: 'readonly',
21 |         process: 'readonly',
22 |         Buffer: 'readonly',
23 |         __dirname: 'readonly',
24 |         __filename: 'readonly',
25 |         setTimeout: 'readonly',
26 |         clearTimeout: 'readonly',
27 |         setInterval: 'readonly',
28 |         clearInterval: 'readonly',
29 |         URL: 'readonly',
30 |       },
31 |     },
32 |     plugins: {
33 |       '@typescript-eslint': tseslint,
34 |       prettier: prettierPlugin,
35 |     },
36 |     rules: {
37 |       ...tseslint.configs.recommended.rules,
38 |       '@typescript-eslint/explicit-function-return-type': 'off',
39 |       '@typescript-eslint/explicit-module-boundary-types': 'off',
40 |       '@typescript-eslint/no-explicit-any': 'warn',
41 |       '@typescript-eslint/no-unused-vars': [
42 |         'error',
43 |         {
44 |           argsIgnorePattern: '^_',
45 |           varsIgnorePattern: '^_',
46 |         },
47 |       ],
48 |       '@typescript-eslint/no-misused-promises': [
49 |         'error',
50 |         {
51 |           checksVoidReturn: false,
52 |         },
53 |       ],
54 |       'prettier/prettier': 'error',
55 |     },
56 |   },
57 |   {
58 |     files: ['src/**/*.test.ts', 'src/**/*.integration.test.ts', 'src/**/test-utils.ts', 'src/__tests__/types/*.ts'],
59 |     languageOptions: {
60 |       globals: {
61 |         describe: 'readonly',
62 |         it: 'readonly',
63 |         expect: 'readonly',
64 |         beforeEach: 'readonly',
65 |         afterEach: 'readonly',
66 |         beforeAll: 'readonly',
67 |         afterAll: 'readonly',
68 |         jest: 'readonly',
69 |       },
70 |     },
71 |   },
72 |   {
73 |     ignores: ['dist/**', 'node_modules/**', '*.js', '*.mjs', '*.cjs', 'coverage/**'],
74 |   },
75 | ];
```

--------------------------------------------------------------------------------
/src/schemas/helpers.ts:
--------------------------------------------------------------------------------

```typescript
 1 | import { z } from 'zod';
 2 | 
 3 | // Helper to validate JavaScript code
 4 | export const validateJavaScriptCode = (code: string): boolean => {
 5 |   // Check for common HTML entities that shouldn't be in JS
 6 |   if (/&quot;|&amp;|&lt;|&gt;|&#\d+;|&\w+;/.test(code)) {
 7 |     return false;
 8 |   }
 9 | 
10 |   // Basic check to ensure it's not HTML
11 |   if (/<(!DOCTYPE|html|body|head|script|style)\b/i.test(code)) {
12 |     return false;
13 |   }
14 | 
15 |   // Check for literal \n, \t, \r outside of strings (common LLM mistake)
16 |   // This is tricky - we'll check if the code has these patterns in a way that suggests
17 |   // they're meant to be actual newlines/tabs rather than escape sequences in strings
18 |   // Look for patterns like: ;\n or }\n or )\n which suggest literal newlines
19 |   if (/[;})]\s*\\n|\\n\s*[{(/]/.test(code)) {
20 |     return false;
21 |   }
22 | 
23 |   // Check for obvious cases of literal \n between statements
24 |   if (/[;})]\s*\\n\s*\w/.test(code)) {
25 |     return false;
26 |   }
27 | 
28 |   return true;
29 | };
30 | 
31 | // Helper to create schema that rejects session_id
32 | export const createStatelessSchema = <T extends z.ZodObject<z.ZodRawShape>>(schema: T, toolName: string) => {
33 |   // Tool-specific guidance for common scenarios
34 |   const toolGuidance: Record<string, string> = {
35 |     capture_screenshot: 'To capture screenshots with sessions, use crawl(session_id, screenshot: true)',
36 |     generate_pdf: 'To generate PDFs with sessions, use crawl(session_id, pdf: true)',
37 |     execute_js: 'To run JavaScript with sessions, use crawl(session_id, js_code: [...])',
38 |     get_html: 'To get HTML with sessions, use crawl(session_id)',
39 |     extract_with_llm: 'To extract data with sessions, first use crawl(session_id) then extract from the response',
40 |   };
41 | 
42 |   const message = `${toolName} does not support session_id. This tool is stateless - each call creates a new browser. ${
43 |     toolGuidance[toolName] || 'For persistent operations, use crawl with session_id.'
44 |   }`;
45 | 
46 |   return z
47 |     .object({
48 |       session_id: z.never({ message }).optional(),
49 |     })
50 |     .passthrough()
51 |     .and(schema)
52 |     .transform((data) => {
53 |       const { session_id, ...rest } = data;
54 |       if (session_id !== undefined) {
55 |         throw new Error(message);
56 |       }
57 |       return rest as z.infer<T>;
58 |     });
59 | };
60 | 
```

--------------------------------------------------------------------------------
/.github/workflows/ci.yml:
--------------------------------------------------------------------------------

```yaml
 1 | name: CI
 2 | 
 3 | on:
 4 |   push:
 5 |     branches: [ main ]
 6 |   pull_request:
 7 |     branches: [ main ]
 8 | 
 9 | permissions:
10 |   contents: write
11 |   pages: write
12 |   id-token: write
13 | 
14 | jobs:
15 |   test:
16 |     runs-on: ubuntu-latest
17 |     
18 |     strategy:
19 |       matrix:
20 |         node-version: [18.x, 20.x, 22.x]
21 |     
22 |     steps:
23 |     - uses: actions/checkout@v4
24 |     
25 |     - name: Use Node.js ${{ matrix.node-version }}
26 |       uses: actions/setup-node@v4
27 |       with:
28 |         node-version: ${{ matrix.node-version }}
29 |         cache: 'npm'
30 |     
31 |     - name: Install dependencies
32 |       run: npm ci
33 |     
34 |     - name: Run linter
35 |       run: npm run lint
36 |     
37 |     - name: Check formatting
38 |       run: npm run format:check
39 |     
40 |     - name: Build
41 |       run: npm run build
42 |     
43 |     - name: Run unit tests
44 |       run: npm run test:unit
45 |     
46 |     - name: Generate coverage report
47 |       if: matrix.node-version == '18.x'
48 |       run: npm run test:coverage -- --testPathIgnorePatterns=integration --testPathIgnorePatterns=examples
49 |     
50 |     - name: Upload coverage reports
51 |       if: matrix.node-version == '18.x'
52 |       uses: actions/upload-artifact@v4
53 |       with:
54 |         name: coverage-report
55 |         path: coverage/
56 |     
57 |     - name: Update coverage gist
58 |       if: matrix.node-version == '18.x'
59 |       env:
60 |         GIST_SECRET: ${{ secrets.GIST_SECRET }}
61 |       run: |
62 |         # Extract coverage percentage from lcov.info
63 |         COVERAGE=$(awk -F: '/^SF:/{files++} /^LF:/{lines+=$2} /^LH:/{hits+=$2} END {printf "%.0f", (hits/lines)*100}' coverage/lcov.info)
64 |         
65 |         # Determine color based on coverage
66 |         if [ $COVERAGE -ge 90 ]; then COLOR="brightgreen"
67 |         elif [ $COVERAGE -ge 70 ]; then COLOR="green"
68 |         elif [ $COVERAGE -ge 50 ]; then COLOR="yellow"
69 |         elif [ $COVERAGE -ge 30 ]; then COLOR="orange"
70 |         else COLOR="red"; fi
71 |         
72 |         # Update gist
73 |         echo "{\"schemaVersion\":1,\"label\":\"coverage\",\"message\":\"${COVERAGE}%\",\"color\":\"${COLOR}\"}" > coverage.json
74 |         gh auth login --with-token <<< "$GIST_SECRET"
75 |         gh gist edit e2abffb0deb25afa2bf9185f440dae81 coverage.json
76 |     
77 |     - name: Deploy coverage to GitHub Pages
78 |       if: matrix.node-version == '18.x' && github.ref == 'refs/heads/main'
79 |       uses: peaceiris/actions-gh-pages@v4
80 |       with:
81 |         github_token: ${{ secrets.GITHUB_TOKEN }}
82 |         publish_dir: ./coverage/lcov-report
83 |         destination_dir: coverage
```

--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------

```json
 1 | {
 2 |   "name": "mcp-crawl4ai-ts",
 3 |   "version": "3.0.2",
 4 |   "description": "TypeScript MCP server for Crawl4AI - web crawling and content extraction",
 5 |   "main": "dist/index.js",
 6 |   "bin": {
 7 |     "mcp-crawl4ai-ts": "dist/index.js"
 8 |   },
 9 |   "type": "module",
10 |   "engines": {
11 |     "node": ">=18.0.0"
12 |   },
13 |   "scripts": {
14 |     "build": "tsc -p tsconfig.build.json",
15 |     "start": "node dist/index.js",
16 |     "dev": "tsx src/index.ts",
17 |     "test": "NODE_OPTIONS=--experimental-vm-modules jest",
18 |     "test:watch": "NODE_OPTIONS=--experimental-vm-modules jest --watch",
19 |     "test:coverage": "NODE_OPTIONS=--experimental-vm-modules jest --coverage",
20 |     "test:unit": "NODE_OPTIONS=--experimental-vm-modules jest --testPathIgnorePatterns=integration --testPathIgnorePatterns=examples",
21 |     "test:integration": "JEST_TEST_TYPE=integration NODE_OPTIONS=--experimental-vm-modules jest src/__tests__/integration",
22 |     "test:ci": "NODE_OPTIONS=--experimental-vm-modules jest --coverage --maxWorkers=2",
23 |     "lint": "eslint src --ext .ts",
24 |     "lint:fix": "eslint src --ext .ts --fix",
25 |     "format": "prettier --write \"src/**/*.ts\"",
26 |     "format:check": "prettier --check \"src/**/*.ts\"",
27 |     "check": "npm run lint && npm run format:check && npm run build"
28 |   },
29 |   "keywords": [
30 |     "mcp",
31 |     "crawl4ai",
32 |     "web-scraping",
33 |     "markdown",
34 |     "pdf",
35 |     "screenshot"
36 |   ],
37 |   "author": "Juan González Cano",
38 |   "license": "MIT",
39 |   "repository": {
40 |     "type": "git",
41 |     "url": "git+https://github.com/omgwtfwow/mcp-crawl4ai-ts.git"
42 |   },
43 |   "bugs": {
44 |     "url": "https://github.com/omgwtfwow/mcp-crawl4ai-ts/issues"
45 |   },
46 |   "homepage": "https://github.com/omgwtfwow/mcp-crawl4ai-ts#readme",
47 |   "files": [
48 |     "dist/**/*",
49 |     "README.md",
50 |     "LICENSE"
51 |   ],
52 |   "dependencies": {
53 |     "@modelcontextprotocol/sdk": "^1.0.4",
54 |     "axios": "^1.7.9",
55 |     "dotenv": "^16.4.7",
56 |     "zod": "^3.25.76"
57 |   },
58 |   "devDependencies": {
59 |     "@eslint/js": "^9.32.0",
60 |   "@jest/globals": "^29.7.0",
61 |   "@types/jest": "^29.5.12",
62 |     "@types/nock": "^10.0.3",
63 |     "@types/node": "^22.10.6",
64 |     "@typescript-eslint/eslint-plugin": "^8.38.0",
65 |     "@typescript-eslint/parser": "^8.38.0",
66 |     "diff": "^8.0.2",
67 |     "eslint": "^9.32.0",
68 |     "eslint-config-prettier": "^10.1.8",
69 |     "eslint-plugin-prettier": "^5.5.3",
70 |   "jest": "^29.7.0",
71 |     "nock": "^14.0.8",
72 |     "prettier": "^3.6.2",
73 |     "ts-jest": "^29.4.0",
74 |     "tsx": "^4.19.2",
75 |     "typescript": "^5.7.3"
76 |   }
77 | }
78 | 
```

--------------------------------------------------------------------------------
/src/__tests__/handlers/session-handlers.test.ts:
--------------------------------------------------------------------------------

```typescript
 1 | /* eslint-env jest */
 2 | import { jest } from '@jest/globals';
 3 | import { AxiosError } from 'axios';
 4 | import type { SessionHandlers as SessionHandlersType } from '../../handlers/session-handlers.js';
 5 | 
 6 | // Mock axios before importing SessionHandlers
 7 | const mockPost = jest.fn();
 8 | const mockAxiosClient = {
 9 |   post: mockPost,
10 | };
11 | 
12 | // Mock the service
13 | const mockService = {} as unknown;
14 | 
15 | // Import after setting up mocks
16 | const { SessionHandlers } = await import('../../handlers/session-handlers.js');
17 | 
18 | describe('SessionHandlers', () => {
19 |   let handler: SessionHandlersType;
20 |   let sessions: Map<string, unknown>;
21 | 
22 |   beforeEach(() => {
23 |     jest.clearAllMocks();
24 |     sessions = new Map();
25 |     handler = new SessionHandlers(mockService, mockAxiosClient as unknown, sessions);
26 |   });
27 | 
28 |   describe('createSession', () => {
29 |     it('should handle initial crawl failure gracefully', async () => {
30 |       // Mock failed crawl
31 |       mockPost.mockRejectedValue(
32 |         new AxiosError('Request failed with status code 500', 'ERR_BAD_RESPONSE', undefined, undefined, {
33 |           status: 500,
34 |           statusText: 'Internal Server Error',
35 |           data: 'Internal Server Error',
36 |           headers: {},
37 |           config: {} as unknown,
38 |         } as unknown),
39 |       );
40 | 
41 |       const options = {
42 |         initial_url: 'https://this-domain-definitely-does-not-exist-12345.com',
43 |         browser_type: 'chromium' as const,
44 |       };
45 | 
46 |       // Create session with initial_url that will fail
47 |       const result = await handler.createSession(options);
48 | 
49 |       // Session should still be created
50 |       expect(result.content[0].type).toBe('text');
51 |       expect(result.content[0].text).toContain('Session created successfully');
52 |       expect(result.content[0].text).toContain(
53 |         'Pre-warmed with: https://this-domain-definitely-does-not-exist-12345.com',
54 |       );
55 |       expect(result.session_id).toBeDefined();
56 |       expect(result.browser_type).toBe('chromium');
57 | 
58 |       // Verify crawl was attempted
59 |       expect(mockPost).toHaveBeenCalledWith(
60 |         '/crawl',
61 |         {
62 |           urls: ['https://this-domain-definitely-does-not-exist-12345.com'],
63 |           browser_config: {
64 |             headless: true,
65 |             browser_type: 'chromium',
66 |           },
67 |           crawler_config: {
68 |             session_id: expect.stringMatching(/^session-/),
69 |             cache_mode: 'BYPASS',
70 |           },
71 |         },
72 |         {
73 |           timeout: 30000,
74 |         },
75 |       );
76 | 
77 |       // Verify session was stored locally
78 |       expect(sessions.size).toBe(1);
79 |       const session = sessions.get(result.session_id);
80 |       expect(session).toBeDefined();
81 |       expect(session.initial_url).toBe('https://this-domain-definitely-does-not-exist-12345.com');
82 |     });
83 | 
84 |     it('should not attempt crawl when no initial_url provided', async () => {
85 |       const result = await handler.createSession({});
86 | 
87 |       // Session should be created without crawl
88 |       expect(result.content[0].text).toContain('Session created successfully');
89 |       expect(result.content[0].text).toContain('Ready for use');
90 |       expect(result.content[0].text).not.toContain('Pre-warmed');
91 | 
92 |       // Verify no crawl was attempted
93 |       expect(mockPost).not.toHaveBeenCalled();
94 |     });
95 |   });
96 | });
97 | 
```

--------------------------------------------------------------------------------
/src/__tests__/integration/generate-pdf.integration.test.ts:
--------------------------------------------------------------------------------

```typescript
  1 | /* eslint-env jest */
  2 | import { Client } from '@modelcontextprotocol/sdk/client/index.js';
  3 | import { createTestClient, cleanupTestClient, TEST_TIMEOUTS } from './test-utils.js';
  4 | 
  5 | interface ToolResult {
  6 |   content: Array<{
  7 |     type: string;
  8 |     text?: string;
  9 |     resource?: {
 10 |       uri: string;
 11 |       mimeType?: string;
 12 |       blob?: string;
 13 |     };
 14 |   }>;
 15 | }
 16 | 
 17 | describe('generate_pdf Integration Tests', () => {
 18 |   let client: Client;
 19 | 
 20 |   beforeAll(async () => {
 21 |     client = await createTestClient();
 22 |   }, TEST_TIMEOUTS.medium);
 23 | 
 24 |   afterAll(async () => {
 25 |     if (client) {
 26 |       await cleanupTestClient(client);
 27 |     }
 28 |   });
 29 | 
 30 |   describe('PDF generation', () => {
 31 |     it(
 32 |       'should generate PDF from URL',
 33 |       async () => {
 34 |         const result = await client.callTool({
 35 |           name: 'generate_pdf',
 36 |           arguments: {
 37 |             url: 'https://httpbin.org/html',
 38 |           },
 39 |         });
 40 | 
 41 |         expect(result).toBeDefined();
 42 |         const content = (result as ToolResult).content;
 43 |         expect(content).toHaveLength(2);
 44 | 
 45 |         // First item should be the PDF as embedded resource
 46 |         expect(content[0].type).toBe('resource');
 47 |         expect(content[0].resource).toBeDefined();
 48 |         expect(content[0].resource?.mimeType).toBe('application/pdf');
 49 |         expect(content[0].resource?.blob).toBeTruthy();
 50 |         expect(content[0].resource?.blob?.length).toBeGreaterThan(1000); // Should be a substantial base64 string
 51 |         expect(content[0].resource?.uri).toContain('data:application/pdf');
 52 | 
 53 |         // Second item should be text description
 54 |         expect(content[1].type).toBe('text');
 55 |         expect(content[1].text).toContain('PDF generated for: https://httpbin.org/html');
 56 |       },
 57 |       TEST_TIMEOUTS.long,
 58 |     );
 59 | 
 60 |     it(
 61 |       'should reject session_id parameter',
 62 |       async () => {
 63 |         const result = await client.callTool({
 64 |           name: 'generate_pdf',
 65 |           arguments: {
 66 |             url: 'https://httpbin.org/html',
 67 |             session_id: 'test-session',
 68 |           },
 69 |         });
 70 | 
 71 |         const content = (result as ToolResult).content;
 72 |         expect(content).toHaveLength(1);
 73 |         expect(content[0].type).toBe('text');
 74 |         expect(content[0].text).toContain('session_id');
 75 |         expect(content[0].text).toContain('does not support');
 76 |         expect(content[0].text).toContain('stateless');
 77 |       },
 78 |       TEST_TIMEOUTS.short,
 79 |     );
 80 | 
 81 |     it(
 82 |       'should handle invalid URLs gracefully',
 83 |       async () => {
 84 |         const result = await client.callTool({
 85 |           name: 'generate_pdf',
 86 |           arguments: {
 87 |             url: 'not-a-valid-url',
 88 |           },
 89 |         });
 90 | 
 91 |         const content = (result as ToolResult).content;
 92 |         expect(content).toHaveLength(1);
 93 |         expect(content[0].type).toBe('text');
 94 |         expect(content[0].text).toContain('Error');
 95 |         expect(content[0].text?.toLowerCase()).toContain('invalid');
 96 |       },
 97 |       TEST_TIMEOUTS.short,
 98 |     );
 99 | 
100 |     it(
101 |       'should handle non-existent domains',
102 |       async () => {
103 |         const result = await client.callTool({
104 |           name: 'generate_pdf',
105 |           arguments: {
106 |             url: 'https://this-domain-definitely-does-not-exist-123456789.com',
107 |           },
108 |         });
109 | 
110 |         const content = (result as ToolResult).content;
111 |         expect(content).toHaveLength(1);
112 |         expect(content[0].type).toBe('text');
113 |         expect(content[0].text).toContain('Error');
114 |       },
115 |       TEST_TIMEOUTS.short,
116 |     );
117 |   });
118 | });
119 | 
```

--------------------------------------------------------------------------------
/src/__tests__/integration/session-management.integration.test.ts:
--------------------------------------------------------------------------------

```typescript
  1 | import { createTestClient, cleanupTestClient, TEST_TIMEOUTS } from './test-utils.js';
  2 | import { Client } from '@modelcontextprotocol/sdk/client/index.js';
  3 | 
  4 | interface ToolResult {
  5 |   content: Array<{
  6 |     type: string;
  7 |     text?: string;
  8 |   }>;
  9 |   session_id?: string;
 10 |   browser_type?: string;
 11 |   initial_url?: string;
 12 |   created_at?: string;
 13 | }
 14 | 
 15 | describe('Session Management Integration Tests', () => {
 16 |   let client: Client;
 17 |   const createdSessions: string[] = [];
 18 | 
 19 |   beforeAll(async () => {
 20 |     client = await createTestClient();
 21 |   }, TEST_TIMEOUTS.medium);
 22 | 
 23 |   afterEach(async () => {
 24 |     // Clean up any sessions created during tests
 25 |     for (const sessionId of createdSessions) {
 26 |       try {
 27 |         await client.callTool({
 28 |           name: 'manage_session',
 29 |           arguments: { action: 'clear', session_id: sessionId },
 30 |         });
 31 |       } catch (e) {
 32 |         // Ignore errors during cleanup
 33 |         console.debug('Cleanup error:', e);
 34 |       }
 35 |     }
 36 |     createdSessions.length = 0;
 37 |   });
 38 | 
 39 |   afterAll(async () => {
 40 |     if (client) {
 41 |       await cleanupTestClient(client);
 42 |     }
 43 |   });
 44 | 
 45 |   describe('manage_session', () => {
 46 |     it(
 47 |       'should create session with auto-generated ID using manage_session',
 48 |       async () => {
 49 |         const result = await client.callTool({
 50 |           name: 'manage_session',
 51 |           arguments: { action: 'create' },
 52 |         });
 53 | 
 54 |         expect(result).toBeDefined();
 55 |         const typedResult = result as ToolResult;
 56 |         expect(typedResult.content).toBeDefined();
 57 |         expect(Array.isArray(typedResult.content)).toBe(true);
 58 | 
 59 |         const textContent = typedResult.content.find((c) => c.type === 'text');
 60 |         expect(textContent).toBeDefined();
 61 |         expect(textContent?.text).toContain('Session created successfully');
 62 | 
 63 |         // Check returned parameters
 64 |         expect(typedResult.session_id).toBeDefined();
 65 |         expect(typedResult.session_id).toMatch(/^session-/);
 66 |         expect(typedResult.browser_type).toBe('chromium');
 67 | 
 68 |         // Track for cleanup
 69 |         createdSessions.push(typedResult.session_id!);
 70 |       },
 71 |       TEST_TIMEOUTS.short,
 72 |     );
 73 | 
 74 |     it(
 75 |       'should clear session using manage_session',
 76 |       async () => {
 77 |         // First create a session
 78 |         const createResult = await client.callTool({
 79 |           name: 'manage_session',
 80 |           arguments: {
 81 |             action: 'create',
 82 |             session_id: 'test-to-clear',
 83 |           },
 84 |         });
 85 | 
 86 |         const typedCreateResult = createResult as ToolResult;
 87 |         createdSessions.push(typedCreateResult.session_id!);
 88 | 
 89 |         // Then clear it
 90 |         const clearResult = await client.callTool({
 91 |           name: 'manage_session',
 92 |           arguments: {
 93 |             action: 'clear',
 94 |             session_id: 'test-to-clear',
 95 |           },
 96 |         });
 97 | 
 98 |         const typedClearResult = clearResult as ToolResult;
 99 |         expect(typedClearResult.content[0].text).toContain('Session cleared successfully');
100 |       },
101 |       TEST_TIMEOUTS.short,
102 |     );
103 | 
104 |     it(
105 |       'should list sessions using manage_session',
106 |       async () => {
107 |         // Create a session first
108 |         const createResult = await client.callTool({
109 |           name: 'manage_session',
110 |           arguments: {
111 |             action: 'create',
112 |             session_id: 'test-list-session',
113 |           },
114 |         });
115 | 
116 |         const typedCreateResult = createResult as ToolResult;
117 |         createdSessions.push(typedCreateResult.session_id!);
118 | 
119 |         // List sessions
120 |         const listResult = await client.callTool({
121 |           name: 'manage_session',
122 |           arguments: { action: 'list' },
123 |         });
124 | 
125 |         const typedListResult = listResult as ToolResult;
126 |         expect(typedListResult.content[0].text).toContain('Active sessions');
127 |         expect(typedListResult.content[0].text).toContain('test-list-session');
128 |       },
129 |       TEST_TIMEOUTS.short,
130 |     );
131 |   });
132 | });
133 | 
```

--------------------------------------------------------------------------------
/src/__tests__/integration/get-html.integration.test.ts:
--------------------------------------------------------------------------------

```typescript
  1 | /* eslint-env jest */
  2 | import { Client } from '@modelcontextprotocol/sdk/client/index.js';
  3 | import { createTestClient, cleanupTestClient, TEST_TIMEOUTS } from './test-utils.js';
  4 | 
  5 | interface ToolResult {
  6 |   content: Array<{
  7 |     type: string;
  8 |     text?: string;
  9 |   }>;
 10 | }
 11 | 
 12 | describe('get_html Integration Tests', () => {
 13 |   let client: Client;
 14 | 
 15 |   beforeAll(async () => {
 16 |     client = await createTestClient();
 17 |   }, TEST_TIMEOUTS.medium);
 18 | 
 19 |   afterAll(async () => {
 20 |     if (client) {
 21 |       await cleanupTestClient(client);
 22 |     }
 23 |   });
 24 | 
 25 |   describe('HTML extraction', () => {
 26 |     it(
 27 |       'should extract HTML from URL',
 28 |       async () => {
 29 |         const result = await client.callTool({
 30 |           name: 'get_html',
 31 |           arguments: {
 32 |             url: 'https://httpbin.org/html',
 33 |           },
 34 |         });
 35 | 
 36 |         expect(result).toBeDefined();
 37 |         const content = (result as ToolResult).content;
 38 |         expect(content).toHaveLength(1);
 39 |         expect(content[0].type).toBe('text');
 40 | 
 41 |         // Should contain processed HTML
 42 |         const html = content[0].text || '';
 43 |         expect(html).toBeTruthy();
 44 |         // The HTML endpoint returns sanitized/processed HTML
 45 |         // It might be truncated with "..."
 46 |         expect(html.length).toBeGreaterThan(0);
 47 |       },
 48 |       TEST_TIMEOUTS.medium,
 49 |     );
 50 | 
 51 |     it(
 52 |       'should reject session_id parameter',
 53 |       async () => {
 54 |         const result = await client.callTool({
 55 |           name: 'get_html',
 56 |           arguments: {
 57 |             url: 'https://example.com',
 58 |             session_id: 'test-session',
 59 |           },
 60 |         });
 61 | 
 62 |         const content = (result as ToolResult).content;
 63 |         expect(content).toHaveLength(1);
 64 |         expect(content[0].type).toBe('text');
 65 |         expect(content[0].text).toContain('session_id');
 66 |         expect(content[0].text).toContain('does not support');
 67 |         expect(content[0].text).toContain('stateless');
 68 |       },
 69 |       TEST_TIMEOUTS.short,
 70 |     );
 71 | 
 72 |     it(
 73 |       'should handle invalid URLs gracefully',
 74 |       async () => {
 75 |         const result = await client.callTool({
 76 |           name: 'get_html',
 77 |           arguments: {
 78 |             url: 'not-a-valid-url',
 79 |           },
 80 |         });
 81 | 
 82 |         const content = (result as ToolResult).content;
 83 |         expect(content).toHaveLength(1);
 84 |         expect(content[0].type).toBe('text');
 85 |         expect(content[0].text).toContain('Error');
 86 |         expect(content[0].text?.toLowerCase()).toContain('invalid');
 87 |       },
 88 |       TEST_TIMEOUTS.short,
 89 |     );
 90 | 
 91 |     it(
 92 |       'should handle non-existent domains',
 93 |       async () => {
 94 |         const result = await client.callTool({
 95 |           name: 'get_html',
 96 |           arguments: {
 97 |             url: 'https://this-domain-definitely-does-not-exist-123456789.com',
 98 |           },
 99 |         });
100 | 
101 |         const content = (result as ToolResult).content;
102 |         expect(content).toHaveLength(1);
103 |         expect(content[0].type).toBe('text');
104 | 
105 |         // According to spec, returns success: true with empty HTML for invalid URLs
106 |         const html = content[0].text || '';
107 |         // Could be empty or contain an error message
108 |         expect(typeof html).toBe('string');
109 |       },
110 |       TEST_TIMEOUTS.short,
111 |     );
112 | 
113 |     it(
114 |       'should ignore extra parameters',
115 |       async () => {
116 |         const result = await client.callTool({
117 |           name: 'get_html',
118 |           arguments: {
119 |             url: 'https://example.com',
120 |             wait_for: '.some-selector', // Should be ignored
121 |             bypass_cache: true, // Should be ignored
122 |           },
123 |         });
124 | 
125 |         const content = (result as ToolResult).content;
126 |         expect(content).toHaveLength(1);
127 |         expect(content[0].type).toBe('text');
128 | 
129 |         // Should still work, ignoring extra params
130 |         const html = content[0].text || '';
131 |         expect(html.length).toBeGreaterThan(0);
132 |       },
133 |       TEST_TIMEOUTS.long,
134 |     );
135 |   });
136 | });
137 | 
```

--------------------------------------------------------------------------------
/src/__tests__/integration/capture-screenshot.integration.test.ts:
--------------------------------------------------------------------------------

```typescript
  1 | /* eslint-env jest */
  2 | import { Client } from '@modelcontextprotocol/sdk/client/index.js';
  3 | import { createTestClient, cleanupTestClient, TEST_TIMEOUTS } from './test-utils.js';
  4 | 
  5 | interface ToolResult {
  6 |   content: Array<{
  7 |     type: string;
  8 |     text?: string;
  9 |     data?: string;
 10 |     mimeType?: string;
 11 |   }>;
 12 | }
 13 | 
 14 | describe('capture_screenshot Integration Tests', () => {
 15 |   let client: Client;
 16 | 
 17 |   beforeAll(async () => {
 18 |     client = await createTestClient();
 19 |   }, TEST_TIMEOUTS.medium);
 20 | 
 21 |   afterAll(async () => {
 22 |     if (client) {
 23 |       await cleanupTestClient(client);
 24 |     }
 25 |   });
 26 | 
 27 |   describe('Screenshot capture', () => {
 28 |     it(
 29 |       'should capture screenshot with default wait time',
 30 |       async () => {
 31 |         const result = await client.callTool({
 32 |           name: 'capture_screenshot',
 33 |           arguments: {
 34 |             url: 'https://httpbin.org/html',
 35 |           },
 36 |         });
 37 | 
 38 |         expect(result).toBeDefined();
 39 |         const content = (result as ToolResult).content;
 40 |         expect(content).toHaveLength(2);
 41 | 
 42 |         // First item should be the image
 43 |         expect(content[0].type).toBe('image');
 44 |         expect(content[0].mimeType).toBe('image/png');
 45 |         expect(content[0].data).toBeTruthy();
 46 |         expect(content[0].data?.length).toBeGreaterThan(1000); // Should be a substantial base64 string
 47 | 
 48 |         // Second item should be text description
 49 |         expect(content[1].type).toBe('text');
 50 |         expect(content[1].text).toContain('Screenshot captured for: https://httpbin.org/html');
 51 |       },
 52 |       TEST_TIMEOUTS.short,
 53 |     );
 54 | 
 55 |     it(
 56 |       'should capture screenshot with custom wait time',
 57 |       async () => {
 58 |         const result = await client.callTool({
 59 |           name: 'capture_screenshot',
 60 |           arguments: {
 61 |             url: 'https://httpbin.org/html',
 62 |             screenshot_wait_for: 0.5, // Reduced from 3 seconds
 63 |           },
 64 |         });
 65 | 
 66 |         expect(result).toBeDefined();
 67 |         const content = (result as ToolResult).content;
 68 |         expect(content).toHaveLength(2);
 69 | 
 70 |         // First item should be the image
 71 |         expect(content[0].type).toBe('image');
 72 |         expect(content[0].mimeType).toBe('image/png');
 73 |         expect(content[0].data).toBeTruthy();
 74 | 
 75 |         // Second item should be text description
 76 |         expect(content[1].type).toBe('text');
 77 |         expect(content[1].text).toContain('Screenshot captured for: https://httpbin.org/html');
 78 |       },
 79 |       TEST_TIMEOUTS.medium,
 80 |     );
 81 | 
 82 |     it(
 83 |       'should reject session_id parameter',
 84 |       async () => {
 85 |         const result = await client.callTool({
 86 |           name: 'capture_screenshot',
 87 |           arguments: {
 88 |             url: 'https://example.com',
 89 |             session_id: 'test-session',
 90 |           },
 91 |         });
 92 | 
 93 |         const content = (result as ToolResult).content;
 94 |         expect(content).toHaveLength(1);
 95 |         expect(content[0].type).toBe('text');
 96 |         expect(content[0].text).toContain('session_id');
 97 |         expect(content[0].text).toContain('does not support');
 98 |         expect(content[0].text).toContain('stateless');
 99 |       },
100 |       TEST_TIMEOUTS.short,
101 |     );
102 | 
103 |     it(
104 |       'should handle invalid URLs gracefully',
105 |       async () => {
106 |         const result = await client.callTool({
107 |           name: 'capture_screenshot',
108 |           arguments: {
109 |             url: 'not-a-valid-url',
110 |           },
111 |         });
112 | 
113 |         const content = (result as ToolResult).content;
114 |         expect(content).toHaveLength(1);
115 |         expect(content[0].type).toBe('text');
116 |         expect(content[0].text).toContain('Error');
117 |         expect(content[0].text?.toLowerCase()).toContain('invalid');
118 |       },
119 |       TEST_TIMEOUTS.short,
120 |     );
121 | 
122 |     it(
123 |       'should handle non-existent domains',
124 |       async () => {
125 |         const result = await client.callTool({
126 |           name: 'capture_screenshot',
127 |           arguments: {
128 |             url: 'https://this-domain-definitely-does-not-exist-123456789.com',
129 |           },
130 |         });
131 | 
132 |         const content = (result as ToolResult).content;
133 |         expect(content).toHaveLength(1);
134 |         expect(content[0].type).toBe('text');
135 |         expect(content[0].text).toContain('Error');
136 |       },
137 |       TEST_TIMEOUTS.short,
138 |     );
139 |   });
140 | });
141 | 
```

--------------------------------------------------------------------------------
/src/__tests__/integration/test-utils.ts:
--------------------------------------------------------------------------------

```typescript
  1 | /* eslint-env jest */
  2 | import { Client } from '@modelcontextprotocol/sdk/client/index.js';
  3 | import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';
  4 | import dotenv from 'dotenv';
  5 | 
  6 | // Load environment variables
  7 | dotenv.config();
  8 | 
  9 | export interface IntegrationTestConfig {
 10 |   baseUrl: string;
 11 |   apiKey: string;
 12 |   llmProvider?: string;
 13 |   llmApiToken?: string;
 14 |   llmBaseUrl?: string;
 15 | }
 16 | 
 17 | export function getTestConfig(): IntegrationTestConfig {
 18 |   const config: IntegrationTestConfig = {
 19 |     baseUrl: process.env.CRAWL4AI_BASE_URL || '',
 20 |     apiKey: process.env.CRAWL4AI_API_KEY || '',
 21 |     llmProvider: process.env.LLM_PROVIDER,
 22 |     llmApiToken: process.env.LLM_API_TOKEN,
 23 |     llmBaseUrl: process.env.LLM_BASE_URL,
 24 |   };
 25 | 
 26 |   if (!config.baseUrl) {
 27 |     throw new Error(
 28 |       'CRAWL4AI_BASE_URL is required for integration tests. Please set it in .env file or environment variable.',
 29 |     );
 30 |   }
 31 | 
 32 |   return config;
 33 | }
 34 | 
 35 | export function hasLLMConfig(): boolean {
 36 |   const config = getTestConfig();
 37 |   return !!(config.llmProvider && config.llmApiToken);
 38 | }
 39 | 
 40 | export async function createTestClient(): Promise<Client> {
 41 |   const transport = new StdioClientTransport({
 42 |     command: 'tsx',
 43 |     args: ['src/index.ts'],
 44 |     env: {
 45 |       ...process.env,
 46 |       NODE_ENV: 'test',
 47 |     },
 48 |     cwd: process.cwd(), // Ensure the child process runs in the correct directory
 49 |   });
 50 | 
 51 |   const client = new Client(
 52 |     {
 53 |       name: 'integration-test-client',
 54 |       version: '1.0.0',
 55 |     },
 56 |     {
 57 |       capabilities: {},
 58 |     },
 59 |   );
 60 | 
 61 |   await client.connect(transport);
 62 |   return client;
 63 | }
 64 | 
 65 | export async function cleanupTestClient(client: Client): Promise<void> {
 66 |   await client.close();
 67 | }
 68 | 
 69 | // Test data generators
 70 | export function generateSessionId(): string {
 71 |   return `test-session-${Date.now()}-${Math.random().toString(36).substring(2, 9)}`;
 72 | }
 73 | 
 74 | export function generateTestUrl(type: 'simple' | 'dynamic' | 'infinite-scroll' | 'auth' = 'simple'): string {
 75 |   const urls = {
 76 |     simple: 'https://example.com',
 77 |     dynamic: 'https://github.com',
 78 |     'infinite-scroll': 'https://twitter.com',
 79 |     auth: 'https://github.com/login',
 80 |   };
 81 |   return urls[type];
 82 | }
 83 | 
 84 | // Test result types
 85 | export interface TestContentItem {
 86 |   type: string;
 87 |   text?: string;
 88 |   data?: string;
 89 |   mimeType?: string;
 90 | }
 91 | 
 92 | export interface TestResult {
 93 |   content: TestContentItem[];
 94 | }
 95 | 
 96 | export interface ToolResult {
 97 |   content: TestContentItem[];
 98 |   isError?: boolean;
 99 | }
100 | 
101 | // Assertion helpers
102 | export async function expectSuccessfulCrawl(result: unknown): Promise<void> {
103 |   expect(result).toBeDefined();
104 | 
105 |   // Type guard to check if result has content property
106 |   const typedResult = result as { content?: unknown };
107 |   expect(typedResult.content).toBeDefined();
108 |   expect(typedResult.content).toBeInstanceOf(Array);
109 | 
110 |   const contentArray = typedResult.content as TestContentItem[];
111 |   expect(contentArray.length).toBeGreaterThan(0);
112 | 
113 |   const textContent = contentArray.find((c) => c.type === 'text');
114 |   expect(textContent).toBeDefined();
115 |   expect(textContent?.text).toBeTruthy();
116 | }
117 | 
118 | export async function expectScreenshot(result: unknown): Promise<void> {
119 |   const typedResult = result as { content?: TestContentItem[] };
120 |   expect(typedResult.content).toBeDefined();
121 | 
122 |   const imageContent = typedResult.content?.find((c) => c.type === 'image');
123 |   expect(imageContent).toBeDefined();
124 |   expect(imageContent?.data).toBeTruthy();
125 |   expect(imageContent?.mimeType).toBe('image/png');
126 | }
127 | 
128 | export async function expectExtractedData(result: unknown, expectedKeys: string[]): Promise<void> {
129 |   const typedResult = result as { content?: TestContentItem[] };
130 |   expect(typedResult.content).toBeDefined();
131 | 
132 |   const textContent = typedResult.content?.find((c) => c.type === 'text');
133 |   expect(textContent).toBeDefined();
134 | 
135 |   // Check if extracted data contains expected keys
136 |   for (const key of expectedKeys) {
137 |     expect(textContent?.text).toContain(key);
138 |   }
139 | }
140 | 
141 | // Delay helper for tests
142 | export function delay(ms: number): Promise<void> {
143 |   return new Promise((resolve) => setTimeout(resolve, ms));
144 | }
145 | 
146 | // Rate limiter for integration tests
147 | let lastRequestTime = 0;
148 | export async function rateLimit(minDelayMs: number = 500): Promise<void> {
149 |   const now = Date.now();
150 |   const timeSinceLastRequest = now - lastRequestTime;
151 | 
152 |   if (timeSinceLastRequest < minDelayMs) {
153 |     await delay(minDelayMs - timeSinceLastRequest);
154 |   }
155 | 
156 |   lastRequestTime = Date.now();
157 | }
158 | 
159 | // Skip test if condition is not met
160 | export function skipIf(condition: boolean, message: string) {
161 |   if (condition) {
162 |     console.log(`⚠️  Skipping test: ${message}`);
163 |     return true;
164 |   }
165 |   return false;
166 | }
167 | 
168 | // Test timeout helper
169 | export const TEST_TIMEOUTS = {
170 |   short: 30000, // 30 seconds
171 |   medium: 60000, // 1 minute
172 |   long: 120000, // 2 minutes
173 |   extraLong: 180000, // 3 minutes
174 | };
175 | 
```

--------------------------------------------------------------------------------
/.github/copilot-instructions.md:
--------------------------------------------------------------------------------

```markdown
 1 | # Copilot Instructions: `mcp-crawl4ai-ts`
 2 | 
 3 | Concise, project-specific guidance for AI coding agents. Optimize for correctness, safety, and existing test expectations.
 4 | 
 5 | ## Architecture & Flow
 6 | - Entrypoint `src/index.ts`: loads dotenv only if `CRAWL4AI_BASE_URL` unset; fails fast if missing. Passes env + version into `Crawl4AIServer`.
 7 | - `src/server.ts`: registers MCP tools, keeps a `Map<string, SessionInfo>` for persistent browser sessions, and uses `validateAndExecute` (Zod parse + invariant error message format). Do NOT alter error text pattern: `Invalid parameters for <tool>: ...` (tests & LLM reliability depend on it).
 8 | - Service layer `src/crawl4ai-service.ts`: pure HTTP wrapper around Crawl4AI endpoints; centralizes axios timeout & error translation (preserve wording like `Request timed out`, `Request failed with status <code>:` — tests rely on these substrings).
 9 | - Handlers (`src/handlers/*.ts`): orchestration & response shaping (text content arrays). No direct business logic inside server class beyond wiring.
10 | - Validation schemas (`src/schemas/validation-schemas.ts` + helpers): all tool inputs defined here. Use `createStatelessSchema` for stateless tools; session/persistent tools have discriminated unions.
11 | 
12 | ## Tool Model
13 | - Stateless tools (e.g. `get_markdown`, `capture_screenshot`, `execute_js`) spin up a fresh browser each call.
14 | - Session-based operations use `manage_session` (create/list/clear) + `crawl` for persistent state, allowing chained JS + screenshot/pdf in ONE call. Never try to chain separate stateless calls to reflect JS mutations.
15 | - Output always returned as base64/text blocks; do not add file system side-effects unless explicitly using a save path param already supported (screenshots: optional local save dir).
16 | 
17 | ## JS & Input Validation Nuances
18 | - JS code schema rejects: HTML entities (&quot;), literal `\n` tokens outside strings, embedded HTML tags. Reuse `JsCodeSchema`—do not duplicate logic.
19 | - For `get_markdown`: if filter is `bm25` or `llm`, `query` becomes required (enforced via `.refine`). Keep this logic centralized.
20 | 
21 | ## Sessions
22 | - `SessionInfo` tracks `created_at` & `last_used`. Update `last_used` whenever a session-based action runs. Don't leak sessions: `clear` must delete map entry.
23 | 
24 | ## Error Handling Pattern
25 | - Handlers wrap service calls; on failure use `this.formatError(error, '<operation>')` (see `BaseHandler`). Preserve format: `Failed to <operation>: <detail>`.
26 | - Zod validation errors: keep exact join pattern of `path: message` segments.
27 | 
28 | ## Adding / Modifying a Tool (Checklist)
29 | 1. Define or extend schema in `validation-schemas.ts` (prefer composing existing small schemas; wrap with `createStatelessSchema` if ephemeral).
30 | 2. Add service method if it maps to a new Crawl4AI endpoint (pure HTTP + validation of URL / JS content; reuse existing validators).
31 | 3. Implement handler method (assemble request body, post-process response to `content: [{ type: 'text', text }]`).
32 | 4. Register in `setupHandlers()` list (tool description should mirror README style & clarify stateless vs session).
33 | 5. Write tests: unit (schema + handler success/failure), integration (happy path with mocked or real endpoint). Place under matching folder in `src/__tests__/`.
34 | 6. Update README tool table if user-facing, and CHANGELOG + version bump.
35 | 
36 | ## Commands & Workflows
37 | - Install: `npm install`
38 | - Build: `npm run build` (tsconfig.build.json)
39 | - Dev (watch): `npm run dev`
40 | - Tests: `npm run test` | unit only: `npm run test:unit` | integration: `npm run test:integration` | coverage: `npm run test:coverage`
41 | - Lint/Format: `npm run lint`, `npm run lint:fix`, `npm run format:check`
42 | - Pre-flight composite: `npm run check`
43 | 
44 | ### Testing Invariants
45 | - NEVER invoke `jest` directly for integration tests; rely on `npm run test:integration` (injects `NODE_OPTIONS=--experimental-vm-modules` + `JEST_TEST_TYPE=integration`).
46 | - Unit tests auto-set `CRAWL4AI_BASE_URL` in `jest.setup.cjs`; integration tests require real env vars (`CRAWL4AI_BASE_URL`, optional `CRAWL4AI_API_KEY`, LLM vars) via `.env` or exported.
47 | - To run a single integration file: `npm run test:integration -- path/to/file.test.ts`.
48 | - Jest pinned at 29.x with `ts-jest@29`; do not upgrade one without the other.
49 | - Symptom mapping: import syntax error or hang at first test => you bypassed the npm script.
50 | 
51 | ## Conventions & Invariants
52 | - No `any`; prefer `unknown` + narrowing.
53 | - Keep responses minimal & textual; do not introduce new top-level fields in tool results without updating all tests.
54 | - Timeout remains 120s in axios clients—changing requires test updates.
55 | - Commit style: conventional commits; no emojis, AI signoffs, or verbose bodies.
56 | 
57 | ## References
58 | - README (tools & examples), CLAUDE.md (contrib rules), CHANGELOG (release notes), coverage report for quality gates.
59 | 
60 | If something is ambiguous, inspect existing handlers first and mirror the closest established pattern before inventing a new one.
61 | 
```

--------------------------------------------------------------------------------
/src/__tests__/integration/extract-with-llm.integration.test.ts:
--------------------------------------------------------------------------------

```typescript
  1 | /* eslint-env jest */
  2 | import { Client } from '@modelcontextprotocol/sdk/client/index.js';
  3 | import { createTestClient, cleanupTestClient, TEST_TIMEOUTS } from './test-utils.js';
  4 | 
  5 | interface ToolResult {
  6 |   content: Array<{
  7 |     type: string;
  8 |     text?: string;
  9 |   }>;
 10 | }
 11 | 
 12 | describe('extract_with_llm Integration Tests', () => {
 13 |   let client: Client;
 14 | 
 15 |   beforeAll(async () => {
 16 |     client = await createTestClient();
 17 |   }, TEST_TIMEOUTS.medium);
 18 | 
 19 |   afterAll(async () => {
 20 |     if (client) {
 21 |       await cleanupTestClient(client);
 22 |     }
 23 |   });
 24 | 
 25 |   describe('LLM extraction', () => {
 26 |     it(
 27 |       'should extract information about a webpage',
 28 |       async () => {
 29 |         const result = await client.callTool({
 30 |           name: 'extract_with_llm',
 31 |           arguments: {
 32 |             url: 'https://httpbin.org/html',
 33 |             query: 'What is the main topic of this page?',
 34 |           },
 35 |         });
 36 | 
 37 |         expect(result).toBeTruthy();
 38 |         const typedResult = result as ToolResult;
 39 |         expect(typedResult.content).toBeDefined();
 40 |         expect(typedResult.content.length).toBeGreaterThan(0);
 41 | 
 42 |         const textContent = (result as ToolResult).content.find((c) => c.type === 'text');
 43 |         expect(textContent?.text).toBeTruthy();
 44 |         // Should return a meaningful response (LLM responses are non-deterministic)
 45 |         expect(textContent?.text?.length || 0).toBeGreaterThan(10);
 46 |       },
 47 |       TEST_TIMEOUTS.long,
 48 |     );
 49 | 
 50 |     it(
 51 |       'should answer specific questions about content',
 52 |       async () => {
 53 |         const result = await client.callTool({
 54 |           name: 'extract_with_llm',
 55 |           arguments: {
 56 |             url: 'https://httpbin.org/json',
 57 |             query: 'What is the slideshow title?',
 58 |           },
 59 |         });
 60 | 
 61 |         expect(result).toBeTruthy();
 62 |         expect(result.content).toBeDefined();
 63 | 
 64 |         const textContent = (result as ToolResult).content.find((c) => c.type === 'text');
 65 |         expect(textContent?.text).toBeTruthy();
 66 |         // Should provide an answer about the content
 67 |         expect(textContent?.text?.length || 0).toBeGreaterThan(5);
 68 |       },
 69 |       TEST_TIMEOUTS.long,
 70 |     );
 71 | 
 72 |     it(
 73 |       'should handle complex queries',
 74 |       async () => {
 75 |         const result = await client.callTool({
 76 |           name: 'extract_with_llm',
 77 |           arguments: {
 78 |             url: 'https://httpbin.org/html',
 79 |             query: 'List all the links found on this page',
 80 |           },
 81 |         });
 82 | 
 83 |         expect(result).toBeTruthy();
 84 |         const textContent = (result as ToolResult).content.find((c) => c.type === 'text');
 85 |         expect(textContent?.text).toBeTruthy();
 86 |         // Should provide a response about links (content may vary)
 87 |         expect(textContent?.text?.length || 0).toBeGreaterThan(10);
 88 |       },
 89 |       TEST_TIMEOUTS.long,
 90 |     );
 91 |   });
 92 | 
 93 |   describe('Error handling', () => {
 94 |     it(
 95 |       'should handle server without API key configured',
 96 |       async () => {
 97 |         // Note: This test may pass if the server has OPENAI_API_KEY configured
 98 |         // It's here to document the expected behavior
 99 |         const result = await client.callTool({
100 |           name: 'extract_with_llm',
101 |           arguments: {
102 |             url: 'https://httpbin.org/status/200',
103 |             query: 'What is on this page?',
104 |           },
105 |         });
106 | 
107 |         const typedResult = result as ToolResult;
108 |         // If it succeeds, we have API key configured
109 |         if (typedResult.content && typedResult.content.length > 0) {
110 |           expect(result).toBeTruthy();
111 |         }
112 |         // If it fails, we should get a proper error message
113 |         else if (typedResult.content[0]?.text?.includes('LLM provider')) {
114 |           expect(typedResult.content[0].text).toContain('LLM provider');
115 |         }
116 |       },
117 |       TEST_TIMEOUTS.medium,
118 |     );
119 | 
120 |     it(
121 |       'should handle invalid URLs',
122 |       async () => {
123 |         const result = await client.callTool({
124 |           name: 'extract_with_llm',
125 |           arguments: {
126 |             url: 'not-a-url',
127 |             query: 'What is this?',
128 |           },
129 |         });
130 | 
131 |         expect(result).toBeDefined();
132 |         const content = (result as ToolResult).content;
133 |         const textContent = content.find((c) => c.type === 'text');
134 |         expect(textContent).toBeDefined();
135 |         expect(textContent?.text).toContain('Error');
136 |         expect(textContent?.text?.toLowerCase()).toContain('invalid');
137 |       },
138 |       TEST_TIMEOUTS.short,
139 |     );
140 | 
141 |     it(
142 |       'should handle empty query gracefully',
143 |       async () => {
144 |         const result = await client.callTool({
145 |           name: 'extract_with_llm',
146 |           arguments: {
147 |             url: 'https://example.com',
148 |             query: '',
149 |           },
150 |         });
151 | 
152 |         expect(result).toBeDefined();
153 |         const content = (result as ToolResult).content;
154 |         const textContent = content.find((c) => c.type === 'text');
155 |         expect(textContent).toBeDefined();
156 |         expect(textContent?.text).toContain('Error');
157 |       },
158 |       TEST_TIMEOUTS.short,
159 |     );
160 |   });
161 | });
162 | 
```

--------------------------------------------------------------------------------
/src/__tests__/integration/extract-links.integration.test.ts:
--------------------------------------------------------------------------------

```typescript
  1 | /* eslint-env jest */
  2 | import { Client } from '@modelcontextprotocol/sdk/client/index.js';
  3 | import { createTestClient, cleanupTestClient, TEST_TIMEOUTS } from './test-utils.js';
  4 | 
  5 | interface ToolResult {
  6 |   content: Array<{
  7 |     type: string;
  8 |     text?: string;
  9 |   }>;
 10 | }
 11 | 
 12 | describe('extract_links Integration Tests', () => {
 13 |   let client: Client;
 14 | 
 15 |   beforeAll(async () => {
 16 |     client = await createTestClient();
 17 |   }, TEST_TIMEOUTS.medium);
 18 | 
 19 |   afterAll(async () => {
 20 |     if (client) {
 21 |       await cleanupTestClient(client);
 22 |     }
 23 |   });
 24 | 
 25 |   describe('Basic functionality', () => {
 26 |     it(
 27 |       'should extract links with categorization (default)',
 28 |       async () => {
 29 |         const result = await client.callTool({
 30 |           name: 'extract_links',
 31 |           arguments: {
 32 |             url: 'https://webscraper.io/test-sites',
 33 |           },
 34 |         });
 35 | 
 36 |         expect(result).toBeDefined();
 37 |         const content = (result as ToolResult).content;
 38 |         expect(content).toBeDefined();
 39 |         expect(Array.isArray(content)).toBe(true);
 40 |         expect(content.length).toBeGreaterThan(0);
 41 | 
 42 |         const textContent = content.find((c) => c.type === 'text');
 43 |         expect(textContent).toBeDefined();
 44 |         expect(textContent?.text).toContain('Link analysis for https://webscraper.io/test-sites');
 45 |         // Should show categorized output
 46 |         expect(textContent?.text).toMatch(/internal \(\d+\)/);
 47 |         expect(textContent?.text).toMatch(/external \(\d+\)/);
 48 |       },
 49 |       TEST_TIMEOUTS.medium,
 50 |     );
 51 | 
 52 |     it(
 53 |       'should extract links without categorization',
 54 |       async () => {
 55 |         const result = await client.callTool({
 56 |           name: 'extract_links',
 57 |           arguments: {
 58 |             url: 'https://webscraper.io/test-sites',
 59 |             categorize: false,
 60 |           },
 61 |         });
 62 | 
 63 |         expect(result).toBeDefined();
 64 |         const content = (result as ToolResult).content;
 65 |         expect(content).toBeDefined();
 66 |         expect(Array.isArray(content)).toBe(true);
 67 |         expect(content.length).toBeGreaterThan(0);
 68 | 
 69 |         const textContent = content.find((c) => c.type === 'text');
 70 |         expect(textContent).toBeDefined();
 71 |         expect(textContent?.text).toContain('All links from https://webscraper.io/test-sites');
 72 |         // Should NOT show categorized output
 73 |         expect(textContent?.text).not.toMatch(/internal \(\d+\)/);
 74 |         expect(textContent?.text).not.toMatch(/external \(\d+\)/);
 75 |       },
 76 |       TEST_TIMEOUTS.medium,
 77 |     );
 78 | 
 79 |     it(
 80 |       'should handle sites with no links',
 81 |       async () => {
 82 |         // Test with a simple status page
 83 |         const result = await client.callTool({
 84 |           name: 'extract_links',
 85 |           arguments: {
 86 |             url: 'https://httpstat.us/200',
 87 |           },
 88 |         });
 89 | 
 90 |         expect(result).toBeDefined();
 91 |         const content = (result as ToolResult).content;
 92 |         expect(content).toBeDefined();
 93 |         const textContent = content.find((c) => c.type === 'text');
 94 |         expect(textContent).toBeDefined();
 95 |       },
 96 |       TEST_TIMEOUTS.medium,
 97 |     );
 98 | 
 99 |     it(
100 |       'should detect JSON endpoints',
101 |       async () => {
102 |         const result = await client.callTool({
103 |           name: 'extract_links',
104 |           arguments: {
105 |             url: 'https://httpbin.org/json',
106 |           },
107 |         });
108 | 
109 |         expect(result).toBeDefined();
110 |         const content = (result as ToolResult).content;
111 |         expect(content).toBeDefined();
112 |         const textContent = content.find((c) => c.type === 'text');
113 |         expect(textContent).toBeDefined();
114 |         // Should show link analysis (even if empty)
115 |         expect(textContent?.text).toContain('Link analysis for https://httpbin.org/json');
116 |       },
117 |       TEST_TIMEOUTS.medium,
118 |     );
119 |   });
120 | 
121 |   describe('Error handling', () => {
122 |     it(
123 |       'should handle invalid URLs',
124 |       async () => {
125 |         const result = await client.callTool({
126 |           name: 'extract_links',
127 |           arguments: {
128 |             url: 'not-a-url',
129 |           },
130 |         });
131 | 
132 |         expect(result).toBeDefined();
133 |         const content = (result as ToolResult).content;
134 |         expect(content).toBeDefined();
135 |         const textContent = content.find((c) => c.type === 'text');
136 |         expect(textContent).toBeDefined();
137 |         expect(textContent?.text).toContain('Error');
138 |         expect(textContent?.text?.toLowerCase()).toContain('invalid');
139 |       },
140 |       TEST_TIMEOUTS.short,
141 |     );
142 | 
143 |     it(
144 |       'should handle non-existent domains',
145 |       async () => {
146 |         const result = await client.callTool({
147 |           name: 'extract_links',
148 |           arguments: {
149 |             url: 'https://this-domain-definitely-does-not-exist-12345.com',
150 |           },
151 |         });
152 | 
153 |         expect(result).toBeDefined();
154 |         const content = (result as ToolResult).content;
155 |         expect(content).toBeDefined();
156 |         const textContent = content.find((c) => c.type === 'text');
157 |         expect(textContent).toBeDefined();
158 |         expect(textContent?.text).toContain('Error');
159 |         // Could be various error messages: connection error, DNS error, etc.
160 |         expect(textContent?.text?.toLowerCase()).toMatch(/error|failed/);
161 |       },
162 |       TEST_TIMEOUTS.medium,
163 |     );
164 |   });
165 | });
166 | 
```

--------------------------------------------------------------------------------
/src/__tests__/integration/smart-crawl.integration.test.ts:
--------------------------------------------------------------------------------

```typescript
  1 | /* eslint-env jest */
  2 | import { Client } from '@modelcontextprotocol/sdk/client/index.js';
  3 | import { createTestClient, cleanupTestClient, TEST_TIMEOUTS } from './test-utils.js';
  4 | 
  5 | interface ToolResult {
  6 |   content: Array<{
  7 |     type: string;
  8 |     text?: string;
  9 |   }>;
 10 | }
 11 | 
 12 | describe('smart_crawl Integration Tests', () => {
 13 |   let client: Client;
 14 | 
 15 |   beforeAll(async () => {
 16 |     client = await createTestClient();
 17 |   }, TEST_TIMEOUTS.medium);
 18 | 
 19 |   afterAll(async () => {
 20 |     if (client) {
 21 |       await cleanupTestClient(client);
 22 |     }
 23 |   });
 24 | 
 25 |   describe('Smart crawling', () => {
 26 |     it(
 27 |       'should auto-detect HTML content',
 28 |       async () => {
 29 |         const result = await client.callTool({
 30 |           name: 'smart_crawl',
 31 |           arguments: {
 32 |             url: 'https://httpbin.org/html',
 33 |           },
 34 |         });
 35 | 
 36 |         expect(result).toBeDefined();
 37 |         const content = (result as ToolResult).content;
 38 |         expect(content.length).toBeGreaterThanOrEqual(1);
 39 |         expect(content[0].type).toBe('text');
 40 | 
 41 |         const text = content[0].text || '';
 42 |         expect(text).toContain('Smart crawl detected content type:');
 43 |         expect(text).toContain('html');
 44 |       },
 45 |       TEST_TIMEOUTS.medium,
 46 |     );
 47 | 
 48 |     it(
 49 |       'should handle sitemap URLs',
 50 |       async () => {
 51 |         const result = await client.callTool({
 52 |           name: 'smart_crawl',
 53 |           arguments: {
 54 |             url: 'https://httpbingo.org/xml',
 55 |             max_depth: 1,
 56 |           },
 57 |         });
 58 | 
 59 |         const content = (result as ToolResult).content;
 60 |         expect(content.length).toBeGreaterThanOrEqual(1);
 61 |         expect(content[0].type).toBe('text');
 62 | 
 63 |         const text = content[0].text || '';
 64 |         expect(text).toContain('Smart crawl detected content type:');
 65 |         expect(text.toLowerCase()).toMatch(/xml|sitemap/);
 66 |       },
 67 |       TEST_TIMEOUTS.medium,
 68 |     );
 69 | 
 70 |     it(
 71 |       'should handle follow_links parameter',
 72 |       async () => {
 73 |         const result = await client.callTool({
 74 |           name: 'smart_crawl',
 75 |           arguments: {
 76 |             url: 'https://httpbingo.org/xml',
 77 |             follow_links: true,
 78 |             max_depth: 1,
 79 |           },
 80 |         });
 81 | 
 82 |         const content = (result as ToolResult).content;
 83 |         expect(content.length).toBeGreaterThanOrEqual(1);
 84 |         expect(content[0].type).toBe('text');
 85 | 
 86 |         const text = content[0].text || '';
 87 |         expect(text).toContain('Smart crawl detected content type:');
 88 |       },
 89 |       TEST_TIMEOUTS.long,
 90 |     );
 91 | 
 92 |     it(
 93 |       'should detect JSON content',
 94 |       async () => {
 95 |         const result = await client.callTool({
 96 |           name: 'smart_crawl',
 97 |           arguments: {
 98 |             url: 'https://httpbin.org/json',
 99 |           },
100 |         });
101 | 
102 |         const content = (result as ToolResult).content;
103 |         expect(content.length).toBeGreaterThanOrEqual(1);
104 |         expect(content[0].type).toBe('text');
105 | 
106 |         const text = content[0].text || '';
107 |         expect(text).toContain('Smart crawl detected content type:');
108 |       },
109 |       TEST_TIMEOUTS.medium,
110 |     );
111 | 
112 |     it(
113 |       'should bypass cache when requested',
114 |       async () => {
115 |         const result = await client.callTool({
116 |           name: 'smart_crawl',
117 |           arguments: {
118 |             url: 'https://httpbin.org/html',
119 |             bypass_cache: true,
120 |           },
121 |         });
122 | 
123 |         const content = (result as ToolResult).content;
124 |         expect(content.length).toBeGreaterThanOrEqual(1);
125 |         expect(content[0].type).toBe('text');
126 | 
127 |         const text = content[0].text || '';
128 |         expect(text).toContain('Smart crawl detected content type:');
129 |       },
130 |       TEST_TIMEOUTS.medium,
131 |     );
132 | 
133 |     it(
134 |       'should handle invalid URLs gracefully',
135 |       async () => {
136 |         const result = await client.callTool({
137 |           name: 'smart_crawl',
138 |           arguments: {
139 |             url: 'not-a-valid-url',
140 |           },
141 |         });
142 | 
143 |         const content = (result as ToolResult).content;
144 |         expect(content.length).toBeGreaterThanOrEqual(1);
145 |         expect(content[0].text).toContain('Error');
146 |       },
147 |       TEST_TIMEOUTS.short,
148 |     );
149 | 
150 |     it(
151 |       'should handle non-existent domains',
152 |       async () => {
153 |         const result = await client.callTool({
154 |           name: 'smart_crawl',
155 |           arguments: {
156 |             url: 'https://this-domain-definitely-does-not-exist-123456789.com',
157 |           },
158 |         });
159 | 
160 |         const content = (result as ToolResult).content;
161 |         expect(content.length).toBeGreaterThanOrEqual(1);
162 |         expect(content[0].type).toBe('text');
163 | 
164 |         const text = content[0].text || '';
165 |         // Non-existent domains cause 500 errors
166 |         expect(text).toContain('Error');
167 |       },
168 |       TEST_TIMEOUTS.short,
169 |     );
170 | 
171 |     it(
172 |       'should reject session_id parameter',
173 |       async () => {
174 |         const result = await client.callTool({
175 |           name: 'smart_crawl',
176 |           arguments: {
177 |             url: 'https://httpbin.org/html',
178 |             session_id: 'test-session',
179 |           },
180 |         });
181 | 
182 |         const content = (result as ToolResult).content;
183 |         expect(content.length).toBeGreaterThanOrEqual(1);
184 |         expect(content[0].type).toBe('text');
185 |         expect(content[0].text).toContain('session_id');
186 |         expect(content[0].text).toContain('does not support');
187 |         expect(content[0].text).toContain('stateless');
188 |       },
189 |       TEST_TIMEOUTS.short,
190 |     );
191 |   });
192 | });
193 | 
```

--------------------------------------------------------------------------------
/src/handlers/session-handlers.ts:
--------------------------------------------------------------------------------

```typescript
  1 | import { BaseHandler } from './base-handler.js';
  2 | 
  3 | export class SessionHandlers extends BaseHandler {
  4 |   async manageSession(options: {
  5 |     action: 'create' | 'clear' | 'list';
  6 |     session_id?: string;
  7 |     initial_url?: string;
  8 |     browser_type?: string;
  9 |   }) {
 10 |     switch (options.action) {
 11 |       case 'create':
 12 |         return this.createSession({
 13 |           session_id: options.session_id,
 14 |           initial_url: options.initial_url,
 15 |           browser_type: options.browser_type,
 16 |         });
 17 |       case 'clear':
 18 |         if (!options.session_id) {
 19 |           throw new Error('session_id is required for clear action');
 20 |         }
 21 |         return this.clearSession({ session_id: options.session_id });
 22 |       case 'list':
 23 |         return this.listSessions();
 24 |       default:
 25 |         // This should never happen due to TypeScript types, but handle it for runtime safety
 26 |         throw new Error(`Invalid action: ${(options as { action: string }).action}`);
 27 |     }
 28 |   }
 29 | 
 30 |   private async createSession(options: { session_id?: string; initial_url?: string; browser_type?: string }) {
 31 |     try {
 32 |       // Generate session ID if not provided
 33 |       const sessionId = options.session_id || `session-${Date.now()}-${Math.random().toString(36).substring(2, 11)}`;
 34 | 
 35 |       // Store session info locally
 36 |       this.sessions.set(sessionId, {
 37 |         id: sessionId,
 38 |         created_at: new Date(),
 39 |         last_used: new Date(),
 40 |         initial_url: options.initial_url,
 41 |         metadata: {
 42 |           browser_type: options.browser_type || 'chromium',
 43 |         },
 44 |       });
 45 | 
 46 |       // If initial_url provided, make first crawl to establish session
 47 |       if (options.initial_url) {
 48 |         try {
 49 |           await this.axiosClient.post(
 50 |             '/crawl',
 51 |             {
 52 |               urls: [options.initial_url],
 53 |               browser_config: {
 54 |                 headless: true,
 55 |                 browser_type: options.browser_type || 'chromium',
 56 |               },
 57 |               crawler_config: {
 58 |                 session_id: sessionId,
 59 |                 cache_mode: 'BYPASS',
 60 |               },
 61 |             },
 62 |             {
 63 |               timeout: 30000, // 30 second timeout for initial crawl
 64 |             },
 65 |           );
 66 | 
 67 |           // Update last_used
 68 |           const session = this.sessions.get(sessionId);
 69 |           if (session) {
 70 |             session.last_used = new Date();
 71 |           }
 72 |         } catch (error) {
 73 |           // Session created but initial crawl failed - still return success
 74 |           console.error(`Initial crawl failed for session ${sessionId}:`, error);
 75 |         }
 76 |       }
 77 | 
 78 |       return {
 79 |         content: [
 80 |           {
 81 |             type: 'text',
 82 |             text: `Session created successfully:\nSession ID: ${sessionId}\nBrowser: ${options.browser_type || 'chromium'}\n${options.initial_url ? `Pre-warmed with: ${options.initial_url}` : 'Ready for use'}\n\nUse this session_id with the crawl tool to maintain state across requests.`,
 83 |           },
 84 |         ],
 85 |         // Include all session parameters for easier programmatic access
 86 |         session_id: sessionId,
 87 |         browser_type: options.browser_type || 'chromium',
 88 |         initial_url: options.initial_url,
 89 |         created_at: this.sessions.get(sessionId)?.created_at.toISOString(),
 90 |       };
 91 |     } catch (error) {
 92 |       throw this.formatError(error, 'create session');
 93 |     }
 94 |   }
 95 | 
 96 |   private async clearSession(options: { session_id: string }) {
 97 |     try {
 98 |       // Remove from local store
 99 |       const deleted = this.sessions.delete(options.session_id);
100 | 
101 |       // Note: The actual browser session in Crawl4AI will be cleaned up
102 |       // automatically after inactivity or when the server restarts
103 | 
104 |       return {
105 |         content: [
106 |           {
107 |             type: 'text',
108 |             text: deleted
109 |               ? `Session cleared successfully: ${options.session_id}`
110 |               : `Session not found: ${options.session_id}`,
111 |           },
112 |         ],
113 |       };
114 |     } catch (error) {
115 |       throw this.formatError(error, 'clear session');
116 |     }
117 |   }
118 | 
119 |   private async listSessions() {
120 |     try {
121 |       // Return locally stored sessions
122 |       const sessions = Array.from(this.sessions.entries()).map(([id, info]) => {
123 |         const ageMinutes = Math.floor((Date.now() - info.created_at.getTime()) / 60000);
124 |         const lastUsedMinutes = Math.floor((Date.now() - info.last_used.getTime()) / 60000);
125 | 
126 |         return {
127 |           session_id: id,
128 |           created_at: info.created_at.toISOString(),
129 |           last_used: info.last_used.toISOString(),
130 |           age_minutes: ageMinutes,
131 |           last_used_minutes_ago: lastUsedMinutes,
132 |           initial_url: info.initial_url,
133 |           browser_type: info.metadata?.browser_type || 'chromium',
134 |         };
135 |       });
136 | 
137 |       if (sessions.length === 0) {
138 |         return {
139 |           content: [
140 |             {
141 |               type: 'text',
142 |               text: 'No active sessions found.',
143 |             },
144 |           ],
145 |         };
146 |       }
147 | 
148 |       const sessionList = sessions
149 |         .map(
150 |           (session) =>
151 |             `- ${session.session_id} (${session.browser_type}, created ${session.age_minutes}m ago, last used ${session.last_used_minutes_ago}m ago)`,
152 |         )
153 |         .join('\n');
154 | 
155 |       return {
156 |         content: [
157 |           {
158 |             type: 'text',
159 |             text: `Active sessions (${sessions.length}):\n${sessionList}`,
160 |           },
161 |         ],
162 |       };
163 |     } catch (error) {
164 |       throw this.formatError(error, 'list sessions');
165 |     }
166 |   }
167 | }
168 | 
```

--------------------------------------------------------------------------------
/src/__tests__/utils/javascript-validation.test.ts:
--------------------------------------------------------------------------------

```typescript
  1 | /* eslint-env jest */
  2 | import { describe, it, expect } from '@jest/globals';
  3 | import { validateJavaScriptCode } from '../../schemas/helpers.js';
  4 | 
  5 | describe('JavaScript Code Validation', () => {
  6 |   describe('Valid JavaScript', () => {
  7 |     it('should accept simple JavaScript code', () => {
  8 |       expect(validateJavaScriptCode('console.log("Hello world")')).toBe(true);
  9 |       expect(validateJavaScriptCode('return document.title')).toBe(true);
 10 |       expect(validateJavaScriptCode('const x = 5; return x * 2;')).toBe(true);
 11 |     });
 12 | 
 13 |     it('should accept JavaScript with real newlines', () => {
 14 |       expect(validateJavaScriptCode('console.log("Hello");\nconsole.log("World");')).toBe(true);
 15 |       expect(validateJavaScriptCode('function test() {\n  return true;\n}')).toBe(true);
 16 |     });
 17 | 
 18 |     it('should accept JavaScript with escape sequences in strings', () => {
 19 |       expect(validateJavaScriptCode('console.log("Line 1\\nLine 2")')).toBe(true);
 20 |       expect(validateJavaScriptCode('const msg = "Tab\\there\\tand\\tthere"')).toBe(true);
 21 |       expect(validateJavaScriptCode('return "Quote: \\"Hello\\""')).toBe(true);
 22 |     });
 23 | 
 24 |     it('should accept complex JavaScript patterns', () => {
 25 |       const complexCode = `
 26 |         const elements = document.querySelectorAll('.item');
 27 |         elements.forEach((el, i) => {
 28 |           el.textContent = \`Item \${i + 1}\`;
 29 |         });
 30 |         return elements.length;
 31 |       `;
 32 |       expect(validateJavaScriptCode(complexCode)).toBe(true);
 33 |     });
 34 | 
 35 |     it('should accept JavaScript with regex patterns', () => {
 36 |       expect(validateJavaScriptCode('return /test\\d+/.test(str)')).toBe(true);
 37 |       expect(validateJavaScriptCode('const pattern = /\\w+@\\w+\\.\\w+/')).toBe(true);
 38 |     });
 39 |   });
 40 | 
 41 |   describe('Invalid JavaScript - HTML Entities', () => {
 42 |     it('should reject code with HTML entities', () => {
 43 |       expect(validateJavaScriptCode('console.log(&quot;Hello&quot;)')).toBe(false);
 44 |       expect(validateJavaScriptCode('const x = &amp;&amp; true')).toBe(false);
 45 |       expect(validateJavaScriptCode('if (x &lt; 5) return')).toBe(false);
 46 |       expect(validateJavaScriptCode('if (x &gt; 5) return')).toBe(false);
 47 |     });
 48 | 
 49 |     it('should reject code with numeric HTML entities', () => {
 50 |       expect(validateJavaScriptCode('const char = &#65;')).toBe(false);
 51 |       // Note: hex entities like &#x41; are not caught by the current regex
 52 |     });
 53 | 
 54 |     it('should reject code with named HTML entities', () => {
 55 |       expect(validateJavaScriptCode('const copy = &copy;')).toBe(false);
 56 |       expect(validateJavaScriptCode('const nbsp = &nbsp;')).toBe(false);
 57 |     });
 58 |   });
 59 | 
 60 |   describe('Invalid JavaScript - HTML Tags', () => {
 61 |     it('should reject HTML markup', () => {
 62 |       expect(validateJavaScriptCode('<!DOCTYPE html>')).toBe(false);
 63 |       expect(validateJavaScriptCode('<html><body>test</body></html>')).toBe(false);
 64 |       expect(validateJavaScriptCode('<script>alert("test")</script>')).toBe(false);
 65 |       expect(validateJavaScriptCode('<style>body { color: red; }</style>')).toBe(false);
 66 |     });
 67 | 
 68 |     it('should reject mixed HTML and JavaScript', () => {
 69 |       expect(validateJavaScriptCode('<head>\nconst x = 5;\n</head>')).toBe(false);
 70 |       expect(validateJavaScriptCode('console.log("test");\n<body>')).toBe(false);
 71 |     });
 72 |   });
 73 | 
 74 |   describe('Invalid JavaScript - Literal Escape Sequences', () => {
 75 |     it('should reject literal \\n outside of strings', () => {
 76 |       expect(validateJavaScriptCode('console.log("Hello");\\nconsole.log("World");')).toBe(false);
 77 |       expect(validateJavaScriptCode('const x = 5;\\nreturn x;')).toBe(false);
 78 |       expect(validateJavaScriptCode('if (true) {\\n  return;\\n}')).toBe(false);
 79 |     });
 80 | 
 81 |     it('should reject literal \\n in various positions', () => {
 82 |       expect(validateJavaScriptCode('}\\nfunction')).toBe(false);
 83 |       expect(validateJavaScriptCode(');\\nconst')).toBe(false);
 84 |       expect(validateJavaScriptCode('\\n{')).toBe(false);
 85 |       expect(validateJavaScriptCode('\\n(')).toBe(false);
 86 |     });
 87 | 
 88 |     it('should reject literal \\n between statements', () => {
 89 |       expect(validateJavaScriptCode('const x = 5;\\nconst y = 10;')).toBe(false);
 90 |       expect(validateJavaScriptCode('doSomething();\\ndoAnother();')).toBe(false);
 91 |     });
 92 |   });
 93 | 
 94 |   describe('Edge Cases', () => {
 95 |     it('should handle empty strings', () => {
 96 |       expect(validateJavaScriptCode('')).toBe(true);
 97 |     });
 98 | 
 99 |     it('should handle whitespace-only strings', () => {
100 |       expect(validateJavaScriptCode('   ')).toBe(true);
101 |       expect(validateJavaScriptCode('\n\n\n')).toBe(true);
102 |       expect(validateJavaScriptCode('\t\t')).toBe(true);
103 |     });
104 | 
105 |     it('should handle single-line comments', () => {
106 |       expect(validateJavaScriptCode('// This is a comment')).toBe(true);
107 |       expect(validateJavaScriptCode('return 5; // Comment here')).toBe(true);
108 |     });
109 | 
110 |     it('should handle multi-line comments', () => {
111 |       expect(validateJavaScriptCode('/* Multi\nline\ncomment */')).toBe(true);
112 |       expect(validateJavaScriptCode('/* Comment */ return 5;')).toBe(true);
113 |     });
114 | 
115 |     it('should reject HTML tags even in what looks like strings', () => {
116 |       // The current validation is quite strict and rejects HTML tags even if they appear to be in strings
117 |       // This is by design to prevent malformed JavaScript that contains actual HTML
118 |       expect(validateJavaScriptCode('const html = "<div>Hello</div>"')).toBe(true); // <div> is ok
119 |       expect(validateJavaScriptCode("return '<style>body{}</style>'")).toBe(false); // <style> is rejected
120 |     });
121 |   });
122 | });
123 | 
```

--------------------------------------------------------------------------------
/src/__tests__/handlers/utility-handlers.test.ts:
--------------------------------------------------------------------------------

```typescript
  1 | /* eslint-env jest */
  2 | import { jest } from '@jest/globals';
  3 | import type { UtilityHandlers } from '../../handlers/utility-handlers.js';
  4 | import type { Crawl4AIService } from '../../crawl4ai-service.js';
  5 | 
  6 | // Mock the service
  7 | const mockCrawl = jest.fn();
  8 | const mockService = {
  9 |   crawl: mockCrawl,
 10 | } as unknown as Crawl4AIService;
 11 | 
 12 | // Mock axios client
 13 | const mockPost = jest.fn();
 14 | const mockAxiosClient = {
 15 |   post: mockPost,
 16 | } as unknown;
 17 | 
 18 | // Import after setting up mocks
 19 | const { UtilityHandlers: UtilityHandlersClass } = await import('../../handlers/utility-handlers.js');
 20 | 
 21 | describe('UtilityHandlers', () => {
 22 |   let handler: UtilityHandlers;
 23 |   let sessions: Map<string, unknown>;
 24 | 
 25 |   beforeEach(() => {
 26 |     jest.clearAllMocks();
 27 |     sessions = new Map();
 28 |     handler = new UtilityHandlersClass(mockService, mockAxiosClient, sessions);
 29 |   });
 30 | 
 31 |   describe('extractLinks', () => {
 32 |     it('should manually extract links from markdown when API returns empty links', async () => {
 33 |       // Mock crawl response with empty links but markdown containing href attributes
 34 |       mockPost.mockResolvedValue({
 35 |         data: {
 36 |           results: [
 37 |             {
 38 |               success: true,
 39 |               links: {
 40 |                 internal: [],
 41 |                 external: [],
 42 |               },
 43 |               markdown: {
 44 |                 raw_markdown: `
 45 |             # Test Page
 46 |             
 47 |             Here are some links:
 48 |             <a href="https://example.com/page1">Internal Link</a>
 49 |             <a href="https://external.com/page">External Link</a>
 50 |             <a href="/relative/path">Relative Link</a>
 51 |             <a href='https://example.com/page2'>Another Internal</a>
 52 |           `,
 53 |               },
 54 |             },
 55 |           ],
 56 |         },
 57 |       });
 58 | 
 59 |       const result = await handler.extractLinks({
 60 |         url: 'https://example.com',
 61 |         categorize: true,
 62 |       });
 63 | 
 64 |       // Should have manually extracted and categorized links
 65 |       expect(result.content[0].type).toBe('text');
 66 |       expect(result.content[0].text).toContain('Link analysis for https://example.com');
 67 |       expect(result.content[0].text).toContain('internal (3)');
 68 |       expect(result.content[0].text).toContain('https://example.com/page1');
 69 |       expect(result.content[0].text).toContain('https://example.com/page2');
 70 |       expect(result.content[0].text).toContain('https://example.com/relative/path');
 71 |       expect(result.content[0].text).toContain('external (1)');
 72 |       expect(result.content[0].text).toContain('https://external.com/page');
 73 |     });
 74 | 
 75 |     it('should handle manual extraction without categorization', async () => {
 76 |       // Mock crawl response with empty links
 77 |       mockPost.mockResolvedValue({
 78 |         data: {
 79 |           results: [
 80 |             {
 81 |               success: true,
 82 |               links: {
 83 |                 internal: [],
 84 |                 external: [],
 85 |               },
 86 |               markdown: {
 87 |                 raw_markdown: `<a href="https://example.com/page1">Link 1</a>
 88 |                          <a href="https://external.com/page">Link 2</a>`,
 89 |               },
 90 |             },
 91 |           ],
 92 |         },
 93 |       });
 94 | 
 95 |       const result = await handler.extractLinks({
 96 |         url: 'https://example.com',
 97 |         categorize: false,
 98 |       });
 99 | 
100 |       // Should show all links without categorization
101 |       expect(result.content[0].text).toContain('All links from https://example.com');
102 |       expect(result.content[0].text).toContain('https://example.com/page1');
103 |       expect(result.content[0].text).toContain('https://external.com/page');
104 |       expect(result.content[0].text).not.toContain('Internal links:');
105 |     });
106 | 
107 |     it('should handle malformed URLs during manual extraction', async () => {
108 |       // Mock crawl response with a malformed URL in href
109 |       mockPost.mockResolvedValue({
110 |         data: {
111 |           results: [
112 |             {
113 |               success: true,
114 |               links: {
115 |                 internal: [],
116 |                 external: [],
117 |               },
118 |               markdown: {
119 |                 raw_markdown: `<a href="javascript:void(0)">JS Link</a>
120 |                          <a href="https://example.com/valid">Valid Link</a>
121 |                          <a href="not-a-url">Invalid URL</a>`,
122 |               },
123 |             },
124 |           ],
125 |         },
126 |       });
127 | 
128 |       const result = await handler.extractLinks({
129 |         url: 'https://example.com',
130 |         categorize: true,
131 |       });
132 | 
133 |       // Should handle invalid URLs gracefully
134 |       expect(result.content[0].text).toContain('https://example.com/valid');
135 |       // Invalid URLs should be treated as relative links
136 |       expect(result.content[0].text).toContain('not-a-url');
137 |       expect(result.content[0].text).toContain('javascript:void(0)');
138 |     });
139 | 
140 |     it('should return empty results when no links found', async () => {
141 |       // Mock crawl response with no links
142 |       mockPost.mockResolvedValue({
143 |         data: {
144 |           results: [
145 |             {
146 |               success: true,
147 |               links: {
148 |                 internal: [],
149 |                 external: [],
150 |               },
151 |               markdown: {
152 |                 raw_markdown: 'Just plain text without any links',
153 |               },
154 |             },
155 |           ],
156 |         },
157 |       });
158 | 
159 |       const result = await handler.extractLinks({
160 |         url: 'https://example.com',
161 |         categorize: true,
162 |       });
163 | 
164 |       // Should show empty categories
165 |       expect(result.content[0].text).toContain('Link analysis for https://example.com');
166 |       expect(result.content[0].text).toContain('internal (0)');
167 |       expect(result.content[0].text).toContain('external (0)');
168 |     });
169 |   });
170 | });
171 | 
```

--------------------------------------------------------------------------------
/src/__tests__/index.cli.test.ts:
--------------------------------------------------------------------------------

```typescript
  1 | // import { jest } from '@jest/globals';
  2 | import { spawn } from 'child_process';
  3 | import * as path from 'path';
  4 | import * as url from 'url';
  5 | 
  6 | const __dirname = url.fileURLToPath(new URL('.', import.meta.url));
  7 | 
  8 | describe('CLI Entry Point', () => {
  9 |   const cliPath = path.join(__dirname, '..', '..', 'src', 'index.ts');
 10 | 
 11 |   // Helper to run CLI with given env vars
 12 |   const runCLI = (
 13 |     env: Record<string, string> = {},
 14 |   ): Promise<{ code: number | null; stdout: string; stderr: string }> => {
 15 |     return new Promise((resolve) => {
 16 |       const child = spawn('tsx', [cliPath], {
 17 |         env: { ...process.env, ...env },
 18 |         stdio: 'pipe',
 19 |       });
 20 | 
 21 |       let stdout = '';
 22 |       let stderr = '';
 23 | 
 24 |       child.stdout.on('data', (data) => {
 25 |         stdout += data.toString();
 26 |       });
 27 | 
 28 |       child.stderr.on('data', (data) => {
 29 |         stderr += data.toString();
 30 |       });
 31 | 
 32 |       child.on('close', (code) => {
 33 |         resolve({ code, stdout, stderr });
 34 |       });
 35 | 
 36 |       // Kill after 2 seconds to prevent hanging
 37 |       setTimeout(() => {
 38 |         child.kill();
 39 |       }, 2000);
 40 |     });
 41 |   };
 42 | 
 43 |   describe('Environment Variable Validation', () => {
 44 |     it('should exit with code 1 when CRAWL4AI_BASE_URL is missing', async () => {
 45 |       const { code, stderr } = await runCLI({
 46 |         CRAWL4AI_BASE_URL: '',
 47 |       });
 48 | 
 49 |       expect(code).toBe(1);
 50 |       expect(stderr).toContain('Error: CRAWL4AI_BASE_URL environment variable is required');
 51 |       expect(stderr).toContain('Please set it to your Crawl4AI server URL');
 52 |     });
 53 | 
 54 |     it('should start successfully with valid CRAWL4AI_BASE_URL', async () => {
 55 |       const { code, stderr } = await runCLI({
 56 |         CRAWL4AI_BASE_URL: 'http://localhost:11235',
 57 |         CRAWL4AI_API_KEY: 'test-key',
 58 |       });
 59 | 
 60 |       // Process should be killed by timeout, not exit with error
 61 |       expect(code).not.toBe(1);
 62 |       // MCP servers output to stderr
 63 |       expect(stderr).toContain('crawl4ai-mcp');
 64 |     });
 65 | 
 66 |     it('should use default values for optional env vars', async () => {
 67 |       const { stderr } = await runCLI({
 68 |         CRAWL4AI_BASE_URL: 'http://localhost:11235',
 69 |         // No API_KEY, SERVER_NAME, or SERVER_VERSION
 70 |       });
 71 | 
 72 |       expect(stderr).toContain('crawl4ai-mcp'); // default server name
 73 |       expect(stderr).toContain('1.0.0'); // default version
 74 |     });
 75 | 
 76 |     it('should use custom SERVER_NAME and SERVER_VERSION when provided', async () => {
 77 |       const { stderr } = await runCLI({
 78 |         CRAWL4AI_BASE_URL: 'http://localhost:11235',
 79 |         SERVER_NAME: 'custom-server',
 80 |         SERVER_VERSION: '2.0.0',
 81 |       });
 82 | 
 83 |       expect(stderr).toContain('custom-server');
 84 |       expect(stderr).toContain('2.0.0');
 85 |     });
 86 |   });
 87 | 
 88 |   describe('Signal Handling', () => {
 89 |     it('should handle SIGTERM gracefully', async () => {
 90 |       const child = spawn('tsx', [cliPath], {
 91 |         env: {
 92 |           ...process.env,
 93 |           CRAWL4AI_BASE_URL: 'http://localhost:11235',
 94 |         },
 95 |         stdio: 'pipe',
 96 |       });
 97 | 
 98 |       // Wait for startup
 99 |       await new Promise((resolve) => setTimeout(resolve, 500));
100 | 
101 |       // Send SIGTERM
102 |       child.kill('SIGTERM');
103 | 
104 |       const code = await new Promise<number | null>((resolve, reject) => {
105 |         const timeout = setTimeout(() => {
106 |           child.kill('SIGKILL');
107 |           reject(new Error('Process did not exit in time'));
108 |         }, 5000);
109 | 
110 |         child.on('close', (exitCode) => {
111 |           clearTimeout(timeout);
112 |           resolve(exitCode);
113 |         });
114 |       });
115 | 
116 |       // Should exit with signal code
117 |       expect(code).toBe(143); // 128 + 15 (SIGTERM)
118 | 
119 |       // Ensure cleanup
120 |       child.kill();
121 |     }, 10000);
122 | 
123 |     it('should handle SIGINT gracefully', async () => {
124 |       const child = spawn('tsx', [cliPath], {
125 |         env: {
126 |           ...process.env,
127 |           CRAWL4AI_BASE_URL: 'http://localhost:11235',
128 |         },
129 |         stdio: 'pipe',
130 |       });
131 | 
132 |       // Wait for startup
133 |       await new Promise((resolve) => setTimeout(resolve, 500));
134 | 
135 |       // Send SIGINT (Ctrl+C)
136 |       child.kill('SIGINT');
137 | 
138 |       const code = await new Promise<number | null>((resolve, reject) => {
139 |         const timeout = setTimeout(() => {
140 |           child.kill('SIGKILL');
141 |           reject(new Error('Process did not exit in time'));
142 |         }, 5000);
143 | 
144 |         child.on('close', (exitCode) => {
145 |           clearTimeout(timeout);
146 |           resolve(exitCode);
147 |         });
148 |       });
149 | 
150 |       // Should exit with signal code
151 |       expect(code).toBe(130); // 128 + 2 (SIGINT)
152 | 
153 |       // Ensure cleanup
154 |       child.kill();
155 |     }, 10000);
156 |   });
157 | 
158 |   describe('Error Handling', () => {
159 |     it('should handle server startup errors', async () => {
160 |       // This will be tricky to test without mocking, but we can at least
161 |       // verify the process starts and attempts to connect
162 |       const { code, stdout, stderr } = await runCLI({
163 |         CRAWL4AI_BASE_URL: 'http://invalid-host-that-does-not-exist:99999',
164 |       });
165 | 
166 |       // Should not exit with code 1 (that's for missing env vars)
167 |       expect(code).not.toBe(1);
168 |       // But might log connection errors
169 |       const output = stdout + stderr;
170 |       expect(output).toBeTruthy();
171 |     });
172 |   });
173 | 
174 |   describe('dotenv Loading', () => {
175 |     it('should load .env file if present', async () => {
176 |       // Create a temporary .env file
177 |       const fs = await import('fs/promises');
178 |       const envPath = path.join(__dirname, '..', '..', '.env.test');
179 | 
180 |       await fs.writeFile(envPath, 'TEST_ENV_VAR=loaded_from_file\n');
181 | 
182 |       try {
183 |         const { stderr } = await runCLI({
184 |           CRAWL4AI_BASE_URL: 'http://localhost:11235',
185 |           NODE_ENV: 'test',
186 |           DOTENV_CONFIG_PATH: envPath,
187 |         });
188 | 
189 |         // Verify the server starts (dotenv loaded successfully)
190 |         expect(stderr).toContain('crawl4ai-mcp');
191 |       } finally {
192 |         // Clean up
193 |         await fs.unlink(envPath).catch(() => {});
194 |       }
195 |     });
196 |   });
197 | });
198 | 
```

--------------------------------------------------------------------------------
/src/__tests__/integration/crawl-recursive.integration.test.ts:
--------------------------------------------------------------------------------

```typescript
  1 | /* eslint-env jest */
  2 | import { Client } from '@modelcontextprotocol/sdk/client/index.js';
  3 | import { createTestClient, cleanupTestClient, TEST_TIMEOUTS } from './test-utils.js';
  4 | 
  5 | interface ToolResult {
  6 |   content: Array<{
  7 |     type: string;
  8 |     text?: string;
  9 |   }>;
 10 | }
 11 | 
 12 | describe('crawl_recursive Integration Tests', () => {
 13 |   let client: Client;
 14 | 
 15 |   beforeAll(async () => {
 16 |     client = await createTestClient();
 17 |   }, TEST_TIMEOUTS.medium);
 18 | 
 19 |   afterAll(async () => {
 20 |     if (client) {
 21 |       await cleanupTestClient(client);
 22 |     }
 23 |   });
 24 | 
 25 |   describe('Basic functionality', () => {
 26 |     it(
 27 |       'should crawl a site recursively with default settings',
 28 |       async () => {
 29 |         const result = await client.callTool({
 30 |           name: 'crawl_recursive',
 31 |           arguments: {
 32 |             url: 'https://httpbin.org/links/5/0',
 33 |           },
 34 |         });
 35 | 
 36 |         expect(result).toBeDefined();
 37 |         const content = (result as ToolResult).content;
 38 |         expect(content).toBeDefined();
 39 |         expect(Array.isArray(content)).toBe(true);
 40 |         expect(content.length).toBeGreaterThan(0);
 41 | 
 42 |         const textContent = content.find((c) => c.type === 'text');
 43 |         expect(textContent).toBeDefined();
 44 |         expect(textContent?.text).toContain('Recursive crawl completed');
 45 |         expect(textContent?.text).toContain('Pages crawled:');
 46 |         expect(textContent?.text).toContain('Max depth reached:');
 47 |         expect(textContent?.text).toContain('Only internal links');
 48 |         // Should have found multiple pages since httpbin.org/links/5/0 has internal links
 49 |         expect(textContent?.text).toMatch(/Pages crawled: [2-9]|[1-9][0-9]/);
 50 |       },
 51 |       TEST_TIMEOUTS.long,
 52 |     );
 53 | 
 54 |     it(
 55 |       'should respect max_depth parameter',
 56 |       async () => {
 57 |         const result = await client.callTool({
 58 |           name: 'crawl_recursive',
 59 |           arguments: {
 60 |             url: 'https://httpbin.org/links/10/0',
 61 |             max_depth: 1,
 62 |             max_pages: 5,
 63 |           },
 64 |         });
 65 | 
 66 |         expect(result).toBeDefined();
 67 |         const content = (result as ToolResult).content;
 68 |         const textContent = content.find((c) => c.type === 'text');
 69 |         expect(textContent).toBeDefined();
 70 |         expect(textContent?.text).toContain('Max depth reached: ');
 71 |         expect(textContent?.text).toMatch(/Max depth reached: [0-1] \(limit: 1\)/);
 72 |         // With max_depth=1, should find some pages but not go too deep
 73 |         expect(textContent?.text).toMatch(/Pages crawled: [1-5]/);
 74 |       },
 75 |       TEST_TIMEOUTS.long,
 76 |     );
 77 | 
 78 |     it(
 79 |       'should apply include pattern filter',
 80 |       async () => {
 81 |         const result = await client.callTool({
 82 |           name: 'crawl_recursive',
 83 |           arguments: {
 84 |             url: 'https://httpbin.org/links/10/0',
 85 |             max_depth: 1,
 86 |             max_pages: 5,
 87 |             include_pattern: '.*/links/[0-9]+/[0-4]$', // Only include links ending with 0-4
 88 |           },
 89 |         });
 90 | 
 91 |         expect(result).toBeDefined();
 92 |         const content = (result as ToolResult).content;
 93 |         const textContent = content.find((c) => c.type === 'text');
 94 |         expect(textContent).toBeDefined();
 95 | 
 96 |         // Check that we have some results
 97 |         expect(textContent?.text).toContain('Pages crawled:');
 98 | 
 99 |         // If we crawled pages, they should match our pattern
100 |         if (textContent?.text && textContent.text.includes('Pages found:')) {
101 |           const pagesSection = textContent.text.split('Pages found:')[1];
102 |           if (pagesSection && pagesSection.trim()) {
103 |             // All URLs should end with /0, /1, /2, /3, or /4
104 |             expect(pagesSection).toMatch(/\/[0-4]\b/);
105 |             // Should NOT have URLs ending with /5, /6, /7, /8, /9
106 |             expect(pagesSection).not.toMatch(/\/[5-9]\b/);
107 |           }
108 |         }
109 |       },
110 |       TEST_TIMEOUTS.long,
111 |     );
112 | 
113 |     it(
114 |       'should apply exclude pattern filter',
115 |       async () => {
116 |         const result = await client.callTool({
117 |           name: 'crawl_recursive',
118 |           arguments: {
119 |             url: 'https://example.com',
120 |             max_depth: 2,
121 |             max_pages: 10,
122 |             exclude_pattern: '.*\\.(pdf|zip|exe)$',
123 |           },
124 |         });
125 | 
126 |         expect(result).toBeDefined();
127 |         const content = (result as ToolResult).content;
128 |         const textContent = content.find((c) => c.type === 'text');
129 |         expect(textContent).toBeDefined();
130 | 
131 |         // Should not have crawled any PDF, ZIP, or EXE files
132 |         expect(textContent?.text).not.toMatch(/\.(pdf|zip|exe)/i);
133 |       },
134 |       TEST_TIMEOUTS.long,
135 |     );
136 |   });
137 | 
138 |   describe('Error handling', () => {
139 |     it(
140 |       'should handle invalid URLs',
141 |       async () => {
142 |         const result = await client.callTool({
143 |           name: 'crawl_recursive',
144 |           arguments: {
145 |             url: 'not-a-url',
146 |           },
147 |         });
148 | 
149 |         expect(result).toBeDefined();
150 |         const content = (result as ToolResult).content;
151 |         expect(content).toBeDefined();
152 |         const textContent = content.find((c) => c.type === 'text');
153 |         expect(textContent).toBeDefined();
154 |         expect(textContent?.text).toContain('Error');
155 |         expect(textContent?.text?.toLowerCase()).toContain('invalid');
156 |       },
157 |       TEST_TIMEOUTS.short,
158 |     );
159 | 
160 |     it(
161 |       'should handle sites with internal links',
162 |       async () => {
163 |         const result = await client.callTool({
164 |           name: 'crawl_recursive',
165 |           arguments: {
166 |             url: 'https://httpbin.org/links/5/0',
167 |             max_depth: 2,
168 |             max_pages: 10,
169 |           },
170 |         });
171 | 
172 |         expect(result).toBeDefined();
173 |         const content = (result as ToolResult).content;
174 |         const textContent = content.find((c) => c.type === 'text');
175 |         expect(textContent).toBeDefined();
176 |         expect(textContent?.text).toContain('Pages crawled:');
177 |         // Should crawl multiple pages since httpbin.org/links/5/0 has 5 internal links
178 |         expect(textContent?.text).toMatch(/Pages crawled: [2-9]|1[0-9]/);
179 |         expect(textContent?.text).toContain('Internal links found:');
180 |       },
181 |       TEST_TIMEOUTS.medium,
182 |     );
183 |   });
184 | });
185 | 
```

--------------------------------------------------------------------------------
/src/handlers/content-handlers.ts:
--------------------------------------------------------------------------------

```typescript
  1 | import { BaseHandler } from './base-handler.js';
  2 | import {
  3 |   MarkdownEndpointOptions,
  4 |   MarkdownEndpointResponse,
  5 |   ScreenshotEndpointOptions,
  6 |   ScreenshotEndpointResponse,
  7 |   PDFEndpointOptions,
  8 |   PDFEndpointResponse,
  9 |   HTMLEndpointOptions,
 10 |   HTMLEndpointResponse,
 11 |   FilterType,
 12 | } from '../types.js';
 13 | import * as fs from 'fs/promises';
 14 | import * as path from 'path';
 15 | import * as os from 'os';
 16 | 
 17 | export class ContentHandlers extends BaseHandler {
 18 |   async getMarkdown(
 19 |     options: Omit<MarkdownEndpointOptions, 'f' | 'q' | 'c'> & { filter?: string; query?: string; cache?: string },
 20 |   ) {
 21 |     try {
 22 |       // Map from schema property names to API parameter names
 23 |       const result: MarkdownEndpointResponse = await this.service.getMarkdown({
 24 |         url: options.url,
 25 |         f: options.filter as FilterType | undefined, // Schema provides 'filter', API expects 'f'
 26 |         q: options.query, // Schema provides 'query', API expects 'q'
 27 |         c: options.cache, // Schema provides 'cache', API expects 'c'
 28 |       });
 29 | 
 30 |       // Format the response
 31 |       let formattedText = `URL: ${result.url}\nFilter: ${result.filter}`;
 32 | 
 33 |       if (result.query) {
 34 |         formattedText += `\nQuery: ${result.query}`;
 35 |       }
 36 | 
 37 |       formattedText += `\nCache: ${result.cache}\n\nMarkdown:\n${result.markdown || 'No content found.'}`;
 38 | 
 39 |       return {
 40 |         content: [
 41 |           {
 42 |             type: 'text',
 43 |             text: formattedText,
 44 |           },
 45 |         ],
 46 |       };
 47 |     } catch (error) {
 48 |       throw this.formatError(error, 'get markdown');
 49 |     }
 50 |   }
 51 | 
 52 |   async captureScreenshot(options: ScreenshotEndpointOptions) {
 53 |     try {
 54 |       const result: ScreenshotEndpointResponse = await this.service.captureScreenshot(options);
 55 | 
 56 |       // Response has { success: true, screenshot: "base64string" }
 57 |       if (!result.success || !result.screenshot) {
 58 |         throw new Error('Screenshot capture failed - no screenshot data in response');
 59 |       }
 60 | 
 61 |       let savedFilePath: string | undefined;
 62 | 
 63 |       // Save to local directory if requested
 64 |       if (options.save_to_directory) {
 65 |         try {
 66 |           // Resolve home directory path
 67 |           let resolvedPath = options.save_to_directory;
 68 |           if (resolvedPath.startsWith('~')) {
 69 |             const homedir = os.homedir();
 70 |             resolvedPath = path.join(homedir, resolvedPath.slice(1));
 71 |           }
 72 | 
 73 |           // Check if user provided a file path instead of directory
 74 |           if (resolvedPath.endsWith('.png') || resolvedPath.endsWith('.jpg')) {
 75 |             console.warn(
 76 |               `Warning: save_to_directory should be a directory path, not a file path. Using parent directory.`,
 77 |             );
 78 |             resolvedPath = path.dirname(resolvedPath);
 79 |           }
 80 | 
 81 |           // Ensure directory exists
 82 |           await fs.mkdir(resolvedPath, { recursive: true });
 83 | 
 84 |           // Generate filename from URL and timestamp
 85 |           const url = new URL(options.url);
 86 |           const hostname = url.hostname.replace(/[^a-z0-9]/gi, '-');
 87 |           const timestamp = new Date().toISOString().replace(/[:.]/g, '-').slice(0, -5);
 88 |           const filename = `${hostname}-${timestamp}.png`;
 89 | 
 90 |           savedFilePath = path.join(resolvedPath, filename);
 91 | 
 92 |           // Convert base64 to buffer and save
 93 |           const buffer = Buffer.from(result.screenshot, 'base64');
 94 |           await fs.writeFile(savedFilePath, buffer);
 95 |         } catch (saveError) {
 96 |           // Log error but don't fail the operation
 97 |           console.error('Failed to save screenshot locally:', saveError);
 98 |         }
 99 |       }
100 | 
101 |       const textContent = savedFilePath
102 |         ? `Screenshot captured for: ${options.url}\nSaved to: ${savedFilePath}`
103 |         : `Screenshot captured for: ${options.url}`;
104 | 
105 |       // If saved locally and screenshot is large (>800KB), don't return the base64 data
106 |       const screenshotSize = Buffer.from(result.screenshot, 'base64').length;
107 |       const shouldReturnImage = !savedFilePath || screenshotSize < 800 * 1024; // 800KB threshold
108 | 
109 |       const content = [];
110 | 
111 |       if (shouldReturnImage) {
112 |         content.push({
113 |           type: 'image',
114 |           data: result.screenshot,
115 |           mimeType: 'image/png',
116 |         });
117 |       }
118 | 
119 |       content.push({
120 |         type: 'text',
121 |         text: shouldReturnImage
122 |           ? textContent
123 |           : `${textContent}\n\nNote: Screenshot data not returned due to size (${Math.round(screenshotSize / 1024)}KB). View the saved file instead.`,
124 |       });
125 | 
126 |       return { content };
127 |     } catch (error) {
128 |       throw this.formatError(error, 'capture screenshot');
129 |     }
130 |   }
131 | 
132 |   async generatePDF(options: PDFEndpointOptions) {
133 |     try {
134 |       const result: PDFEndpointResponse = await this.service.generatePDF(options);
135 | 
136 |       // Response has { success: true, pdf: "base64string" }
137 |       if (!result.success || !result.pdf) {
138 |         throw new Error('PDF generation failed - no PDF data in response');
139 |       }
140 | 
141 |       return {
142 |         content: [
143 |           {
144 |             type: 'resource',
145 |             resource: {
146 |               uri: `data:application/pdf;name=${encodeURIComponent(new URL(String(options.url)).hostname)}.pdf;base64,${result.pdf}`,
147 |               mimeType: 'application/pdf',
148 |               blob: result.pdf,
149 |             },
150 |           },
151 |           {
152 |             type: 'text',
153 |             text: `PDF generated for: ${options.url}`,
154 |           },
155 |         ],
156 |       };
157 |     } catch (error) {
158 |       throw this.formatError(error, 'generate PDF');
159 |     }
160 |   }
161 | 
162 |   async getHTML(options: HTMLEndpointOptions) {
163 |     try {
164 |       const result: HTMLEndpointResponse = await this.service.getHTML(options);
165 | 
166 |       // Response has { html: string, url: string, success: true }
167 |       return {
168 |         content: [
169 |           {
170 |             type: 'text',
171 |             text: result.html || '',
172 |           },
173 |         ],
174 |       };
175 |     } catch (error) {
176 |       throw this.formatError(error, 'get HTML');
177 |     }
178 |   }
179 | 
180 |   async extractWithLLM(options: { url: string; query: string }) {
181 |     try {
182 |       const result = await this.service.extractWithLLM(options);
183 | 
184 |       return {
185 |         content: [
186 |           {
187 |             type: 'text',
188 |             text: result.answer,
189 |           },
190 |         ],
191 |       };
192 |     } catch (error) {
193 |       throw this.formatError(error, 'extract with LLM');
194 |     }
195 |   }
196 | }
197 | 
```

--------------------------------------------------------------------------------
/src/__tests__/integration/get-markdown.integration.test.ts:
--------------------------------------------------------------------------------

```typescript
  1 | /* eslint-env jest */
  2 | import { Client } from '@modelcontextprotocol/sdk/client/index.js';
  3 | import { createTestClient, cleanupTestClient, TEST_TIMEOUTS } from './test-utils.js';
  4 | 
  5 | interface ToolResult {
  6 |   content: Array<{
  7 |     type: string;
  8 |     text?: string;
  9 |   }>;
 10 | }
 11 | 
 12 | describe('get_markdown Integration Tests', () => {
 13 |   let client: Client;
 14 | 
 15 |   beforeAll(async () => {
 16 |     client = await createTestClient();
 17 |   }, TEST_TIMEOUTS.medium);
 18 | 
 19 |   afterAll(async () => {
 20 |     if (client) {
 21 |       await cleanupTestClient(client);
 22 |     }
 23 |   });
 24 | 
 25 |   describe('Markdown extraction', () => {
 26 |     it(
 27 |       'should extract markdown with default fit filter',
 28 |       async () => {
 29 |         const result = await client.callTool({
 30 |           name: 'get_markdown',
 31 |           arguments: {
 32 |             url: 'https://httpbin.org/html',
 33 |           },
 34 |         });
 35 | 
 36 |         expect(result).toBeDefined();
 37 |         const content = (result as ToolResult).content;
 38 |         expect(content).toHaveLength(1);
 39 |         expect(content[0].type).toBe('text');
 40 | 
 41 |         const text = content[0].text || '';
 42 |         expect(text).toContain('URL: https://httpbin.org/html');
 43 |         expect(text).toContain('Filter: fit');
 44 |         expect(text).toContain('Markdown:');
 45 |       },
 46 |       TEST_TIMEOUTS.medium,
 47 |     );
 48 | 
 49 |     it(
 50 |       'should extract markdown with raw filter',
 51 |       async () => {
 52 |         const result = await client.callTool({
 53 |           name: 'get_markdown',
 54 |           arguments: {
 55 |             url: 'https://httpbin.org/html',
 56 |             filter: 'raw',
 57 |           },
 58 |         });
 59 | 
 60 |         const content = (result as ToolResult).content;
 61 |         expect(content).toHaveLength(1);
 62 |         expect(content[0].type).toBe('text');
 63 | 
 64 |         const text = content[0].text || '';
 65 |         expect(text).toContain('Filter: raw');
 66 |       },
 67 |       TEST_TIMEOUTS.medium,
 68 |     );
 69 | 
 70 |     it(
 71 |       'should extract markdown with bm25 filter and query',
 72 |       async () => {
 73 |         const result = await client.callTool({
 74 |           name: 'get_markdown',
 75 |           arguments: {
 76 |             url: 'https://httpbin.org/html',
 77 |             filter: 'bm25',
 78 |             query: 'Herman Melville',
 79 |           },
 80 |         });
 81 | 
 82 |         const content = (result as ToolResult).content;
 83 |         expect(content).toHaveLength(1);
 84 |         expect(content[0].type).toBe('text');
 85 | 
 86 |         const text = content[0].text || '';
 87 |         expect(text).toContain('Filter: bm25');
 88 |         expect(text).toContain('Query: Herman Melville');
 89 |       },
 90 |       TEST_TIMEOUTS.medium,
 91 |     );
 92 | 
 93 |     it(
 94 |       'should extract markdown with llm filter and query',
 95 |       async () => {
 96 |         const result = await client.callTool({
 97 |           name: 'get_markdown',
 98 |           arguments: {
 99 |             url: 'https://httpbin.org/html',
100 |             filter: 'llm',
101 |             query: 'What is this page about?',
102 |           },
103 |         });
104 | 
105 |         const content = (result as ToolResult).content;
106 |         expect(content).toHaveLength(1);
107 |         expect(content[0].type).toBe('text');
108 | 
109 |         const text = content[0].text || '';
110 |         expect(text).toContain('Filter: llm');
111 |         expect(text).toContain('Query: What is this page about?');
112 |       },
113 |       TEST_TIMEOUTS.medium,
114 |     );
115 | 
116 |     it(
117 |       'should use cache parameter',
118 |       async () => {
119 |         const result = await client.callTool({
120 |           name: 'get_markdown',
121 |           arguments: {
122 |             url: 'https://httpbin.org/html',
123 |             cache: '1',
124 |           },
125 |         });
126 | 
127 |         const content = (result as ToolResult).content;
128 |         expect(content).toHaveLength(1);
129 |         expect(content[0].type).toBe('text');
130 | 
131 |         const text = content[0].text || '';
132 |         expect(text).toContain('Cache: 1');
133 |       },
134 |       TEST_TIMEOUTS.medium,
135 |     );
136 | 
137 |     it(
138 |       'should reject session_id parameter',
139 |       async () => {
140 |         const result = await client.callTool({
141 |           name: 'get_markdown',
142 |           arguments: {
143 |             url: 'https://httpbin.org/html',
144 |             session_id: 'test-session',
145 |           },
146 |         });
147 | 
148 |         const content = (result as ToolResult).content;
149 |         expect(content).toHaveLength(1);
150 |         expect(content[0].type).toBe('text');
151 |         expect(content[0].text).toContain('session_id');
152 |         expect(content[0].text).toContain('does not support');
153 |         expect(content[0].text).toContain('stateless');
154 |       },
155 |       TEST_TIMEOUTS.short,
156 |     );
157 | 
158 |     it(
159 |       'should handle invalid URLs gracefully',
160 |       async () => {
161 |         const result = await client.callTool({
162 |           name: 'get_markdown',
163 |           arguments: {
164 |             url: 'not-a-valid-url',
165 |           },
166 |         });
167 | 
168 |         const content = (result as ToolResult).content;
169 |         expect(content).toHaveLength(1);
170 |         expect(content[0].type).toBe('text');
171 |         expect(content[0].text).toContain('Error');
172 |         expect(content[0].text?.toLowerCase()).toContain('invalid');
173 |       },
174 |       TEST_TIMEOUTS.short,
175 |     );
176 | 
177 |     it(
178 |       'should handle non-existent domains',
179 |       async () => {
180 |         const result = await client.callTool({
181 |           name: 'get_markdown',
182 |           arguments: {
183 |             url: 'https://this-domain-definitely-does-not-exist-123456789.com',
184 |           },
185 |         });
186 | 
187 |         const content = (result as ToolResult).content;
188 |         expect(content).toHaveLength(1);
189 |         expect(content[0].type).toBe('text');
190 | 
191 |         // According to the pattern from other tests, might return success with empty content
192 |         const text = content[0].text || '';
193 |         expect(typeof text).toBe('string');
194 |       },
195 |       TEST_TIMEOUTS.short,
196 |     );
197 | 
198 |     it(
199 |       'should ignore extra parameters',
200 |       async () => {
201 |         const result = await client.callTool({
202 |           name: 'get_markdown',
203 |           arguments: {
204 |             url: 'https://httpbin.org/html',
205 |             filter: 'fit',
206 |             // These should be ignored
207 |             remove_images: true,
208 |             bypass_cache: true,
209 |             screenshot: true,
210 |           },
211 |         });
212 | 
213 |         const content = (result as ToolResult).content;
214 |         expect(content).toHaveLength(1);
215 |         expect(content[0].type).toBe('text');
216 | 
217 |         // Should still work, ignoring extra params
218 |         const text = content[0].text || '';
219 |         expect(text).toContain('Filter: fit');
220 |       },
221 |       TEST_TIMEOUTS.medium,
222 |     );
223 |   });
224 | });
225 | 
```

--------------------------------------------------------------------------------
/src/__tests__/integration/execute-js.integration.test.ts:
--------------------------------------------------------------------------------

```typescript
  1 | /* eslint-env jest */
  2 | import { Client } from '@modelcontextprotocol/sdk/client/index.js';
  3 | import { createTestClient, cleanupTestClient, TEST_TIMEOUTS } from './test-utils.js';
  4 | 
  5 | interface ToolResult {
  6 |   content: Array<{
  7 |     type: string;
  8 |     text?: string;
  9 |   }>;
 10 | }
 11 | 
 12 | describe('execute_js Integration Tests', () => {
 13 |   let client: Client;
 14 | 
 15 |   beforeAll(async () => {
 16 |     client = await createTestClient();
 17 |   }, TEST_TIMEOUTS.medium);
 18 | 
 19 |   afterAll(async () => {
 20 |     if (client) {
 21 |       await cleanupTestClient(client);
 22 |     }
 23 |   });
 24 | 
 25 |   describe('JavaScript execution', () => {
 26 |     it(
 27 |       'should execute JavaScript and return results',
 28 |       async () => {
 29 |         const result = await client.callTool({
 30 |           name: 'execute_js',
 31 |           arguments: {
 32 |             url: 'https://httpbin.org/html',
 33 |             scripts: ['return document.title', 'return document.querySelectorAll("h1").length'],
 34 |           },
 35 |         });
 36 | 
 37 |         expect(result).toBeDefined();
 38 |         const content = (result as ToolResult).content;
 39 |         expect(content).toHaveLength(1);
 40 |         expect(content[0].type).toBe('text');
 41 | 
 42 |         // Should contain JavaScript execution results
 43 |         expect(content[0].text).toContain('JavaScript executed on: https://httpbin.org/html');
 44 |         expect(content[0].text).toContain('Results:');
 45 |         expect(content[0].text).toContain('Script: return document.title');
 46 |         expect(content[0].text).toMatch(/Returned: .*/); // Title may be empty or no return value
 47 |         expect(content[0].text).toContain('Script: return document.querySelectorAll("h1").length');
 48 |         expect(content[0].text).toContain('Returned: 1'); // Should have 1 h1 element
 49 |       },
 50 |       TEST_TIMEOUTS.medium,
 51 |     );
 52 | 
 53 |     it(
 54 |       'should execute single script as string',
 55 |       async () => {
 56 |         console.log('Starting execute_js test...');
 57 |         const result = await client.callTool({
 58 |           name: 'execute_js',
 59 |           arguments: {
 60 |             url: 'https://httpbin.org/html',
 61 |             scripts: 'return window.location.href',
 62 |           },
 63 |         });
 64 |         console.log('Got result:', result);
 65 | 
 66 |         expect(result).toBeDefined();
 67 |         const content = (result as ToolResult).content;
 68 |         expect(content).toHaveLength(1);
 69 | 
 70 |         expect(content[0].text).toContain('JavaScript executed on: https://httpbin.org/html');
 71 |         expect(content[0].text).toContain('Script: return window.location.href');
 72 |         expect(content[0].text).toContain('Returned: "https://httpbin.org/html');
 73 |       },
 74 |       TEST_TIMEOUTS.long, // Increase timeout to 120s
 75 |     );
 76 | 
 77 |     it(
 78 |       'should reject session_id parameter',
 79 |       async () => {
 80 |         const result = await client.callTool({
 81 |           name: 'execute_js',
 82 |           arguments: {
 83 |             url: 'https://httpbin.org/html',
 84 |             scripts: 'return true',
 85 |             session_id: 'test-session',
 86 |           },
 87 |         });
 88 | 
 89 |         const content = (result as ToolResult).content;
 90 |         expect(content).toHaveLength(1);
 91 |         expect(content[0].type).toBe('text');
 92 |         expect(content[0].text).toContain('session_id');
 93 |         expect(content[0].text).toContain('does not support');
 94 |         expect(content[0].text).toContain('stateless');
 95 |       },
 96 |       TEST_TIMEOUTS.short,
 97 |     );
 98 | 
 99 |     it(
100 |       'should reject invalid JavaScript with HTML entities',
101 |       async () => {
102 |         const result = await client.callTool({
103 |           name: 'execute_js',
104 |           arguments: {
105 |             url: 'https://httpbin.org/html',
106 |             scripts: 'return &quot;test&quot;',
107 |           },
108 |         });
109 | 
110 |         const content = (result as ToolResult).content;
111 |         expect(content).toHaveLength(1);
112 |         expect(content[0].text).toContain('Error');
113 |         expect(content[0].text).toContain('Invalid JavaScript');
114 |         expect(content[0].text).toContain('HTML entities');
115 |       },
116 |       TEST_TIMEOUTS.short,
117 |     );
118 | 
119 |     it(
120 |       'should accept JavaScript with newlines in strings',
121 |       async () => {
122 |         const result = await client.callTool({
123 |           name: 'execute_js',
124 |           arguments: {
125 |             url: 'https://httpbin.org/html',
126 |             scripts: 'const text = "line1\\nline2"; return text',
127 |           },
128 |         });
129 | 
130 |         const content = (result as ToolResult).content;
131 |         expect(content).toHaveLength(1);
132 |         expect(content[0].text).toContain('JavaScript executed on: https://httpbin.org/html');
133 |         expect(content[0].text).toContain('Returned: "line1\\nline2"');
134 |       },
135 |       TEST_TIMEOUTS.medium, // Increase from short to medium
136 |     );
137 | 
138 |     it(
139 |       'should handle JavaScript execution errors',
140 |       async () => {
141 |         const result = await client.callTool({
142 |           name: 'execute_js',
143 |           arguments: {
144 |             url: 'https://httpbin.org/html',
145 |             scripts: [
146 |               'return "This works"',
147 |               'throw new Error("This is a test error")',
148 |               'nonExistentVariable.someMethod()',
149 |             ],
150 |           },
151 |         });
152 | 
153 |         const content = (result as ToolResult).content;
154 |         expect(content).toHaveLength(1);
155 |         expect(content[0].text).toContain('JavaScript executed on: https://httpbin.org/html');
156 | 
157 |         // First script should succeed
158 |         expect(content[0].text).toContain('Script: return "This works"');
159 |         expect(content[0].text).toContain('Returned: "This works"');
160 | 
161 |         // Second script should show error
162 |         expect(content[0].text).toContain('Script: throw new Error("This is a test error")');
163 |         expect(content[0].text).toContain('Returned: Error: Error: This is a test error');
164 | 
165 |         // Third script should show reference error
166 |         expect(content[0].text).toContain('Script: nonExistentVariable.someMethod()');
167 |         expect(content[0].text).toContain('Returned: Error: ReferenceError: nonExistentVariable is not defined');
168 |       },
169 |       TEST_TIMEOUTS.medium,
170 |     );
171 | 
172 |     it(
173 |       'should handle invalid URLs gracefully',
174 |       async () => {
175 |         const result = await client.callTool({
176 |           name: 'execute_js',
177 |           arguments: {
178 |             url: 'not-a-valid-url',
179 |             scripts: 'return true',
180 |           },
181 |         });
182 | 
183 |         const content = (result as ToolResult).content;
184 |         expect(content).toHaveLength(1);
185 |         expect(content[0].text).toContain('Error');
186 |         expect(content[0].text?.toLowerCase()).toContain('invalid');
187 |       },
188 |       TEST_TIMEOUTS.short,
189 |     );
190 |   });
191 | });
192 | 
```

--------------------------------------------------------------------------------
/src/__tests__/integration/batch-crawl.integration.test.ts:
--------------------------------------------------------------------------------

```typescript
  1 | /* eslint-env jest */
  2 | import { Client } from '@modelcontextprotocol/sdk/client/index.js';
  3 | import { createTestClient, cleanupTestClient, TEST_TIMEOUTS } from './test-utils.js';
  4 | 
  5 | interface ToolResult {
  6 |   content: Array<{
  7 |     type: string;
  8 |     text?: string;
  9 |   }>;
 10 | }
 11 | 
 12 | describe('batch_crawl Integration Tests', () => {
 13 |   let client: Client;
 14 | 
 15 |   beforeAll(async () => {
 16 |     client = await createTestClient();
 17 |   }, TEST_TIMEOUTS.medium);
 18 | 
 19 |   afterAll(async () => {
 20 |     if (client) {
 21 |       await cleanupTestClient(client);
 22 |     }
 23 |   });
 24 | 
 25 |   describe('Batch crawling', () => {
 26 |     it(
 27 |       'should crawl multiple URLs',
 28 |       async () => {
 29 |         const result = await client.callTool({
 30 |           name: 'batch_crawl',
 31 |           arguments: {
 32 |             urls: ['https://httpbingo.org/html', 'https://httpbingo.org/json'],
 33 |           },
 34 |         });
 35 | 
 36 |         expect(result).toBeDefined();
 37 |         const content = (result as ToolResult).content;
 38 |         expect(content).toHaveLength(1);
 39 |         expect(content[0].type).toBe('text');
 40 | 
 41 |         const text = content[0].text || '';
 42 |         expect(text).toContain('Batch crawl completed');
 43 |         expect(text).toContain('Processed 2 URLs');
 44 |         expect(text).toContain('https://httpbingo.org/html: Success');
 45 |         expect(text).toContain('https://httpbingo.org/json: Success');
 46 |       },
 47 |       TEST_TIMEOUTS.medium,
 48 |     );
 49 | 
 50 |     it(
 51 |       'should handle max_concurrent parameter',
 52 |       async () => {
 53 |         const result = await client.callTool({
 54 |           name: 'batch_crawl',
 55 |           arguments: {
 56 |             urls: ['https://httpbingo.org/html', 'https://httpbingo.org/xml', 'https://httpbingo.org/json'],
 57 |             max_concurrent: 1,
 58 |           },
 59 |         });
 60 | 
 61 |         const content = (result as ToolResult).content;
 62 |         expect(content).toHaveLength(1);
 63 |         expect(content[0].type).toBe('text');
 64 | 
 65 |         const text = content[0].text || '';
 66 |         expect(text).toContain('Processed 3 URLs');
 67 |         expect(text).toContain(': Success');
 68 |       },
 69 |       TEST_TIMEOUTS.long,
 70 |     );
 71 | 
 72 |     it(
 73 |       'should remove images when requested',
 74 |       async () => {
 75 |         const result = await client.callTool({
 76 |           name: 'batch_crawl',
 77 |           arguments: {
 78 |             urls: ['https://httpbingo.org/html'],
 79 |             remove_images: true,
 80 |           },
 81 |         });
 82 | 
 83 |         const content = (result as ToolResult).content;
 84 |         expect(content).toHaveLength(1);
 85 |         expect(content[0].type).toBe('text');
 86 | 
 87 |         const text = content[0].text || '';
 88 |         expect(text).toContain('Batch crawl completed');
 89 |         expect(text).toContain('https://httpbingo.org/html: Success');
 90 |       },
 91 |       TEST_TIMEOUTS.medium,
 92 |     );
 93 | 
 94 |     it(
 95 |       'should bypass cache when requested',
 96 |       async () => {
 97 |         const result = await client.callTool({
 98 |           name: 'batch_crawl',
 99 |           arguments: {
100 |             urls: ['https://httpbingo.org/html'],
101 |             bypass_cache: true,
102 |           },
103 |         });
104 | 
105 |         const content = (result as ToolResult).content;
106 |         expect(content).toHaveLength(1);
107 |         expect(content[0].type).toBe('text');
108 | 
109 |         const text = content[0].text || '';
110 |         expect(text).toContain('Batch crawl completed');
111 |         expect(text).toContain('https://httpbingo.org/html: Success');
112 |       },
113 |       TEST_TIMEOUTS.medium,
114 |     );
115 | 
116 |     it(
117 |       'should handle mixed content types',
118 |       async () => {
119 |         const result = await client.callTool({
120 |           name: 'batch_crawl',
121 |           arguments: {
122 |             urls: ['https://httpbin.org/html', 'https://httpbin.org/json', 'https://httpbin.org/xml'],
123 |           },
124 |         });
125 | 
126 |         const content = (result as ToolResult).content;
127 |         expect(content).toHaveLength(1);
128 |         expect(content[0].type).toBe('text');
129 | 
130 |         const text = content[0].text || '';
131 |         expect(text).toContain('Processed 3 URLs');
132 |         expect(text).toContain('https://httpbin.org/html: Success');
133 |         expect(text).toContain('https://httpbin.org/json: Success');
134 |         expect(text).toContain('https://httpbin.org/xml: Success');
135 |       },
136 |       TEST_TIMEOUTS.medium,
137 |     );
138 | 
139 |     it(
140 |       'should handle empty URL list',
141 |       async () => {
142 |         const result = await client.callTool({
143 |           name: 'batch_crawl',
144 |           arguments: {
145 |             urls: [],
146 |           },
147 |         });
148 | 
149 |         const content = (result as ToolResult).content;
150 |         expect(content).toHaveLength(1);
151 |         expect(content[0].text).toContain('Error');
152 |         // Just check that it's an error about invalid parameters
153 |         expect(content[0].text?.toLowerCase()).toMatch(/error|invalid|failed/);
154 |       },
155 |       TEST_TIMEOUTS.short,
156 |     );
157 | 
158 |     it(
159 |       'should reject session_id parameter',
160 |       async () => {
161 |         const result = await client.callTool({
162 |           name: 'batch_crawl',
163 |           arguments: {
164 |             urls: ['https://httpbingo.org/html'],
165 |             session_id: 'test-session',
166 |           },
167 |         });
168 | 
169 |         const content = (result as ToolResult).content;
170 |         expect(content).toHaveLength(1);
171 |         expect(content[0].type).toBe('text');
172 |         expect(content[0].text).toContain('session_id');
173 |         expect(content[0].text).toContain('does not support');
174 |         expect(content[0].text).toContain('stateless');
175 |       },
176 |       TEST_TIMEOUTS.short,
177 |     );
178 | 
179 |     it(
180 |       'should handle per-URL configs array',
181 |       async () => {
182 |         const result = await client.callTool({
183 |           name: 'batch_crawl',
184 |           arguments: {
185 |             urls: ['https://httpbingo.org/html', 'https://httpbingo.org/json'],
186 |             configs: [
187 |               {
188 |                 url: 'https://httpbingo.org/html',
189 |                 browser_config: { browser_type: 'chromium' },
190 |                 crawler_config: { word_count_threshold: 10 },
191 |               },
192 |               {
193 |                 url: 'https://httpbingo.org/json',
194 |                 browser_config: { browser_type: 'firefox' },
195 |                 crawler_config: { word_count_threshold: 20 },
196 |               },
197 |             ],
198 |             max_concurrent: 2,
199 |           },
200 |         });
201 | 
202 |         const content = (result as ToolResult).content;
203 |         expect(content).toHaveLength(1);
204 |         expect(content[0].type).toBe('text');
205 | 
206 |         const text = content[0].text || '';
207 |         expect(text).toContain('Batch crawl completed');
208 |         expect(text).toContain('Processed 2 URLs');
209 |         // Both should succeed regardless of different configs
210 |         expect(text).toContain('https://httpbingo.org/html: Success');
211 |         expect(text).toContain('https://httpbingo.org/json: Success');
212 |       },
213 |       TEST_TIMEOUTS.medium,
214 |     );
215 |   });
216 | });
217 | 
```

--------------------------------------------------------------------------------
/src/__tests__/integration/parse-sitemap.integration.test.ts:
--------------------------------------------------------------------------------

```typescript
  1 | /* eslint-env jest */
  2 | import { Client } from '@modelcontextprotocol/sdk/client/index.js';
  3 | import { createTestClient, cleanupTestClient, TEST_TIMEOUTS } from './test-utils.js';
  4 | 
  5 | interface ToolResult {
  6 |   content: Array<{
  7 |     type: string;
  8 |     text?: string;
  9 |   }>;
 10 | }
 11 | 
 12 | describe('parse_sitemap Integration Tests', () => {
 13 |   let client: Client;
 14 | 
 15 |   beforeAll(async () => {
 16 |     client = await createTestClient();
 17 |   }, TEST_TIMEOUTS.medium);
 18 | 
 19 |   afterAll(async () => {
 20 |     if (client) {
 21 |       await cleanupTestClient(client);
 22 |     }
 23 |   });
 24 | 
 25 |   describe('Basic functionality', () => {
 26 |     it(
 27 |       'should parse nodejs.org sitemap successfully',
 28 |       async () => {
 29 |         const result = await client.callTool({
 30 |           name: 'parse_sitemap',
 31 |           arguments: {
 32 |             url: 'https://nodejs.org/sitemap.xml',
 33 |           },
 34 |         });
 35 | 
 36 |         expect(result).toBeDefined();
 37 |         const content = (result as ToolResult).content;
 38 |         expect(content).toBeDefined();
 39 |         expect(Array.isArray(content)).toBe(true);
 40 |         expect(content.length).toBeGreaterThan(0);
 41 | 
 42 |         const textContent = content.find((c) => c.type === 'text');
 43 |         expect(textContent).toBeDefined();
 44 |         expect(textContent?.text).toContain('Sitemap parsed successfully');
 45 |         expect(textContent?.text).toContain('Total URLs found:');
 46 |         expect(textContent?.text).toContain('https://nodejs.org');
 47 | 
 48 |         // Should find many URLs in the nodejs sitemap
 49 |         expect(textContent?.text).toMatch(/Total URLs found: [1-9][0-9]+/);
 50 |       },
 51 |       TEST_TIMEOUTS.medium,
 52 |     );
 53 | 
 54 |     it(
 55 |       'should filter URLs with regex pattern',
 56 |       async () => {
 57 |         const result = await client.callTool({
 58 |           name: 'parse_sitemap',
 59 |           arguments: {
 60 |             url: 'https://nodejs.org/sitemap.xml',
 61 |             filter_pattern: '.*/learn/.*', // Only URLs containing /learn/
 62 |           },
 63 |         });
 64 | 
 65 |         expect(result).toBeDefined();
 66 |         const content = (result as ToolResult).content;
 67 |         const textContent = content.find((c) => c.type === 'text');
 68 |         expect(textContent).toBeDefined();
 69 | 
 70 |         // Check that filtering worked
 71 |         expect(textContent?.text).toContain('Filtered URLs:');
 72 | 
 73 |         // All URLs in the result should contain /learn/
 74 |         const urlsSection = textContent?.text?.split('URLs:\n')[1];
 75 |         if (urlsSection) {
 76 |           const urls = urlsSection.split('\n').filter((url) => url.trim());
 77 |           urls.forEach((url) => {
 78 |             if (url && !url.includes('... and')) {
 79 |               expect(url).toContain('/learn/');
 80 |             }
 81 |           });
 82 |         }
 83 |       },
 84 |       TEST_TIMEOUTS.medium,
 85 |     );
 86 | 
 87 |     it(
 88 |       'should handle empty sitemaps',
 89 |       async () => {
 90 |         // Using a URL that returns valid XML but not a sitemap
 91 |         const result = await client.callTool({
 92 |           name: 'parse_sitemap',
 93 |           arguments: {
 94 |             url: 'https://www.w3schools.com/xml/note.xml',
 95 |           },
 96 |         });
 97 | 
 98 |         expect(result).toBeDefined();
 99 |         const content = (result as ToolResult).content;
100 |         const textContent = content.find((c) => c.type === 'text');
101 |         expect(textContent).toBeDefined();
102 |         expect(textContent?.text).toContain('Total URLs found: 0');
103 |       },
104 |       TEST_TIMEOUTS.medium,
105 |     );
106 | 
107 |     it(
108 |       'should handle large sitemaps with truncation',
109 |       async () => {
110 |         const result = await client.callTool({
111 |           name: 'parse_sitemap',
112 |           arguments: {
113 |             url: 'https://nodejs.org/sitemap.xml',
114 |             filter_pattern: '.*', // Match all to test truncation
115 |           },
116 |         });
117 | 
118 |         expect(result).toBeDefined();
119 |         const content = (result as ToolResult).content;
120 |         const textContent = content.find((c) => c.type === 'text');
121 |         expect(textContent).toBeDefined();
122 | 
123 |         // Should show max 100 URLs and indicate there are more
124 |         if (textContent?.text && textContent.text.includes('... and')) {
125 |           expect(textContent.text).toMatch(/\.\.\. and \d+ more/);
126 |         }
127 |       },
128 |       TEST_TIMEOUTS.medium,
129 |     );
130 |   });
131 | 
132 |   describe('Error handling', () => {
133 |     it(
134 |       'should handle invalid URLs',
135 |       async () => {
136 |         const result = await client.callTool({
137 |           name: 'parse_sitemap',
138 |           arguments: {
139 |             url: 'not-a-url',
140 |           },
141 |         });
142 | 
143 |         expect(result).toBeDefined();
144 |         const content = (result as ToolResult).content;
145 |         expect(content).toBeDefined();
146 |         const textContent = content.find((c) => c.type === 'text');
147 |         expect(textContent).toBeDefined();
148 |         expect(textContent?.text).toContain('Error');
149 |         expect(textContent?.text?.toLowerCase()).toContain('invalid');
150 |       },
151 |       TEST_TIMEOUTS.short,
152 |     );
153 | 
154 |     it(
155 |       'should handle non-existent URLs',
156 |       async () => {
157 |         const result = await client.callTool({
158 |           name: 'parse_sitemap',
159 |           arguments: {
160 |             url: 'https://this-domain-definitely-does-not-exist-12345.com/sitemap.xml',
161 |           },
162 |         });
163 | 
164 |         expect(result).toBeDefined();
165 |         const content = (result as ToolResult).content;
166 |         const textContent = content.find((c) => c.type === 'text');
167 |         expect(textContent).toBeDefined();
168 |         expect(textContent?.text).toContain('Error');
169 |       },
170 |       TEST_TIMEOUTS.medium,
171 |     );
172 | 
173 |     it(
174 |       'should handle non-XML content',
175 |       async () => {
176 |         const result = await client.callTool({
177 |           name: 'parse_sitemap',
178 |           arguments: {
179 |             url: 'https://example.com', // HTML page, not XML
180 |           },
181 |         });
182 | 
183 |         expect(result).toBeDefined();
184 |         const content = (result as ToolResult).content;
185 |         const textContent = content.find((c) => c.type === 'text');
186 |         expect(textContent).toBeDefined();
187 |         // Should still parse but likely find 0 URLs since it's not a sitemap
188 |         expect(textContent?.text).toContain('Total URLs found:');
189 |       },
190 |       TEST_TIMEOUTS.medium,
191 |     );
192 | 
193 |     it(
194 |       'should handle invalid regex patterns',
195 |       async () => {
196 |         const result = await client.callTool({
197 |           name: 'parse_sitemap',
198 |           arguments: {
199 |             url: 'https://nodejs.org/sitemap.xml',
200 |             filter_pattern: '[invalid(regex', // Invalid regex
201 |           },
202 |         });
203 | 
204 |         expect(result).toBeDefined();
205 |         const content = (result as ToolResult).content;
206 |         const textContent = content.find((c) => c.type === 'text');
207 |         expect(textContent).toBeDefined();
208 |         expect(textContent?.text).toContain('Error');
209 |         expect(textContent?.text?.toLowerCase()).toMatch(/failed|error|invalid/);
210 |       },
211 |       TEST_TIMEOUTS.medium,
212 |     );
213 |   });
214 | });
215 | 
```

--------------------------------------------------------------------------------
/src/__tests__/integration/crawl-handlers.integration.test.ts:
--------------------------------------------------------------------------------

```typescript
  1 | /* eslint-env jest */
  2 | import { Client } from '@modelcontextprotocol/sdk/client/index.js';
  3 | import { createTestClient, cleanupTestClient, TEST_TIMEOUTS } from './test-utils.js';
  4 | 
  5 | interface ToolResult {
  6 |   content: Array<{
  7 |     type: string;
  8 |     text?: string;
  9 |   }>;
 10 | }
 11 | 
 12 | describe('Crawl Handlers Integration Tests', () => {
 13 |   let client: Client;
 14 | 
 15 |   beforeAll(async () => {
 16 |     client = await createTestClient();
 17 |   }, TEST_TIMEOUTS.medium);
 18 | 
 19 |   afterAll(async () => {
 20 |     if (client) {
 21 |       await cleanupTestClient(client);
 22 |     }
 23 |   });
 24 | 
 25 |   describe('batch_crawl error handling', () => {
 26 |     it(
 27 |       'should handle batch crawl with invalid URLs',
 28 |       async () => {
 29 |         const result = await client.callTool({
 30 |           name: 'batch_crawl',
 31 |           arguments: {
 32 |             urls: ['not-a-valid-url', 'https://this-domain-does-not-exist-12345.com'],
 33 |             max_concurrent: 2,
 34 |           },
 35 |         });
 36 | 
 37 |         expect(result).toBeDefined();
 38 |         const content = (result as ToolResult).content;
 39 |         expect(content[0].type).toBe('text');
 40 |         // Zod validation will catch the invalid URL format
 41 |         expect(content[0].text).toContain('Invalid parameters');
 42 |       },
 43 |       TEST_TIMEOUTS.medium,
 44 |     );
 45 |   });
 46 | 
 47 |   describe('smart_crawl edge cases', () => {
 48 |     it(
 49 |       'should detect XML content type for XML URLs',
 50 |       async () => {
 51 |         const result = await client.callTool({
 52 |           name: 'smart_crawl',
 53 |           arguments: {
 54 |             url: 'https://httpbin.org/xml',
 55 |             bypass_cache: true,
 56 |           },
 57 |         });
 58 | 
 59 |         expect(result).toBeDefined();
 60 |         const content = (result as ToolResult).content;
 61 |         expect(content[0].text).toContain('Smart crawl detected content type:');
 62 |         // Should detect as XML based on content-type header
 63 |         expect(content[0].text?.toLowerCase()).toMatch(/xml|json/); // httpbin.org/xml actually returns JSON
 64 |       },
 65 |       TEST_TIMEOUTS.medium,
 66 |     );
 67 | 
 68 |     it(
 69 |       'should handle follow_links with sitemap URLs',
 70 |       async () => {
 71 |         // Note: Most sites don't have accessible sitemaps, so this tests the logic
 72 |         const result = await client.callTool({
 73 |           name: 'smart_crawl',
 74 |           arguments: {
 75 |             url: 'https://example.com/sitemap.xml',
 76 |             follow_links: true,
 77 |             max_depth: 2,
 78 |             bypass_cache: true,
 79 |           },
 80 |         });
 81 | 
 82 |         expect(result).toBeDefined();
 83 |         const content = (result as ToolResult).content;
 84 |         expect(content[0].text).toContain('Smart crawl detected content type:');
 85 |       },
 86 |       TEST_TIMEOUTS.long, // Increase timeout for sitemap processing
 87 |     );
 88 |   });
 89 | 
 90 |   describe('crawl_recursive edge cases', () => {
 91 |     it(
 92 |       'should respect max_depth limit of 0',
 93 |       async () => {
 94 |         const result = await client.callTool({
 95 |           name: 'crawl_recursive',
 96 |           arguments: {
 97 |             url: 'https://httpbin.org/links/5/0',
 98 |             max_depth: 0, // Should only crawl the initial page
 99 |           },
100 |         });
101 | 
102 |         expect(result).toBeDefined();
103 |         const content = (result as ToolResult).content;
104 |         // The test might show 0 pages if the URL fails, or 1 page if it succeeds
105 |         expect(content[0].text).toMatch(/Pages crawled: [01]/);
106 |         // If pages were crawled, check for max depth message
107 |         if (content[0].text?.includes('Pages crawled: 1')) {
108 |           expect(content[0].text).toContain('Max depth reached: 0');
109 |         }
110 |       },
111 |       TEST_TIMEOUTS.medium,
112 |     );
113 | 
114 |     it(
115 |       'should handle sites with no internal links',
116 |       async () => {
117 |         const result = await client.callTool({
118 |           name: 'crawl_recursive',
119 |           arguments: {
120 |             url: 'https://httpbin.org/json', // JSON endpoint has no links
121 |             max_depth: 2,
122 |           },
123 |         });
124 | 
125 |         expect(result).toBeDefined();
126 |         const content = (result as ToolResult).content;
127 |         expect(content[0].text).toContain('Pages crawled: 1');
128 |         expect(content[0].text).toContain('Internal links found: 0');
129 |       },
130 |       TEST_TIMEOUTS.medium,
131 |     );
132 |   });
133 | 
134 |   describe('parse_sitemap error handling', () => {
135 |     it(
136 |       'should handle non-existent sitemap URLs',
137 |       async () => {
138 |         const result = await client.callTool({
139 |           name: 'parse_sitemap',
140 |           arguments: {
141 |             url: 'https://this-domain-does-not-exist-12345.com/sitemap.xml',
142 |           },
143 |         });
144 | 
145 |         expect(result).toBeDefined();
146 |         const content = (result as ToolResult).content;
147 |         expect(content[0].text).toContain('Error');
148 |         expect(content[0].text?.toLowerCase()).toMatch(/failed|error|not found/);
149 |       },
150 |       TEST_TIMEOUTS.medium,
151 |     );
152 |   });
153 | 
154 |   describe('crawl method edge cases', () => {
155 |     it(
156 |       'should handle crawl with all image and filtering parameters',
157 |       async () => {
158 |         const result = await client.callTool({
159 |           name: 'crawl',
160 |           arguments: {
161 |             url: 'https://example.com',
162 |             word_count_threshold: 50,
163 |             image_description_min_word_threshold: 10,
164 |             image_score_threshold: 0.5,
165 |             exclude_social_media_links: true,
166 |             cache_mode: 'BYPASS',
167 |           },
168 |         });
169 | 
170 |         expect(result).toBeDefined();
171 |         const content = (result as ToolResult).content;
172 |         expect(content[0].type).toBe('text');
173 |         // Should successfully crawl with these parameters
174 |         expect(content[0].text).not.toContain('Error');
175 |       },
176 |       TEST_TIMEOUTS.medium,
177 |     );
178 | 
179 |     it(
180 |       'should handle js_code as null with validation error',
181 |       async () => {
182 |         const result = await client.callTool({
183 |           name: 'crawl',
184 |           arguments: {
185 |             url: 'https://example.com',
186 |             js_code: null as unknown as string, // Intentionally pass null
187 |           },
188 |         });
189 | 
190 |         expect(result).toBeDefined();
191 |         const content = (result as ToolResult).content;
192 |         expect(content[0].text).toContain('Invalid parameters for crawl');
193 |         expect(content[0].text).toContain('js_code');
194 |       },
195 |       TEST_TIMEOUTS.short,
196 |     );
197 | 
198 |     it(
199 |       'should work with session_id parameter using manage_session',
200 |       async () => {
201 |         // First create a session using manage_session
202 |         const sessionResult = await client.callTool({
203 |           name: 'manage_session',
204 |           arguments: {
205 |             action: 'create',
206 |             session_id: 'test-crawl-session-new',
207 |           },
208 |         });
209 | 
210 |         expect(sessionResult).toBeDefined();
211 | 
212 |         // Then use it for crawling
213 |         const crawlResult = await client.callTool({
214 |           name: 'crawl',
215 |           arguments: {
216 |             url: 'https://example.com',
217 |             session_id: 'test-crawl-session-new',
218 |           },
219 |         });
220 | 
221 |         expect(crawlResult).toBeDefined();
222 |         const content = (crawlResult as ToolResult).content;
223 |         expect(content[0].type).toBe('text');
224 |         expect(content[0].text).not.toContain('Error');
225 | 
226 |         // Clean up using manage_session
227 |         await client.callTool({
228 |           name: 'manage_session',
229 |           arguments: {
230 |             action: 'clear',
231 |             session_id: 'test-crawl-session-new',
232 |           },
233 |         });
234 |       },
235 |       TEST_TIMEOUTS.medium,
236 |     );
237 |   });
238 | });
239 | 
```

--------------------------------------------------------------------------------
/src/__tests__/integration/crawl-advanced.integration.test.ts:
--------------------------------------------------------------------------------

```typescript
  1 | /* eslint-env jest */
  2 | import { Client } from '@modelcontextprotocol/sdk/client/index.js';
  3 | import { createTestClient, cleanupTestClient, expectSuccessfulCrawl, TEST_TIMEOUTS } from './test-utils.js';
  4 | 
  5 | interface ToolResult {
  6 |   content: Array<{
  7 |     type: string;
  8 |     text?: string;
  9 |     data?: string;
 10 |     mimeType?: string;
 11 |   }>;
 12 | }
 13 | 
 14 | describe('crawl Advanced Features Integration Tests', () => {
 15 |   let client: Client;
 16 | 
 17 |   beforeAll(async () => {
 18 |     client = await createTestClient();
 19 |   }, TEST_TIMEOUTS.medium);
 20 | 
 21 |   afterAll(async () => {
 22 |     if (client) {
 23 |       await cleanupTestClient(client);
 24 |     }
 25 |   });
 26 | 
 27 |   describe('Media and Content Extraction', () => {
 28 |     it(
 29 |       'should extract images with scoring',
 30 |       async () => {
 31 |         const result = await client.callTool({
 32 |           name: 'crawl',
 33 |           arguments: {
 34 |             url: 'https://example.com',
 35 |             image_score_threshold: 3,
 36 |             exclude_external_images: false,
 37 |             cache_mode: 'BYPASS',
 38 |           },
 39 |         });
 40 | 
 41 |         await expectSuccessfulCrawl(result);
 42 |         const textContent = (result as ToolResult).content.find((c) => c.type === 'text');
 43 |         expect(textContent?.text).toBeTruthy();
 44 |         // Should have extracted content
 45 |         expect(textContent?.text).toContain('Example Domain');
 46 |       },
 47 |       TEST_TIMEOUTS.medium,
 48 |     );
 49 | 
 50 |     it(
 51 |       'should capture MHTML',
 52 |       async () => {
 53 |         const result = await client.callTool({
 54 |           name: 'crawl',
 55 |           arguments: {
 56 |             url: 'https://example.com',
 57 |             capture_mhtml: true,
 58 |             cache_mode: 'BYPASS',
 59 |           },
 60 |         });
 61 | 
 62 |         await expectSuccessfulCrawl(result);
 63 |         const textContent = (result as ToolResult).content.find((c) => c.type === 'text');
 64 |         expect(textContent?.text).toBeTruthy();
 65 |         // MHTML should be captured but not in the text output
 66 |         expect(textContent?.text).toContain('Example Domain');
 67 |       },
 68 |       TEST_TIMEOUTS.long,
 69 |     );
 70 | 
 71 |     it(
 72 |       'should extract tables from Wikipedia',
 73 |       async () => {
 74 |         const result = await client.callTool({
 75 |           name: 'crawl',
 76 |           arguments: {
 77 |             url: 'https://en.wikipedia.org/wiki/List_of_countries_by_population_(United_Nations)',
 78 |             word_count_threshold: 10,
 79 |             cache_mode: 'BYPASS',
 80 |           },
 81 |         });
 82 | 
 83 |         await expectSuccessfulCrawl(result);
 84 |         const textContent = (result as ToolResult).content.find((c) => c.type === 'text');
 85 |         expect(textContent?.text).toBeTruthy();
 86 |         // Should contain country data
 87 |         expect(textContent?.text).toMatch(/China|India|United States/);
 88 |       },
 89 |       TEST_TIMEOUTS.long,
 90 |     );
 91 |   });
 92 | 
 93 |   describe('Link and Content Filtering', () => {
 94 |     it(
 95 |       'should exclude social media links',
 96 |       async () => {
 97 |         const result = await client.callTool({
 98 |           name: 'crawl',
 99 |           arguments: {
100 |             url: 'https://www.bbc.com/news',
101 |             exclude_social_media_links: true,
102 |             exclude_domains: ['twitter.com', 'facebook.com', 'instagram.com'],
103 |             cache_mode: 'BYPASS',
104 |             word_count_threshold: 50,
105 |           },
106 |         });
107 | 
108 |         await expectSuccessfulCrawl(result);
109 |         const textContent = (result as ToolResult).content.find((c) => c.type === 'text');
110 |         expect(textContent?.text).toBeTruthy();
111 |         // Should have news content but no social media references in extracted links
112 |         expect(textContent?.text).toContain('BBC');
113 |       },
114 |       TEST_TIMEOUTS.long,
115 |     );
116 | 
117 |     it(
118 |       'should remove excluded selectors',
119 |       async () => {
120 |         const result = await client.callTool({
121 |           name: 'crawl',
122 |           arguments: {
123 |             url: 'https://httpbin.org/html',
124 |             excluded_selector: 'div:first-child',
125 |             cache_mode: 'BYPASS',
126 |           },
127 |         });
128 | 
129 |         await expectSuccessfulCrawl(result);
130 |       },
131 |       TEST_TIMEOUTS.medium,
132 |     );
133 |   });
134 | 
135 |   describe('Page Navigation Options', () => {
136 |     it(
137 |       'should wait for images to load',
138 |       async () => {
139 |         const result = await client.callTool({
140 |           name: 'crawl',
141 |           arguments: {
142 |             url: 'https://httpbin.org/image/png',
143 |             wait_for_images: true,
144 |             wait_until: 'load',
145 |             page_timeout: 30000,
146 |             cache_mode: 'BYPASS',
147 |           },
148 |         });
149 | 
150 |         await expectSuccessfulCrawl(result);
151 |       },
152 |       TEST_TIMEOUTS.medium,
153 |     );
154 | 
155 |     it(
156 |       'should scan full page',
157 |       async () => {
158 |         const result = await client.callTool({
159 |           name: 'crawl',
160 |           arguments: {
161 |             url: 'https://httpbin.org/html',
162 |             scan_full_page: true,
163 |             delay_before_scroll: 0.5,
164 |             scroll_delay: 0.2,
165 |             cache_mode: 'BYPASS',
166 |           },
167 |         });
168 | 
169 |         await expectSuccessfulCrawl(result);
170 |       },
171 |       TEST_TIMEOUTS.medium,
172 |     );
173 |   });
174 | 
175 |   describe('Stealth and Bot Detection', () => {
176 |     it(
177 |       'should use magic mode',
178 |       async () => {
179 |         const result = await client.callTool({
180 |           name: 'crawl',
181 |           arguments: {
182 |             url: 'https://httpbin.org/headers',
183 |             magic: true,
184 |             simulate_user: true,
185 |             override_navigator: true,
186 |             cache_mode: 'BYPASS',
187 |           },
188 |         });
189 | 
190 |         await expectSuccessfulCrawl(result);
191 |       },
192 |       TEST_TIMEOUTS.long,
193 |     );
194 |   });
195 | 
196 |   describe('Extraction Strategies (0.7.3/0.7.4)', () => {
197 |     it(
198 |       'should accept extraction_strategy parameter',
199 |       async () => {
200 |         const result = await client.callTool({
201 |           name: 'crawl',
202 |           arguments: {
203 |             url: 'https://httpbin.org/html',
204 |             extraction_strategy: {
205 |               type: 'custom',
206 |               provider: 'openai',
207 |               api_key: 'test-key',
208 |               model: 'gpt-4',
209 |             },
210 |             cache_mode: 'BYPASS',
211 |           },
212 |         });
213 | 
214 |         // The parameter should be accepted even if not fully processed
215 |         await expectSuccessfulCrawl(result);
216 |       },
217 |       TEST_TIMEOUTS.short,
218 |     );
219 | 
220 |     it(
221 |       'should accept table_extraction_strategy parameter',
222 |       async () => {
223 |         const result = await client.callTool({
224 |           name: 'crawl',
225 |           arguments: {
226 |             url: 'https://httpbin.org/html',
227 |             table_extraction_strategy: {
228 |               enable_chunking: true,
229 |               thresholds: {
230 |                 min_rows: 5,
231 |                 max_columns: 20,
232 |               },
233 |             },
234 |             cache_mode: 'BYPASS',
235 |           },
236 |         });
237 | 
238 |         await expectSuccessfulCrawl(result);
239 |       },
240 |       TEST_TIMEOUTS.short,
241 |     );
242 | 
243 |     it(
244 |       'should accept markdown_generator_options parameter',
245 |       async () => {
246 |         const result = await client.callTool({
247 |           name: 'crawl',
248 |           arguments: {
249 |             url: 'https://httpbin.org/html',
250 |             markdown_generator_options: {
251 |               include_links: true,
252 |               preserve_formatting: true,
253 |             },
254 |             cache_mode: 'BYPASS',
255 |           },
256 |         });
257 | 
258 |         await expectSuccessfulCrawl(result);
259 |       },
260 |       TEST_TIMEOUTS.short,
261 |     );
262 |   });
263 | 
264 |   describe('Virtual Scroll', () => {
265 |     it(
266 |       'should handle virtual scroll configuration',
267 |       async () => {
268 |         const result = await client.callTool({
269 |           name: 'crawl',
270 |           arguments: {
271 |             url: 'https://httpbin.org/html',
272 |             virtual_scroll_config: {
273 |               container_selector: 'body',
274 |               scroll_count: 3,
275 |               scroll_by: 'container_height',
276 |               wait_after_scroll: 0.5,
277 |             },
278 |             cache_mode: 'BYPASS',
279 |           },
280 |         });
281 | 
282 |         await expectSuccessfulCrawl(result);
283 |       },
284 |       TEST_TIMEOUTS.medium,
285 |     );
286 |   });
287 | });
288 | 
```