# Directory Structure
```
├── .github
│ └── workflows
│ └── npm-publish.yml
├── .gitignore
├── .npmignore
├── docker-compose.yml
├── Dockerfile
├── package-lock.json
├── package.json
├── README.md
├── smithery.yaml
├── src
│ ├── index.ts
│ └── test-client.ts
├── swagger.json
└── tsconfig.json
```
# Files
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
```
# Dependency directories
node_modules/
# Build outputs
dist/
build/
*.tsbuildinfo
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
lerna-debug.log*
# Environment variables
.env
.env.local
.env.development
.env.test
.env.production
# IDE specific files
.idea/
.vscode/
*.swp
*.swo
.DS_Store
.history/
# Debug logs
.npm
.eslintcache
# Test coverage
coverage/
# Temporary files
tmp/
temp/
# Docker volumes from testing
.docker-volumes/
# Mac files
.DS_Store
# Windows files
Thumbs.db
ehthumbs.db
Desktop.ini
$RECYCLE.BIN/
```
--------------------------------------------------------------------------------
/.npmignore:
--------------------------------------------------------------------------------
```
# Source files (we only want to publish the dist folder with compiled code)
src/
# TypeScript config files
tsconfig.json
tsconfig.*.json
# Test files
test/
*.test.ts
*.spec.ts
__tests__/
__mocks__/
# Git files
.git/
.github/
.gitignore
.gitattributes
# CI/CD
.travis.yml
.circleci/
.gitlab-ci.yml
.github/
.azure-pipelines.yml
# IDE specific files
.idea/
.vscode/
*.swp
*.swo
.DS_Store
# Docker files
docker-compose.yml
Dockerfile
.docker-volumes/
# Development tools and configs
.prettierrc
.eslintrc
.eslintignore
.editorconfig
jest.config.js
rollup.config.js
webpack.config.js
babel.config.js
.babelrc
.nvmrc
# Documentation that's not necessary for npm
CONTRIBUTING.md
CHANGELOG.md
swagger.json
docs/
# Examples not needed for npm
examples/
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
lerna-debug.log*
# Misc
.DS_Store
*.env
coverage/
tmp/
.nyc_output/
# Don't ignore these files
!dist/
!LICENSE
!README.md
!package.json
```
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
```markdown
# WebSearch-MCP
[](https://smithery.ai/server/@mnhlt/WebSearch-MCP)
A Model Context Protocol (MCP) server implementation that provides a web search capability over stdio transport. This server integrates with a WebSearch Crawler API to retrieve search results.
## Table of Contents
- [About](#about)
- [Installation](#installation)
- [Configuration](#configuration)
- [Setup & Integration](#setup--integration)
- [Setting Up the Crawler Service](#setting-up-the-crawler-service)
- [Prerequisites](#prerequisites)
- [Starting the Crawler Service](#starting-the-crawler-service)
- [Testing the Crawler API](#testing-the-crawler-api)
- [Custom Configuration](#custom-configuration)
- [Integrating with MCP Clients](#integrating-with-mcp-clients)
- [Quick Reference: MCP Configuration](#quick-reference-mcp-configuration)
- [Claude Desktop](#claude-desktop)
- [Cursor IDE](#cursor-ide)
- [Cline](#cline-command-line-interface-for-claude)
- [Usage](#usage)
- [Parameters](#parameters)
- [Example Search Response](#example-search-response)
- [Testing Locally](#testing-locally)
- [As a Library](#as-a-library)
- [Troubleshooting](#troubleshooting)
- [Crawler Service Issues](#crawler-service-issues)
- [MCP Server Issues](#mcp-server-issues)
- [Development](#development)
- [Project Structure](#project-structure)
- [Publishing to npm](#publishing-to-npm)
- [Contributing](#contributing)
- [License](#license)
## About
WebSearch-MCP is a Model Context Protocol server that provides web search capabilities to AI assistants that support MCP. It allows AI models like Claude to search the web in real-time, retrieving up-to-date information about any topic.
The server integrates with a Crawler API service that handles the actual web searches, and communicates with AI assistants using the standardized Model Context Protocol.
## Installation
### Installing via Smithery
To install WebSearch for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@mnhlt/WebSearch-MCP):
```bash
npx -y @smithery/cli install @mnhlt/WebSearch-MCP --client claude
```
### Manual Installation
```bash
npm install -g websearch-mcp
```
Or use without installing:
```bash
npx websearch-mcp
```
## Configuration
The WebSearch MCP server can be configured using environment variables:
- `API_URL`: The URL of the WebSearch Crawler API (default: `http://localhost:3001`)
- `MAX_SEARCH_RESULT`: Maximum number of search results to return when not specified in the request (default: `5`)
Examples:
```bash
# Configure API URL
API_URL=https://crawler.example.com npx websearch-mcp
# Configure maximum search results
MAX_SEARCH_RESULT=10 npx websearch-mcp
# Configure both
API_URL=https://crawler.example.com MAX_SEARCH_RESULT=10 npx websearch-mcp
```
## Setup & Integration
Setting up WebSearch-MCP involves two main parts: configuring the crawler service that performs the actual web searches, and integrating the MCP server with your AI client applications.
### Setting Up the Crawler Service
The WebSearch MCP server requires a crawler service to perform the actual web searches. You can easily set up the crawler service using Docker Compose.
### Prerequisites
- [Docker](https://www.docker.com/get-started) and [Docker Compose](https://docs.docker.com/compose/install/)
### Starting the Crawler Service
1. Create a file named `docker-compose.yml` with the following content:
```yaml
version: '3.8'
services:
crawler:
image: laituanmanh/websearch-crawler:latest
container_name: websearch-api
restart: unless-stopped
ports:
- "3001:3001"
environment:
- NODE_ENV=production
- PORT=3001
- LOG_LEVEL=info
- FLARESOLVERR_URL=http://flaresolverr:8191/v1
depends_on:
- flaresolverr
volumes:
- crawler_storage:/app/storage
flaresolverr:
image: 21hsmw/flaresolverr:nodriver
container_name: flaresolverr
restart: unless-stopped
environment:
- LOG_LEVEL=info
- TZ=UTC
volumes:
crawler_storage:
```
workaround for Mac Apple Silicon
```
version: '3.8'
services:
crawler:
image: laituanmanh/websearch-crawler:latest
container_name: websearch-api
platform: "linux/amd64"
restart: unless-stopped
ports:
- "3001:3001"
environment:
- NODE_ENV=production
- PORT=3001
- LOG_LEVEL=info
- FLARESOLVERR_URL=http://flaresolverr:8191/v1
depends_on:
- flaresolverr
volumes:
- crawler_storage:/app/storage
flaresolverr:
image: 21hsmw/flaresolverr:nodriver
platform: "linux/arm64"
container_name: flaresolverr
restart: unless-stopped
environment:
- LOG_LEVEL=info
- TZ=UTC
volumes:
crawler_storage:
```
2. Start the services:
```bash
docker-compose up -d
```
3. Verify that the services are running:
```bash
docker-compose ps
```
4. Test the crawler API health endpoint:
```bash
curl http://localhost:3001/health
```
Expected response:
```json
{
"status": "ok",
"details": {
"status": "ok",
"flaresolverr": true,
"google": true,
"message": null
}
}
```
The crawler API will be available at `http://localhost:3001`.
### Testing the Crawler API
You can test the crawler API directly using curl:
```bash
curl -X POST http://localhost:3001/crawl \
-H "Content-Type: application/json" \
-d '{
"query": "typescript best practices",
"numResults": 2,
"language": "en",
"filters": {
"excludeDomains": ["youtube.com"],
"resultType": "all"
}
}'
```
### Custom Configuration
You can customize the crawler service by modifying the environment variables in the `docker-compose.yml` file:
- `PORT`: The port on which the crawler API listens (default: 3001)
- `LOG_LEVEL`: Logging level (options: debug, info, warn, error)
- `FLARESOLVERR_URL`: URL of the FlareSolverr service (for bypassing Cloudflare protection)
## Integrating with MCP Clients
### Quick Reference: MCP Configuration
Here's a quick reference for MCP configuration across different clients:
```json
{
"mcpServers": {
"websearch": {
"command": "npx",
"args": [
"websearch-mcp"
],
"environment": {
"API_URL": "http://localhost:3001",
"MAX_SEARCH_RESULT": "5" // reduce to save your tokens, increase for wider information gain
}
}
}
}
```
Workaround for Windows, due to [Issue](https://github.com/smithery-ai/mcp-obsidian/issues/19)
```
{
"mcpServers": {
"websearch": {
"command": "cmd",
"args": [
"/c",
"npx",
"websearch-mcp"
],
"environment": {
"API_URL": "http://localhost:3001",
"MAX_SEARCH_RESULT": "1"
}
}
}
}
```
## Usage
This package implements an MCP server using stdio transport that exposes a `web_search` tool with the following parameters:
### Parameters
- `query` (required): The search query to look up
- `numResults` (optional): Number of results to return (default: 5)
- `language` (optional): Language code for search results (e.g., 'en')
- `region` (optional): Region code for search results (e.g., 'us')
- `excludeDomains` (optional): Domains to exclude from results
- `includeDomains` (optional): Only include these domains in results
- `excludeTerms` (optional): Terms to exclude from results
- `resultType` (optional): Type of results to return ('all', 'news', or 'blogs')
### Example Search Response
Here's an example of a search response:
```json
{
"query": "machine learning trends",
"results": [
{
"title": "Top Machine Learning Trends in 2025",
"snippet": "The key machine learning trends for 2025 include multimodal AI, generative models, and quantum machine learning applications in enterprise...",
"url": "https://example.com/machine-learning-trends-2025",
"siteName": "AI Research Today",
"byline": "Dr. Jane Smith"
},
{
"title": "The Evolution of Machine Learning: 2020-2025",
"snippet": "Over the past five years, machine learning has evolved from primarily supervised learning approaches to more sophisticated self-supervised and reinforcement learning paradigms...",
"url": "https://example.com/ml-evolution",
"siteName": "Tech Insights",
"byline": "John Doe"
}
]
}
```
### Testing Locally
To test the WebSearch MCP server locally, you can use the included test client:
```bash
npm run test-client
```
This will start the MCP server and a simple command-line interface that allows you to enter search queries and see the results.
You can also configure the API_URL for the test client:
```bash
API_URL=https://crawler.example.com npm run test-client
```
### As a Library
You can use this package programmatically:
```typescript
import { createMCPClient } from '@modelcontextprotocol/sdk';
// Create an MCP client
const client = createMCPClient({
transport: { type: 'subprocess', command: 'npx websearch-mcp' }
});
// Execute a web search
const response = await client.request({
method: 'call_tool',
params: {
name: 'web_search',
arguments: {
query: 'your search query',
numResults: 5,
language: 'en'
}
}
});
console.log(response.result);
```
## Troubleshooting
### Crawler Service Issues
- **API Unreachable**: Ensure that the crawler service is running and accessible at the configured API_URL.
- **Search Results Not Available**: Check the logs of the crawler service to see if there are any errors:
```bash
docker-compose logs crawler
```
- **FlareSolverr Issues**: Some websites use Cloudflare protection. If you see errors related to this, check if FlareSolverr is working:
```bash
docker-compose logs flaresolverr
```
### MCP Server Issues
- **Import Errors**: Ensure you have the latest version of the MCP SDK:
```bash
npm install -g @modelcontextprotocol/sdk@latest
```
- **Connection Issues**: Make sure the stdio transport is properly configured for your client.
## Development
To work on this project:
1. Clone the repository
2. Install dependencies: `npm install`
3. Build the project: `npm run build`
4. Run in development mode: `npm run dev`
The server expects a WebSearch Crawler API as defined in the included swagger.json file. Make sure the API is running at the configured API_URL.
### Project Structure
- `.gitignore`: Specifies files that Git should ignore (node_modules, dist, logs, etc.)
- `.npmignore`: Specifies files that shouldn't be included when publishing to npm
- `package.json`: Project metadata and dependencies
- `src/`: Source TypeScript files
- `dist/`: Compiled JavaScript files (generated when building)
### Publishing to npm
To publish this package to npm:
1. Make sure you have an npm account and are logged in (`npm login`)
2. Update the version in package.json (`npm version patch|minor|major`)
3. Run `npm publish`
The `.npmignore` file ensures that only the necessary files are included in the published package:
- The compiled code in `dist/`
- README.md and LICENSE files
- package.json
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
ISC
```
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
```dockerfile
# Generated by https://smithery.ai. See: https://smithery.ai/docs/config#dockerfile
FROM node:lts-alpine
# Create app directory
WORKDIR /app
# Copy package files
COPY package*.json ./
# Install dependencies without running scripts
RUN npm install --ignore-scripts
# Copy the rest of the application
COPY . ./
# Build the project
RUN npm run build
# Expose port if necessary (not strictly required for stdio MCP server)
# Command to run the server
CMD ["node", "dist/index.js"]
```
--------------------------------------------------------------------------------
/docker-compose.yml:
--------------------------------------------------------------------------------
```yaml
version: '3.8'
services:
crawler:
image: laituanmanh/websearch-crawler:latest
container_name: websearch-api
restart: unless-stopped
ports:
- "3001:3001"
environment:
- NODE_ENV=production
- PORT=3001
- LOG_LEVEL=info
- FLARESOLVERR_URL=http://flaresolverr:8191/v1
depends_on:
- flaresolverr
volumes:
- crawler_storage:/app/storage
flaresolverr:
image: 21hsmw/flaresolverr:nodriver
container_name: flaresolverr
restart: unless-stopped
environment:
- LOG_LEVEL=info
- TZ=UTC
volumes:
crawler_storage:
```
--------------------------------------------------------------------------------
/.github/workflows/npm-publish.yml:
--------------------------------------------------------------------------------
```yaml
# This workflow will run tests using node and then publish a package to GitHub Packages when a release is created
# For more information see: https://docs.github.com/en/actions/publishing-packages/publishing-nodejs-packages
name: Node.js Package
on:
release:
types: [published]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
- run: npm ci
# - run: npm test
publish-npm:
needs: build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
registry-url: https://registry.npmjs.org/
- run: npm ci
- run: npm publish
env:
NODE_AUTH_TOKEN: ${{secrets.NPM_TOKEN}}
```
--------------------------------------------------------------------------------
/smithery.yaml:
--------------------------------------------------------------------------------
```yaml
# Smithery configuration file: https://smithery.ai/docs/config#smitheryyaml
startCommand:
type: stdio
configSchema:
# JSON Schema defining the configuration options for the MCP.
type: object
required: []
properties:
apiUrl:
type: string
default: http://localhost:3001
description: The URL of the WebSearch Crawler API.
maxSearchResult:
type: number
default: 5
description: Maximum number of search results to return.
commandFunction:
# A JS function that produces the CLI command based on the given config to start the MCP on stdio.
|-
(config) => ({
command: 'node',
args: ['dist/index.js'],
env: {
API_URL: config.apiUrl,
MAX_SEARCH_RESULT: String(config.maxSearchResult)
}
})
exampleConfig:
apiUrl: http://localhost:3001
maxSearchResult: 5
```
--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------
```json
{
"name": "websearch-mcp",
"version": "1.0.2",
"description": "A Model Context Protocol (MCP) server implementation that provides real-time web search capabilities through a simple API",
"main": "dist/index.js",
"bin": {
"websearch-mcp": "./dist/index.js"
},
"scripts": {
"build": "tsc",
"start": "node dist/index.js",
"dev": "ts-node src/index.ts",
"test-client": "ts-node src/test-client.ts",
"test": "echo \"Error: no test specified\" && exit 1",
"prepare": "npm run build",
"prepublishOnly": "npm run build"
},
"author": {
"name": "Manh La",
"url": "https://github.com/mnhlt"
},
"license": "ISC",
"dependencies": {
"@modelcontextprotocol/sdk": "^1.7.0",
"axios": "^1.6.7",
"zod": "^3.24.2"
},
"devDependencies": {
"@types/node": "^20.10.5",
"ts-node": "^10.9.2",
"typescript": "^5.3.3"
},
"keywords": [
"mcp",
"websearch",
"claude",
"ai",
"llm",
"model-context-protocol",
"search",
"web-search",
"crawler",
"anthropic",
"cursor"
],
"files": [
"dist/**/*",
"README.md",
"LICENSE"
],
"repository": {
"type": "git",
"url": "https://github.com/mnhlt/WebSearch-MCP.git"
},
"homepage": "https://github.com/mnhlt/WebSearch-MCP",
"bugs": {
"url": "https://github.com/mnhlt/WebSearch-MCP/issues"
},
"engines": {
"node": ">=16.0.0"
}
}
```
--------------------------------------------------------------------------------
/src/test-client.ts:
--------------------------------------------------------------------------------
```typescript
#!/usr/bin/env node
import { spawn } from 'child_process';
import { createInterface } from 'readline';
import * as path from 'path';
interface MCPMessage {
jsonrpc: string;
id: number;
method?: string;
params?: any;
result?: any;
error?: {
code: number;
message: string;
};
}
async function main() {
// Start the MCP server as a child process
const serverProcess = spawn('ts-node', [path.join(__dirname, 'index.ts')], {
stdio: ['pipe', 'pipe', process.stderr],
env: {
...process.env,
API_URL: process.env.API_URL || 'http://localhost:3001',
MAX_SEARCH_RESULT: process.env.MAX_SEARCH_RESULT || '5'
}
});
// Set up readline for nice formatting
const readline = createInterface({
input: process.stdin,
output: process.stdout
});
let messageId = 1;
// Handle server output
serverProcess.stdout.on('data', (data) => {
try {
const message = JSON.parse(data.toString()) as MCPMessage;
console.log('\nReceived from server:');
console.log(JSON.stringify(message, null, 2));
if (message.result) {
try {
const resultContent = JSON.parse(message.result.content[0].text);
console.log('\nSearch Results:');
console.log(JSON.stringify(resultContent, null, 2));
} catch (e) {
console.log('\nResult Content:');
console.log(message.result.content[0].text);
}
}
readline.prompt();
} catch (error) {
console.error('Error parsing server response:', error);
console.log('Raw response:', data.toString());
readline.prompt();
}
});
// Initial prompt
console.log('WebSearch MCP Test Client');
console.log('------------------------');
console.log('Type a search query and press Enter to search.');
console.log('Type \'exit\' to quit.');
console.log('');
readline.setPrompt('Search> ');
readline.prompt();
// Handle user input
readline.on('line', (line) => {
const input = line.trim();
if (input.toLowerCase() === 'exit') {
console.log('Exiting...');
serverProcess.kill();
process.exit(0);
}
// Create a web_search request
const request: MCPMessage = {
jsonrpc: '2.0',
id: messageId++,
method: 'call_tool',
params: {
name: 'web_search',
arguments: {
query: input,
numResults: parseInt(process.env.MAX_SEARCH_RESULT || '3', 10)
}
}
};
console.log('\nSending request:');
console.log(JSON.stringify(request, null, 2));
// Send the request to the server
serverProcess.stdin.write(JSON.stringify(request) + '\n');
});
// Handle exit
readline.on('close', () => {
console.log('Exiting...');
serverProcess.kill();
process.exit(0);
});
// Handle server exit
serverProcess.on('exit', (code) => {
console.log(`Server exited with code ${code}`);
process.exit(code || 0);
});
}
main().catch(error => {
console.error('Error running test client:', error);
process.exit(1);
});
```
--------------------------------------------------------------------------------
/src/index.ts:
--------------------------------------------------------------------------------
```typescript
#!/usr/bin/env node
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
import axios from "axios";
// Configuration
const API_URL = process.env.API_URL || "http://localhost:3001";
const MAX_SEARCH_RESULT = parseInt(process.env.MAX_SEARCH_RESULT || "5", 10);
// Interface definitions based on swagger.json
interface CrawlRequest {
query: string;
numResults?: number;
language?: string;
region?: string;
filters?: {
excludeDomains?: string[];
includeDomains?: string[];
excludeTerms?: string[];
resultType?: "all" | "news" | "blogs";
};
}
interface CrawlResult {
url: string;
title: string;
excerpt: string;
text?: string;
html?: string;
siteName?: string;
byline?: string;
error?: string | null;
}
interface CrawlResponse {
query: string;
results: CrawlResult[];
error: string | null;
}
// Main function to set up and run the MCP server
async function main() {
// Create an MCP server
const server = new McpServer({
name: "WebSearch-MCP",
version: "1.0.0",
});
// Add a web_search tool
server.tool(
"web_search",
"Search the web for information.\n"
+ "Use this tool when you need to search the web for information.\n"
+ "You can use this tool to search for news, blogs, or all types of information.\n"
+ "You can also use this tool to search for information about a specific company or product.\n"
+ "You can also use this tool to search for information about a specific person.\n"
+ "You can also use this tool to search for information about a specific product.\n"
+ "You can also use this tool to search for information about a specific company.\n"
+ "You can also use this tool to search for information about a specific event.\n"
+ "You can also use this tool to search for information about a specific location.\n"
+ "You can also use this tool to search for information about a specific thing.\n"
+ "If you request search with 1 result number and failed, retry with bigger results number.",
{
query: z.string().describe("The search query to look up"),
numResults: z
.number()
.optional()
.describe(
`Number of results to return (default: ${MAX_SEARCH_RESULT})`
),
language: z
.string()
.optional()
.describe("Language code for search results (e.g., 'en')"),
region: z
.string()
.optional()
.describe("Region code for search results (e.g., 'us')"),
excludeDomains: z
.array(z.string())
.optional()
.describe("Domains to exclude from results"),
includeDomains: z
.array(z.string())
.optional()
.describe("Only include these domains in results"),
excludeTerms: z
.array(z.string())
.optional()
.describe("Terms to exclude from results"),
resultType: z
.enum(["all", "news", "blogs"])
.optional()
.describe("Type of results to return"),
},
async (params) => {
try {
console.error(`Performing web search for: ${params.query}`);
// Prepare request payload for crawler API
const requestPayload: CrawlRequest = {
query: params.query,
numResults: params.numResults ?? MAX_SEARCH_RESULT,
language: params.language,
region: params.region,
filters: {
excludeDomains: params.excludeDomains,
includeDomains: params.includeDomains,
excludeTerms: params.excludeTerms,
resultType: params.resultType as "all" | "news" | "blogs",
},
};
// Call the crawler API
console.error(`Sending request to ${API_URL}/crawl`);
const response = await axios.post<CrawlResponse>(
`${API_URL}/crawl`,
requestPayload
);
// Format the response for the MCP client
const results = response.data.results.map((result) => ({
title: result.title,
snippet: result.excerpt,
text: result.text,
url: result.url,
siteName: result.siteName || "",
byline: result.byline || "",
}));
return {
content: [
{
type: "text",
text: JSON.stringify(
{
query: response.data.query,
results: results,
},
null,
2
),
},
],
};
} catch (error) {
console.error("Error performing web search:", error);
if (axios.isAxiosError(error)) {
const errorMessage = error.response?.data?.error || error.message;
return {
content: [{ type: "text", text: `Error: ${errorMessage}` }],
isError: true,
};
}
return {
content: [
{
type: "text",
text: `Error: ${
error instanceof Error ? error.message : "Unknown error"
}`,
},
],
isError: true,
};
}
}
);
// Start receiving messages on stdin and sending messages on stdout
console.error("Starting WebSearch MCP server...");
console.error(`Using API_URL: ${API_URL}`);
const transport = new StdioServerTransport();
await server.connect(transport);
console.error("WebSearch MCP server started");
}
// Start the server
main().catch((error) => {
console.error("Failed to start WebSearch MCP server:", error);
process.exit(1);
});
```
--------------------------------------------------------------------------------
/swagger.json:
--------------------------------------------------------------------------------
```json
{
"openapi": "3.0.0",
"info": {
"title": "WebSearch API - Crawler Service",
"version": "1.0.0",
"description": "API documentation for the WebSearch API crawler service",
"license": {
"name": "ISC",
"url": "https://opensource.org/licenses/ISC"
},
"contact": {
"name": "WebSearch API Support",
"url": "https://github.com/yourusername/WebSearchAPI",
"email": "[email protected]"
}
},
"servers": [
{
"url": "/",
"description": "Development server"
},
{
"url": "https://crawler.example.com",
"description": "Production server"
}
],
"components": {
"schemas": {
"Error": {
"type": "object",
"properties": {
"error": {
"type": "string",
"example": "Error message"
}
}
},
"CrawlRequest": {
"type": "object",
"required": [
"query"
],
"properties": {
"query": {
"type": "string",
"example": "artificial intelligence"
},
"numResults": {
"type": "integer",
"example": 5,
"description": "Maximum number of results to return"
},
"language": {
"type": "string",
"example": "en",
"description": "Language code for search results"
},
"region": {
"type": "string",
"example": "us",
"description": "Region code for search results"
},
"filters": {
"type": "object",
"properties": {
"excludeDomains": {
"type": "array",
"items": {
"type": "string"
},
"example": [
"youtube.com",
"facebook.com"
],
"description": "Domains to exclude from results"
},
"includeDomains": {
"type": "array",
"items": {
"type": "string"
},
"example": [
"example.com",
"blog.example.com"
],
"description": "Only include these domains in results"
},
"excludeTerms": {
"type": "array",
"items": {
"type": "string"
},
"example": [
"video",
"course"
],
"description": "Terms to exclude from results"
},
"resultType": {
"type": "string",
"enum": [
"all",
"news",
"blogs"
],
"example": "all",
"description": "Type of results to return"
}
}
}
}
},
"CrawlResponse": {
"type": "object",
"properties": {
"query": {
"type": "string",
"example": "artificial intelligence"
},
"results": {
"type": "array",
"items": {
"type": "object",
"properties": {
"url": {
"type": "string",
"example": "https://example.com/article"
},
"html": {
"type": "string",
"example": "<div>Article content...</div>"
},
"text": {
"type": "string",
"example": "Artificial intelligence (AI) is intelligence—perceiving..."
},
"title": {
"type": "string",
"example": "Understanding AI"
},
"excerpt": {
"type": "string",
"example": "A brief overview of artificial intelligence..."
},
"siteName": {
"type": "string",
"example": "Example.com"
},
"byline": {
"type": "string",
"example": "John Doe"
},
"error": {
"type": "string",
"example": null
}
}
}
},
"error": {
"type": "string",
"example": null
}
}
},
"HealthCheckResponse": {
"type": "object",
"properties": {
"status": {
"type": "string",
"enum": [
"ok",
"degraded",
"down"
],
"example": "ok"
},
"details": {
"type": "object",
"properties": {
"status": {
"type": "string",
"enum": [
"ok",
"degraded",
"down"
],
"example": "ok"
},
"flaresolverr": {
"type": "boolean",
"example": true
},
"google": {
"type": "boolean",
"example": true
},
"message": {
"type": "string",
"example": null
}
}
}
}
},
"QuotaResponse": {
"type": "object",
"properties": {
"dailyQuota": {
"type": "integer",
"example": 1000
},
"usedToday": {
"type": "integer",
"example": 150
},
"remaining": {
"type": "integer",
"example": 850
},
"resetTime": {
"type": "string",
"format": "date-time",
"example": "2023-01-02T00:00:00Z"
}
}
}
}
},
"tags": [
{
"name": "Crawling",
"description": "API endpoints for web crawling operations"
},
{
"name": "Health",
"description": "Health check and monitoring endpoints"
}
],
"paths": {
"/crawl": {
"post": {
"summary": "Crawl web pages based on a search query",
"description": "Searches the web for results matching the given query and returns the content of those pages",
"tags": [
"Crawling"
],
"requestBody": {
"required": true,
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/CrawlRequest",
"properties": {
"query": {
"type": "string",
"description": "The search query to crawl for",
"required": true
},
"numResults": {
"type": "integer",
"description": "Number of results to return",
"default": 5
},
"debug": {
"type": "boolean",
"description": "When true, include HTML content in the response",
"default": true
},
"language": {
"type": "string",
"description": "Language code for search results"
},
"region": {
"type": "string",
"description": "Region code for search results"
}
}
}
}
}
},
"responses": {
"200": {
"description": "Successfully crawled web pages",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/CrawlResponse"
}
}
}
},
"400": {
"description": "Missing query parameter",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/Error"
}
}
}
},
"500": {
"description": "Server error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/Error"
}
}
}
}
}
}
},
"/health": {
"get": {
"summary": "Check crawler service health",
"description": "Returns the health status of the crawler service and its dependencies",
"tags": [
"Health"
],
"responses": {
"200": {
"description": "Health status (OK or degraded)",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HealthCheckResponse"
}
}
}
},
"503": {
"description": "Service unavailable",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HealthCheckResponse"
}
}
}
}
}
}
},
"/quota": {
"get": {
"summary": "Get crawling quota status",
"description": "Returns information about the current usage and limits of the crawling quota",
"tags": [
"Crawling"
],
"responses": {
"200": {
"description": "Quota information",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/QuotaResponse"
}
}
}
},
"500": {
"description": "Server error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/Error"
}
}
}
}
}
}
}
}
}
```
--------------------------------------------------------------------------------
/tsconfig.json:
--------------------------------------------------------------------------------
```json
{
"compilerOptions": {
/* Visit https://aka.ms/tsconfig to read more about this file */
/* Projects */
// "incremental": true, /* Save .tsbuildinfo files to allow for incremental compilation of projects. */
// "composite": true, /* Enable constraints that allow a TypeScript project to be used with project references. */
// "tsBuildInfoFile": "./.tsbuildinfo", /* Specify the path to .tsbuildinfo incremental compilation file. */
// "disableSourceOfProjectReferenceRedirect": true, /* Disable preferring source files instead of declaration files when referencing composite projects. */
// "disableSolutionSearching": true, /* Opt a project out of multi-project reference checking when editing. */
// "disableReferencedProjectLoad": true, /* Reduce the number of projects loaded automatically by TypeScript. */
/* Language and Environment */
"target": "es2016", /* Set the JavaScript language version for emitted JavaScript and include compatible library declarations. */
// "lib": [], /* Specify a set of bundled library declaration files that describe the target runtime environment. */
// "jsx": "preserve", /* Specify what JSX code is generated. */
// "libReplacement": true, /* Enable lib replacement. */
// "experimentalDecorators": true, /* Enable experimental support for legacy experimental decorators. */
// "emitDecoratorMetadata": true, /* Emit design-type metadata for decorated declarations in source files. */
// "jsxFactory": "", /* Specify the JSX factory function used when targeting React JSX emit, e.g. 'React.createElement' or 'h'. */
// "jsxFragmentFactory": "", /* Specify the JSX Fragment reference used for fragments when targeting React JSX emit e.g. 'React.Fragment' or 'Fragment'. */
// "jsxImportSource": "", /* Specify module specifier used to import the JSX factory functions when using 'jsx: react-jsx*'. */
// "reactNamespace": "", /* Specify the object invoked for 'createElement'. This only applies when targeting 'react' JSX emit. */
// "noLib": true, /* Disable including any library files, including the default lib.d.ts. */
// "useDefineForClassFields": true, /* Emit ECMAScript-standard-compliant class fields. */
// "moduleDetection": "auto", /* Control what method is used to detect module-format JS files. */
/* Modules */
"module": "commonjs", /* Specify what module code is generated. */
"outDir": "./dist",
"rootDir": "./src",
// "moduleResolution": "node10", /* Specify how TypeScript looks up a file from a given module specifier. */
// "baseUrl": "./", /* Specify the base directory to resolve non-relative module names. */
// "paths": {}, /* Specify a set of entries that re-map imports to additional lookup locations. */
// "rootDirs": [], /* Allow multiple folders to be treated as one when resolving modules. */
// "typeRoots": [], /* Specify multiple folders that act like './node_modules/@types'. */
// "types": [], /* Specify type package names to be included without being referenced in a source file. */
// "allowUmdGlobalAccess": true, /* Allow accessing UMD globals from modules. */
// "moduleSuffixes": [], /* List of file name suffixes to search when resolving a module. */
// "allowImportingTsExtensions": true, /* Allow imports to include TypeScript file extensions. Requires '--moduleResolution bundler' and either '--noEmit' or '--emitDeclarationOnly' to be set. */
// "rewriteRelativeImportExtensions": true, /* Rewrite '.ts', '.tsx', '.mts', and '.cts' file extensions in relative import paths to their JavaScript equivalent in output files. */
// "resolvePackageJsonExports": true, /* Use the package.json 'exports' field when resolving package imports. */
// "resolvePackageJsonImports": true, /* Use the package.json 'imports' field when resolving imports. */
// "customConditions": [], /* Conditions to set in addition to the resolver-specific defaults when resolving imports. */
// "noUncheckedSideEffectImports": true, /* Check side effect imports. */
// "resolveJsonModule": true, /* Enable importing .json files. */
// "allowArbitraryExtensions": true, /* Enable importing files with any extension, provided a declaration file is present. */
// "noResolve": true, /* Disallow 'import's, 'require's or '<reference>'s from expanding the number of files TypeScript should add to a project. */
/* JavaScript Support */
// "allowJs": true, /* Allow JavaScript files to be a part of your program. Use the 'checkJS' option to get errors from these files. */
// "checkJs": true, /* Enable error reporting in type-checked JavaScript files. */
// "maxNodeModuleJsDepth": 1, /* Specify the maximum folder depth used for checking JavaScript files from 'node_modules'. Only applicable with 'allowJs'. */
/* Emit */
// "declaration": true, /* Generate .d.ts files from TypeScript and JavaScript files in your project. */
// "declarationMap": true, /* Create sourcemaps for d.ts files. */
// "emitDeclarationOnly": true, /* Only output d.ts files and not JavaScript files. */
// "sourceMap": true, /* Create source map files for emitted JavaScript files. */
// "inlineSourceMap": true, /* Include sourcemap files inside the emitted JavaScript. */
// "noEmit": true, /* Disable emitting files from a compilation. */
// "outFile": "./", /* Specify a file that bundles all outputs into one JavaScript file. If 'declaration' is true, also designates a file that bundles all .d.ts output. */
// "removeComments": true, /* Disable emitting comments. */
// "importHelpers": true, /* Allow importing helper functions from tslib once per project, instead of including them per-file. */
// "downlevelIteration": true, /* Emit more compliant, but verbose and less performant JavaScript for iteration. */
// "sourceRoot": "", /* Specify the root path for debuggers to find the reference source code. */
// "mapRoot": "", /* Specify the location where debugger should locate map files instead of generated locations. */
// "inlineSources": true, /* Include source code in the sourcemaps inside the emitted JavaScript. */
// "emitBOM": true, /* Emit a UTF-8 Byte Order Mark (BOM) in the beginning of output files. */
// "newLine": "crlf", /* Set the newline character for emitting files. */
// "stripInternal": true, /* Disable emitting declarations that have '@internal' in their JSDoc comments. */
// "noEmitHelpers": true, /* Disable generating custom helper functions like '__extends' in compiled output. */
// "noEmitOnError": true, /* Disable emitting files if any type checking errors are reported. */
// "preserveConstEnums": true, /* Disable erasing 'const enum' declarations in generated code. */
// "declarationDir": "./", /* Specify the output directory for generated declaration files. */
/* Interop Constraints */
// "isolatedModules": true, /* Ensure that each file can be safely transpiled without relying on other imports. */
// "verbatimModuleSyntax": true, /* Do not transform or elide any imports or exports not marked as type-only, ensuring they are written in the output file's format based on the 'module' setting. */
// "isolatedDeclarations": true, /* Require sufficient annotation on exports so other tools can trivially generate declaration files. */
// "erasableSyntaxOnly": true, /* Do not allow runtime constructs that are not part of ECMAScript. */
// "allowSyntheticDefaultImports": true, /* Allow 'import x from y' when a module doesn't have a default export. */
"esModuleInterop": true, /* Emit additional JavaScript to ease support for importing CommonJS modules. This enables 'allowSyntheticDefaultImports' for type compatibility. */
// "preserveSymlinks": true, /* Disable resolving symlinks to their realpath. This correlates to the same flag in node. */
"forceConsistentCasingInFileNames": true, /* Ensure that casing is correct in imports. */
/* Type Checking */
"strict": true, /* Enable all strict type-checking options. */
// "noImplicitAny": true, /* Enable error reporting for expressions and declarations with an implied 'any' type. */
// "strictNullChecks": true, /* When type checking, take into account 'null' and 'undefined'. */
// "strictFunctionTypes": true, /* When assigning functions, check to ensure parameters and the return values are subtype-compatible. */
// "strictBindCallApply": true, /* Check that the arguments for 'bind', 'call', and 'apply' methods match the original function. */
// "strictPropertyInitialization": true, /* Check for class properties that are declared but not set in the constructor. */
// "strictBuiltinIteratorReturn": true, /* Built-in iterators are instantiated with a 'TReturn' type of 'undefined' instead of 'any'. */
// "noImplicitThis": true, /* Enable error reporting when 'this' is given the type 'any'. */
// "useUnknownInCatchVariables": true, /* Default catch clause variables as 'unknown' instead of 'any'. */
// "alwaysStrict": true, /* Ensure 'use strict' is always emitted. */
// "noUnusedLocals": true, /* Enable error reporting when local variables aren't read. */
// "noUnusedParameters": true, /* Raise an error when a function parameter isn't read. */
// "exactOptionalPropertyTypes": true, /* Interpret optional property types as written, rather than adding 'undefined'. */
// "noImplicitReturns": true, /* Enable error reporting for codepaths that do not explicitly return in a function. */
// "noFallthroughCasesInSwitch": true, /* Enable error reporting for fallthrough cases in switch statements. */
// "noUncheckedIndexedAccess": true, /* Add 'undefined' to a type when accessed using an index. */
// "noImplicitOverride": true, /* Ensure overriding members in derived classes are marked with an override modifier. */
// "noPropertyAccessFromIndexSignature": true, /* Enforces using indexed accessors for keys declared using an indexed type. */
// "allowUnusedLabels": true, /* Disable error reporting for unused labels. */
// "allowUnreachableCode": true, /* Disable error reporting for unreachable code. */
/* Completeness */
// "skipDefaultLibCheck": true, /* Skip type checking .d.ts files that are included with TypeScript. */
"skipLibCheck": true /* Skip type checking all .d.ts files. */
},
"include": ["src/**/*"],
"exclude": ["node_modules", "dist"]
}
```