thedaviddias/mcp-llms-txt-explorer # codebase.md

# Directory Structure

```
├── .github
│   ├── actions
│   │   └── install
│   │       └── action.yml
│   ├── CODE_OF_CONDUCT.md
│   ├── CODEOWNERS
│   ├── CONTRIBUTING.md
│   ├── FUNDING.yml
│   ├── ISSUE_TEMPLATE
│   │   ├── BUG_REPORT.yml
│   │   ├── config.yml
│   │   └── FEATURE_REQUEST.yml
│   ├── labeler.yml
│   ├── SECURITY.md
│   └── workflows
│       └── release.yml
├── .gitignore
├── .nvmrc
├── Dockerfile
├── LICENSE
├── package.json
├── pnpm-lock.yaml
├── README.md
├── smithery.yaml
├── src
│   └── index.ts
└── tsconfig.json
```

# Files

--------------------------------------------------------------------------------
/.nvmrc:
--------------------------------------------------------------------------------

```
1 | 22.14.0
2 | 
```

--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------

```
1 | node_modules/
2 | build/
3 | *.log
4 | .env*
```

--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------

```markdown
  1 | # MCP LLMS.txt Explorer
  2 | 
  3 | <a href="https://glama.ai/mcp/servers/lhyj3pva0z">
  4 |   <img width="380" height="200" src="https://glama.ai/mcp/servers/lhyj3pva0z/badge" alt="LLMS.txt Explorer MCP server" />
  5 | </a>
  6 | 
  7 | [![smithery badge](https://smithery.ai/badge/@thedaviddias/mcp-llms-txt-explorer)](https://smithery.ai/server/@thedaviddias/mcp-llms-txt-explorer)
  8 | 
  9 | A Model Context Protocol server for exploring websites with llms.txt files. This server helps you discover and analyze websites that implement the llms.txt standard.
 10 | 
 11 | ## Features
 12 | 
 13 | ### Resources
 14 | - Check websites for llms.txt and llms-full.txt files
 15 | - Parse and validate llms.txt file contents
 16 | - Access structured data about compliant websites
 17 | 
 18 | ### Tools
 19 | - `check_website` - Check if a website has llms.txt files
 20 |   - Takes domain URL as input
 21 |   - Returns file locations and validation status
 22 | - `list_websites` - List known websites with llms.txt files
 23 |   - Returns structured data about compliant websites
 24 |   - Supports filtering by file type (llms.txt/llms-full.txt)
 25 | 
 26 | ## Development
 27 | 
 28 | Install dependencies:
 29 | ```bash
 30 | pnpm install
 31 | ```
 32 | 
 33 | Build the server:
 34 | ```bash
 35 | pnpm run build
 36 | ```
 37 | 
 38 | For development with auto-rebuild:
 39 | ```bash
 40 | pnpm run watch
 41 | ```
 42 | 
 43 | ## Installation
 44 | 
 45 | ### Installing via Smithery
 46 | 
 47 | To install mcp-llms-txt-explorer for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@thedaviddias/mcp-llms-txt-explorer):
 48 | 
 49 | ```bash
 50 | npx -y @smithery/cli install @thedaviddias/mcp-llms-txt-explorer --client claude
 51 | ```
 52 | 
 53 | ### Installing Manually
 54 | To use this server:
 55 | 
 56 | ```bash
 57 | # Clone the repository
 58 | git clone https://github.com/thedaviddias/mcp-llms-txt-explorer.git
 59 | cd mcp-llms-txt-explorer
 60 | 
 61 | # Install dependencies
 62 | pnpm install
 63 | 
 64 | # Build the server
 65 | pnpm run build
 66 | ```
 67 | 
 68 | ### Configuration with Claude Desktop
 69 | 
 70 | To use with Claude Desktop, add the server config:
 71 | 
 72 | On MacOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
 73 | On Windows: `%APPDATA%/Claude/claude_desktop_config.json`
 74 | 
 75 | ```json
 76 | {
 77 |   "mcpServers": {
 78 |     "llms-txt-explorer": {
 79 |       "command": "node",
 80 |       "args": ["/path/to/llms-txt-explorer/build/index.js"],
 81 |     }
 82 |   }
 83 | }
 84 | ```
 85 | 
 86 | For npx usage, you can use:
 87 | ```json
 88 | {
 89 |   "mcpServers": {
 90 |     "llms-txt-explorer": {
 91 |       "command": "npx",
 92 |       "args": ["-y", "@thedaviddias/mcp-llms-txt-explorer"]
 93 |     }
 94 |   }
 95 | }
 96 | ```
 97 | 
 98 | ### Debugging
 99 | 
100 | Since MCP servers communicate over stdio, debugging can be challenging. We recommend using the [MCP Inspector](https://github.com/modelcontextprotocol/inspector), which is available as a package script:
101 | 
102 | ```bash
103 | pnpm run inspector
104 | ```
105 | 
106 | The Inspector will provide a URL to access debugging tools in your browser.
107 | 
108 | ## License
109 | 
110 | This project is licensed under the MIT License—see the LICENSE file for details.
```

--------------------------------------------------------------------------------
/.github/SECURITY.md:
--------------------------------------------------------------------------------

```markdown
1 | # Security Policy
2 | 
3 | ## Reporting a Vulnerability
4 | 
5 | If you believe you have found a security vulnerability in Links Base, we encourage you to responsibly disclose this and not open a public issue. Please report it using [GitHub Security Advisory](https://github.com/thedaviddias/llms-txt-hub/security/advisories/new) tool, to ensure confidentiality and security.
6 | 
7 | We'll review it as soon as possible and publish a fix accordingly.
8 | 
```

--------------------------------------------------------------------------------
/.github/CONTRIBUTING.md:
--------------------------------------------------------------------------------

```markdown
 1 | # Contributing to This Project
 2 | 
 3 | Thank you for your interest in contributing! This document outlines the process for contributing to our project.
 4 | 
 5 | ## Getting Started
 6 | 
 7 | There are two ways to contribute a new website to our list:
 8 | 
 9 | ### Option 1: Web Interface (Recommended)
10 | 
11 | The easiest way to contribute is through our web interface:
12 | 
13 | 1. Visit our website
14 | 2. Log in with your GitHub account
15 | 3. Submit your website through our user-friendly form
16 | 4. Your submission will be automatically validated and processed
17 | 
18 | ### Option 2: Manual Pull Request
19 | 
20 | If you prefer to contribute directly through GitHub:
21 | 
22 | 1. Fork the repository
23 | 2. Create a new branch for your addition: `git checkout -b add/your-website-name`
24 | 3. Create a new MDX file in the content/websites directory
25 | 4. Add your website information following our template format
26 | 5. Test your changes thoroughly
27 | 6. Commit your changes with clear, descriptive commit messages
28 | 7. Push to your fork
29 | 8. Submit a Pull Request
30 | 
31 | ## Pull Request Guidelines
32 | 
33 | - Ensure your PR addresses a specific issue or adds value to the project
34 | - Include a clear description of the changes
35 | - Keep changes focused and atomic
36 | - Follow existing code style and conventions
37 | - Include tests if applicable
38 | - Update documentation as needed
39 | 
40 | ## Code Style
41 | 
42 | - Follow the existing code formatting in the project (ensure you have Biome installed)
43 | - Write clear, self-documenting code
44 | - Add comments only when necessary to explain complex logic
45 | - Use meaningful variable and function names
46 | 
47 | ## Reporting Issues
48 | 
49 | - Use the GitHub issue tracker
50 | - Check if the issue already exists before creating a new one
51 | - Provide a clear description of the issue
52 | - Include steps to reproduce if applicable
53 | - Add relevant labels
54 | 
55 | ## Questions or Need Help?
56 | 
57 | Feel free to open an issue for questions or join our discussions. We're here to help!
58 | 
59 | ## Code of Conduct
60 | 
61 | Please note that this project follows a Code of Conduct. By participating, you are expected to uphold this code.
62 | 
63 | Thank you for contributing!
64 | 
```

--------------------------------------------------------------------------------
/.github/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------

```markdown
  1 | # Contributor Covenant Code of Conduct
  2 | 
  3 | ## Our Pledge
  4 | 
  5 | We as members, contributors, and leaders pledge to make participation in our
  6 | community a harassment-free experience for everyone, regardless of age, body
  7 | size, visible or invisible disability, ethnicity, sex characteristics, gender
  8 | identity and expression, level of experience, education, socio-economic status,
  9 | nationality, personal appearance, race, religion, or sexual identity
 10 | and orientation.
 11 | 
 12 | We pledge to act and interact in ways that contribute to an open, welcoming,
 13 | diverse, inclusive, and healthy community.
 14 | 
 15 | ## Our Standards
 16 | 
 17 | Examples of behavior that contributes to a positive environment for our
 18 | community include:
 19 | 
 20 | * Demonstrating empathy and kindness toward other people
 21 | * Being respectful of differing opinions, viewpoints, and experiences
 22 | * Giving and gracefully accepting constructive feedback
 23 | * Accepting responsibility and apologizing to those affected by our mistakes,
 24 |   and learning from the experience
 25 | * Focusing on what is best not just for us as individuals, but for the
 26 |   overall community
 27 | 
 28 | Examples of unacceptable behavior include:
 29 | 
 30 | * The use of sexualized language or imagery, and sexual attention or
 31 |   advances of any kind
 32 | * Trolling, insulting or derogatory comments, and personal or political attacks
 33 | * Public or private harassment
 34 | * Publishing others' private information, such as a physical or email
 35 |   address, without their explicit permission
 36 | * Other conduct which could reasonably be considered inappropriate in a
 37 |   professional setting
 38 | 
 39 | ## Enforcement Responsibilities
 40 | 
 41 | Community leaders are responsible for clarifying and enforcing our standards of
 42 | acceptable behavior and will take appropriate and fair corrective action in
 43 | response to any behavior that they deem inappropriate, threatening, offensive,
 44 | or harmful.
 45 | 
 46 | Community leaders have the right and responsibility to remove, edit, or reject
 47 | comments, commits, code, wiki edits, issues, and other contributions that are
 48 | not aligned to this Code of Conduct, and will communicate reasons for moderation
 49 | decisions when appropriate.
 50 | 
 51 | ## Scope
 52 | 
 53 | This Code of Conduct applies within all community spaces, and also applies when
 54 | an individual is officially representing the community in public spaces.
 55 | Examples of representing our community include using an official e-mail address,
 56 | posting via an official social media account, or acting as an appointed
 57 | representative at an online or offline event.
 58 | 
 59 | ## Enforcement
 60 | 
 61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be
 62 | reported to the community leaders responsible for enforcement at
 63 | haydenbleasel.com/contact.
 64 | All complaints will be reviewed and investigated promptly and fairly.
 65 | 
 66 | All community leaders are obligated to respect the privacy and security of the
 67 | reporter of any incident.
 68 | 
 69 | ## Enforcement Guidelines
 70 | 
 71 | Community leaders will follow these Community Impact Guidelines in determining
 72 | the consequences for any action they deem in violation of this Code of Conduct:
 73 | 
 74 | ### 1. Correction
 75 | 
 76 | **Community Impact**: Use of inappropriate language or other behavior deemed
 77 | unprofessional or unwelcome in the community.
 78 | 
 79 | **Consequence**: A private, written warning from community leaders, providing
 80 | clarity around the nature of the violation and an explanation of why the
 81 | behavior was inappropriate. A public apology may be requested.
 82 | 
 83 | ### 2. Warning
 84 | 
 85 | **Community Impact**: A violation through a single incident or series
 86 | of actions.
 87 | 
 88 | **Consequence**: A warning with consequences for continued behavior. No
 89 | interaction with the people involved, including unsolicited interaction with
 90 | those enforcing the Code of Conduct, for a specified period of time. This
 91 | includes avoiding interactions in community spaces as well as external channels
 92 | like social media. Violating these terms may lead to a temporary or
 93 | permanent ban.
 94 | 
 95 | ### 3. Temporary Ban
 96 | 
 97 | **Community Impact**: A serious violation of community standards, including
 98 | sustained inappropriate behavior.
 99 | 
100 | **Consequence**: A temporary ban from any sort of interaction or public
101 | communication with the community for a specified period of time. No public or
102 | private interaction with the people involved, including unsolicited interaction
103 | with those enforcing the Code of Conduct, is allowed during this period.
104 | Violating these terms may lead to a permanent ban.
105 | 
106 | ### 4. Permanent Ban
107 | 
108 | **Community Impact**: Demonstrating a pattern of violation of community
109 | standards, including sustained inappropriate behavior,  harassment of an
110 | individual, or aggression toward or disparagement of classes of individuals.
111 | 
112 | **Consequence**: A permanent ban from any sort of public interaction within
113 | the community.
114 | 
115 | ## Attribution
116 | 
117 | This Code of Conduct is adapted from the [Contributor Covenant][homepage],
118 | version 2.0, available at
119 | https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
120 | 
121 | Community Impact Guidelines were inspired by [Mozilla's code of conduct
122 | enforcement ladder](https://github.com/mozilla/diversity).
123 | 
124 | [homepage]: https://www.contributor-covenant.org
125 | 
126 | For answers to common questions about this code of conduct, see the FAQ at
127 | https://www.contributor-covenant.org/faq. Translations are available at
128 | https://www.contributor-covenant.org/translations.
129 | 
```

--------------------------------------------------------------------------------
/.github/labeler.yml:
--------------------------------------------------------------------------------

```yaml
1 | 
```

--------------------------------------------------------------------------------
/.github/FUNDING.yml:
--------------------------------------------------------------------------------

```yaml
1 | github: thedaviddias
2 | custom: ["https://thanks.dev/u/thedaviddias"]
3 | 
```

--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/config.yml:
--------------------------------------------------------------------------------

```yaml
1 | blank_issues_enabled: false
2 | contact_links:
3 |   - name: Questions & Discussions
4 |     url: https://github.com/thedaviddias/ai-templates/discussions
5 |     about: Please ask and answer questions here
6 | 
```

--------------------------------------------------------------------------------
/tsconfig.json:
--------------------------------------------------------------------------------

```json
 1 | {
 2 |   "compilerOptions": {
 3 |     "target": "ES2022",
 4 |     "module": "NodeNext",
 5 |     "moduleResolution": "NodeNext",
 6 |     "outDir": "./build",
 7 |     "rootDir": "./src",
 8 |     "strict": true,
 9 |     "esModuleInterop": true,
10 |     "skipLibCheck": true,
11 |     "forceConsistentCasingInFileNames": true
12 |   },
13 |   "include": ["src/**/*"],
14 |   "exclude": ["node_modules"]
15 | }
16 | 
```

--------------------------------------------------------------------------------
/smithery.yaml:
--------------------------------------------------------------------------------

```yaml
 1 | # Smithery configuration file: https://smithery.ai/docs/config#smitheryyaml
 2 | 
 3 | startCommand:
 4 |   type: stdio
 5 |   configSchema:
 6 |     # JSON Schema defining the configuration options for the MCP.
 7 |     type: object
 8 |     properties: {}
 9 |   commandFunction:
10 |     # A JS function that produces the CLI command based on the given config to start the MCP on stdio.
11 |     |-
12 |     (config) => ({
13 |       command: 'node',
14 |       args: ['build/index.js'],
15 |       env: {}
16 |     })
17 |   exampleConfig: {}
18 | 
```

--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------

```dockerfile
 1 | # Generated by https://smithery.ai. See: https://smithery.ai/docs/config#dockerfile
 2 | FROM node:lts-alpine
 3 | 
 4 | # Create app directory
 5 | WORKDIR /app
 6 | 
 7 | # Install pnpm
 8 | RUN npm install -g pnpm
 9 | 
10 | # Copy package files
11 | COPY package.json pnpm-lock.yaml ./
12 | 
13 | # Install dependencies without running prepare scripts (we run build explicitly)
14 | RUN pnpm install --frozen-lockfile --ignore-scripts
15 | 
16 | # Copy rest of the source
17 | COPY . .
18 | 
19 | # Build the project
20 | RUN pnpm run build
21 | 
22 | # Expose any necessary port if needed, but MCP uses stdio so not required
23 | 
24 | # Set the entry point
25 | CMD [ "node", "build/index.js" ]
26 | 
```

--------------------------------------------------------------------------------
/.github/actions/install/action.yml:
--------------------------------------------------------------------------------

```yaml
 1 | name: Setup project dependencies
 2 | 
 3 | description: "Setup project dependencies"
 4 | 
 5 | runs:
 6 |   using: "composite"
 7 |   steps:
 8 |     - name: Install pnpm
 9 |       uses: pnpm/[email protected]
10 |       id: pnpm-install
11 |       with:
12 |         run_install: false
13 | 
14 |     - name: Setup Node.js
15 |       uses: actions/setup-node@v4
16 |       with:
17 |         node-version-file: ".nvmrc"
18 |         registry-url: "https://registry.npmjs.org"
19 |         cache: "pnpm"
20 | 
21 |     - name: Cache dependencies
22 |       id: cache_dependencies
23 |       uses: actions/cache@v4
24 |       with:
25 |         path: |
26 |           ~/.pnpm
27 |           ${{ github.workspace }}/.next/cache
28 |         key: ${{ runner.os }}-nextjs-${{ hashFiles('**/pnpm-lock.yaml') }}-${{ hashFiles('**/*.js', '**/*.jsx', '**/*.ts', '**/*.tsx') }}
29 |         restore-keys: |
30 |           ${{ runner.os }}-nextjs-${{ hashFiles('**/pnpm-lock.yaml') }}-
31 | 
32 |     - name: Install dependencies
33 |       shell: bash
34 |       # if: steps.cache_dependencies.outputs.cache-hit != 'true'
35 |       run: pnpm install
36 | 
```

--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/BUG_REPORT.yml:
--------------------------------------------------------------------------------

```yaml
 1 | name: Bug Report
 2 | description: Report a bug to help us improve
 3 | title: "[BUG] "
 4 | labels: ["bug"]
 5 | body:
 6 |   - type: markdown
 7 |     attributes:
 8 |       value: |
 9 |         Thanks for taking the time to report this bug!
10 | 
11 |   - type: textarea
12 |     id: description
13 |     attributes:
14 |       label: What happened?
15 |       description: A clear and concise description of the bug
16 |       placeholder: When I click X, Y happens instead of Z
17 |     validations:
18 |       required: true
19 | 
20 |   - type: textarea
21 |     id: reproduction
22 |     attributes:
23 |       label: Steps to reproduce
24 |       description: How can we reproduce this issue?
25 |       placeholder: |
26 |         1. Go to '...'
27 |         2. Click on '...'
28 |         3. See error
29 |     validations:
30 |       required: true
31 | 
32 |   - type: dropdown
33 |     id: environment
34 |     attributes:
35 |       label: Environment
36 |       description: Where does this occur?
37 |       options:
38 |         - Development
39 |         - Staging
40 |         - Production
41 |     validations:
42 |       required: true
43 | 
44 |   - type: textarea
45 |     id: additional
46 |     attributes:
47 |       label: Additional context
48 |       description: Add any other context, screenshots, or error messages
49 | 
```

--------------------------------------------------------------------------------
/.github/workflows/release.yml:
--------------------------------------------------------------------------------

```yaml
 1 | name: Release to npm
 2 | 
 3 | on:
 4 |   workflow_dispatch:
 5 | 
 6 | permissions:
 7 |   contents: write
 8 | 
 9 | jobs:
10 |   release:
11 |     runs-on: ubuntu-latest
12 |     steps:
13 |       - name: Checkout
14 |         uses: actions/checkout@v4
15 |         with:
16 |           fetch-depth: 0
17 | 
18 |       - name: Install
19 |         uses: ./.github/actions/install
20 |         with:
21 |           registry-url: 'https://registry.npmjs.org'
22 |           scope: '@thedaviddias'
23 | 
24 |       - name: Setup npm auth
25 |         run: |
26 |           echo "//registry.npmjs.org/:_authToken=${{ secrets.NPM_TOKEN }}" > ~/.npmrc
27 |           echo "@thedaviddias:registry=https://registry.npmjs.org/" >> ~/.npmrc
28 |           npm whoami || echo "Not logged in to npm"
29 | 
30 |       - name: Build
31 |         run: pnpm build
32 | 
33 |       - name: Publish
34 |         run: |
35 |           echo "Attempting to publish with npm..."
36 |           npm publish --access public || {
37 |             echo "npm publish failed, trying with pnpm..."
38 |             pnpm publish --no-git-checks --access public
39 |           }
40 |         env:
41 |           NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
42 |           NPM_CONFIG_REGISTRY: 'https://registry.npmjs.org'
43 |           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
44 | 
```

--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------

```json
 1 | {
 2 |   "name": "@thedaviddias/mcp-llms-txt-explorer",
 3 |   "version": "0.2.0",
 4 |   "description": "A Model Context Protocol server for exploring websites with llms.txt files",
 5 |   "type": "module",
 6 |   "exports": "./build/index.js",
 7 |   "bin": {
 8 |     "mcp-llms-txt-explorer": "./build/index.js"
 9 |   },
10 |   "files": [
11 |     "build"
12 |   ],
13 |   "scripts": {
14 |     "build": "tsc && node -e \"require('fs').chmodSync('build/index.js', '755')\"",
15 |     "prepare": "npm run build",
16 |     "watch": "tsc --watch",
17 |     "inspector": "npx @modelcontextprotocol/inspector build/index.js",
18 |     "test": "tsc --noEmit",
19 |     "prepublishOnly": "pnpm run build"
20 |   },
21 |   "dependencies": {
22 |     "@modelcontextprotocol/sdk": "0.6.0",
23 |     "@types/node-fetch": "^2.6.12",
24 |     "node-fetch": "^2.7.0"
25 |   },
26 |   "devDependencies": {
27 |     "@types/node": "^20.11.24",
28 |     "typescript": "^5.3.3"
29 |   },
30 |   "keywords": [
31 |     "mcp",
32 |     "llms-txt",
33 |     "model-context-protocol",
34 |     "claude"
35 |   ],
36 |   "packageManager": "[email protected]",
37 |   "engines": {
38 |     "node": ">=18"
39 |   },
40 |   "author": "David Dias",
41 |   "license": "MIT",
42 |   "repository": {
43 |     "type": "git",
44 |     "url": "git+https://github.com/thedaviddias/mcp-llms-txt-explorer.git"
45 |   }
46 | }
47 | 
```

--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/FEATURE_REQUEST.yml:
--------------------------------------------------------------------------------

```yaml
 1 | name: Feature Request
 2 | description: Suggest an idea for this project
 3 | title: "[FEATURE] "
 4 | labels: ["enhancement"]
 5 | body:
 6 |   - type: markdown
 7 |     attributes:
 8 |       value: |
 9 |         Thanks for taking the time to suggest a new feature!
10 | 
11 |   - type: textarea
12 |     id: problem
13 |     attributes:
14 |       label: Problem Statement
15 |       description: What problem are you trying to solve?
16 |       placeholder: I'm always frustrated when...
17 |     validations:
18 |       required: true
19 | 
20 |   - type: textarea
21 |     id: solution
22 |     attributes:
23 |       label: Proposed Solution
24 |       description: What solution would you like to see?
25 |       placeholder: It would be great if...
26 |     validations:
27 |       required: true
28 | 
29 |   - type: dropdown
30 |     id: scope
31 |     attributes:
32 |       label: Scope
33 |       description: Which part of the project does this affect?
34 |       options:
35 |         - Frontend
36 |         - Backend
37 |         - Infrastructure
38 |         - Documentation
39 |         - Other
40 |     validations:
41 |       required: true
42 | 
43 |   - type: textarea
44 |     id: alternatives
45 |     attributes:
46 |       label: Alternatives Considered
47 |       description: What alternative solutions have you considered?
48 |       placeholder: I also thought about...
49 | 
```

--------------------------------------------------------------------------------
/src/index.ts:
--------------------------------------------------------------------------------

```typescript
  1 | #!/opt/homebrew/bin/node
  2 | 
  3 | import { Server } from "@modelcontextprotocol/sdk/server/index.js";
  4 | import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
  5 | import {
  6 |   CallToolRequestSchema,
  7 |   ListResourcesRequestSchema,
  8 |   ListToolsRequestSchema,
  9 |   ReadResourceRequestSchema,
 10 | } from "@modelcontextprotocol/sdk/types.js";
 11 | import fetch from "node-fetch";
 12 | import { createRequire } from 'node:module';
 13 | 
 14 | const require = createRequire(import.meta.url);
 15 | const { version } = require('../package.json');
 16 | 
 17 | const websites = 'https://raw.githubusercontent.com/thedaviddias/llms-txt-hub/main/data/websites.json'
 18 | 
 19 | /**
 20 |  * Type for a website with llms.txt information
 21 |  */
 22 | interface Website {
 23 |   name: string;
 24 |   domain: string;
 25 |   description: string;
 26 |   llmsTxtUrl?: string;
 27 |   llmsFullTxtUrl?: string;
 28 |   category?: string;
 29 |   favicon?: string;
 30 | }
 31 | 
 32 | /**
 33 |  * Type for a linked content from llms.txt
 34 |  */
 35 | interface LinkedContent {
 36 |   url: string;
 37 |   content?: string;
 38 |   error?: string;
 39 | }
 40 | 
 41 | /**
 42 |  * Type for the check website result
 43 |  */
 44 | interface WebsiteCheckResult {
 45 |   hasLlmsTxt: boolean;
 46 |   hasLlmsFullTxt: boolean;
 47 |   llmsTxtUrl?: string;
 48 |   llmsFullTxtUrl?: string;
 49 |   llmsTxtContent?: string;
 50 |   llmsFullTxtContent?: string;
 51 |   linkedContents?: LinkedContent[];
 52 |   error?: string;
 53 | }
 54 | 
 55 | /**
 56 |  * Known websites with llms.txt files
 57 |  * Initial data from llms-txt-hub
 58 |  */
 59 | let knownWebsites: Website[] = [];
 60 | 
 61 | /**
 62 |  * Cache for website check results
 63 |  */
 64 | const websiteCheckCache: { [domain: string]: WebsiteCheckResult } = {};
 65 | 
 66 | /**
 67 |  * Create an MCP server for exploring llms.txt files
 68 |  */
 69 | const server = new Server(
 70 |   {
 71 |     name: "LLMS.txt Explorer",
 72 |     version,
 73 |   },
 74 |   {
 75 |     capabilities: {
 76 |       resources: {},
 77 |       tools: {},
 78 |     },
 79 |   }
 80 | );
 81 | 
 82 | /**
 83 |  * Validate website data
 84 |  */
 85 | function isValidWebsite(website: unknown): website is Website {
 86 |   if (!website || typeof website !== 'object') return false;
 87 |   const w = website as Record<string, unknown>;
 88 |   return (
 89 |     typeof w.name === 'string' &&
 90 |     typeof w.domain === 'string' &&
 91 |     typeof w.description === 'string' &&
 92 |     (w.llmsTxtUrl === undefined || typeof w.llmsTxtUrl === 'string') &&
 93 |     (w.llmsFullTxtUrl === undefined || typeof w.llmsFullTxtUrl === 'string') &&
 94 |     (w.category === undefined || typeof w.category === 'string') &&
 95 |     (w.favicon === undefined || typeof w.favicon === 'string')
 96 |   );
 97 | }
 98 | 
 99 | /**
100 |  * Fetch websites list from GitHub
101 |  */
102 | async function fetchWebsitesList() {
103 |   try {
104 |     console.error('Fetching websites list from GitHub...');
105 |     const response = await fetch(websites);
106 | 
107 |     if (!response.ok) {
108 |       throw new Error(`Failed to fetch websites list: ${response.status}`);
109 |     }
110 | 
111 |     const data = await response.json();
112 | 
113 |     if (!Array.isArray(data)) {
114 |       throw new Error('Invalid data format: expected an array');
115 |     }
116 | 
117 |     const validWebsites = data.filter(isValidWebsite);
118 |     console.error(`Fetched ${validWebsites.length} valid websites`);
119 |     knownWebsites = validWebsites;
120 |   } catch (error) {
121 |     console.error('Error fetching websites list:', error);
122 |     // Fallback to default website if fetch fails
123 |     knownWebsites = [{
124 |       name: "Supabase",
125 |       domain: "https://supabase.com",
126 |       description: "Build production-grade applications with Postgres",
127 |       llmsTxtUrl: "https://supabase.com/llms.txt",
128 |       category: "developer-tools"
129 |     }];
130 |   }
131 | }
132 | 
133 | /**
134 |  * Extract linked URLs from llms.txt content
135 |  */
136 | function extractLinkedUrls(content: string): string[] {
137 |   const urls: string[] = [];
138 |   const lines = content.split('\n');
139 | 
140 |   for (const line of lines) {
141 |     const trimmedLine = line.trim();
142 |     if (trimmedLine.startsWith('@')) {
143 |       const url = trimmedLine.slice(1).trim();
144 |       if (url) {
145 |         urls.push(url);
146 |       }
147 |     }
148 |   }
149 | 
150 |   return urls;
151 | }
152 | 
153 | /**
154 |  * Check if a website has llms.txt files
155 |  */
156 | async function checkWebsite(domain: string): Promise<WebsiteCheckResult> {
157 |   console.error('Starting website check for:', domain);
158 | 
159 |   // Return cached result if available
160 |   if (websiteCheckCache[domain]) {
161 |     console.error('Returning cached result for:', domain);
162 |     return websiteCheckCache[domain];
163 |   }
164 | 
165 |   const result: WebsiteCheckResult = {
166 |     hasLlmsTxt: false,
167 |     hasLlmsFullTxt: false
168 |   };
169 | 
170 |   // Create an overall timeout for the entire operation
171 |   const globalTimeout = new Promise<never>((_, reject) => {
172 |     setTimeout(() => {
173 |       reject(new Error('Global timeout exceeded'));
174 |     }, 15000); // 15 second global timeout
175 |   });
176 | 
177 |   try {
178 |     // Normalize domain and add protocol if missing
179 |     let normalizedDomain = domain;
180 |     if (!domain.startsWith('http://') && !domain.startsWith('https://')) {
181 |       normalizedDomain = `https://${domain}`;
182 |     }
183 |     console.error('Normalized domain:', normalizedDomain);
184 | 
185 |     // Validate URL format
186 |     let url: URL;
187 |     try {
188 |       url = new URL(normalizedDomain);
189 |     } catch (e) {
190 |       console.error('Invalid URL:', domain);
191 |       throw new Error(`Invalid URL format: ${domain}`);
192 |     }
193 | 
194 |     // Use the normalized URL
195 |     const baseUrl = url.origin;
196 |     console.error('Base URL:', baseUrl);
197 | 
198 |     // Helper function to fetch with timeout
199 |     async function fetchWithTimeout(url: string, timeout = 5000) { // Reduced to 5 seconds
200 |       console.error(`Fetching ${url} with ${timeout}ms timeout`);
201 |       const controller = new AbortController();
202 |       const timeoutId = setTimeout(() => {
203 |         controller.abort();
204 |         console.error(`Timeout after ${timeout}ms for ${url}`);
205 |       }, timeout);
206 | 
207 |       try {
208 |         const startTime = Date.now();
209 |         const response = await fetch(url, {
210 |           signal: controller.signal,
211 |           headers: {
212 |             'User-Agent': 'llms-txt-explorer/0.1.0'
213 |           }
214 |         });
215 |         const endTime = Date.now();
216 |         console.error(`Fetch completed in ${endTime - startTime}ms for ${url}`);
217 |         clearTimeout(timeoutId);
218 |         return response;
219 |       } catch (error) {
220 |         clearTimeout(timeoutId);
221 |         console.error(`Fetch error for ${url}:`, error);
222 |         throw error;
223 |       }
224 |     }
225 | 
226 |     const checkPromise = (async () => {
227 |       // Check for llms.txt
228 |       try {
229 |         const llmsTxtUrl = `${baseUrl}/llms.txt`;
230 |         console.error('Fetching llms.txt from:', llmsTxtUrl);
231 |         const llmsTxtRes = await fetchWithTimeout(llmsTxtUrl);
232 |         console.error('llms.txt response status:', llmsTxtRes.status);
233 | 
234 |         if (llmsTxtRes.ok) {
235 |           result.hasLlmsTxt = true;
236 |           result.llmsTxtUrl = llmsTxtUrl;
237 |           const content = await llmsTxtRes.text();
238 |           console.error(`llms.txt content length: ${content.length} bytes`);
239 |           result.llmsTxtContent = content;
240 |           console.error('Successfully fetched llms.txt');
241 | 
242 |           // Extract and fetch linked contents in parallel with timeout
243 |           const linkedUrls = extractLinkedUrls(content).slice(0, 3); // Reduced to 3 linked contents
244 |           if (linkedUrls.length > 0) {
245 |             console.error(`Found ${linkedUrls.length} linked URLs in llms.txt (limited to 3)`);
246 |             result.linkedContents = [];
247 | 
248 |             const fetchPromises = linkedUrls.map(async (url) => {
249 |               console.error(`Fetching linked content from: ${url}`);
250 |               try {
251 |                 const linkedRes = await fetchWithTimeout(url);
252 |                 if (!linkedRes.ok) {
253 |                   throw new Error(`Failed to fetch content: ${linkedRes.status}`);
254 |                 }
255 |                 const linkedContent = await linkedRes.text();
256 |                 console.error(`Linked content length: ${linkedContent.length} bytes`);
257 |                 return {
258 |                   url,
259 |                   content: linkedContent
260 |                 };
261 |               } catch (error) {
262 |                 console.error(`Error fetching linked content from ${url}:`, error);
263 |                 return {
264 |                   url,
265 |                   error: error instanceof Error ? error.message : 'Unknown error'
266 |                 };
267 |               }
268 |             });
269 | 
270 |             // Wait for all fetches to complete with a 10 second timeout
271 |             const linkedContentTimeout = new Promise<never>((_, reject) => {
272 |               setTimeout(() => {
273 |                 reject(new Error('Linked content fetch timeout'));
274 |               }, 10000);
275 |             });
276 | 
277 |             try {
278 |               result.linkedContents = await Promise.race([
279 |                 Promise.all(fetchPromises),
280 |                 linkedContentTimeout
281 |               ]);
282 |             } catch (error) {
283 |               console.error('Error fetching linked contents:', error);
284 |               result.linkedContents = linkedUrls.map(url => ({
285 |                 url,
286 |                 error: 'Timeout fetching linked contents'
287 |               }));
288 |             }
289 |           }
290 |         }
291 |       } catch (error: unknown) {
292 |         console.error('Error in main llms.txt fetch:', error);
293 |         if (error instanceof Error) {
294 |           result.error = error.message;
295 |         } else {
296 |           result.error = 'Unknown error fetching llms.txt';
297 |         }
298 |       }
299 | 
300 |       // Only check llms-full.txt if llms.txt was successful
301 |       if (result.hasLlmsTxt && !result.error) {
302 |         try {
303 |           const llmsFullTxtUrl = `${baseUrl}/llms-full.txt`;
304 |           console.error('Fetching llms-full.txt from:', llmsFullTxtUrl);
305 |           const llmsFullTxtRes = await fetchWithTimeout(llmsFullTxtUrl);
306 |           console.error('llms-full.txt response status:', llmsFullTxtRes.status);
307 | 
308 |           if (llmsFullTxtRes.ok) {
309 |             result.hasLlmsFullTxt = true;
310 |             result.llmsFullTxtUrl = llmsFullTxtUrl;
311 |             const content = await llmsFullTxtRes.text();
312 |             console.error(`llms-full.txt content length: ${content.length} bytes`);
313 |             result.llmsFullTxtContent = content;
314 |             console.error('Successfully fetched llms-full.txt');
315 |           }
316 |         } catch (error) {
317 |           console.error('Error fetching llms-full.txt:', error);
318 |           // Don't fail the whole operation for llms-full.txt errors
319 |         }
320 |       }
321 | 
322 |       return result;
323 |     })();
324 | 
325 |     // Race between the check operation and the global timeout
326 |     const finalResult = await Promise.race([checkPromise, globalTimeout]);
327 | 
328 |     // Cache successful results only
329 |     if (!finalResult.error) {
330 |       websiteCheckCache[domain] = finalResult;
331 |     }
332 | 
333 |     console.error('Final result:', JSON.stringify(finalResult, null, 2));
334 |     return finalResult;
335 |   } catch (error) {
336 |     const errorMessage = error instanceof Error ? error.message : 'Unknown error';
337 |     console.error('Error checking website:', errorMessage);
338 |     return {
339 |       hasLlmsTxt: false,
340 |       hasLlmsFullTxt: false,
341 |       error: errorMessage
342 |     };
343 |   }
344 | }
345 | 
346 | /**
347 |  * Handler for listing available websites as resources
348 |  */
349 | server.setRequestHandler(ListResourcesRequestSchema, async () => {
350 |   return {
351 |     resources: knownWebsites.map(site => ({
352 |       uri: `website://${site.domain}`,
353 |       mimeType: "application/json",
354 |       name: site.name,
355 |       description: site.description
356 |     }))
357 |   };
358 | });
359 | 
360 | /**
361 |  * Handler for reading website information
362 |  */
363 | server.setRequestHandler(ReadResourceRequestSchema, async (request) => {
364 |   const url = new URL(request.params.uri);
365 |   const domain = url.hostname;
366 | 
367 |   const website = knownWebsites.find(site => new URL(site.domain).hostname === domain);
368 |   if (!website) {
369 |     throw new Error(`Website ${domain} not found in known websites`);
370 |   }
371 | 
372 |   const checkResult = await checkWebsite(website.domain);
373 | 
374 |   return {
375 |     contents: [{
376 |       uri: request.params.uri,
377 |       mimeType: "application/json",
378 |       text: JSON.stringify({ ...website, ...checkResult }, null, 2)
379 |     }]
380 |   };
381 | });
382 | 
383 | /**
384 |  * Handler that lists available tools
385 |  */
386 | server.setRequestHandler(ListToolsRequestSchema, async () => {
387 |   return {
388 |     tools: [
389 |       {
390 |         name: "check_website",
391 |         description: "Check if a website has llms.txt files",
392 |         inputSchema: {
393 |           type: "object",
394 |           properties: {
395 |             url: {
396 |               type: "string",
397 |               description: "URL of the website to check"
398 |             }
399 |           },
400 |           required: ["url"]
401 |         }
402 |       },
403 |       {
404 |         name: "list_websites",
405 |         description: "List known websites with llms.txt files",
406 |         inputSchema: {
407 |           type: "object",
408 |           properties: {
409 |             filter_llms_txt: {
410 |               type: "boolean",
411 |               description: "Only show websites with llms.txt"
412 |             },
413 |             filter_llms_full_txt: {
414 |               type: "boolean",
415 |               description: "Only show websites with llms-full.txt"
416 |             }
417 |           }
418 |         }
419 |       }
420 |     ]
421 |   };
422 | });
423 | 
424 | /**
425 |  * Handler for tool calls
426 |  */
427 | server.setRequestHandler(CallToolRequestSchema, async (request) => {
428 |   console.error('Received tool request:', request.params.name);
429 | 
430 |   switch (request.params.name) {
431 |     case "check_website": {
432 |       const url = String(request.params.arguments?.url);
433 |       console.error('Checking website:', url);
434 | 
435 |       if (!url) {
436 |         console.error('URL is required');
437 |         return {
438 |           content: [{
439 |             type: "text",
440 |             text: JSON.stringify({ error: "URL is required" }, null, 2)
441 |           }]
442 |         };
443 |       }
444 | 
445 |       try {
446 |         const result = await checkWebsite(url);
447 |         console.error('Tool returning result:', JSON.stringify(result, null, 2));
448 |         return {
449 |           content: [{
450 |             type: "text",
451 |             text: JSON.stringify(result, null, 2)
452 |           }]
453 |         };
454 |       } catch (error) {
455 |         const errorMessage = error instanceof Error ? error.message : 'Unknown error';
456 |         console.error('Tool returning error:', errorMessage);
457 |         return {
458 |           content: [{
459 |             type: "text",
460 |             text: JSON.stringify({ error: errorMessage }, null, 2)
461 |           }]
462 |         };
463 |       }
464 |     }
465 | 
466 |     case "list_websites": {
467 |       const filterLlmsTxt = Boolean(request.params.arguments?.filter_llms_txt);
468 |       const filterLlmsFullTxt = Boolean(request.params.arguments?.filter_llms_full_txt);
469 | 
470 |       let websites = knownWebsites;
471 | 
472 |       if (filterLlmsTxt) {
473 |         websites = websites.filter(site => site.llmsTxtUrl);
474 |       }
475 |       if (filterLlmsFullTxt) {
476 |         websites = websites.filter(site => site.llmsFullTxtUrl);
477 |       }
478 | 
479 |       return {
480 |         content: [{
481 |           type: "text",
482 |           text: JSON.stringify(websites, null, 2)
483 |         }]
484 |       };
485 |     }
486 | 
487 |     default:
488 |       throw new Error("Unknown tool");
489 |   }
490 | });
491 | 
492 | /**
493 |  * Start the server using stdio transport
494 |  */
495 | async function main() {
496 |   // Fetch websites list before starting the server
497 |   await fetchWebsitesList();
498 | 
499 |   const transport = new StdioServerTransport();
500 |   await server.connect(transport);
501 | }
502 | 
503 | main().catch((error) => {
504 |   console.error("Server error:", error);
505 |   process.exit(1);
506 | });
507 | 
```