# Directory Structure
```
├── .github
│ ├── actions
│ │ └── install
│ │ └── action.yml
│ ├── CODE_OF_CONDUCT.md
│ ├── CODEOWNERS
│ ├── CONTRIBUTING.md
│ ├── FUNDING.yml
│ ├── ISSUE_TEMPLATE
│ │ ├── BUG_REPORT.yml
│ │ ├── config.yml
│ │ └── FEATURE_REQUEST.yml
│ ├── labeler.yml
│ ├── SECURITY.md
│ └── workflows
│ └── release.yml
├── .gitignore
├── .nvmrc
├── Dockerfile
├── LICENSE
├── package.json
├── pnpm-lock.yaml
├── README.md
├── smithery.yaml
├── src
│ └── index.ts
└── tsconfig.json
```
# Files
--------------------------------------------------------------------------------
/.nvmrc:
--------------------------------------------------------------------------------
```
1 | 22.14.0
2 |
```
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
```
1 | node_modules/
2 | build/
3 | *.log
4 | .env*
```
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
```markdown
1 | # MCP LLMS.txt Explorer
2 |
3 | <a href="https://glama.ai/mcp/servers/lhyj3pva0z">
4 | <img width="380" height="200" src="https://glama.ai/mcp/servers/lhyj3pva0z/badge" alt="LLMS.txt Explorer MCP server" />
5 | </a>
6 |
7 | [](https://smithery.ai/server/@thedaviddias/mcp-llms-txt-explorer)
8 |
9 | A Model Context Protocol server for exploring websites with llms.txt files. This server helps you discover and analyze websites that implement the llms.txt standard.
10 |
11 | ## Features
12 |
13 | ### Resources
14 | - Check websites for llms.txt and llms-full.txt files
15 | - Parse and validate llms.txt file contents
16 | - Access structured data about compliant websites
17 |
18 | ### Tools
19 | - `check_website` - Check if a website has llms.txt files
20 | - Takes domain URL as input
21 | - Returns file locations and validation status
22 | - `list_websites` - List known websites with llms.txt files
23 | - Returns structured data about compliant websites
24 | - Supports filtering by file type (llms.txt/llms-full.txt)
25 |
26 | ## Development
27 |
28 | Install dependencies:
29 | ```bash
30 | pnpm install
31 | ```
32 |
33 | Build the server:
34 | ```bash
35 | pnpm run build
36 | ```
37 |
38 | For development with auto-rebuild:
39 | ```bash
40 | pnpm run watch
41 | ```
42 |
43 | ## Installation
44 |
45 | ### Installing via Smithery
46 |
47 | To install mcp-llms-txt-explorer for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@thedaviddias/mcp-llms-txt-explorer):
48 |
49 | ```bash
50 | npx -y @smithery/cli install @thedaviddias/mcp-llms-txt-explorer --client claude
51 | ```
52 |
53 | ### Installing Manually
54 | To use this server:
55 |
56 | ```bash
57 | # Clone the repository
58 | git clone https://github.com/thedaviddias/mcp-llms-txt-explorer.git
59 | cd mcp-llms-txt-explorer
60 |
61 | # Install dependencies
62 | pnpm install
63 |
64 | # Build the server
65 | pnpm run build
66 | ```
67 |
68 | ### Configuration with Claude Desktop
69 |
70 | To use with Claude Desktop, add the server config:
71 |
72 | On MacOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
73 | On Windows: `%APPDATA%/Claude/claude_desktop_config.json`
74 |
75 | ```json
76 | {
77 | "mcpServers": {
78 | "llms-txt-explorer": {
79 | "command": "node",
80 | "args": ["/path/to/llms-txt-explorer/build/index.js"],
81 | }
82 | }
83 | }
84 | ```
85 |
86 | For npx usage, you can use:
87 | ```json
88 | {
89 | "mcpServers": {
90 | "llms-txt-explorer": {
91 | "command": "npx",
92 | "args": ["-y", "@thedaviddias/mcp-llms-txt-explorer"]
93 | }
94 | }
95 | }
96 | ```
97 |
98 | ### Debugging
99 |
100 | Since MCP servers communicate over stdio, debugging can be challenging. We recommend using the [MCP Inspector](https://github.com/modelcontextprotocol/inspector), which is available as a package script:
101 |
102 | ```bash
103 | pnpm run inspector
104 | ```
105 |
106 | The Inspector will provide a URL to access debugging tools in your browser.
107 |
108 | ## License
109 |
110 | This project is licensed under the MIT License—see the LICENSE file for details.
```
--------------------------------------------------------------------------------
/.github/SECURITY.md:
--------------------------------------------------------------------------------
```markdown
1 | # Security Policy
2 |
3 | ## Reporting a Vulnerability
4 |
5 | If you believe you have found a security vulnerability in Links Base, we encourage you to responsibly disclose this and not open a public issue. Please report it using [GitHub Security Advisory](https://github.com/thedaviddias/llms-txt-hub/security/advisories/new) tool, to ensure confidentiality and security.
6 |
7 | We'll review it as soon as possible and publish a fix accordingly.
8 |
```
--------------------------------------------------------------------------------
/.github/CONTRIBUTING.md:
--------------------------------------------------------------------------------
```markdown
1 | # Contributing to This Project
2 |
3 | Thank you for your interest in contributing! This document outlines the process for contributing to our project.
4 |
5 | ## Getting Started
6 |
7 | There are two ways to contribute a new website to our list:
8 |
9 | ### Option 1: Web Interface (Recommended)
10 |
11 | The easiest way to contribute is through our web interface:
12 |
13 | 1. Visit our website
14 | 2. Log in with your GitHub account
15 | 3. Submit your website through our user-friendly form
16 | 4. Your submission will be automatically validated and processed
17 |
18 | ### Option 2: Manual Pull Request
19 |
20 | If you prefer to contribute directly through GitHub:
21 |
22 | 1. Fork the repository
23 | 2. Create a new branch for your addition: `git checkout -b add/your-website-name`
24 | 3. Create a new MDX file in the content/websites directory
25 | 4. Add your website information following our template format
26 | 5. Test your changes thoroughly
27 | 6. Commit your changes with clear, descriptive commit messages
28 | 7. Push to your fork
29 | 8. Submit a Pull Request
30 |
31 | ## Pull Request Guidelines
32 |
33 | - Ensure your PR addresses a specific issue or adds value to the project
34 | - Include a clear description of the changes
35 | - Keep changes focused and atomic
36 | - Follow existing code style and conventions
37 | - Include tests if applicable
38 | - Update documentation as needed
39 |
40 | ## Code Style
41 |
42 | - Follow the existing code formatting in the project (ensure you have Biome installed)
43 | - Write clear, self-documenting code
44 | - Add comments only when necessary to explain complex logic
45 | - Use meaningful variable and function names
46 |
47 | ## Reporting Issues
48 |
49 | - Use the GitHub issue tracker
50 | - Check if the issue already exists before creating a new one
51 | - Provide a clear description of the issue
52 | - Include steps to reproduce if applicable
53 | - Add relevant labels
54 |
55 | ## Questions or Need Help?
56 |
57 | Feel free to open an issue for questions or join our discussions. We're here to help!
58 |
59 | ## Code of Conduct
60 |
61 | Please note that this project follows a Code of Conduct. By participating, you are expected to uphold this code.
62 |
63 | Thank you for contributing!
64 |
```
--------------------------------------------------------------------------------
/.github/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
```markdown
1 | # Contributor Covenant Code of Conduct
2 |
3 | ## Our Pledge
4 |
5 | We as members, contributors, and leaders pledge to make participation in our
6 | community a harassment-free experience for everyone, regardless of age, body
7 | size, visible or invisible disability, ethnicity, sex characteristics, gender
8 | identity and expression, level of experience, education, socio-economic status,
9 | nationality, personal appearance, race, religion, or sexual identity
10 | and orientation.
11 |
12 | We pledge to act and interact in ways that contribute to an open, welcoming,
13 | diverse, inclusive, and healthy community.
14 |
15 | ## Our Standards
16 |
17 | Examples of behavior that contributes to a positive environment for our
18 | community include:
19 |
20 | * Demonstrating empathy and kindness toward other people
21 | * Being respectful of differing opinions, viewpoints, and experiences
22 | * Giving and gracefully accepting constructive feedback
23 | * Accepting responsibility and apologizing to those affected by our mistakes,
24 | and learning from the experience
25 | * Focusing on what is best not just for us as individuals, but for the
26 | overall community
27 |
28 | Examples of unacceptable behavior include:
29 |
30 | * The use of sexualized language or imagery, and sexual attention or
31 | advances of any kind
32 | * Trolling, insulting or derogatory comments, and personal or political attacks
33 | * Public or private harassment
34 | * Publishing others' private information, such as a physical or email
35 | address, without their explicit permission
36 | * Other conduct which could reasonably be considered inappropriate in a
37 | professional setting
38 |
39 | ## Enforcement Responsibilities
40 |
41 | Community leaders are responsible for clarifying and enforcing our standards of
42 | acceptable behavior and will take appropriate and fair corrective action in
43 | response to any behavior that they deem inappropriate, threatening, offensive,
44 | or harmful.
45 |
46 | Community leaders have the right and responsibility to remove, edit, or reject
47 | comments, commits, code, wiki edits, issues, and other contributions that are
48 | not aligned to this Code of Conduct, and will communicate reasons for moderation
49 | decisions when appropriate.
50 |
51 | ## Scope
52 |
53 | This Code of Conduct applies within all community spaces, and also applies when
54 | an individual is officially representing the community in public spaces.
55 | Examples of representing our community include using an official e-mail address,
56 | posting via an official social media account, or acting as an appointed
57 | representative at an online or offline event.
58 |
59 | ## Enforcement
60 |
61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be
62 | reported to the community leaders responsible for enforcement at
63 | haydenbleasel.com/contact.
64 | All complaints will be reviewed and investigated promptly and fairly.
65 |
66 | All community leaders are obligated to respect the privacy and security of the
67 | reporter of any incident.
68 |
69 | ## Enforcement Guidelines
70 |
71 | Community leaders will follow these Community Impact Guidelines in determining
72 | the consequences for any action they deem in violation of this Code of Conduct:
73 |
74 | ### 1. Correction
75 |
76 | **Community Impact**: Use of inappropriate language or other behavior deemed
77 | unprofessional or unwelcome in the community.
78 |
79 | **Consequence**: A private, written warning from community leaders, providing
80 | clarity around the nature of the violation and an explanation of why the
81 | behavior was inappropriate. A public apology may be requested.
82 |
83 | ### 2. Warning
84 |
85 | **Community Impact**: A violation through a single incident or series
86 | of actions.
87 |
88 | **Consequence**: A warning with consequences for continued behavior. No
89 | interaction with the people involved, including unsolicited interaction with
90 | those enforcing the Code of Conduct, for a specified period of time. This
91 | includes avoiding interactions in community spaces as well as external channels
92 | like social media. Violating these terms may lead to a temporary or
93 | permanent ban.
94 |
95 | ### 3. Temporary Ban
96 |
97 | **Community Impact**: A serious violation of community standards, including
98 | sustained inappropriate behavior.
99 |
100 | **Consequence**: A temporary ban from any sort of interaction or public
101 | communication with the community for a specified period of time. No public or
102 | private interaction with the people involved, including unsolicited interaction
103 | with those enforcing the Code of Conduct, is allowed during this period.
104 | Violating these terms may lead to a permanent ban.
105 |
106 | ### 4. Permanent Ban
107 |
108 | **Community Impact**: Demonstrating a pattern of violation of community
109 | standards, including sustained inappropriate behavior, harassment of an
110 | individual, or aggression toward or disparagement of classes of individuals.
111 |
112 | **Consequence**: A permanent ban from any sort of public interaction within
113 | the community.
114 |
115 | ## Attribution
116 |
117 | This Code of Conduct is adapted from the [Contributor Covenant][homepage],
118 | version 2.0, available at
119 | https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
120 |
121 | Community Impact Guidelines were inspired by [Mozilla's code of conduct
122 | enforcement ladder](https://github.com/mozilla/diversity).
123 |
124 | [homepage]: https://www.contributor-covenant.org
125 |
126 | For answers to common questions about this code of conduct, see the FAQ at
127 | https://www.contributor-covenant.org/faq. Translations are available at
128 | https://www.contributor-covenant.org/translations.
129 |
```
--------------------------------------------------------------------------------
/.github/labeler.yml:
--------------------------------------------------------------------------------
```yaml
1 |
```
--------------------------------------------------------------------------------
/.github/FUNDING.yml:
--------------------------------------------------------------------------------
```yaml
1 | github: thedaviddias
2 | custom: ["https://thanks.dev/u/thedaviddias"]
3 |
```
--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/config.yml:
--------------------------------------------------------------------------------
```yaml
1 | blank_issues_enabled: false
2 | contact_links:
3 | - name: Questions & Discussions
4 | url: https://github.com/thedaviddias/ai-templates/discussions
5 | about: Please ask and answer questions here
6 |
```
--------------------------------------------------------------------------------
/tsconfig.json:
--------------------------------------------------------------------------------
```json
1 | {
2 | "compilerOptions": {
3 | "target": "ES2022",
4 | "module": "NodeNext",
5 | "moduleResolution": "NodeNext",
6 | "outDir": "./build",
7 | "rootDir": "./src",
8 | "strict": true,
9 | "esModuleInterop": true,
10 | "skipLibCheck": true,
11 | "forceConsistentCasingInFileNames": true
12 | },
13 | "include": ["src/**/*"],
14 | "exclude": ["node_modules"]
15 | }
16 |
```
--------------------------------------------------------------------------------
/smithery.yaml:
--------------------------------------------------------------------------------
```yaml
1 | # Smithery configuration file: https://smithery.ai/docs/config#smitheryyaml
2 |
3 | startCommand:
4 | type: stdio
5 | configSchema:
6 | # JSON Schema defining the configuration options for the MCP.
7 | type: object
8 | properties: {}
9 | commandFunction:
10 | # A JS function that produces the CLI command based on the given config to start the MCP on stdio.
11 | |-
12 | (config) => ({
13 | command: 'node',
14 | args: ['build/index.js'],
15 | env: {}
16 | })
17 | exampleConfig: {}
18 |
```
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
```dockerfile
1 | # Generated by https://smithery.ai. See: https://smithery.ai/docs/config#dockerfile
2 | FROM node:lts-alpine
3 |
4 | # Create app directory
5 | WORKDIR /app
6 |
7 | # Install pnpm
8 | RUN npm install -g pnpm
9 |
10 | # Copy package files
11 | COPY package.json pnpm-lock.yaml ./
12 |
13 | # Install dependencies without running prepare scripts (we run build explicitly)
14 | RUN pnpm install --frozen-lockfile --ignore-scripts
15 |
16 | # Copy rest of the source
17 | COPY . .
18 |
19 | # Build the project
20 | RUN pnpm run build
21 |
22 | # Expose any necessary port if needed, but MCP uses stdio so not required
23 |
24 | # Set the entry point
25 | CMD [ "node", "build/index.js" ]
26 |
```
--------------------------------------------------------------------------------
/.github/actions/install/action.yml:
--------------------------------------------------------------------------------
```yaml
1 | name: Setup project dependencies
2 |
3 | description: "Setup project dependencies"
4 |
5 | runs:
6 | using: "composite"
7 | steps:
8 | - name: Install pnpm
9 | uses: pnpm/[email protected]
10 | id: pnpm-install
11 | with:
12 | run_install: false
13 |
14 | - name: Setup Node.js
15 | uses: actions/setup-node@v4
16 | with:
17 | node-version-file: ".nvmrc"
18 | registry-url: "https://registry.npmjs.org"
19 | cache: "pnpm"
20 |
21 | - name: Cache dependencies
22 | id: cache_dependencies
23 | uses: actions/cache@v4
24 | with:
25 | path: |
26 | ~/.pnpm
27 | ${{ github.workspace }}/.next/cache
28 | key: ${{ runner.os }}-nextjs-${{ hashFiles('**/pnpm-lock.yaml') }}-${{ hashFiles('**/*.js', '**/*.jsx', '**/*.ts', '**/*.tsx') }}
29 | restore-keys: |
30 | ${{ runner.os }}-nextjs-${{ hashFiles('**/pnpm-lock.yaml') }}-
31 |
32 | - name: Install dependencies
33 | shell: bash
34 | # if: steps.cache_dependencies.outputs.cache-hit != 'true'
35 | run: pnpm install
36 |
```
--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/BUG_REPORT.yml:
--------------------------------------------------------------------------------
```yaml
1 | name: Bug Report
2 | description: Report a bug to help us improve
3 | title: "[BUG] "
4 | labels: ["bug"]
5 | body:
6 | - type: markdown
7 | attributes:
8 | value: |
9 | Thanks for taking the time to report this bug!
10 |
11 | - type: textarea
12 | id: description
13 | attributes:
14 | label: What happened?
15 | description: A clear and concise description of the bug
16 | placeholder: When I click X, Y happens instead of Z
17 | validations:
18 | required: true
19 |
20 | - type: textarea
21 | id: reproduction
22 | attributes:
23 | label: Steps to reproduce
24 | description: How can we reproduce this issue?
25 | placeholder: |
26 | 1. Go to '...'
27 | 2. Click on '...'
28 | 3. See error
29 | validations:
30 | required: true
31 |
32 | - type: dropdown
33 | id: environment
34 | attributes:
35 | label: Environment
36 | description: Where does this occur?
37 | options:
38 | - Development
39 | - Staging
40 | - Production
41 | validations:
42 | required: true
43 |
44 | - type: textarea
45 | id: additional
46 | attributes:
47 | label: Additional context
48 | description: Add any other context, screenshots, or error messages
49 |
```
--------------------------------------------------------------------------------
/.github/workflows/release.yml:
--------------------------------------------------------------------------------
```yaml
1 | name: Release to npm
2 |
3 | on:
4 | workflow_dispatch:
5 |
6 | permissions:
7 | contents: write
8 |
9 | jobs:
10 | release:
11 | runs-on: ubuntu-latest
12 | steps:
13 | - name: Checkout
14 | uses: actions/checkout@v4
15 | with:
16 | fetch-depth: 0
17 |
18 | - name: Install
19 | uses: ./.github/actions/install
20 | with:
21 | registry-url: 'https://registry.npmjs.org'
22 | scope: '@thedaviddias'
23 |
24 | - name: Setup npm auth
25 | run: |
26 | echo "//registry.npmjs.org/:_authToken=${{ secrets.NPM_TOKEN }}" > ~/.npmrc
27 | echo "@thedaviddias:registry=https://registry.npmjs.org/" >> ~/.npmrc
28 | npm whoami || echo "Not logged in to npm"
29 |
30 | - name: Build
31 | run: pnpm build
32 |
33 | - name: Publish
34 | run: |
35 | echo "Attempting to publish with npm..."
36 | npm publish --access public || {
37 | echo "npm publish failed, trying with pnpm..."
38 | pnpm publish --no-git-checks --access public
39 | }
40 | env:
41 | NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
42 | NPM_CONFIG_REGISTRY: 'https://registry.npmjs.org'
43 | GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
44 |
```
--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------
```json
1 | {
2 | "name": "@thedaviddias/mcp-llms-txt-explorer",
3 | "version": "0.2.0",
4 | "description": "A Model Context Protocol server for exploring websites with llms.txt files",
5 | "type": "module",
6 | "exports": "./build/index.js",
7 | "bin": {
8 | "mcp-llms-txt-explorer": "./build/index.js"
9 | },
10 | "files": [
11 | "build"
12 | ],
13 | "scripts": {
14 | "build": "tsc && node -e \"require('fs').chmodSync('build/index.js', '755')\"",
15 | "prepare": "npm run build",
16 | "watch": "tsc --watch",
17 | "inspector": "npx @modelcontextprotocol/inspector build/index.js",
18 | "test": "tsc --noEmit",
19 | "prepublishOnly": "pnpm run build"
20 | },
21 | "dependencies": {
22 | "@modelcontextprotocol/sdk": "0.6.0",
23 | "@types/node-fetch": "^2.6.12",
24 | "node-fetch": "^2.7.0"
25 | },
26 | "devDependencies": {
27 | "@types/node": "^20.11.24",
28 | "typescript": "^5.3.3"
29 | },
30 | "keywords": [
31 | "mcp",
32 | "llms-txt",
33 | "model-context-protocol",
34 | "claude"
35 | ],
36 | "packageManager": "[email protected]",
37 | "engines": {
38 | "node": ">=18"
39 | },
40 | "author": "David Dias",
41 | "license": "MIT",
42 | "repository": {
43 | "type": "git",
44 | "url": "git+https://github.com/thedaviddias/mcp-llms-txt-explorer.git"
45 | }
46 | }
47 |
```
--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/FEATURE_REQUEST.yml:
--------------------------------------------------------------------------------
```yaml
1 | name: Feature Request
2 | description: Suggest an idea for this project
3 | title: "[FEATURE] "
4 | labels: ["enhancement"]
5 | body:
6 | - type: markdown
7 | attributes:
8 | value: |
9 | Thanks for taking the time to suggest a new feature!
10 |
11 | - type: textarea
12 | id: problem
13 | attributes:
14 | label: Problem Statement
15 | description: What problem are you trying to solve?
16 | placeholder: I'm always frustrated when...
17 | validations:
18 | required: true
19 |
20 | - type: textarea
21 | id: solution
22 | attributes:
23 | label: Proposed Solution
24 | description: What solution would you like to see?
25 | placeholder: It would be great if...
26 | validations:
27 | required: true
28 |
29 | - type: dropdown
30 | id: scope
31 | attributes:
32 | label: Scope
33 | description: Which part of the project does this affect?
34 | options:
35 | - Frontend
36 | - Backend
37 | - Infrastructure
38 | - Documentation
39 | - Other
40 | validations:
41 | required: true
42 |
43 | - type: textarea
44 | id: alternatives
45 | attributes:
46 | label: Alternatives Considered
47 | description: What alternative solutions have you considered?
48 | placeholder: I also thought about...
49 |
```
--------------------------------------------------------------------------------
/src/index.ts:
--------------------------------------------------------------------------------
```typescript
1 | #!/opt/homebrew/bin/node
2 |
3 | import { Server } from "@modelcontextprotocol/sdk/server/index.js";
4 | import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
5 | import {
6 | CallToolRequestSchema,
7 | ListResourcesRequestSchema,
8 | ListToolsRequestSchema,
9 | ReadResourceRequestSchema,
10 | } from "@modelcontextprotocol/sdk/types.js";
11 | import fetch from "node-fetch";
12 | import { createRequire } from 'node:module';
13 |
14 | const require = createRequire(import.meta.url);
15 | const { version } = require('../package.json');
16 |
17 | const websites = 'https://raw.githubusercontent.com/thedaviddias/llms-txt-hub/main/data/websites.json'
18 |
19 | /**
20 | * Type for a website with llms.txt information
21 | */
22 | interface Website {
23 | name: string;
24 | domain: string;
25 | description: string;
26 | llmsTxtUrl?: string;
27 | llmsFullTxtUrl?: string;
28 | category?: string;
29 | favicon?: string;
30 | }
31 |
32 | /**
33 | * Type for a linked content from llms.txt
34 | */
35 | interface LinkedContent {
36 | url: string;
37 | content?: string;
38 | error?: string;
39 | }
40 |
41 | /**
42 | * Type for the check website result
43 | */
44 | interface WebsiteCheckResult {
45 | hasLlmsTxt: boolean;
46 | hasLlmsFullTxt: boolean;
47 | llmsTxtUrl?: string;
48 | llmsFullTxtUrl?: string;
49 | llmsTxtContent?: string;
50 | llmsFullTxtContent?: string;
51 | linkedContents?: LinkedContent[];
52 | error?: string;
53 | }
54 |
55 | /**
56 | * Known websites with llms.txt files
57 | * Initial data from llms-txt-hub
58 | */
59 | let knownWebsites: Website[] = [];
60 |
61 | /**
62 | * Cache for website check results
63 | */
64 | const websiteCheckCache: { [domain: string]: WebsiteCheckResult } = {};
65 |
66 | /**
67 | * Create an MCP server for exploring llms.txt files
68 | */
69 | const server = new Server(
70 | {
71 | name: "LLMS.txt Explorer",
72 | version,
73 | },
74 | {
75 | capabilities: {
76 | resources: {},
77 | tools: {},
78 | },
79 | }
80 | );
81 |
82 | /**
83 | * Validate website data
84 | */
85 | function isValidWebsite(website: unknown): website is Website {
86 | if (!website || typeof website !== 'object') return false;
87 | const w = website as Record<string, unknown>;
88 | return (
89 | typeof w.name === 'string' &&
90 | typeof w.domain === 'string' &&
91 | typeof w.description === 'string' &&
92 | (w.llmsTxtUrl === undefined || typeof w.llmsTxtUrl === 'string') &&
93 | (w.llmsFullTxtUrl === undefined || typeof w.llmsFullTxtUrl === 'string') &&
94 | (w.category === undefined || typeof w.category === 'string') &&
95 | (w.favicon === undefined || typeof w.favicon === 'string')
96 | );
97 | }
98 |
99 | /**
100 | * Fetch websites list from GitHub
101 | */
102 | async function fetchWebsitesList() {
103 | try {
104 | console.error('Fetching websites list from GitHub...');
105 | const response = await fetch(websites);
106 |
107 | if (!response.ok) {
108 | throw new Error(`Failed to fetch websites list: ${response.status}`);
109 | }
110 |
111 | const data = await response.json();
112 |
113 | if (!Array.isArray(data)) {
114 | throw new Error('Invalid data format: expected an array');
115 | }
116 |
117 | const validWebsites = data.filter(isValidWebsite);
118 | console.error(`Fetched ${validWebsites.length} valid websites`);
119 | knownWebsites = validWebsites;
120 | } catch (error) {
121 | console.error('Error fetching websites list:', error);
122 | // Fallback to default website if fetch fails
123 | knownWebsites = [{
124 | name: "Supabase",
125 | domain: "https://supabase.com",
126 | description: "Build production-grade applications with Postgres",
127 | llmsTxtUrl: "https://supabase.com/llms.txt",
128 | category: "developer-tools"
129 | }];
130 | }
131 | }
132 |
133 | /**
134 | * Extract linked URLs from llms.txt content
135 | */
136 | function extractLinkedUrls(content: string): string[] {
137 | const urls: string[] = [];
138 | const lines = content.split('\n');
139 |
140 | for (const line of lines) {
141 | const trimmedLine = line.trim();
142 | if (trimmedLine.startsWith('@')) {
143 | const url = trimmedLine.slice(1).trim();
144 | if (url) {
145 | urls.push(url);
146 | }
147 | }
148 | }
149 |
150 | return urls;
151 | }
152 |
153 | /**
154 | * Check if a website has llms.txt files
155 | */
156 | async function checkWebsite(domain: string): Promise<WebsiteCheckResult> {
157 | console.error('Starting website check for:', domain);
158 |
159 | // Return cached result if available
160 | if (websiteCheckCache[domain]) {
161 | console.error('Returning cached result for:', domain);
162 | return websiteCheckCache[domain];
163 | }
164 |
165 | const result: WebsiteCheckResult = {
166 | hasLlmsTxt: false,
167 | hasLlmsFullTxt: false
168 | };
169 |
170 | // Create an overall timeout for the entire operation
171 | const globalTimeout = new Promise<never>((_, reject) => {
172 | setTimeout(() => {
173 | reject(new Error('Global timeout exceeded'));
174 | }, 15000); // 15 second global timeout
175 | });
176 |
177 | try {
178 | // Normalize domain and add protocol if missing
179 | let normalizedDomain = domain;
180 | if (!domain.startsWith('http://') && !domain.startsWith('https://')) {
181 | normalizedDomain = `https://${domain}`;
182 | }
183 | console.error('Normalized domain:', normalizedDomain);
184 |
185 | // Validate URL format
186 | let url: URL;
187 | try {
188 | url = new URL(normalizedDomain);
189 | } catch (e) {
190 | console.error('Invalid URL:', domain);
191 | throw new Error(`Invalid URL format: ${domain}`);
192 | }
193 |
194 | // Use the normalized URL
195 | const baseUrl = url.origin;
196 | console.error('Base URL:', baseUrl);
197 |
198 | // Helper function to fetch with timeout
199 | async function fetchWithTimeout(url: string, timeout = 5000) { // Reduced to 5 seconds
200 | console.error(`Fetching ${url} with ${timeout}ms timeout`);
201 | const controller = new AbortController();
202 | const timeoutId = setTimeout(() => {
203 | controller.abort();
204 | console.error(`Timeout after ${timeout}ms for ${url}`);
205 | }, timeout);
206 |
207 | try {
208 | const startTime = Date.now();
209 | const response = await fetch(url, {
210 | signal: controller.signal,
211 | headers: {
212 | 'User-Agent': 'llms-txt-explorer/0.1.0'
213 | }
214 | });
215 | const endTime = Date.now();
216 | console.error(`Fetch completed in ${endTime - startTime}ms for ${url}`);
217 | clearTimeout(timeoutId);
218 | return response;
219 | } catch (error) {
220 | clearTimeout(timeoutId);
221 | console.error(`Fetch error for ${url}:`, error);
222 | throw error;
223 | }
224 | }
225 |
226 | const checkPromise = (async () => {
227 | // Check for llms.txt
228 | try {
229 | const llmsTxtUrl = `${baseUrl}/llms.txt`;
230 | console.error('Fetching llms.txt from:', llmsTxtUrl);
231 | const llmsTxtRes = await fetchWithTimeout(llmsTxtUrl);
232 | console.error('llms.txt response status:', llmsTxtRes.status);
233 |
234 | if (llmsTxtRes.ok) {
235 | result.hasLlmsTxt = true;
236 | result.llmsTxtUrl = llmsTxtUrl;
237 | const content = await llmsTxtRes.text();
238 | console.error(`llms.txt content length: ${content.length} bytes`);
239 | result.llmsTxtContent = content;
240 | console.error('Successfully fetched llms.txt');
241 |
242 | // Extract and fetch linked contents in parallel with timeout
243 | const linkedUrls = extractLinkedUrls(content).slice(0, 3); // Reduced to 3 linked contents
244 | if (linkedUrls.length > 0) {
245 | console.error(`Found ${linkedUrls.length} linked URLs in llms.txt (limited to 3)`);
246 | result.linkedContents = [];
247 |
248 | const fetchPromises = linkedUrls.map(async (url) => {
249 | console.error(`Fetching linked content from: ${url}`);
250 | try {
251 | const linkedRes = await fetchWithTimeout(url);
252 | if (!linkedRes.ok) {
253 | throw new Error(`Failed to fetch content: ${linkedRes.status}`);
254 | }
255 | const linkedContent = await linkedRes.text();
256 | console.error(`Linked content length: ${linkedContent.length} bytes`);
257 | return {
258 | url,
259 | content: linkedContent
260 | };
261 | } catch (error) {
262 | console.error(`Error fetching linked content from ${url}:`, error);
263 | return {
264 | url,
265 | error: error instanceof Error ? error.message : 'Unknown error'
266 | };
267 | }
268 | });
269 |
270 | // Wait for all fetches to complete with a 10 second timeout
271 | const linkedContentTimeout = new Promise<never>((_, reject) => {
272 | setTimeout(() => {
273 | reject(new Error('Linked content fetch timeout'));
274 | }, 10000);
275 | });
276 |
277 | try {
278 | result.linkedContents = await Promise.race([
279 | Promise.all(fetchPromises),
280 | linkedContentTimeout
281 | ]);
282 | } catch (error) {
283 | console.error('Error fetching linked contents:', error);
284 | result.linkedContents = linkedUrls.map(url => ({
285 | url,
286 | error: 'Timeout fetching linked contents'
287 | }));
288 | }
289 | }
290 | }
291 | } catch (error: unknown) {
292 | console.error('Error in main llms.txt fetch:', error);
293 | if (error instanceof Error) {
294 | result.error = error.message;
295 | } else {
296 | result.error = 'Unknown error fetching llms.txt';
297 | }
298 | }
299 |
300 | // Only check llms-full.txt if llms.txt was successful
301 | if (result.hasLlmsTxt && !result.error) {
302 | try {
303 | const llmsFullTxtUrl = `${baseUrl}/llms-full.txt`;
304 | console.error('Fetching llms-full.txt from:', llmsFullTxtUrl);
305 | const llmsFullTxtRes = await fetchWithTimeout(llmsFullTxtUrl);
306 | console.error('llms-full.txt response status:', llmsFullTxtRes.status);
307 |
308 | if (llmsFullTxtRes.ok) {
309 | result.hasLlmsFullTxt = true;
310 | result.llmsFullTxtUrl = llmsFullTxtUrl;
311 | const content = await llmsFullTxtRes.text();
312 | console.error(`llms-full.txt content length: ${content.length} bytes`);
313 | result.llmsFullTxtContent = content;
314 | console.error('Successfully fetched llms-full.txt');
315 | }
316 | } catch (error) {
317 | console.error('Error fetching llms-full.txt:', error);
318 | // Don't fail the whole operation for llms-full.txt errors
319 | }
320 | }
321 |
322 | return result;
323 | })();
324 |
325 | // Race between the check operation and the global timeout
326 | const finalResult = await Promise.race([checkPromise, globalTimeout]);
327 |
328 | // Cache successful results only
329 | if (!finalResult.error) {
330 | websiteCheckCache[domain] = finalResult;
331 | }
332 |
333 | console.error('Final result:', JSON.stringify(finalResult, null, 2));
334 | return finalResult;
335 | } catch (error) {
336 | const errorMessage = error instanceof Error ? error.message : 'Unknown error';
337 | console.error('Error checking website:', errorMessage);
338 | return {
339 | hasLlmsTxt: false,
340 | hasLlmsFullTxt: false,
341 | error: errorMessage
342 | };
343 | }
344 | }
345 |
346 | /**
347 | * Handler for listing available websites as resources
348 | */
349 | server.setRequestHandler(ListResourcesRequestSchema, async () => {
350 | return {
351 | resources: knownWebsites.map(site => ({
352 | uri: `website://${site.domain}`,
353 | mimeType: "application/json",
354 | name: site.name,
355 | description: site.description
356 | }))
357 | };
358 | });
359 |
360 | /**
361 | * Handler for reading website information
362 | */
363 | server.setRequestHandler(ReadResourceRequestSchema, async (request) => {
364 | const url = new URL(request.params.uri);
365 | const domain = url.hostname;
366 |
367 | const website = knownWebsites.find(site => new URL(site.domain).hostname === domain);
368 | if (!website) {
369 | throw new Error(`Website ${domain} not found in known websites`);
370 | }
371 |
372 | const checkResult = await checkWebsite(website.domain);
373 |
374 | return {
375 | contents: [{
376 | uri: request.params.uri,
377 | mimeType: "application/json",
378 | text: JSON.stringify({ ...website, ...checkResult }, null, 2)
379 | }]
380 | };
381 | });
382 |
383 | /**
384 | * Handler that lists available tools
385 | */
386 | server.setRequestHandler(ListToolsRequestSchema, async () => {
387 | return {
388 | tools: [
389 | {
390 | name: "check_website",
391 | description: "Check if a website has llms.txt files",
392 | inputSchema: {
393 | type: "object",
394 | properties: {
395 | url: {
396 | type: "string",
397 | description: "URL of the website to check"
398 | }
399 | },
400 | required: ["url"]
401 | }
402 | },
403 | {
404 | name: "list_websites",
405 | description: "List known websites with llms.txt files",
406 | inputSchema: {
407 | type: "object",
408 | properties: {
409 | filter_llms_txt: {
410 | type: "boolean",
411 | description: "Only show websites with llms.txt"
412 | },
413 | filter_llms_full_txt: {
414 | type: "boolean",
415 | description: "Only show websites with llms-full.txt"
416 | }
417 | }
418 | }
419 | }
420 | ]
421 | };
422 | });
423 |
424 | /**
425 | * Handler for tool calls
426 | */
427 | server.setRequestHandler(CallToolRequestSchema, async (request) => {
428 | console.error('Received tool request:', request.params.name);
429 |
430 | switch (request.params.name) {
431 | case "check_website": {
432 | const url = String(request.params.arguments?.url);
433 | console.error('Checking website:', url);
434 |
435 | if (!url) {
436 | console.error('URL is required');
437 | return {
438 | content: [{
439 | type: "text",
440 | text: JSON.stringify({ error: "URL is required" }, null, 2)
441 | }]
442 | };
443 | }
444 |
445 | try {
446 | const result = await checkWebsite(url);
447 | console.error('Tool returning result:', JSON.stringify(result, null, 2));
448 | return {
449 | content: [{
450 | type: "text",
451 | text: JSON.stringify(result, null, 2)
452 | }]
453 | };
454 | } catch (error) {
455 | const errorMessage = error instanceof Error ? error.message : 'Unknown error';
456 | console.error('Tool returning error:', errorMessage);
457 | return {
458 | content: [{
459 | type: "text",
460 | text: JSON.stringify({ error: errorMessage }, null, 2)
461 | }]
462 | };
463 | }
464 | }
465 |
466 | case "list_websites": {
467 | const filterLlmsTxt = Boolean(request.params.arguments?.filter_llms_txt);
468 | const filterLlmsFullTxt = Boolean(request.params.arguments?.filter_llms_full_txt);
469 |
470 | let websites = knownWebsites;
471 |
472 | if (filterLlmsTxt) {
473 | websites = websites.filter(site => site.llmsTxtUrl);
474 | }
475 | if (filterLlmsFullTxt) {
476 | websites = websites.filter(site => site.llmsFullTxtUrl);
477 | }
478 |
479 | return {
480 | content: [{
481 | type: "text",
482 | text: JSON.stringify(websites, null, 2)
483 | }]
484 | };
485 | }
486 |
487 | default:
488 | throw new Error("Unknown tool");
489 | }
490 | });
491 |
492 | /**
493 | * Start the server using stdio transport
494 | */
495 | async function main() {
496 | // Fetch websites list before starting the server
497 | await fetchWebsitesList();
498 |
499 | const transport = new StdioServerTransport();
500 | await server.connect(transport);
501 | }
502 |
503 | main().catch((error) => {
504 | console.error("Server error:", error);
505 | process.exit(1);
506 | });
507 |
```