# Directory Structure
```
├── .github
│ └── workflows
│ └── ci.yaml
├── .gitignore
├── .python-version
├── client.py
├── Dockerfile
├── main.py
├── pyproject.toml
├── README.md
├── requirements.txt
├── smithery.yaml
└── uv.lock
```
# Files
--------------------------------------------------------------------------------
/.python-version:
--------------------------------------------------------------------------------
```
1 | 3.10
2 |
```
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
```
1 | # Python-generated files
2 | __pycache__/
3 | *.py[oc]
4 | build/
5 | dist/
6 | wheels/
7 | *.egg-info
8 |
9 | # Virtual environments
10 | .venv
11 |
```
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
```markdown
1 | # 🤖 Browser Automation Agent
2 |
3 | A powerful browser automation tool built with MCP (Model Controlled Program) that combines web scraping capabilities with LLM-powered intelligence. This agent can search Google, navigate to webpages, and intelligently scrape content from various websites including GitHub, Stack Overflow, and documentation sites.
4 |
5 | ## 🚀 Features
6 |
7 | - **🔍 Google Search Integration**: Finds and retrieves top search results for any query
8 | - **🕸️ Intelligent Web Scraping**: Tailored scraping strategies for different website types:
9 | - 📂 GitHub repositories
10 | - 💬 Stack Overflow questions and answers
11 | - 📚 Documentation pages
12 | - 🌐 Generic websites
13 | - **🧠 AI-Powered Processing**: Uses Mistral AI for understanding and processing scraped content
14 | - **🥷 Stealth Mode**: Implements browser fingerprint protection to avoid detection
15 | - **💾 Content Saving**: Automatically saves both screenshots and text content from scraped pages
16 |
17 | ## 🏗️ Architecture
18 |
19 | This project uses a client-server architecture powered by MCP:
20 |
21 | - **🖥️ Server**: Handles browser automation and web scraping tasks
22 | - **👤 Client**: Provides the AI interface using Mistral AI and LangGraph
23 | - **📡 Communication**: Uses stdio for client-server communication
24 |
25 | ## ⚙️ Requirements
26 |
27 | - 🐍 Python 3.8+
28 | - 🎭 Playwright
29 | - 🧩 MCP (Model Controlled Program)
30 | - 🔑 Mistral AI API key
31 |
32 | ## 📥 Installation
33 |
34 | 1. Clone the repository:
35 |
36 | ```bash
37 | git clone https://github.com/yourusername/browser-automation-agent.git
38 | cd browser-automation-agent
39 | ```
40 |
41 | 2. Install dependencies:
42 |
43 | ```bash
44 | pip install -r requirements.txt
45 | ```
46 |
47 | 3. Install Playwright browsers:
48 |
49 | ```bash
50 | playwright install
51 | ```
52 |
53 | 4. Create a `.env` file in the project root and add your Mistral AI API key:
54 |
55 | ```
56 | MISTRAL_API_KEY=your_api_key_here
57 | ```
58 |
59 | ## 📋 Usage
60 |
61 | ### Running the Server
62 |
63 | ```bash
64 | python main.py
65 | ```
66 |
67 | ### Running the Client
68 |
69 | ```bash
70 | python client.py
71 | ```
72 |
73 | ### Sample Interaction
74 |
75 | Once both the server and client are running:
76 |
77 | 1. Enter your query when prompted
78 | 2. The agent will:
79 | - 🔍 Search Google for relevant results
80 | - 🧭 Navigate to the top result
81 | - 📊 Scrape content based on the website type
82 | - 📸 Save screenshots and content to files
83 | - 📤 Return processed information
84 |
85 | ## 🛠️ Tool Functions
86 |
87 | ### `get_top_google_url`
88 | 🔍 Searches Google and returns the top result URL for a given query.
89 |
90 | ### `browse_and_scrape`
91 | 🌐 Navigates to a URL and scrapes content based on the website type.
92 |
93 | ### `scrape_github`
94 | 📂 Specializes in extracting README content and code blocks from GitHub repositories.
95 |
96 | ### `scrape_stackoverflow`
97 | 💬 Extracts questions, answers, comments, and code blocks from Stack Overflow pages.
98 |
99 | ### `scrape_documentation`
100 | 📚 Optimized for extracting documentation content and code examples.
101 |
102 | ### `scrape_generic`
103 | 🌐 Extracts paragraph text and code blocks from generic websites.
104 |
105 | ## 📁 File Structure
106 |
107 | ```
108 | browser-automation-agent/
109 | ├── main.py # MCP server implementation
110 | ├── client.py # Mistral AI client implementation
111 | ├── requirements.txt # Project dependencies
112 | ├── .env # Environment variables (API keys)
113 | └── README.md # Project documentation
114 | ```
115 |
116 | ## 📤 Output Files
117 |
118 | The agent generates two types of output files with timestamps:
119 |
120 | - 📸 `final_page_YYYYMMDD_HHMMSS.png`: Screenshot of the final page state
121 | - 📄 `scraped_content_YYYYMMDD_HHMMSS.txt`: Extracted text content from the page
122 |
123 | ## ⚙️ Customization
124 |
125 | You can modify the following parameters in the code:
126 |
127 | - 🖥️ Browser window size: Adjust `width` and `height` in `browse_and_scrape`
128 | - 👻 Headless mode: Set `headless=True` for invisible browser operation
129 | - 🔢 Number of Google results: Change `num_results` in `get_top_google_url`
130 |
131 | ## ❓ Troubleshooting
132 |
133 | - **🔌 Connection Issues**: Ensure both server and client are running in separate terminals
134 | - **🎭 Playwright Errors**: Make sure browsers are installed with `playwright install`
135 | - **🔑 API Key Errors**: Verify your Mistral API key is correctly set in the `.env` file
136 | - **🛣️ Path Errors**: Update the path to `main.py` in `client.py` if needed
137 |
138 | ## 📜 License
139 |
140 | [MIT License](LICENSE)
141 |
142 | ## 🤝 Contributing
143 |
144 | Contributions are welcome! Please feel free to submit a Pull Request.
145 |
146 | ---
147 |
148 | Built with 🧩 MCP, 🎭 Playwright, and 🧠 Mistral AI
149 |
```
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
```
1 | playwright
2 | playwright-stealth
3 | langchain_mistralai
4 | python-dotenv
5 | mcp
6 | langchain-mcp-adapters
7 | asyncio
8 |
```
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
```dockerfile
1 | FROM python:3.10-slim
2 |
3 | # Set working directory
4 | WORKDIR /app
5 |
6 | # Copy files
7 | COPY . /app
8 |
9 | # Install dependencies
10 | RUN pip install --no-cache-dir \
11 | fastmcp \
12 | firecrawl \
13 | tavily-python \
14 | rich \
15 | beautifulsoup4 \
16 | python-dotenv \
17 | requests
18 |
19 | # Expose the port if needed (optional)
20 | EXPOSE 8080
21 |
22 | # Default command
23 | CMD ["python", "main.py"]
24 |
```
--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
```toml
1 | [project]
2 | name = "browsing-mcp"
3 | version = "0.1.0"
4 | description = "Add your description here"
5 | readme = "README.md"
6 | requires-python = ">=3.10"
7 | dependencies = [
8 | "googlesearch-python>=1.3.0",
9 | "langchain-mcp-adapters>=0.0.9",
10 | "langchain-openai>=0.3.14",
11 | "playwright-stealth>=1.0.6",
12 | "playwright>=1.51.0",
13 | "python-dotenv>=1.1.0",
14 | "setuptools>=78.1.0",
15 | ]
16 |
```
--------------------------------------------------------------------------------
/smithery.yaml:
--------------------------------------------------------------------------------
```yaml
1 | build:
2 | dockerfile: Dockerfile
3 | dockerBuildPath: .
4 |
5 | startCommand:
6 | type: stdio
7 | configSchema:
8 | type: object
9 | required:
10 | - MISTRAL_API_KEY
11 | - FIRECRAWL_API_KEY
12 | - TAVILY_SEARCH_API
13 | properties:
14 | MISTRAL_API_KEY:
15 | type: string
16 | description: API key for Mistral AI
17 | FIRECRAWL_API_KEY:
18 | type: string
19 | description: API key for Firecrawl service
20 | TAVILY_SEARCH_API:
21 | type: string
22 | description: API key for Tavily search service
23 | commandFunction: |
24 | (config) => ({
25 | command: 'python',
26 | args: ['main.py'],
27 | env: {
28 | MISTRAL_API_KEY: config.MISTRAL_API_KEY,
29 | FIRECRAWL_API_KEY: config.FIRECRAWL_API_KEY,
30 | TAVILY_SEARCH_API: config.TAVILY_SEARCH_API
31 | }
32 | })
33 |
```
--------------------------------------------------------------------------------
/client.py:
--------------------------------------------------------------------------------
```python
1 | # Create server parameters for stdio connection
2 | from mcp import ClientSession, StdioServerParameters
3 | from mcp.client.stdio import stdio_client
4 | import asyncio
5 | from langchain_mcp_adapters.tools import load_mcp_tools
6 | from langgraph.prebuilt import create_react_agent
7 | from langchain_mistralai import ChatMistralAI
8 | import os
9 | from dotenv import load_dotenv
10 |
11 | # Load environment variables
12 | load_dotenv()
13 |
14 | # Initialize the Mistral AI model
15 | model = ChatMistralAI(
16 | model="mistral-small-latest",
17 | temperature=0.4,
18 | api_key=os.getenv("MISTRAL_API_KEY") # Ensure the API key is loaded
19 | )
20 |
21 | # Define server parameters
22 | server_params = StdioServerParameters(
23 | command="python",
24 | args=["main.py"], # Path to your server script
25 | )
26 |
27 | async def run_agent():
28 | try:
29 | # Connect to the server
30 | async with stdio_client(server_params) as (read, write):
31 | async with ClientSession(read, write) as session:
32 | await session.initialize()
33 | print("Client session initialized successfully.")
34 |
35 | # Load tools from the server
36 | tools = await load_mcp_tools(session)
37 | print("Tools loaded successfully.")
38 |
39 | # Create the agent
40 | agent = create_react_agent(model, tools)
41 |
42 | # Main loop for user interaction
43 | while True:
44 | query = input("Enter the query (or type 'exit' to quit): ")
45 | if query.lower() == 'exit':
46 | print("Exiting...")
47 | break
48 |
49 | # Invoke the agent with the user's query
50 | agent_response = await agent.ainvoke({"messages": query})
51 | print("Agent response:", agent_response["messages"][3].content)
52 | except Exception as e:
53 | print(f"Error during client execution: {e}")
54 | raise
55 | finally:
56 | print("Client execution complete.")
57 |
58 | if __name__ == "__main__":
59 | # Run the agent in an asyncio event loop
60 | asyncio.run(run_agent())
61 |
62 |
```
--------------------------------------------------------------------------------
/.github/workflows/ci.yaml:
--------------------------------------------------------------------------------
```yaml
1 | name: CI/CD Pipeline
2 |
3 | on:
4 | push:
5 | branches: [ main, develop ]
6 | pull_request:
7 | branches: [ main, develop ]
8 | workflow_dispatch: # Allows manual triggering
9 |
10 | jobs:
11 | lint:
12 | name: Code Linting
13 | runs-on: ubuntu-latest
14 | steps:
15 | - uses: actions/checkout@v3
16 |
17 | - name: Set up Python
18 | uses: actions/setup-python@v4
19 | with:
20 | python-version: '3.10'
21 | cache: 'pip'
22 |
23 | - name: Install dependencies
24 | run: |
25 | python -m pip install --upgrade pip
26 | pip install flake8 black isort
27 | pip install -r requirements.txt
28 |
29 | - name: Check formatting with Black
30 | run: black --check .
31 |
32 | - name: Check imports with isort
33 | run: isort --check-only --profile black .
34 |
35 | - name: Lint with flake8
36 | run: flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
37 |
38 | test:
39 | name: Run Tests
40 | runs-on: ubuntu-latest
41 | needs: lint
42 | steps:
43 | - uses: actions/checkout@v3
44 |
45 | - name: Set up Python
46 | uses: actions/setup-python@v4
47 | with:
48 | python-version: '3.10'
49 | cache: 'pip'
50 |
51 | - name: Install dependencies
52 | run: |
53 | python -m pip install --upgrade pip
54 | pip install pytest pytest-asyncio pytest-cov
55 | pip install -r requirements.txt
56 |
57 | - name: Install Playwright browsers
58 | run: playwright install --with-deps chromium
59 |
60 | - name: Create .env file
61 | run: |
62 | echo "MISTRAL_API_KEY=${{ secrets.MISTRAL_API_KEY }}" > .env
63 |
64 | - name: Run tests
65 | run: pytest --cov=. --cov-report=xml
66 |
67 | - name: Upload coverage to Codecov
68 | uses: codecov/codecov-action@v3
69 | with:
70 | file: ./coverage.xml
71 | fail_ci_if_error: false
72 |
73 | build:
74 | name: Build Docker Image
75 | runs-on: ubuntu-latest
76 | if: github.event_name == 'push' && github.ref == 'refs/heads/main'
77 | needs: test
78 | steps:
79 | - uses: actions/checkout@v3
80 |
81 | - name: Set up Docker Buildx
82 | uses: docker/setup-buildx-action@v2
83 |
84 | - name: Login to GitHub Container Registry
85 | uses: docker/login-action@v2
86 | with:
87 | registry: ghcr.io
88 | username: ${{ github.actor }}
89 | password: ${{ secrets.GITHUB_TOKEN }}
90 |
91 | - name: Extract metadata
92 | id: meta
93 | uses: docker/metadata-action@v4
94 | with:
95 | images: ghcr.io/${{ github.repository }}
96 | tags: |
97 | type=sha,format=long
98 | type=ref,event=branch
99 | type=semver,pattern={{version}}
100 | latest
101 |
102 | - name: Build and push
103 | uses: docker/build-push-action@v4
104 | with:
105 | context: .
106 | push: true
107 | tags: ${{ steps.meta.outputs.tags }}
108 | labels: ${{ steps.meta.outputs.labels }}
109 | cache-from: type=gha
110 | cache-to: type=gha,mode=max
111 |
112 | deploy:
113 | name: Deploy to Dev Environment
114 | runs-on: ubuntu-latest
115 | if: github.event_name == 'push' && github.ref == 'refs/heads/main'
116 | needs: build
117 | environment: development
118 | steps:
119 | - name: Install SSH key
120 | uses: shimataro/ssh-key-action@v2
121 | with:
122 | key: ${{ secrets.SSH_PRIVATE_KEY }}
123 | known_hosts: ${{ secrets.KNOWN_HOSTS }}
124 |
125 | - name: Deploy to server
126 | run: |
127 | ssh ${{ secrets.SSH_USER }}@${{ secrets.SSH_HOST }} << 'EOF'
128 | cd /path/to/deployment
129 | docker pull ghcr.io/${{ github.repository }}:latest
130 | docker-compose down
131 | docker-compose up -d
132 | docker system prune -af
133 | EOF
134 |
```
--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
```python
1 | from mcp.server.fastmcp import FastMCP
2 | from firecrawl import FirecrawlApp
3 | from tavily import TavilyClient
4 | import re
5 | import requests
6 | from bs4 import BeautifulSoup
7 | from rich.markdown import Markdown
8 | from rich.console import Console
9 | from dotenv import load_dotenv
10 | import os
11 | import io
12 |
13 | load_dotenv()
14 |
15 | mcp = FastMCP("Framework Summarizer")
16 |
17 | app = FirecrawlApp(api_key=os.getenv('FIRECRAWL_API_KEY'))
18 |
19 | tavily_client = TavilyClient(api_key=os.getenv("TAVILY_SEARCH_API"))
20 |
21 | def render_markdown(markdown_text: str) -> str:
22 | """Render markdown text into formatted output.
23 |
24 | This function uses the rich library to render markdown content with proper formatting.
25 | It captures the output in a string and handles various markdown elements.
26 |
27 | Args:
28 | markdown_text (str): The markdown text to render.
29 |
30 | Returns:
31 | str: The rendered markdown content with proper formatting.
32 |
33 | Example:
34 | >>> content = "# Hello World\n\nThis is **bold** text."
35 | >>> rendered = render_markdown(content)
36 | >>> print(rendered)
37 | """
38 | try:
39 | # Create a console that writes to a string buffer
40 | console = Console(file=io.StringIO())
41 |
42 | # Create and render the markdown
43 | md = Markdown(markdown_text)
44 | console.print(md)
45 |
46 | # Get the rendered content from the buffer
47 | rendered = console.file.getvalue()
48 | console.file.close()
49 |
50 | return rendered
51 | except Exception as e:
52 | return f"Error rendering markdown: {str(e)}"
53 |
54 | @mcp.tool()
55 | def search_and_scrape(query:str):
56 | """Search for content using Tavily and scrape the most relevant result.
57 |
58 | This function performs a two-step process:
59 | 1. Uses Tavily search API to find the most relevant URLs for a given query
60 | 2. Scrapes the content from the top-ranked URL using Firecrawl
61 |
62 | Args:
63 | query (str): The search query to find relevant content. This query will be used
64 | to search for and retrieve the most relevant webpage content.
65 |
66 | Returns:
67 | str: The scraped content in markdown format from the most relevant webpage.
68 |
69 | Example:
70 | >>> content = search_and_scrape("What is Python programming language?")
71 | >>> print(content)
72 |
73 | Raises:
74 | Exception: If the search fails or if the scraping process fails.
75 | """
76 | response = tavily_client.search(query, max_results=5)
77 | top_5_urls = [result['url'] for result in response.get('results', [])]
78 | url = top_5_urls[0]
79 | response = app.scrape_url(url=url, params={
80 | 'formats': [ 'markdown' ],
81 | })
82 | return response['markdown']
83 |
84 | @mcp.tool()
85 | def list_directory(path: str = ".") -> list:
86 | """List contents of a directory.
87 |
88 | This tool lists all files and directories in the specified path.
89 | If no path is provided, it lists the current directory.
90 |
91 | Args:
92 | path (str, optional): The directory path to list. Defaults to current directory (".").
93 |
94 | Returns:
95 | list: A list of dictionaries containing information about each item:
96 | - name: The name of the file/directory
97 | - type: Either "file" or "directory"
98 | - size: File size in bytes (for files only)
99 | - modified: Last modification timestamp
100 |
101 | Example:
102 | >>> contents = list_directory("/path/to/directory")
103 | >>> print(contents)
104 | """
105 | try:
106 | items = []
107 | for item in os.listdir(path):
108 | full_path = os.path.join(path, item)
109 | item_info = {
110 | "name": item,
111 | "type": "directory" if os.path.isdir(full_path) else "file",
112 | "modified": os.path.getmtime(full_path)
113 | }
114 | if item_info["type"] == "file":
115 | item_info["size"] = os.path.getsize(full_path)
116 | items.append(item_info)
117 | return items
118 | except Exception as e:
119 | return {"error": str(e)}
120 |
121 | @mcp.tool()
122 | def get_current_directory() -> str:
123 | """Get the current working directory.
124 |
125 | Returns:
126 | str: The absolute path of the current working directory.
127 |
128 | Example:
129 | >>> current_dir = get_current_directory()
130 | >>> print(current_dir)
131 | """
132 | return os.getcwd()
133 |
134 | @mcp.tool()
135 | def change_directory(path: str) -> str:
136 | """Change the current working directory.
137 |
138 | Args:
139 | path (str): The directory path to change to.
140 |
141 | Returns:
142 | str: The new current working directory path.
143 |
144 | Raises:
145 | Exception: If the directory doesn't exist or is not accessible.
146 |
147 | Example:
148 | >>> new_dir = change_directory("/path/to/directory")
149 | >>> print(new_dir)
150 | """
151 | try:
152 | os.chdir(path)
153 | return os.getcwd()
154 | except Exception as e:
155 | return {"error": str(e)}
156 |
157 | @mcp.tool()
158 | def file_info(path: str) -> dict:
159 | """Get detailed information about a file or directory.
160 |
161 | Args:
162 | path (str): The path to the file or directory. Can be obtained from list_all_files()["files"][i]["path"].
163 |
164 | Returns:
165 | dict: A dictionary containing:
166 | - exists: Whether the path exists
167 | - type: "file" or "directory"
168 | - size: Size in bytes (for files)
169 | - created: Creation timestamp
170 | - modified: Last modification timestamp
171 | - accessed: Last access timestamp
172 | - absolute_path: Full absolute path
173 |
174 | Example:
175 | >>> # Get all files first
176 | >>> all_files = list_all_files()
177 | >>> # Get info for first file
178 | >>> info = file_info(all_files["files"][0]["path"])
179 | >>> print(info)
180 | """
181 | try:
182 | info = {
183 | "exists": os.path.exists(path),
184 | "absolute_path": os.path.abspath(path)
185 | }
186 |
187 | if info["exists"]:
188 | info.update({
189 | "type": "directory" if os.path.isdir(path) else "file",
190 | "created": os.path.getctime(path),
191 | "modified": os.path.getmtime(path),
192 | "accessed": os.path.getatime(path)
193 | })
194 |
195 | if info["type"] == "file":
196 | info["size"] = os.path.getsize(path)
197 |
198 | return info
199 | except Exception as e:
200 | return {"error": str(e)}
201 |
202 | @mcp.tool()
203 | def create_directory(path: str) -> dict:
204 | """Create a new directory.
205 |
206 | Args:
207 | path (str): The path where the directory should be created.
208 |
209 | Returns:
210 | dict: A dictionary containing:
211 | - success: Boolean indicating if creation was successful
212 | - path: The created directory path
213 | - error: Error message if creation failed
214 |
215 | Example:
216 | >>> result = create_directory("/path/to/new/directory")
217 | >>> print(result)
218 | """
219 | try:
220 | os.makedirs(path, exist_ok=True)
221 | return {
222 | "success": True,
223 | "path": os.path.abspath(path)
224 | }
225 | except Exception as e:
226 | return {
227 | "success": False,
228 | "error": str(e)
229 | }
230 |
231 | @mcp.tool()
232 | def scrape_content(url):
233 | """Scrape content from a given URL and return it in markdown format.
234 |
235 | This tool uses Firecrawl to extract content from a webpage and convert it to markdown format.
236 | It's designed to handle various types of web content and convert them into a consistent markdown representation.
237 |
238 | Args:
239 | url (str): The URL of the webpage to scrape. Must be a valid HTTP/HTTPS URL.
240 |
241 | Returns:
242 | str: The scraped content in markdown format.
243 |
244 | Example:
245 | >>> content = scrape_content("https://example.com")
246 | >>> print(content)
247 |
248 | Raises:
249 | Exception: If the URL is invalid or if the scraping process fails.
250 | """
251 | headers = {"User-Agent": "Mozilla/5.0"} # Bypass simple bot detection
252 | response = requests.get(url, headers=headers,timeout=10)
253 |
254 | if response.status_code == 200:
255 | soup = BeautifulSoup(response.text, "html.parser")
256 |
257 | # Remove all <a> (links) and <script> tags
258 | for tag in soup(["a", "script", "style", "noscript"]):
259 | tag.decompose()
260 |
261 | # Extract clean text from <p> tags
262 | paragraphs = [p.get_text(strip=True) for p in soup.find_all("p")]
263 |
264 | return "\n".join(paragraphs)
265 |
266 | else:
267 | return f"Error: Unable to scrape. Status code {response.status_code}"
268 |
269 | @mcp.tool()
270 | def read_file_content(file_path: str, start_line: int = 1, end_line: int = None) -> dict:
271 | """Read and display the contents of a file with proper formatting.
272 |
273 | This tool reads a file and returns its contents with metadata. For text files,
274 | it can optionally return specific line ranges. For markdown files, it includes
275 | rendered content.
276 |
277 | Args:
278 | file_path (str): The path to the file to read. Can be obtained from list_all_files()["files"][i]["path"].
279 | start_line (int, optional): Starting line number to read. Defaults to 1.
280 | end_line (int, optional): Ending line number to read. If None, reads entire file.
281 |
282 | Returns:
283 | dict: A dictionary containing:
284 | - content: The file contents
285 | - rendered_content: Rendered markdown if applicable
286 | - metadata: File information (size, type, etc.)
287 | - error: Error message if reading fails
288 |
289 | Example:
290 | >>> # Get all files first
291 | >>> all_files = list_all_files()
292 | >>> # Read content of first file
293 | >>> result = read_file_content(all_files["files"][0]["path"])
294 | >>> print(result["content"])
295 | """
296 | try:
297 | # Get file information
298 | info = file_info(file_path)
299 | if not info["exists"]:
300 | return {"error": f"File not found: {file_path}"}
301 |
302 | # Read file content
303 | with open(file_path, 'r', encoding='utf-8') as file:
304 | if end_line is None:
305 | content = file.read()
306 | else:
307 | lines = file.readlines()
308 | content = ''.join(lines[start_line-1:end_line])
309 |
310 | result = {
311 | "content": content,
312 | "metadata": info
313 | }
314 |
315 | # If it's a markdown file, add rendered content
316 | if file_path.lower().endswith(('.md', '.markdown')):
317 | result["rendered_content"] = render_markdown(content)
318 |
319 | return result
320 |
321 | except Exception as e:
322 | return {"error": f"Error reading file: {str(e)}"}
323 |
324 | @mcp.tool()
325 | def preview_file(file_path: str, num_lines: int = 10) -> dict:
326 | """Preview the beginning of a file.
327 |
328 | This tool reads and displays the first few lines of a file, useful for
329 | quick file content inspection.
330 |
331 | Args:
332 | file_path (str): The path to the file to preview. Can be obtained from list_all_files()["files"][i]["path"].
333 | num_lines (int, optional): Number of lines to preview. Defaults to 10.
334 |
335 | Returns:
336 | dict: A dictionary containing:
337 | - preview: The first few lines of the file
338 | - total_lines: Total number of lines in the file
339 | - metadata: File information
340 | - error: Error message if reading fails
341 |
342 | Example:
343 | >>> # Get all files first
344 | >>> all_files = list_all_files()
345 | >>> # Preview first file
346 | >>> preview = preview_file(all_files["files"][0]["path"], num_lines=5)
347 | >>> print(preview["preview"])
348 | """
349 | try:
350 | # Get file information
351 | info = file_info(file_path)
352 | if not info["exists"]:
353 | return {"error": f"File not found: {file_path}"}
354 |
355 | # Read first few lines
356 | with open(file_path, 'r', encoding='utf-8') as file:
357 | lines = file.readlines()
358 | preview = ''.join(lines[:num_lines])
359 |
360 | return {
361 | "preview": preview,
362 | "total_lines": len(lines),
363 | "metadata": info
364 | }
365 |
366 | except Exception as e:
367 | return {"error": f"Error previewing file: {str(e)}"}
368 |
369 | @mcp.tool()
370 | def list_all_files(path: str = ".", exclude_dirs: list = None) -> dict:
371 | """Recursively list all files in a directory and its subdirectories.
372 |
373 | This tool walks through all directories and subdirectories to find all files,
374 | with options to exclude specific directories and file types.
375 |
376 | Args:
377 | path (str, optional): The root directory to start from. Defaults to current directory (".").
378 | exclude_dirs (list, optional): List of directory names to exclude (e.g., ['node_modules', '.git']).
379 |
380 | Returns:
381 | dict: A dictionary containing:
382 | - files: List of dictionaries with file information:
383 | - path: Full path to the file
384 | - name: File name
385 | - size: File size in bytes
386 | - type: File type (extension)
387 | - modified: Last modification timestamp
388 | - total_files: Total number of files found
389 | - total_size: Total size of all files in bytes
390 | - error: Error message if operation fails
391 |
392 | Example:
393 | >>> result = list_all_files("/path/to/directory", exclude_dirs=['node_modules'])
394 | >>> print(result["files"])
395 | """
396 | try:
397 | if exclude_dirs is None:
398 | exclude_dirs = ['.git', 'node_modules', '__pycache__', '.venv', 'venv']
399 |
400 | files = []
401 | total_size = 0
402 |
403 | for root, dirs, files_in_dir in os.walk(path):
404 | # Skip excluded directories
405 | dirs[:] = [d for d in dirs if d not in exclude_dirs]
406 |
407 | for file in files_in_dir:
408 | file_path = os.path.join(root, file)
409 | file_info = {
410 | "path": file_path,
411 | "name": file,
412 | "size": os.path.getsize(file_path),
413 | "type": os.path.splitext(file)[1],
414 | "modified": os.path.getmtime(file_path)
415 | }
416 | files.append(file_info)
417 | total_size += file_info["size"]
418 |
419 | return {
420 | "files": files,
421 | "total_files": len(files),
422 | "total_size": total_size,
423 | "excluded_dirs": exclude_dirs
424 | }
425 |
426 | except Exception as e:
427 | return {"error": f"Error listing files: {str(e)}"}
428 |
429 | @mcp.tool()
430 | def find_files_by_type(path: str = ".", file_type: str = None) -> dict:
431 | """Find all files of a specific type in a directory and its subdirectories.
432 |
433 | Args:
434 | path (str, optional): The root directory to start from. Defaults to current directory (".").
435 | file_type (str, optional): The file extension to search for (e.g., '.py', '.js', '.md').
436 |
437 | Returns:
438 | dict: A dictionary containing:
439 | - files: List of matching files with their details
440 | - total_matches: Number of files found
441 | - file_type: The type of files searched for
442 |
443 | Example:
444 | >>> result = find_files_by_type("/path/to/directory", file_type=".py")
445 | >>> print(result["files"])
446 | """
447 | try:
448 | all_files = list_all_files(path)
449 | if "error" in all_files:
450 | return all_files
451 |
452 | if file_type:
453 | if not file_type.startswith('.'):
454 | file_type = '.' + file_type
455 |
456 | matching_files = [
457 | file for file in all_files["files"]
458 | if file["type"].lower() == file_type.lower()
459 | ]
460 | else:
461 | matching_files = all_files["files"]
462 |
463 | return {
464 | "files": matching_files,
465 | "total_matches": len(matching_files),
466 | "file_type": file_type
467 | }
468 |
469 | except Exception as e:
470 | return {"error": f"Error finding files: {str(e)}"}
471 |
472 | if __name__ == "__main__":
473 | print("Starting MCP server...")
474 | print("MCP server is running.")
475 | mcp.run(transport='stdio')
476 |
```