# Directory Structure ``` ├── demonstration.gif ├── Dockerfile ├── icon.png ├── LICENSE ├── MANIFEST.in ├── pyproject.toml ├── README.md ├── smithery.yaml ├── src │ ├── computer_control_mcp │ │ ├── __init__.py │ │ ├── __main__.py │ │ ├── cli.py │ │ ├── core.py │ │ ├── FZYTK.TTF │ │ ├── gui.py │ │ ├── server.py │ │ ├── test_image.png │ │ └── test.py │ └── README.md ├── tests │ ├── conftest.py │ ├── rapidocr_test.py │ ├── README.md │ ├── run_cli.py │ ├── run_server.py │ ├── setup.py │ ├── test_computer_control.py │ └── test_screenshot.py └── uv.lock ``` # Files -------------------------------------------------------------------------------- /tests/README.md: -------------------------------------------------------------------------------- ```markdown # Computer Control MCP Tests This directory contains the tests for the Computer Control MCP package. ## Running Tests To run the tests, use pytest: ```bash pytest ``` Or with specific test: ```bash pytest tests/test_computer_control.py ``` ## Test Structure - `conftest.py`: Pytest configuration - `test_computer_control.py`: Tests for the core functionality ``` -------------------------------------------------------------------------------- /src/README.md: -------------------------------------------------------------------------------- ```markdown # Computer Control MCP Source Code This directory contains the source code for the Computer Control MCP package. ## Structure - `computer_control_mcp/`: Main package directory - `__init__.py`: Package initialization - `__main__.py`: Entry point for running as a module - `core.py`: Core functionality - `cli.py`: Command-line interface - `gui.py`: Graphical user interface for testing ``` -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- ```markdown # Computer Control MCP ### MCP server that provides computer control capabilities, like mouse, keyboard, OCR, etc. using PyAutoGUI, RapidOCR, ONNXRuntime. Similar to 'computer-use' by Anthropic. With Zero External Dependencies. <div align="center" style="text-align:center;font-family: monospace; display: flex; align-items: center; justify-content: center; width: 100%; gap: 10px"> <a href="https://nextjs-boilerplate-ashy-nine-64.vercel.app/demo-computer-control"><img src="https://komarev.com/ghpvc/?username=AB498&label=DEMO&style=for-the-badge&color=CC0000" /></a> <a href="https://discord.gg/ZeeqSBpjU2"><img src="https://img.shields.io/discord/1095854826786668545?style=for-the-badge&color=0000CC" alt="Discord"></a> <a href="https://img.shields.io/badge/License-MIT-yellow.svg"><img src="https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge&color=00CC00" alt="License: MIT"></a> <a href="https://pypi.org/project/computer-control-mcp"><img src="https://img.shields.io/pypi/v/computer-control-mcp?style=for-the-badge" alt="PyPi"></a> </div> ---  ## Quick Usage (MCP Setup Using `uvx`) ***Note:** Running `uvx computer-control-mcp@latest` for the first time will download python dependencies (around 70MB) which may take some time. Recommended to run this in a terminal before using it as MCP. Subsequent runs will be instant.* ```json { "mcpServers": { "computer-control-mcp": { "command": "uvx", "args": ["computer-control-mcp@latest"] } } } ``` OR install globally with `pip`: ```bash pip install computer-control-mcp ``` Then run the server with: ```bash computer-control-mcp # instead of uvx computer-control-mcp, so you can use the latest version, also you can `uv cache clean` to clear the cache and `uvx` again to use latest version. ``` ## Features - Control mouse movements and clicks - Type text at the current cursor position - Take screenshots of the entire screen or specific windows with optional saving to downloads directory - Extract text from screenshots using OCR (Optical Character Recognition) - List and activate windows - Press keyboard keys - Drag and drop operations ## Available Tools ### Mouse Control - `click_screen(x: int, y: int)`: Click at specified screen coordinates - `move_mouse(x: int, y: int)`: Move mouse cursor to specified coordinates - `drag_mouse(from_x: int, from_y: int, to_x: int, to_y: int, duration: float = 0.5)`: Drag mouse from one position to another - `mouse_down(button: str = "left")`: Hold down a mouse button ('left', 'right', 'middle') - `mouse_up(button: str = "left")`: Release a mouse button ('left', 'right', 'middle') ### Keyboard Control - `type_text(text: str)`: Type the specified text at current cursor position - `press_key(key: str)`: Press a specified keyboard key - `key_down(key: str)`: Hold down a specific keyboard key until released - `key_up(key: str)`: Release a specific keyboard key - `press_keys(keys: Union[str, List[Union[str, List[str]]]])`: Press keyboard keys (supports single keys, sequences, and combinations) ### Screen and Window Management - `take_screenshot(title_pattern: str = None, use_regex: bool = False, threshold: int = 60, scale_percent_for_ocr: int = None, save_to_downloads: bool = False)`: Capture screen or window - `take_screenshot_with_ocr(title_pattern: str = None, use_regex: bool = False, threshold: int = 10, scale_percent_for_ocr: int = None, save_to_downloads: bool = False)`: Extract adn return text with coordinates using OCR from screen or window - `get_screen_size()`: Get current screen resolution - `list_windows()`: List all open windows - `activate_window(title_pattern: str, use_regex: bool = False, threshold: int = 60)`: Bring specified window to foreground ## Development ### Setting up the Development Environment ```bash # Clone the repository git clone https://github.com/AB498/computer-control-mcp.git cd computer-control-mcp # Install in development mode pip install -e . # Start server python -m computer_control_mcp.core # -- OR -- # Build hatch build # Non-windows pip install dist/*.whl --upgrade # Windows $latest = Get-ChildItem .\dist\*.whl | Sort-Object LastWriteTime -Descending | Select-Object -First 1 pip install $latest.FullName --upgrade # Run computer-control-mcp ``` ### Running Tests ```bash python -m pytest ``` ## API Reference See the [API Reference](docs/api.md) for detailed information about the available functions and classes. ## License MIT ## For more information or help - [Email ([email protected])](mailto:[email protected]) - [Discord (CodePlayground)](https://discord.gg/ZeeqSBpjU2) ``` -------------------------------------------------------------------------------- /tests/run_cli.py: -------------------------------------------------------------------------------- ```python #!/usr/bin/env python """ Simple script to run the Computer Control MCP CLI. """ from computer_control_mcp.cli import main if __name__ == "__main__": main() ``` -------------------------------------------------------------------------------- /tests/conftest.py: -------------------------------------------------------------------------------- ```python """ Pytest configuration file. """ import pytest import sys import os # Add the src directory to the Python path sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "src"))) ``` -------------------------------------------------------------------------------- /src/computer_control_mcp/__init__.py: -------------------------------------------------------------------------------- ```python """ Computer Control MCP - Python package for computer control via MCP. This package provides computer control capabilities using PyAutoGUI through a Model Context Protocol (MCP) server. """ from computer_control_mcp.core import mcp, main __version__ = "0.1.2" __all__ = ["mcp", "main"] ``` -------------------------------------------------------------------------------- /src/computer_control_mcp/server.py: -------------------------------------------------------------------------------- ```python """ Server module for Computer Control MCP. This module provides a simple way to run the MCP server. """ from computer_control_mcp.core import main as run_server def main(): """Run the MCP server.""" print("Starting Computer Control MCP server...") run_server() if __name__ == "__main__": main() ``` -------------------------------------------------------------------------------- /tests/run_server.py: -------------------------------------------------------------------------------- ```python #!/usr/bin/env python """ Simple script to run the Computer Control MCP server. """ # import sys # import os # sys.path.insert(0, os.path.join(os.path.dirname(__file__), "src")) # from computer_control_mcp.core import main from computer_control_mcp.core import main if __name__ == "__main__": print("Starting Computer Control MCP server...") main() ``` -------------------------------------------------------------------------------- /smithery.yaml: -------------------------------------------------------------------------------- ```yaml # Smithery configuration file: https://smithery.ai/docs/config#smitheryyaml startCommand: type: stdio configSchema: # JSON Schema defining the configuration options for the MCP. type: object description: Empty config commandFunction: # A JS function that produces the CLI command based on the given config to start the MCP on stdio. |- (config) => ({ command: 'python', args: ['src/computer_control_mcp/core.py'] }) exampleConfig: {} ``` -------------------------------------------------------------------------------- /tests/setup.py: -------------------------------------------------------------------------------- ```python #!/usr/bin/env python """ Backward compatibility setup.py file for Computer Control MCP. This file is provided for backward compatibility with tools that don't support pyproject.toml. """ import setuptools if __name__ == "__main__": try: setuptools.setup() except Exception as e: print(f"Error: {e}") print("\nThis package uses pyproject.toml for configuration.") print("Please use a PEP 517 compatible build tool like pip or build.") print("For example: pip install .") ``` -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- ```dockerfile # Use a lightweight Python base image FROM python:3.12-slim # Set environment variables for Python ENV PYTHONDONTWRITEBYTECODE=1 \ PYTHONUNBUFFERED=1 # Set working directory WORKDIR /app # Copy dependency file(s) COPY pyproject.toml . COPY src/ src/ COPY README.md README.md # Install build backend (Hatchling) RUN pip install --upgrade pip && \ pip install hatchling && \ pip install -e . # Copy any additional files (e.g. configs, CLI, entrypoints) COPY . . # Default command (can be overridden) CMD ["python", "-m", "computer_control_mcp"] ``` -------------------------------------------------------------------------------- /src/computer_control_mcp/__main__.py: -------------------------------------------------------------------------------- ```python """ Entry point for running the Computer Control MCP as a module. This module serves as the main entry point for the package. When executed directly (e.g., with `python -m computer_control_mcp`), it will run the CLI interface. For CLI functionality, use: computer-control-mcp <command> python -m computer_control_mcp <command> """ from computer_control_mcp.cli import main as cli_main def main(): """Main entry point for the package.""" # Run the CLI when the module is executed directly cli_main() if __name__ == "__main__": main() ``` -------------------------------------------------------------------------------- /tests/test_screenshot.py: -------------------------------------------------------------------------------- ```python import sys sys.path.append('src') from computer_control_mcp.core import take_screenshot # Test with save_to_downloads=False result = take_screenshot(mode='whole_screen', save_to_downloads=False) print('Base64 image included:', 'base64_image' in result) print('MCP Image included:', 'image' in result) # Test with save_to_downloads=True result = take_screenshot(mode='whole_screen', save_to_downloads=True) print('Base64 image included:', 'base64_image' in result) print('MCP Image included:', 'image' in result) print('File path included:', 'file_path' in result) ``` -------------------------------------------------------------------------------- /tests/rapidocr_test.py: -------------------------------------------------------------------------------- ```python import cv2 from rapidocr import RapidOCR from rapidocr_onnxruntime import VisRes image_path = r"C:\Users\Admin\AppData\Local\Temp\tmpdw2d8r14\screenshot_20250815_033153_f99a8396.png" img = cv2.imread(image_path) if img is None: print(f"Failed to load img: {image_path}") else: print(f"Loaded img: {image_path}, shape: {img.shape}") engine = RapidOCR() vis = VisRes() output = engine(img) # Separate into boxes, texts, and scores boxes = output.boxes txts = output.txts scores = output.scores zipped_results = list(zip(boxes, txts, scores)) print(f"Found {len(zipped_results)} text items in OCR result.") print(f"First 10 items: {str(zipped_results).encode("utf-8", errors="ignore")}") ``` -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- ```toml [build-system] requires = ["hatchling"] build-backend = "hatchling.build" [project] name = "computer-control-mcp" version = "0.3.6" description = "MCP server that provides computer control capabilities, like mouse, keyboard, OCR, etc. using PyAutoGUI, RapidOCR, ONNXRuntime. Similar to 'computer-use' by Anthropic. With Zero External Dependencies." readme = "README.md" requires-python = ">=3.8" license = {text = "MIT"} authors = [{name = "AB498", email = "[email protected]"}] classifiers = [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.12", "Topic :: Software Development :: Libraries", "Topic :: Utilities" ] dependencies = [ "pyautogui==0.9.54", "mcp[cli]==1.13.0", "pillow==11.3.0", "pygetwindow==0.0.9", "pywinctl==0.4.1", "fuzzywuzzy==0.18.0", "rapidocr==3.3.1", "onnxruntime==1.22.0", "rapidocr_onnxruntime==1.2.3", "opencv-python==4.12.0.88", "python-Levenshtein>=0.20.9", "mss>=7.0.0" ] [project.urls] Homepage = "https://github.com/AB498/computer-control-mcp" Issues = "https://github.com/AB498/computer-control-mcp/issues" Documentation = "https://github.com/AB498/computer-control-mcp#readme" [project.scripts] computer-control-mcp = "computer_control_mcp.cli:main" computer-control-mcp-server = "computer_control_mcp.server:main" [tool.hatch.build] sources = ["src"] packages = ["src/computer_control_mcp"] [tool.hatch.build.targets.wheel] packages = ["src/computer_control_mcp"] ``` -------------------------------------------------------------------------------- /src/computer_control_mcp/gui.py: -------------------------------------------------------------------------------- ```python """ GUI Test Harness for Computer Control MCP. This module provides a graphical user interface for testing the Computer Control MCP functionality. """ import tkinter as tk from tkinter import ttk, scrolledtext from PIL import Image, ImageTk import pyautogui import json import io from computer_control_mcp.core import mcp class TestHarnessGUI: def __init__(self, root): self.root = root self.root.title("Computer Control Test Harness") self.root.geometry("800x600") # Create main frame with scrollbar self.main_frame = ttk.Frame(root) self.main_frame.pack(fill=tk.BOTH, expand=True, padx=10, pady=10) # Create test sections self.create_click_test_section() self.create_type_text_section() self.create_screenshot_section() self.create_output_section() # Initialize test results self.test_results = {} def create_click_test_section(self): frame = ttk.LabelFrame(self.main_frame, text="Mouse Click Test") frame.pack(fill=tk.X, padx=5, pady=5) # Coordinates input coord_frame = ttk.Frame(frame) coord_frame.pack(fill=tk.X, padx=5, pady=5) ttk.Label(coord_frame, text="X:").pack(side=tk.LEFT) self.x_entry = ttk.Entry(coord_frame, width=10) self.x_entry.pack(side=tk.LEFT, padx=5) ttk.Label(coord_frame, text="Y:").pack(side=tk.LEFT) self.y_entry = ttk.Entry(coord_frame, width=10) self.y_entry.pack(side=tk.LEFT, padx=5) ttk.Button(frame, text="Test Click", command=self.test_click).pack(pady=5) def create_type_text_section(self): frame = ttk.LabelFrame(self.main_frame, text="Type Text Test") frame.pack(fill=tk.X, padx=5, pady=5) ttk.Label(frame, text="Text to type:").pack(pady=2) self.text_entry = ttk.Entry(frame) self.text_entry.pack(fill=tk.X, padx=5, pady=2) ttk.Button(frame, text="Test Type Text", command=self.test_type_text).pack(pady=5) def create_screenshot_section(self): frame = ttk.LabelFrame(self.main_frame, text="Screenshot Test") frame.pack(fill=tk.X, padx=5, pady=5) ttk.Button(frame, text="Take Screenshot", command=self.test_screenshot).pack(pady=5) # Canvas for screenshot preview self.screenshot_canvas = tk.Canvas(frame, width=200, height=150) self.screenshot_canvas.pack(pady=5) def create_output_section(self): frame = ttk.LabelFrame(self.main_frame, text="Test Output") frame.pack(fill=tk.BOTH, expand=True, padx=5, pady=5) self.output_text = scrolledtext.ScrolledText(frame, height=10) self.output_text.pack(fill=tk.BOTH, expand=True, padx=5, pady=5) def log_output(self, test_name, request_data, response_data): self.output_text.insert(tk.END, f"\n===== TEST: {test_name} =====\n") self.output_text.insert(tk.END, f"REQUEST: {json.dumps(request_data, indent=2)}\n") self.output_text.insert(tk.END, f"RESPONSE: {response_data}\n") self.output_text.insert(tk.END, "======================\n") self.output_text.see(tk.END) def test_click(self): try: x = int(self.x_entry.get()) y = int(self.y_entry.get()) request_data = {"x": x, "y": y} result = mcp.click_screen(**request_data) self.log_output("click_screen", request_data, result) except Exception as e: self.log_output("click_screen", request_data, f"Error: {str(e)}") def test_type_text(self): try: text = self.text_entry.get() request_data = {"text": text} result = mcp.type_text(**request_data) self.log_output("type_text", request_data, result) except Exception as e: self.log_output("type_text", request_data, f"Error: {str(e)}") def test_screenshot(self): try: result = mcp.take_screenshot() # Convert bytes to image for preview image = Image.open(io.BytesIO(result.data)) # Resize for preview image.thumbnail((200, 150)) photo = ImageTk.PhotoImage(image) self.screenshot_canvas.create_image(100, 75, image=photo) self.screenshot_canvas.image = photo # Keep reference self.log_output("take_screenshot", {}, "Screenshot taken successfully") except Exception as e: self.log_output("take_screenshot", {}, f"Error: {str(e)}") def main(): root = tk.Tk() app = TestHarnessGUI(root) root.mainloop() if __name__ == "__main__": main() ``` -------------------------------------------------------------------------------- /src/computer_control_mcp/cli.py: -------------------------------------------------------------------------------- ```python """ Command-line interface for Computer Control MCP. This module provides a command-line interface for interacting with the Computer Control MCP. """ import argparse import sys from computer_control_mcp.core import mcp, main as run_server def parse_args(): """Parse command-line arguments.""" parser = argparse.ArgumentParser(description="Computer Control MCP CLI") subparsers = parser.add_subparsers(dest="command", help="Command to run") # Server command server_parser = subparsers.add_parser("server", help="Run the MCP server") # Click command click_parser = subparsers.add_parser("click", help="Click at specified coordinates") click_parser.add_argument("x", type=int, help="X coordinate") click_parser.add_argument("y", type=int, help="Y coordinate") # Type text command type_parser = subparsers.add_parser("type", help="Type text at current cursor position") type_parser.add_argument("text", help="Text to type") # Screenshot command screenshot_parser = subparsers.add_parser("screenshot", help="Take a screenshot") screenshot_parser.add_argument("--mode", choices=["all_windows", "single_window", "whole_screen"], default="whole_screen", help="Screenshot mode") screenshot_parser.add_argument("--title", help="Window title pattern (for single_window mode)") screenshot_parser.add_argument("--regex", action="store_true", help="Use regex for title matching") screenshot_parser.add_argument("--output", help="Output file path (if not provided, saves to downloads directory)") screenshot_parser.add_argument("--no-save", action="store_true", help="Don't save images to downloads directory") # List windows command subparsers.add_parser("list-windows", help="List all open windows") # GUI command subparsers.add_parser("gui", help="Launch the GUI test harness") return parser.parse_args() def main(): """Main entry point for the CLI.""" args = parse_args() if args.command == "server": run_server() elif args.command == "click": # Call the tool using the call_tool method import asyncio result = asyncio.run(mcp.call_tool("click_screen", {"x": args.x, "y": args.y})) print(result) elif args.command == "type": # Call the tool using the call_tool method import asyncio result = asyncio.run(mcp.call_tool("type_text", {"text": args.text})) print(result) elif args.command == "screenshot": if args.mode == "single_window" and not args.title: print("Error: --title is required for single_window mode") sys.exit(1) # Call the tool using the call_tool method import asyncio result = asyncio.run(mcp.call_tool("take_screenshot", { "mode": args.mode, "title_pattern": args.title, "use_regex": args.regex, "save_to_downloads": not args.no_save })) if args.output: # Save the screenshot to a specific file path provided by user with open(args.output, "wb") as f: f.write(result.image.data) print(f"Screenshot saved to {args.output}") elif hasattr(result, 'file_path'): # If image was saved to downloads, show the path print(f"Screenshot saved to {result.file_path}") else: print("Screenshot taken successfully") # If we have multiple results (all_windows mode) if args.mode == "all_windows" and isinstance(result, list): print("\nAll screenshots:") for i, item in enumerate(result): if hasattr(item, 'file_path'): window_title = item.window_info.title if hasattr(item, 'window_info') else f"Window {i+1}" print(f"{i+1}. {window_title}: {item.file_path}") elif args.command == "list-windows": # Call the tool using the call_tool method import asyncio result = asyncio.run(mcp.call_tool("list_windows", {})) # Parse the result windows = [] for item in result: if hasattr(item, 'text'): try: import json window_info = json.loads(item.text) windows.append(window_info) except json.JSONDecodeError: print(f"Failed to parse window info: {item.text}") # Display the windows for i, window in enumerate(windows): print(f"{i+1}. {window.get('title')} ({window.get('width')}x{window.get('height')})") elif args.command == "gui": from computer_control_mcp.gui import main as run_gui run_gui() else: # When no command is specified, run the server by default print("MCP server started!") run_server() if __name__ == "__main__": main() ``` -------------------------------------------------------------------------------- /tests/test_computer_control.py: -------------------------------------------------------------------------------- ```python """ Tests for the Computer Control MCP package. """ import pytest from unittest.mock import Mock, patch import json import sys import tkinter as tk from tkinter import ttk import asyncio import os import ast from computer_control_mcp.core import mcp # Helper function to print request/response JSON, skipping non-serializable properties def print_json_data(name, request_data=None, response_data=None): def serialize(obj): try: json.dumps(obj) return obj except (TypeError, OverflowError): return str(obj) print(f"\n===== TEST: {name} =====", file=sys.stderr) if isinstance(request_data, dict): serializable_request = {k: serialize(v) for k, v in request_data.items()} print(f"REQUEST: {json.dumps(serializable_request, indent=2)}", file=sys.stderr) elif request_data is not None: print(f"REQUEST: {serialize(request_data)}", file=sys.stderr) if response_data is not None: if isinstance(response_data, dict): serializable_response = {k: serialize(v) for k, v in response_data.items()} print( f"RESPONSE: {json.dumps(serializable_response, indent=2)}", file=sys.stderr, ) else: print(f"RESPONSE: {serialize(response_data)}", file=sys.stderr) print("======================\n", file=sys.stderr) # Test drag_mouse tool @pytest.mark.asyncio async def test_drag_mouse(): # Test data test_window = tk.Tk() test_window.title("Test Drag Mouse") test_window.geometry("400x400") # Update the window to ensure coordinates are calculated test_window.update_idletasks() test_window.update() # Window title coordinates window_x = test_window.winfo_x() window_y = test_window.winfo_y() screen_width = test_window.winfo_screenwidth() screen_height = test_window.winfo_screenheight() center_x = screen_width // 2 center_y = screen_height // 2 request_data = { "from_x": window_x + 55, "from_y": window_y + 15, "to_x": center_x, "to_y": center_y, "duration": 1.0, } print(f"starting coordinates: x={window_x}, y={window_y}", file=sys.stderr) # Create an event to track completion drag_complete = asyncio.Event() async def perform_drag(): try: result = await mcp.call_tool("drag_mouse", request_data) print(f"Result: {result}", file=sys.stderr) finally: drag_complete.set() # Start the drag operation drag_task = asyncio.create_task(perform_drag()) # Keep updating the window while waiting for drag to complete while not drag_complete.is_set(): test_window.update() await asyncio.sleep(0.01) # Small delay to prevent high CPU usage # Wait for drag operation to complete await drag_task window_x_end = test_window.winfo_x() window_y_end = test_window.winfo_y() print(f'ending coordinates: x={window_x_end}, y={window_y_end}', file=sys.stderr) assert window_y_end != window_y and window_x_end != window_x test_window.destroy() # Test list_windows tool @pytest.mark.asyncio async def test_list_windows(): # open tkinter test_window = tk.Tk() test_window.title("Test Window") test_window.geometry("400x400") # Update the window to ensure coordinates are calculated test_window.update_idletasks() test_window.update() # list all windows result = await mcp.call_tool("list_windows", {}) # check if "Test Window" is in the list # Parse the TextContent objects to extract the JSON data window_data = [] for item in result: if hasattr(item, 'text'): try: window_info = json.loads(item.text) window_data.append(window_info) except json.JSONDecodeError: print(f"Failed to parse JSON: {item.text}", file=sys.stderr) print(f"Result: {window_data}") assert any(window.get("title") == "Test Window" for window in window_data) test_window.destroy() # Test screenshot with downloads @pytest.mark.asyncio async def test_take_screenshot(): # Take a screenshot of the whole screen and save to downloads results = await mcp.call_tool("take_screenshot", {'save_to_downloads': True, 'mode': 'whole_screen'}) for result in results: # Check if file_path is in the result if hasattr(result, 'text'): try: result_dict = json.loads(result.text) print(f"Screenshot result: {result_dict['title']}", file=sys.stderr) assert 'file_path' in result_dict, "file_path should be in the result" file_path = result_dict['file_path'] # Check if the file exists assert os.path.exists(file_path), f"File {file_path} should exist" print(f"Screenshot saved to: {file_path}", file=sys.stderr) # Clean up - remove the file os.remove(file_path) print(f"Removed test file: {file_path}", file=sys.stderr) except (ValueError, SyntaxError, AttributeError) as e: print(f"Error processing result: {e}", file=sys.stderr) assert False, f"Error processing result: {e}" assert True, "Successfully tested screenshot with downloads" ``` -------------------------------------------------------------------------------- /src/computer_control_mcp/test.py: -------------------------------------------------------------------------------- ```python import shutil import sys import os from typing import Dict, Any, List, Optional, Tuple from io import BytesIO import re import asyncio import uuid import datetime from pathlib import Path import tempfile # --- Auto-install dependencies if needed --- import pyautogui from mcp.server.fastmcp import FastMCP, Image import mss from PIL import Image as PILImage import pygetwindow as gw from fuzzywuzzy import fuzz, process import cv2 from rapidocr_onnxruntime import RapidOCR, VisRes def log(message: str) -> None: """Log a message to stderr.""" print(f"STDOUT: {message}", file=sys.stderr) def get_downloads_dir() -> Path: """Get the OS downloads directory.""" if os.name == "nt": # Windows import winreg sub_key = r"SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\Shell Folders" downloads_guid = "{374DE290-123F-4565-9164-39C4925E467B}" with winreg.OpenKey(winreg.HKEY_CURRENT_USER, sub_key) as key: downloads_dir = winreg.QueryValueEx(key, downloads_guid)[0] return Path(downloads_dir) else: # macOS, Linux, etc. return Path.home() / "Downloads" def _mss_screenshot(region=None): """Take a screenshot using mss and return PIL Image. Args: region: Optional tuple (left, top, width, height) for region capture Returns: PIL Image object """ with mss.mss() as sct: if region is None: # Full screen screenshot monitor = sct.monitors[0] # All monitors combined else: # Region screenshot left, top, width, height = region monitor = { "left": left, "top": top, "width": width, "height": height, } screenshot = sct.grab(monitor) # Convert to PIL Image return PILImage.frombytes("RGB", screenshot.size, screenshot.bgra, "raw", "BGRX") def save_image_to_downloads( image, prefix: str = "screenshot", directory: Path = None ) -> Tuple[str, bytes]: """Save an image to the downloads directory and return its absolute path. Args: image: Either a PIL Image object or MCP Image object prefix: Prefix for the filename (default: 'screenshot') directory: Optional directory to save the image to Returns: Tuple of (absolute_path, image_data_bytes) """ # Create a unique filename with timestamp timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S") unique_id = str(uuid.uuid4())[:8] filename = f"{prefix}_{timestamp}_{unique_id}.png" # Get downloads directory downloads_dir = directory or get_downloads_dir() filepath = downloads_dir / filename # Handle different image types if hasattr(image, "save"): # PIL Image image.save(filepath) # Also get the bytes for returning img_byte_arr = BytesIO() image.save(img_byte_arr, format="PNG") img_bytes = img_byte_arr.getvalue() elif hasattr(image, "data"): # MCP Image img_bytes = image.data with open(filepath, "wb") as f: f.write(img_bytes) else: raise TypeError("Unsupported image type") log(f"Saved image to {filepath}") return str(filepath.absolute()), img_bytes def _find_matching_window( windows: any, title_pattern: str = None, use_regex: bool = False, threshold: int = 60, ) -> Optional[Dict[str, Any]]: """Helper function to find a matching window based on title pattern. Args: windows: List of window dictionaries title_pattern: Pattern to match window title use_regex: If True, treat the pattern as a regex, otherwise use fuzzy matching threshold: Minimum score (0-100) required for a fuzzy match Returns: The best matching window or None if no match found """ if not title_pattern: log("No title pattern provided, returning None") return None # For regex matching if use_regex: for window in windows: if re.search(title_pattern, window["title"], re.IGNORECASE): log(f"Regex match found: {window['title']}") return window return None # For fuzzy matching using fuzzywuzzy # Extract all window titles window_titles = [window["title"] for window in windows] # Use process.extractOne to find the best match best_match_title, score = process.extractOne( title_pattern, window_titles, scorer=fuzz.partial_ratio ) log(f"Best fuzzy match: '{best_match_title}' with score {score}") # Only return if the score is above the threshold if score >= threshold: # Find the window with the matching title for window in windows: if window["title"] == best_match_title: return window return None def take_screenshot( title_pattern: str = None, use_regex: bool = False, threshold: int = 60, save_to_downloads: bool = False, ) -> Image: """ Take screenshots based on the specified title pattern and save them to the downloads directory with absolute paths returned. If no title pattern is provided, take screenshot of entire screen. Args: title_pattern: Pattern to match window title, if None, take screenshot of entire screen use_regex: If True, treat the pattern as a regex, otherwise best match with fuzzy matching save_to_downloads: If True, save the screenshot to the downloads directory and return the absolute path threshold: Minimum score (0-100) required for a fuzzy match Returns: Always returns a single screenshot as MCP Image object, content type image not supported means preview isnt supported but Image object is there. """ try: all_windows = gw.getAllWindows() # Convert to list of dictionaries for _find_matching_window windows = [] for window in all_windows: if window.title: # Only include windows with titles windows.append( { "title": window.title, "window_obj": window, # Store the actual window object } ) print(f"Found {len(windows)} windows") window = _find_matching_window(windows, title_pattern, use_regex, threshold) window = window["window_obj"] if window else None # Store the currently active window current_active_window = gw.getActiveWindow() # Take the screenshot if not window: print("No matching window found, taking screenshot of entire screen") screenshot = _mss_screenshot() else: print(f"Taking screenshot of window: {window.title}") # Activate the window and wait for it to be fully in focus window.activate() pyautogui.sleep(0.5) # Wait for 0.5 seconds to ensure window is active screenshot = _mss_screenshot( region=(window.left, window.top, window.width, window.height) ) # Restore the previously active window if current_active_window: current_active_window.activate() pyautogui.sleep(0.2) # Wait a bit to ensure previous window is restored # Create temp directory temp_dir = Path(tempfile.mkdtemp()) # Save screenshot and get filepath filepath, _ = save_image_to_downloads( screenshot, prefix="screenshot", directory=temp_dir ) # Create Image object from filepath image = Image(filepath) # Copy from temp to downloads if save_to_downloads: print("Copying screenshot from temp to downloads") shutil.copy(filepath, get_downloads_dir()) return image # MCP Image object except Exception as e: print(f"Error taking screenshot: {str(e)}") return f"Error taking screenshot: {str(e)}" def get_ocr_from_screenshot( title_pattern: str = None, use_regex: bool = False, threshold: int = 60, scale_percent: int = 100, ) -> any: """ Get OCR text from the specified title pattern and save them to the downloads directory with absolute paths returned. If no title pattern is provided, get all Text on the screen. Args: title_pattern: Pattern to match window title, if None, get all UI elements on the screen use_regex: If True, treat the pattern as a regex, otherwise best match with fuzzy matching save_to_downloads: If True, save the screenshot to the downloads directory and return the absolute path threshold: Minimum score (0-100) required for a fuzzy match Returns: List of UI elements as MCP Image objects """ try: all_windows = gw.getAllWindows() # Convert to list of dictionaries for _find_matching_window windows = [] for window in all_windows: if window.title: # Only include windows with titles windows.append( { "title": window.title, "window_obj": window, # Store the actual window object } ) log(f"Found {len(windows)} windows") window = _find_matching_window(windows, title_pattern, use_regex, threshold) window = window["window_obj"] if window else None # Store the currently active window current_active_window = gw.getActiveWindow() # Take the screenshot if not window: log("No matching window found, taking screenshot of entire screen") screenshot = _mss_screenshot() else: log(f"Taking screenshot of window: {window.title}") # Activate the window and wait for it to be fully in focus window.activate() pyautogui.sleep(0.5) # Wait for 0.5 seconds to ensure window is active screenshot = _mss_screenshot( region=(window.left, window.top, window.width, window.height) ) # Restore the previously active window if current_active_window: current_active_window.activate() pyautogui.sleep(0.2) # Wait a bit to ensure previous window is restored # Create temp directory temp_dir = Path(tempfile.mkdtemp()) # Save screenshot and get filepath filepath, _ = save_image_to_downloads( screenshot, prefix="screenshot", directory=temp_dir ) # Create Image object from filepath image = Image(filepath) # Copy from temp to downloads if False: log("Copying screenshot from temp to downloads") shutil.copy(filepath, get_downloads_dir()) image_path = image.path img = cv2.imread(image_path) # Lower down resolution before processing width = int(img.shape[1] * scale_percent / 100) height = int(img.shape[0] * scale_percent / 100) dim = (width, height) resized_img = cv2.resize(img, dim, interpolation=cv2.INTER_AREA) # save resized image to pwd # cv2.imwrite("resized_img.png", resized_img) engine = RapidOCR() vis = VisRes() result, elapse_list = engine(resized_img) boxes, txts, scores = list(zip(*result)) boxes = [[[x + window.left, y + window.top] for x, y in box] for box in boxes] zipped_results = list(zip(boxes, txts, scores)) return zipped_results except Exception as e: log(f"Error getting UI elements: {str(e)}") import traceback stack_trace = traceback.format_exc() log(f"Stack trace:\n{stack_trace}") return f"Error getting UI elements: {str(e)}\nStack trace:\n{stack_trace}" import json print(json.dumps(get_ocr_from_screenshot("chrome"))) ``` -------------------------------------------------------------------------------- /src/computer_control_mcp/core.py: -------------------------------------------------------------------------------- ```python #!/usr/bin/env python3 """ Computer Control MCP - Core Implementation A compact ModelContextProtocol server that provides computer control capabilities using PyAutoGUI for mouse/keyboard control. """ import json import shutil import sys import os from typing import Dict, Any, List, Optional, Tuple from io import BytesIO import re import asyncio import uuid import datetime from pathlib import Path import tempfile from typing import Union # --- Auto-install dependencies if needed --- import pyautogui from mcp.server.fastmcp import FastMCP, Image import mss from PIL import Image as PILImage try: import pywinctl as gw except (NotImplementedError, ImportError): import pygetwindow as gw from fuzzywuzzy import fuzz, process import cv2 from rapidocr import RapidOCR from pydantic import BaseModel BaseModel.model_config = {"arbitrary_types_allowed": True} engine = RapidOCR() DEBUG = True # Set to False in production RELOAD_ENABLED = True # Set to False to disable auto-reload # Create FastMCP server instance at module level mcp = FastMCP("ComputerControlMCP") # Determine mode automatically IS_DEVELOPMENT = os.getenv("ENV") == "development" def log(message: str) -> None: """Log to stderr in dev, to stdout or file in production.""" if IS_DEVELOPMENT: # In dev, write to stderr print(f"[DEV] {message}", file=sys.stderr) else: # In production, write to stdout or a file print(f"[PROD] {message}", file=sys.stdout) # or append to a file: open("app.log", "a").write(message+"\n") def get_downloads_dir() -> Path: """Get the OS downloads directory.""" if os.name == "nt": # Windows import winreg sub_key = r"SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\Shell Folders" downloads_guid = "{374DE290-123F-4565-9164-39C4925E467B}" with winreg.OpenKey(winreg.HKEY_CURRENT_USER, sub_key) as key: downloads_dir = winreg.QueryValueEx(key, downloads_guid)[0] return Path(downloads_dir) else: # macOS, Linux, etc. return Path.home() / "Downloads" def _mss_screenshot(region=None): """Take a screenshot using mss and return PIL Image. Args: region: Optional tuple (left, top, width, height) for region capture Returns: PIL Image object """ with mss.mss() as sct: if region is None: # Full screen screenshot monitor = sct.monitors[0] # All monitors combined else: # Region screenshot left, top, width, height = region monitor = { "left": left, "top": top, "width": width, "height": height, } screenshot = sct.grab(monitor) # Convert to PIL Image return PILImage.frombytes( "RGB", screenshot.size, screenshot.bgra, "raw", "BGRX" ) def save_image_to_downloads( image, prefix: str = "screenshot", directory: Path = None ) -> Tuple[str, bytes]: """Save an image to the downloads directory and return its absolute path. Args: image: Either a PIL Image object or MCP Image object prefix: Prefix for the filename (default: 'screenshot') directory: Optional directory to save the image to Returns: Tuple of (absolute_path, image_data_bytes) """ # Create a unique filename with timestamp timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S") unique_id = str(uuid.uuid4())[:8] filename = f"{prefix}_{timestamp}_{unique_id}.png" # Get downloads directory downloads_dir = directory or get_downloads_dir() filepath = downloads_dir / filename # Handle different image types if hasattr(image, "save"): # PIL Image image.save(filepath) # Also get the bytes for returning img_byte_arr = BytesIO() image.save(img_byte_arr, format="PNG") img_bytes = img_byte_arr.getvalue() elif hasattr(image, "data"): # MCP Image img_bytes = image.data with open(filepath, "wb") as f: f.write(img_bytes) else: raise TypeError("Unsupported image type") log(f"Saved image to {filepath}") return str(filepath.absolute()), img_bytes def _find_matching_window( windows: any, title_pattern: str = None, use_regex: bool = False, threshold: int = 10, ) -> Optional[Dict[str, Any]]: """Helper function to find a matching window based on title pattern. Args: windows: List of window dictionaries title_pattern: Pattern to match window title use_regex: If True, treat the pattern as a regex, otherwise use fuzzy matching threshold: Minimum score (0-100) required for a fuzzy match Returns: The best matching window or None if no match found """ if not title_pattern: log("No title pattern provided, returning None") return None # For regex matching if use_regex: for window in windows: if re.search(title_pattern, window["title"], re.IGNORECASE): log(f"Regex match found: {window['title']}") return window return None # For fuzzy matching using fuzzywuzzy # Extract all window titles window_titles = [window["title"] for window in windows] # Use process.extractOne to find the best match best_match_title, score = process.extractOne( title_pattern, window_titles, scorer=fuzz.partial_ratio ) log(f"Best fuzzy match: '{best_match_title}' with score {score}") # Only return if the score is above the threshold if score >= threshold: # Find the window with the matching title for window in windows: if window["title"] == best_match_title: return window return None # --- MCP Function Handlers --- @mcp.tool() def click_screen(x: int, y: int) -> str: """Click at the specified screen coordinates.""" try: pyautogui.click(x=x, y=y) return f"Successfully clicked at coordinates ({x}, {y})" except Exception as e: return f"Error clicking at coordinates ({x}, {y}): {str(e)}" @mcp.tool() def get_screen_size() -> Dict[str, Any]: """Get the current screen resolution.""" try: width, height = pyautogui.size() return { "width": width, "height": height, "message": f"Screen size: {width}x{height}", } except Exception as e: return {"error": str(e), "message": f"Error getting screen size: {str(e)}"} @mcp.tool() def type_text(text: str) -> str: """Type the specified text at the current cursor position.""" try: pyautogui.typewrite(text) return f"Successfully typed text: {text}" except Exception as e: return f"Error typing text: {str(e)}" @mcp.tool() def take_screenshot( title_pattern: str = None, use_regex: bool = False, threshold: int = 10, scale_percent_for_ocr: int = None, save_to_downloads: bool = False, ) -> Image: """ Get screenshot Image as MCP Image object. If no title pattern is provided, get screenshot of entire screen and all text on the screen. Args: title_pattern: Pattern to match window title, if None, take screenshot of entire screen use_regex: If True, treat the pattern as a regex, otherwise best match with fuzzy matching threshold: Minimum score (0-100) required for a fuzzy match scale_percent_for_ocr: Percentage to scale the image down before processing, you wont need this most of the time unless your pc is extremely old or slow save_to_downloads: If True, save the screenshot to the downloads directory and return the absolute path Returns: Returns a single screenshot as MCP Image object. "content type image not supported" means preview isnt supported but Image object is there and returned successfully. """ try: all_windows = gw.getAllWindows() # Convert to list of dictionaries for _find_matching_window windows = [] for window in all_windows: if window.title: # Only include windows with titles windows.append( { "title": window.title, "window_obj": window, # Store the actual window object } ) log(f"Found {len(windows)} windows") window = _find_matching_window(windows, title_pattern, use_regex, threshold) window = window["window_obj"] if window else None import ctypes import time def force_activate(window): """Force a window to the foreground on Windows.""" try: hwnd = window._hWnd # pywinctl window handle # Restore if minimized if window.isMinimized: window.restore() time.sleep(0.1) # Bring to top and set foreground ctypes.windll.user32.SetForegroundWindow(hwnd) ctypes.windll.user32.BringWindowToTop(hwnd) window.activate() # fallback time.sleep(0.3) # wait for OS to update except Exception as e: print(f"Warning: Could not force window: {e}", file=sys.stderr) # Take the screenshot if not window: log("No matching window found, taking screenshot of entire screen") screenshot = _mss_screenshot() else: try: # Re-fetch window handle to ensure it's valid window = gw.getWindowsWithTitle(window.title)[0] current_active_window = gw.getActiveWindow() log(f"Taking screenshot of window: {window.title}") if sys.platform == "win32": force_activate(window) else: window.activate() pyautogui.sleep(0.5) # Give Windows time to focus screen_width, screen_height = pyautogui.size() screenshot = _mss_screenshot( region=( max(window.left, 0), max(window.top, 0), min(window.width, screen_width), min(window.height, screen_height), ) ) # Restore previously active window if current_active_window and current_active_window != window: try: if sys.platform == "win32": force_activate(current_active_window) else: current_active_window.activate() pyautogui.sleep(0.2) except Exception as e: log(f"Error restoring previous window: {str(e)}") except Exception as e: log(f"Error taking screenshot of window: {str(e)}") screenshot = _mss_screenshot() # fallback to full screen # Create temp directory temp_dir = Path(tempfile.mkdtemp()) # Save screenshot and get filepath filepath, _ = save_image_to_downloads( screenshot, prefix="screenshot", directory=temp_dir ) # Create Image object from filepath image = Image(filepath) if save_to_downloads: log("Copying screenshot from temp to downloads") shutil.copy(filepath, get_downloads_dir()) return image # MCP Image object except Exception as e: log(f"Error in screenshot or getting UI elements: {str(e)}") import traceback stack_trace = traceback.format_exc() log(f"Stack trace:\n{stack_trace}") return f"Error in screenshot or getting UI elements: {str(e)}\nStack trace:\n{stack_trace}" def is_low_spec_pc() -> bool: try: import psutil cpu_low = psutil.cpu_count(logical=False) < 4 ram_low = psutil.virtual_memory().total < 8 * 1024**3 return cpu_low or ram_low except Exception: # Fallback if psutil not available or info unavailable return False @mcp.tool() def take_screenshot_with_ocr( title_pattern: str = None, use_regex: bool = False, threshold: int = 10, scale_percent_for_ocr: int = None, save_to_downloads: bool = False, ) -> str: """ Get OCR text from screenshot with absolute coordinates as JSON string of List[Tuple[List[List[int]], str, float]] (returned after adding the window offset from true (0, 0) of screen to the OCR coordinates, so clicking is on-point. Recommended to click in the middle of OCR Box) and using confidence from window with the specified title pattern. If no title pattern is provided, get screenshot of entire screen and all text on the screen. Know that OCR takes around 20 seconds on an mid-spec pc at 1080p resolution. Args: title_pattern: Pattern to match window title, if None, take screenshot of entire screen use_regex: If True, treat the pattern as a regex, otherwise best match with fuzzy matching threshold: Minimum score (0-100) required for a fuzzy match scale_percent_for_ocr: Percentage to scale the image down before processing, you wont need this most of the time unless your pc is extremely old or slow save_to_downloads: If True, save the screenshot to the downloads directory and return the absolute path Returns: Returns a list of UI elements as List[Tuple[List[List[int]], str, float]] where each tuple is [[4 corners of box], text, confidence], "content type image not supported" means preview isnt supported but Image object is there. """ try: all_windows = gw.getAllWindows() # Convert to list of dictionaries for _find_matching_window windows = [] for window in all_windows: if window.title: # Only include windows with titles windows.append( { "title": window.title, "window_obj": window, # Store the actual window object } ) log(f"Found {len(windows)} windows") window = _find_matching_window(windows, title_pattern, use_regex, threshold) window = window["window_obj"] if window else None # Store the currently active window # Take the screenshot if not window: log("No matching window found, taking screenshot of entire screen") screenshot = _mss_screenshot() else: current_active_window = gw.getActiveWindow() log(f"Taking screenshot of window: {window.title}") # Activate the window and wait for it to be fully in focus try: window.activate() pyautogui.sleep(0.5) # Wait for 0.5 seconds to ensure window is active screenshot = _mss_screenshot( region=(window.left, window.top, window.width, window.height) ) # Restore the previously active window if current_active_window: try: current_active_window.activate() pyautogui.sleep( 0.2 ) # Wait a bit to ensure previous window is restored except Exception as e: log(f"Error restoring previous window: {str(e)}") except Exception as e: log(f"Error taking screenshot of window: {str(e)}") return f"Error taking screenshot of window: {str(e)}" # Create temp directory temp_dir = Path(tempfile.mkdtemp()) # Save screenshot and get filepath filepath, _ = save_image_to_downloads( screenshot, prefix="screenshot", directory=temp_dir ) # Create Image object from filepath image = Image(filepath) # Copy from temp to downloads if save_to_downloads: log("Copying screenshot from temp to downloads") shutil.copy(filepath, get_downloads_dir()) image_path = image.path img = cv2.imread(image_path) if scale_percent_for_ocr is None: # Calculate percent to scale height to 360 pixels scale_percent_for_ocr = 100 # 360 / img.shape[0] * 100 # Lower down resolution before processing width = int(img.shape[1] * scale_percent_for_ocr / 100) height = int(img.shape[0] * scale_percent_for_ocr / 100) dim = (width, height) resized_img = cv2.resize(img, dim, interpolation=cv2.INTER_AREA) # save resized image to pwd # cv2.imwrite("resized_img.png", resized_img) output = engine(resized_img) boxes = output.boxes txts = output.txts scores = output.scores zipped_results = list(zip(boxes, txts, scores)) zipped_results = [ ( box.tolist(), text, float(score), ) # convert np.array -> list, ensure score is float for box, text, score in zipped_results ] log(f"Found {len(zipped_results)} text items in OCR result.") log(f"First 5 items: {zipped_results[:5]}") return ( ",\n".join([str(item) for item in zipped_results]) if zipped_results else "No text found" ) except Exception as e: log(f"Error in screenshot or getting UI elements: {str(e)}") import traceback stack_trace = traceback.format_exc() log(f"Stack trace:\n{stack_trace}") return f"Error in screenshot or getting UI elements: {str(e)}\nStack trace:\n{stack_trace}" @mcp.tool() def move_mouse(x: int, y: int) -> str: """Move the mouse to the specified screen coordinates.""" try: pyautogui.moveTo(x=x, y=y) return f"Successfully moved mouse to coordinates ({x}, {y})" except Exception as e: return f"Error moving mouse to coordinates ({x}, {y}): {str(e)}" @mcp.tool() def mouse_down(button: str = "left") -> str: """Hold down a mouse button ('left', 'right', 'middle').""" try: pyautogui.mouseDown(button=button) return f"Held down {button} mouse button" except Exception as e: return f"Error holding {button} mouse button: {str(e)}" @mcp.tool() def mouse_up(button: str = "left") -> str: """Release a mouse button ('left', 'right', 'middle').""" try: pyautogui.mouseUp(button=button) return f"Released {button} mouse button" except Exception as e: return f"Error releasing {button} mouse button: {str(e)}" @mcp.tool() async def drag_mouse( from_x: int, from_y: int, to_x: int, to_y: int, duration: float = 0.5 ) -> str: """ Drag the mouse from one position to another. Args: from_x: Starting X coordinate from_y: Starting Y coordinate to_x: Ending X coordinate to_y: Ending Y coordinate duration: Duration of the drag in seconds (default: 0.5) Returns: Success or error message """ try: # First move to the starting position pyautogui.moveTo(x=from_x, y=from_y) # Then drag to the destination log("starting drag") await asyncio.to_thread(pyautogui.dragTo, x=to_x, y=to_y, duration=duration) log("done drag") return f"Successfully dragged from ({from_x}, {from_y}) to ({to_x}, {to_y})" except Exception as e: return f"Error dragging from ({from_x}, {from_y}) to ({to_x}, {to_y}): {str(e)}" import pyautogui from typing import Union, List @mcp.tool() def key_down(key: str) -> str: """Hold down a specific keyboard key until released.""" try: pyautogui.keyDown(key) return f"Held down key: {key}" except Exception as e: return f"Error holding key {key}: {str(e)}" @mcp.tool() def key_up(key: str) -> str: """Release a specific keyboard key.""" try: pyautogui.keyUp(key) return f"Released key: {key}" except Exception as e: return f"Error releasing key {key}: {str(e)}" @mcp.tool() def press_keys(keys: Union[str, List[Union[str, List[str]]]]) -> str: """ Press keyboard keys. Args: keys: - Single key as string (e.g., "enter") - Sequence of keys as list (e.g., ["a", "b", "c"]) - Key combinations as nested list (e.g., [["ctrl", "c"], ["alt", "tab"]]) Examples: press_keys("enter") press_keys(["a", "b", "c"]) press_keys([["ctrl", "c"], ["alt", "tab"]]) """ try: if isinstance(keys, str): # Single key pyautogui.press(keys) return f"Pressed single key: {keys}" elif isinstance(keys, list): for item in keys: if isinstance(item, str): # Sequential key press pyautogui.press(item) elif isinstance(item, list): # Key combination (e.g., ctrl+c) pyautogui.hotkey(*item) else: return f"Invalid key format: {item}" return f"Successfully pressed keys sequence: {keys}" else: return "Invalid input: must be str or list" except Exception as e: return f"Error pressing keys {keys}: {str(e)}" @mcp.tool() def list_windows() -> List[Dict[str, Any]]: """List all open windows on the system.""" try: windows = gw.getAllWindows() result = [] for window in windows: if window.title: # Only include windows with titles result.append( { "title": window.title, "left": window.left, "top": window.top, "width": window.width, "height": window.height, "is_active": window.isActive, "is_visible": window.visible, "is_minimized": window.isMinimized, "is_maximized": window.isMaximized, # "screenshot": pyautogui.screenshot( # region=( # window.left, # window.top, # window.width, # window.height, # ) # ), } ) return result except Exception as e: log(f"Error listing windows: {str(e)}") return [{"error": str(e)}] @mcp.tool() def activate_window( title_pattern: str, use_regex: bool = False, threshold: int = 60 ) -> str: """ Activate a window (bring it to the foreground) by matching its title. Args: title_pattern: Pattern to match window title use_regex: If True, treat the pattern as a regex, otherwise use fuzzy matching threshold: Minimum score (0-100) required for a fuzzy match Returns: Success or error message """ try: # Get all windows all_windows = gw.getAllWindows() # Convert to list of dictionaries for _find_matching_window windows = [] for window in all_windows: if window.title: # Only include windows with titles windows.append( { "title": window.title, "window_obj": window, # Store the actual window object } ) # Find matching window using our improved function matched_window_dict = _find_matching_window( windows, title_pattern, use_regex, threshold ) if not matched_window_dict: log(f"No window found matching pattern: {title_pattern}") return f"Error: No window found matching pattern: {title_pattern}" # Get the actual window object matched_window = matched_window_dict["window_obj"] # Activate the window matched_window.activate() return f"Successfully activated window: '{matched_window.title}'" except Exception as e: log(f"Error activating window: {str(e)}") return f"Error activating window: {str(e)}" def main(): """Main entry point for the MCP server.""" pyautogui.FAILSAFE = True try: # Run the server log("Computer Control MCP Server Started...") mcp.run() except KeyboardInterrupt: log("Server shutting down...") except Exception as e: log(f"Error: {str(e)}") if __name__ == "__main__": main() ```