ab498/computer-control-mcp # codebase.md

# Directory Structure

```
├── demonstration.gif
├── Dockerfile
├── icon.png
├── LICENSE
├── MANIFEST.in
├── pyproject.toml
├── README.md
├── smithery.yaml
├── src
│   ├── computer_control_mcp
│   │   ├── __init__.py
│   │   ├── __main__.py
│   │   ├── cli.py
│   │   ├── core.py
│   │   ├── FZYTK.TTF
│   │   ├── gui.py
│   │   ├── server.py
│   │   ├── test_image.png
│   │   └── test.py
│   └── README.md
├── tests
│   ├── conftest.py
│   ├── rapidocr_test.py
│   ├── README.md
│   ├── run_cli.py
│   ├── run_server.py
│   ├── setup.py
│   ├── test_computer_control.py
│   └── test_screenshot.py
└── uv.lock
```

# Files

--------------------------------------------------------------------------------
/tests/README.md:
--------------------------------------------------------------------------------

```markdown
# Computer Control MCP Tests

This directory contains the tests for the Computer Control MCP package.

## Running Tests

To run the tests, use pytest:

```bash
pytest
```

Or with specific test:

```bash
pytest tests/test_computer_control.py
```

## Test Structure

- `conftest.py`: Pytest configuration
- `test_computer_control.py`: Tests for the core functionality

```

--------------------------------------------------------------------------------
/src/README.md:
--------------------------------------------------------------------------------

```markdown
# Computer Control MCP Source Code

This directory contains the source code for the Computer Control MCP package.

## Structure

- `computer_control_mcp/`: Main package directory
  - `__init__.py`: Package initialization
  - `__main__.py`: Entry point for running as a module
  - `core.py`: Core functionality
  - `cli.py`: Command-line interface
  - `gui.py`: Graphical user interface for testing

```

--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------

```markdown
# Computer Control MCP

### MCP server that provides computer control capabilities, like mouse, keyboard, OCR, etc. using PyAutoGUI, RapidOCR, ONNXRuntime. Similar to 'computer-use' by Anthropic. With Zero External Dependencies.

<div align="center" style="text-align:center;font-family: monospace; display: flex; align-items: center; justify-content: center; width: 100%; gap: 10px">
    <a href="https://nextjs-boilerplate-ashy-nine-64.vercel.app/demo-computer-control"><img
            src="https://komarev.com/ghpvc/?username=AB498&label=DEMO&style=for-the-badge&color=CC0000" /></a>
    <a href="https://discord.gg/ZeeqSBpjU2"><img
            src="https://img.shields.io/discord/1095854826786668545?style=for-the-badge&color=0000CC" alt="Discord"></a>
    <a href="https://img.shields.io/badge/License-MIT-yellow.svg"><img
            src="https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge&color=00CC00" alt="License: MIT"></a>
    <a href="https://pypi.org/project/computer-control-mcp"><img
            src="https://img.shields.io/pypi/v/computer-control-mcp?style=for-the-badge" alt="PyPi"></a>
</div>

---

![MCP Computer Control Demo](https://github.com/AB498/computer-control-mcp/blob/main/demonstration.gif?raw=true)

## Quick Usage (MCP Setup Using `uvx`)

***Note:** Running `uvx computer-control-mcp@latest` for the first time will download python dependencies (around 70MB) which may take some time. Recommended to run this in a terminal before using it as MCP. Subsequent runs will be instant.* 

```json
{
  "mcpServers": {
    "computer-control-mcp": {
      "command": "uvx",
      "args": ["computer-control-mcp@latest"]
    }
  }
}
```

OR install globally with `pip`:
```bash
pip install computer-control-mcp
```
Then run the server with:
```bash
computer-control-mcp # instead of uvx computer-control-mcp, so you can use the latest version, also you can `uv cache clean` to clear the cache and `uvx` again to use latest version.
```

## Features

- Control mouse movements and clicks
- Type text at the current cursor position
- Take screenshots of the entire screen or specific windows with optional saving to downloads directory
- Extract text from screenshots using OCR (Optical Character Recognition)
- List and activate windows
- Press keyboard keys
- Drag and drop operations

## Available Tools

### Mouse Control
- `click_screen(x: int, y: int)`: Click at specified screen coordinates
- `move_mouse(x: int, y: int)`: Move mouse cursor to specified coordinates
- `drag_mouse(from_x: int, from_y: int, to_x: int, to_y: int, duration: float = 0.5)`: Drag mouse from one position to another
- `mouse_down(button: str = "left")`: Hold down a mouse button ('left', 'right', 'middle')
- `mouse_up(button: str = "left")`: Release a mouse button ('left', 'right', 'middle')

### Keyboard Control
- `type_text(text: str)`: Type the specified text at current cursor position
- `press_key(key: str)`: Press a specified keyboard key
- `key_down(key: str)`: Hold down a specific keyboard key until released
- `key_up(key: str)`: Release a specific keyboard key
- `press_keys(keys: Union[str, List[Union[str, List[str]]]])`: Press keyboard keys (supports single keys, sequences, and combinations)

### Screen and Window Management
- `take_screenshot(title_pattern: str = None, use_regex: bool = False, threshold: int = 60, scale_percent_for_ocr: int = None, save_to_downloads: bool = False)`: Capture screen or window
- `take_screenshot_with_ocr(title_pattern: str = None, use_regex: bool = False, threshold: int = 10, scale_percent_for_ocr: int = None, save_to_downloads: bool = False)`: Extract adn return text with coordinates using OCR from screen or window
- `get_screen_size()`: Get current screen resolution
- `list_windows()`: List all open windows
- `activate_window(title_pattern: str, use_regex: bool = False, threshold: int = 60)`: Bring specified window to foreground

## Development

### Setting up the Development Environment

```bash
# Clone the repository
git clone https://github.com/AB498/computer-control-mcp.git
cd computer-control-mcp

# Install in development mode
pip install -e .

# Start server
python -m computer_control_mcp.core

# -- OR --

# Build
hatch build

# Non-windows
pip install dist/*.whl --upgrade

# Windows
$latest = Get-ChildItem .\dist\*.whl | Sort-Object LastWriteTime -Descending | Select-Object -First 1
pip install $latest.FullName --upgrade 

# Run
computer-control-mcp
```

### Running Tests

```bash
python -m pytest
```

## API Reference

See the [API Reference](docs/api.md) for detailed information about the available functions and classes.

## License

MIT

## For more information or help

- [Email ([email protected])](mailto:[email protected])
- [Discord (CodePlayground)](https://discord.gg/ZeeqSBpjU2)

```

--------------------------------------------------------------------------------
/tests/run_cli.py:
--------------------------------------------------------------------------------

```python
#!/usr/bin/env python
"""
Simple script to run the Computer Control MCP CLI.
"""

from computer_control_mcp.cli import main

if __name__ == "__main__":
    main()

```

--------------------------------------------------------------------------------
/tests/conftest.py:
--------------------------------------------------------------------------------

```python
"""
Pytest configuration file.
"""

import pytest
import sys
import os

# Add the src directory to the Python path
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "src")))

```

--------------------------------------------------------------------------------
/src/computer_control_mcp/__init__.py:
--------------------------------------------------------------------------------

```python
"""
Computer Control MCP - Python package for computer control via MCP.

This package provides computer control capabilities using PyAutoGUI through a
Model Context Protocol (MCP) server.
"""

from computer_control_mcp.core import mcp, main

__version__ = "0.1.2"
__all__ = ["mcp", "main"]

```

--------------------------------------------------------------------------------
/src/computer_control_mcp/server.py:
--------------------------------------------------------------------------------

```python
"""
Server module for Computer Control MCP.

This module provides a simple way to run the MCP server.
"""

from computer_control_mcp.core import main as run_server

def main():
    """Run the MCP server."""
    print("Starting Computer Control MCP server...")
    run_server()

if __name__ == "__main__":
    main()

```

--------------------------------------------------------------------------------
/tests/run_server.py:
--------------------------------------------------------------------------------

```python
#!/usr/bin/env python
"""
Simple script to run the Computer Control MCP server.
"""

# import sys
# import os
# sys.path.insert(0, os.path.join(os.path.dirname(__file__), "src"))
# from computer_control_mcp.core import main

from computer_control_mcp.core import main

if __name__ == "__main__":
    print("Starting Computer Control MCP server...")
    main()

```

--------------------------------------------------------------------------------
/smithery.yaml:
--------------------------------------------------------------------------------

```yaml
# Smithery configuration file: https://smithery.ai/docs/config#smitheryyaml

startCommand:
  type: stdio
  configSchema:
    # JSON Schema defining the configuration options for the MCP.
    type: object
    description: Empty config
  commandFunction:
    # A JS function that produces the CLI command based on the given config to start the MCP on stdio.
    |-
    (config) => ({ command: 'python', args: ['src/computer_control_mcp/core.py'] })
  exampleConfig: {}

```

--------------------------------------------------------------------------------
/tests/setup.py:
--------------------------------------------------------------------------------

```python
#!/usr/bin/env python
"""
Backward compatibility setup.py file for Computer Control MCP.
This file is provided for backward compatibility with tools that don't support pyproject.toml.
"""

import setuptools

if __name__ == "__main__":
    try:
        setuptools.setup()
    except Exception as e:
        print(f"Error: {e}")
        print("\nThis package uses pyproject.toml for configuration.")
        print("Please use a PEP 517 compatible build tool like pip or build.")
        print("For example: pip install .")

```

--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------

```dockerfile
# Use a lightweight Python base image
FROM python:3.12-slim

# Set environment variables for Python
ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

# Set working directory
WORKDIR /app

# Copy dependency file(s)
COPY pyproject.toml .
COPY src/ src/
COPY README.md README.md

# Install build backend (Hatchling)
RUN pip install --upgrade pip && \
    pip install hatchling && \
    pip install -e .

# Copy any additional files (e.g. configs, CLI, entrypoints)
COPY . .

# Default command (can be overridden)
CMD ["python", "-m", "computer_control_mcp"]

```

--------------------------------------------------------------------------------
/src/computer_control_mcp/__main__.py:
--------------------------------------------------------------------------------

```python
"""
Entry point for running the Computer Control MCP as a module.

This module serves as the main entry point for the package.
When executed directly (e.g., with `python -m computer_control_mcp`),
it will run the CLI interface.

For CLI functionality, use:
    computer-control-mcp <command>
    python -m computer_control_mcp <command>
"""

from computer_control_mcp.cli import main as cli_main

def main():
    """Main entry point for the package."""
    # Run the CLI when the module is executed directly
    cli_main()

if __name__ == "__main__":
    main()

```

--------------------------------------------------------------------------------
/tests/test_screenshot.py:
--------------------------------------------------------------------------------

```python
import sys
sys.path.append('src')
from computer_control_mcp.core import take_screenshot

# Test with save_to_downloads=False
result = take_screenshot(mode='whole_screen', save_to_downloads=False)
print('Base64 image included:', 'base64_image' in result)
print('MCP Image included:', 'image' in result)

# Test with save_to_downloads=True
result = take_screenshot(mode='whole_screen', save_to_downloads=True)
print('Base64 image included:', 'base64_image' in result)
print('MCP Image included:', 'image' in result)
print('File path included:', 'file_path' in result)

```

--------------------------------------------------------------------------------
/tests/rapidocr_test.py:
--------------------------------------------------------------------------------

```python
import cv2
from rapidocr import RapidOCR
from rapidocr_onnxruntime import VisRes

image_path = r"C:\Users\Admin\AppData\Local\Temp\tmpdw2d8r14\screenshot_20250815_033153_f99a8396.png"
img = cv2.imread(image_path)
if img is None:
    print(f"Failed to load img: {image_path}")
else:
    print(f"Loaded img: {image_path}, shape: {img.shape}")
    engine = RapidOCR()
    vis = VisRes()
    output = engine(img)

    # Separate into boxes, texts, and scores
    boxes  = output.boxes
    txts   = output.txts
    scores = output.scores
    zipped_results = list(zip(boxes, txts, scores))
    print(f"Found {len(zipped_results)} text items in OCR result.")
    print(f"First 10 items: {str(zipped_results).encode("utf-8", errors="ignore")}")

```

--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------

```toml
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "computer-control-mcp"
version = "0.3.6"
description = "MCP server that provides computer control capabilities, like mouse, keyboard, OCR, etc. using PyAutoGUI, RapidOCR, ONNXRuntime. Similar to 'computer-use' by Anthropic. With Zero External Dependencies."
readme = "README.md"
requires-python = ">=3.8"
license = {text = "MIT"}
authors = [{name = "AB498", email = "[email protected]"}]
classifiers = [
    "Development Status :: 4 - Beta",
    "Intended Audience :: Developers",
    "License :: OSI Approved :: MIT License",
    "Programming Language :: Python :: 3",
    "Programming Language :: Python :: 3.12",
    "Topic :: Software Development :: Libraries",
    "Topic :: Utilities"
]
dependencies = [
    "pyautogui==0.9.54",
    "mcp[cli]==1.13.0",
    "pillow==11.3.0",
    "pygetwindow==0.0.9",
    "pywinctl==0.4.1",
    "fuzzywuzzy==0.18.0",
    "rapidocr==3.3.1",
    "onnxruntime==1.22.0",
    "rapidocr_onnxruntime==1.2.3",
    "opencv-python==4.12.0.88",
    "python-Levenshtein>=0.20.9",
    "mss>=7.0.0"
]

[project.urls]
Homepage = "https://github.com/AB498/computer-control-mcp"
Issues = "https://github.com/AB498/computer-control-mcp/issues"
Documentation = "https://github.com/AB498/computer-control-mcp#readme"

[project.scripts]
computer-control-mcp = "computer_control_mcp.cli:main"
computer-control-mcp-server = "computer_control_mcp.server:main"

[tool.hatch.build]
sources = ["src"]
packages = ["src/computer_control_mcp"]

[tool.hatch.build.targets.wheel]
packages = ["src/computer_control_mcp"]

```

--------------------------------------------------------------------------------
/src/computer_control_mcp/gui.py:
--------------------------------------------------------------------------------

```python
"""
GUI Test Harness for Computer Control MCP.

This module provides a graphical user interface for testing the Computer Control MCP functionality.
"""

import tkinter as tk
from tkinter import ttk, scrolledtext
from PIL import Image, ImageTk
import pyautogui
import json
import io

from computer_control_mcp.core import mcp

class TestHarnessGUI:
    def __init__(self, root):
        self.root = root
        self.root.title("Computer Control Test Harness")
        self.root.geometry("800x600")
        
        # Create main frame with scrollbar
        self.main_frame = ttk.Frame(root)
        self.main_frame.pack(fill=tk.BOTH, expand=True, padx=10, pady=10)
        
        # Create test sections
        self.create_click_test_section()
        self.create_type_text_section()
        self.create_screenshot_section()
        self.create_output_section()
        
        # Initialize test results
        self.test_results = {}
    
    def create_click_test_section(self):
        frame = ttk.LabelFrame(self.main_frame, text="Mouse Click Test")
        frame.pack(fill=tk.X, padx=5, pady=5)
        
        # Coordinates input
        coord_frame = ttk.Frame(frame)
        coord_frame.pack(fill=tk.X, padx=5, pady=5)
        
        ttk.Label(coord_frame, text="X:").pack(side=tk.LEFT)
        self.x_entry = ttk.Entry(coord_frame, width=10)
        self.x_entry.pack(side=tk.LEFT, padx=5)
        
        ttk.Label(coord_frame, text="Y:").pack(side=tk.LEFT)
        self.y_entry = ttk.Entry(coord_frame, width=10)
        self.y_entry.pack(side=tk.LEFT, padx=5)
        
        ttk.Button(frame, text="Test Click", command=self.test_click).pack(pady=5)
    
    def create_type_text_section(self):
        frame = ttk.LabelFrame(self.main_frame, text="Type Text Test")
        frame.pack(fill=tk.X, padx=5, pady=5)
        
        ttk.Label(frame, text="Text to type:").pack(pady=2)
        self.text_entry = ttk.Entry(frame)
        self.text_entry.pack(fill=tk.X, padx=5, pady=2)
        
        ttk.Button(frame, text="Test Type Text", command=self.test_type_text).pack(pady=5)
    
    def create_screenshot_section(self):
        frame = ttk.LabelFrame(self.main_frame, text="Screenshot Test")
        frame.pack(fill=tk.X, padx=5, pady=5)
        
        ttk.Button(frame, text="Take Screenshot", command=self.test_screenshot).pack(pady=5)
        
        # Canvas for screenshot preview
        self.screenshot_canvas = tk.Canvas(frame, width=200, height=150)
        self.screenshot_canvas.pack(pady=5)
    
    def create_output_section(self):
        frame = ttk.LabelFrame(self.main_frame, text="Test Output")
        frame.pack(fill=tk.BOTH, expand=True, padx=5, pady=5)
        
        self.output_text = scrolledtext.ScrolledText(frame, height=10)
        self.output_text.pack(fill=tk.BOTH, expand=True, padx=5, pady=5)
    
    def log_output(self, test_name, request_data, response_data):
        self.output_text.insert(tk.END, f"\n===== TEST: {test_name} =====\n")
        self.output_text.insert(tk.END, f"REQUEST: {json.dumps(request_data, indent=2)}\n")
        self.output_text.insert(tk.END, f"RESPONSE: {response_data}\n")
        self.output_text.insert(tk.END, "======================\n")
        self.output_text.see(tk.END)
    
    def test_click(self):
        try:
            x = int(self.x_entry.get())
            y = int(self.y_entry.get())
            request_data = {"x": x, "y": y}
            result = mcp.click_screen(**request_data)
            self.log_output("click_screen", request_data, result)
        except Exception as e:
            self.log_output("click_screen", request_data, f"Error: {str(e)}")
    
    def test_type_text(self):
        try:
            text = self.text_entry.get()
            request_data = {"text": text}
            result = mcp.type_text(**request_data)
            self.log_output("type_text", request_data, result)
        except Exception as e:
            self.log_output("type_text", request_data, f"Error: {str(e)}")
    
    def test_screenshot(self):
        try:
            result = mcp.take_screenshot()
            # Convert bytes to image for preview
            image = Image.open(io.BytesIO(result.data))
            # Resize for preview
            image.thumbnail((200, 150))
            photo = ImageTk.PhotoImage(image)
            self.screenshot_canvas.create_image(100, 75, image=photo)
            self.screenshot_canvas.image = photo  # Keep reference
            self.log_output("take_screenshot", {}, "Screenshot taken successfully")
        except Exception as e:
            self.log_output("take_screenshot", {}, f"Error: {str(e)}")

def main():
    root = tk.Tk()
    app = TestHarnessGUI(root)
    root.mainloop()

if __name__ == "__main__":
    main()

```

--------------------------------------------------------------------------------
/src/computer_control_mcp/cli.py:
--------------------------------------------------------------------------------

```python
"""
Command-line interface for Computer Control MCP.

This module provides a command-line interface for interacting with the Computer Control MCP.
"""

import argparse
import sys
from computer_control_mcp.core import mcp, main as run_server

def parse_args():
    """Parse command-line arguments."""
    parser = argparse.ArgumentParser(description="Computer Control MCP CLI")

    subparsers = parser.add_subparsers(dest="command", help="Command to run")

    # Server command
    server_parser = subparsers.add_parser("server", help="Run the MCP server")

    # Click command
    click_parser = subparsers.add_parser("click", help="Click at specified coordinates")
    click_parser.add_argument("x", type=int, help="X coordinate")
    click_parser.add_argument("y", type=int, help="Y coordinate")

    # Type text command
    type_parser = subparsers.add_parser("type", help="Type text at current cursor position")
    type_parser.add_argument("text", help="Text to type")

    # Screenshot command
    screenshot_parser = subparsers.add_parser("screenshot", help="Take a screenshot")
    screenshot_parser.add_argument("--mode", choices=["all_windows", "single_window", "whole_screen"],
                                  default="whole_screen", help="Screenshot mode")
    screenshot_parser.add_argument("--title", help="Window title pattern (for single_window mode)")
    screenshot_parser.add_argument("--regex", action="store_true", help="Use regex for title matching")
    screenshot_parser.add_argument("--output", help="Output file path (if not provided, saves to downloads directory)")
    screenshot_parser.add_argument("--no-save", action="store_true", help="Don't save images to downloads directory")

    # List windows command
    subparsers.add_parser("list-windows", help="List all open windows")

    # GUI command
    subparsers.add_parser("gui", help="Launch the GUI test harness")

    return parser.parse_args()

def main():
    """Main entry point for the CLI."""
    args = parse_args()

    if args.command == "server":
        run_server()

    elif args.command == "click":
        # Call the tool using the call_tool method
        import asyncio
        result = asyncio.run(mcp.call_tool("click_screen", {"x": args.x, "y": args.y}))
        print(result)

    elif args.command == "type":
        # Call the tool using the call_tool method
        import asyncio
        result = asyncio.run(mcp.call_tool("type_text", {"text": args.text}))
        print(result)

    elif args.command == "screenshot":
        if args.mode == "single_window" and not args.title:
            print("Error: --title is required for single_window mode")
            sys.exit(1)

        # Call the tool using the call_tool method
        import asyncio
        result = asyncio.run(mcp.call_tool("take_screenshot", {
            "mode": args.mode,
            "title_pattern": args.title,
            "use_regex": args.regex,
            "save_to_downloads": not args.no_save
        }))

        if args.output:
            # Save the screenshot to a specific file path provided by user
            with open(args.output, "wb") as f:
                f.write(result.image.data)
            print(f"Screenshot saved to {args.output}")
        elif hasattr(result, 'file_path'):
            # If image was saved to downloads, show the path
            print(f"Screenshot saved to {result.file_path}")
        else:
            print("Screenshot taken successfully")

        # If we have multiple results (all_windows mode)
        if args.mode == "all_windows" and isinstance(result, list):
            print("\nAll screenshots:")
            for i, item in enumerate(result):
                if hasattr(item, 'file_path'):
                    window_title = item.window_info.title if hasattr(item, 'window_info') else f"Window {i+1}"
                    print(f"{i+1}. {window_title}: {item.file_path}")

    elif args.command == "list-windows":
        # Call the tool using the call_tool method
        import asyncio
        result = asyncio.run(mcp.call_tool("list_windows", {}))

        # Parse the result
        windows = []
        for item in result:
            if hasattr(item, 'text'):
                try:
                    import json
                    window_info = json.loads(item.text)
                    windows.append(window_info)
                except json.JSONDecodeError:
                    print(f"Failed to parse window info: {item.text}")

        # Display the windows
        for i, window in enumerate(windows):
            print(f"{i+1}. {window.get('title')} ({window.get('width')}x{window.get('height')})")

    elif args.command == "gui":
        from computer_control_mcp.gui import main as run_gui
        run_gui()

    else:
        # When no command is specified, run the server by default
        print("MCP server started!")
        run_server()

if __name__ == "__main__":
    main()

```

--------------------------------------------------------------------------------
/tests/test_computer_control.py:
--------------------------------------------------------------------------------

```python
"""
Tests for the Computer Control MCP package.
"""

import pytest
from unittest.mock import Mock, patch
import json
import sys
import tkinter as tk
from tkinter import ttk
import asyncio
import os
import ast
from computer_control_mcp.core import mcp

# Helper function to print request/response JSON, skipping non-serializable properties
def print_json_data(name, request_data=None, response_data=None):
    def serialize(obj):
        try:
            json.dumps(obj)
            return obj
        except (TypeError, OverflowError):
            return str(obj)

    print(f"\n===== TEST: {name} =====", file=sys.stderr)
    if isinstance(request_data, dict):
        serializable_request = {k: serialize(v) for k, v in request_data.items()}
        print(f"REQUEST: {json.dumps(serializable_request, indent=2)}", file=sys.stderr)
    elif request_data is not None:
        print(f"REQUEST: {serialize(request_data)}", file=sys.stderr)
    if response_data is not None:
        if isinstance(response_data, dict):
            serializable_response = {k: serialize(v) for k, v in response_data.items()}
            print(
                f"RESPONSE: {json.dumps(serializable_response, indent=2)}",
                file=sys.stderr,
            )
        else:
            print(f"RESPONSE: {serialize(response_data)}", file=sys.stderr)
    print("======================\n", file=sys.stderr)


# Test drag_mouse tool
@pytest.mark.asyncio
async def test_drag_mouse():
    # Test data
    test_window = tk.Tk()
    test_window.title("Test Drag Mouse")
    test_window.geometry("400x400")

    # Update the window to ensure coordinates are calculated
    test_window.update_idletasks()
    test_window.update()

    # Window title coordinates
    window_x = test_window.winfo_x()
    window_y = test_window.winfo_y()

    screen_width = test_window.winfo_screenwidth()
    screen_height = test_window.winfo_screenheight()
    center_x = screen_width // 2
    center_y = screen_height // 2
    request_data = {
        "from_x": window_x + 55,
        "from_y": window_y + 15,
        "to_x": center_x,
        "to_y": center_y,
        "duration": 1.0,
    }

    print(f"starting coordinates: x={window_x}, y={window_y}", file=sys.stderr)

    # Create an event to track completion
    drag_complete = asyncio.Event()

    async def perform_drag():
        try:
            result = await mcp.call_tool("drag_mouse", request_data)
            print(f"Result: {result}", file=sys.stderr)
        finally:
            drag_complete.set()

    # Start the drag operation
    drag_task = asyncio.create_task(perform_drag())

    # Keep updating the window while waiting for drag to complete
    while not drag_complete.is_set():
        test_window.update()
        await asyncio.sleep(0.01)  # Small delay to prevent high CPU usage

    # Wait for drag operation to complete
    await drag_task

    window_x_end = test_window.winfo_x()
    window_y_end = test_window.winfo_y()
    print(f'ending coordinates: x={window_x_end}, y={window_y_end}', file=sys.stderr)

    assert window_y_end != window_y and window_x_end != window_x

    test_window.destroy()


# Test list_windows tool
@pytest.mark.asyncio
async def test_list_windows():
    # open tkinter
    test_window = tk.Tk()
    test_window.title("Test Window")
    test_window.geometry("400x400")

    # Update the window to ensure coordinates are calculated
    test_window.update_idletasks()
    test_window.update()

    # list all windows
    result = await mcp.call_tool("list_windows", {})

    # check if "Test Window" is in the list
    # Parse the TextContent objects to extract the JSON data
    window_data = []
    for item in result:
        if hasattr(item, 'text'):
            try:
                window_info = json.loads(item.text)
                window_data.append(window_info)
            except json.JSONDecodeError:
                print(f"Failed to parse JSON: {item.text}", file=sys.stderr)

    print(f"Result: {window_data}")

    assert any(window.get("title") == "Test Window" for window in window_data)

    test_window.destroy()

# Test screenshot with downloads
@pytest.mark.asyncio
async def test_take_screenshot():
    # Take a screenshot of the whole screen and save to downloads
    results = await mcp.call_tool("take_screenshot", {'save_to_downloads': True, 'mode': 'whole_screen'})

    for result in results:
        # Check if file_path is in the result
        if hasattr(result, 'text'):
            try:
                result_dict = json.loads(result.text)
                print(f"Screenshot result: {result_dict['title']}", file=sys.stderr)
                assert 'file_path' in result_dict, "file_path should be in the result"
                file_path = result_dict['file_path']

                # Check if the file exists
                assert os.path.exists(file_path), f"File {file_path} should exist"
                print(f"Screenshot saved to: {file_path}", file=sys.stderr)

                # Clean up - remove the file
                os.remove(file_path)
                print(f"Removed test file: {file_path}", file=sys.stderr)
            except (ValueError, SyntaxError, AttributeError) as e:
                print(f"Error processing result: {e}", file=sys.stderr)
                assert False, f"Error processing result: {e}"

    assert True, "Successfully tested screenshot with downloads"

```

--------------------------------------------------------------------------------
/src/computer_control_mcp/test.py:
--------------------------------------------------------------------------------

```python
import shutil
import sys
import os
from typing import Dict, Any, List, Optional, Tuple
from io import BytesIO
import re
import asyncio
import uuid
import datetime
from pathlib import Path
import tempfile

# --- Auto-install dependencies if needed ---
import pyautogui
from mcp.server.fastmcp import FastMCP, Image
import mss
from PIL import Image as PILImage
import pygetwindow as gw
from fuzzywuzzy import fuzz, process

import cv2
from rapidocr_onnxruntime import RapidOCR, VisRes


def log(message: str) -> None:
    """Log a message to stderr."""
    print(f"STDOUT: {message}", file=sys.stderr)


def get_downloads_dir() -> Path:
    """Get the OS downloads directory."""
    if os.name == "nt":  # Windows
        import winreg

        sub_key = r"SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\Shell Folders"
        downloads_guid = "{374DE290-123F-4565-9164-39C4925E467B}"
        with winreg.OpenKey(winreg.HKEY_CURRENT_USER, sub_key) as key:
            downloads_dir = winreg.QueryValueEx(key, downloads_guid)[0]
        return Path(downloads_dir)
    else:  # macOS, Linux, etc.
        return Path.home() / "Downloads"


def _mss_screenshot(region=None):
    """Take a screenshot using mss and return PIL Image.
    
    Args:
        region: Optional tuple (left, top, width, height) for region capture
        
    Returns:
        PIL Image object
    """
    with mss.mss() as sct:
        if region is None:
            # Full screen screenshot
            monitor = sct.monitors[0]  # All monitors combined
        else:
            # Region screenshot
            left, top, width, height = region
            monitor = {
                "left": left,
                "top": top,
                "width": width,
                "height": height,
            }
        
        screenshot = sct.grab(monitor)
        # Convert to PIL Image
        return PILImage.frombytes("RGB", screenshot.size, screenshot.bgra, "raw", "BGRX")


def save_image_to_downloads(
    image, prefix: str = "screenshot", directory: Path = None
) -> Tuple[str, bytes]:
    """Save an image to the downloads directory and return its absolute path.

    Args:
        image: Either a PIL Image object or MCP Image object
        prefix: Prefix for the filename (default: 'screenshot')
        directory: Optional directory to save the image to

    Returns:
        Tuple of (absolute_path, image_data_bytes)
    """
    # Create a unique filename with timestamp
    timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
    unique_id = str(uuid.uuid4())[:8]
    filename = f"{prefix}_{timestamp}_{unique_id}.png"

    # Get downloads directory
    downloads_dir = directory or get_downloads_dir()
    filepath = downloads_dir / filename

    # Handle different image types
    if hasattr(image, "save"):  # PIL Image
        image.save(filepath)
        # Also get the bytes for returning
        img_byte_arr = BytesIO()
        image.save(img_byte_arr, format="PNG")
        img_bytes = img_byte_arr.getvalue()
    elif hasattr(image, "data"):  # MCP Image
        img_bytes = image.data
        with open(filepath, "wb") as f:
            f.write(img_bytes)
    else:
        raise TypeError("Unsupported image type")

    log(f"Saved image to {filepath}")
    return str(filepath.absolute()), img_bytes


def _find_matching_window(
    windows: any,
    title_pattern: str = None,
    use_regex: bool = False,
    threshold: int = 60,
) -> Optional[Dict[str, Any]]:
    """Helper function to find a matching window based on title pattern.

    Args:
        windows: List of window dictionaries
        title_pattern: Pattern to match window title
        use_regex: If True, treat the pattern as a regex, otherwise use fuzzy matching
        threshold: Minimum score (0-100) required for a fuzzy match

    Returns:
        The best matching window or None if no match found
    """
    if not title_pattern:
        log("No title pattern provided, returning None")
        return None

    # For regex matching
    if use_regex:
        for window in windows:
            if re.search(title_pattern, window["title"], re.IGNORECASE):
                log(f"Regex match found: {window['title']}")
                return window
        return None

    # For fuzzy matching using fuzzywuzzy
    # Extract all window titles
    window_titles = [window["title"] for window in windows]

    # Use process.extractOne to find the best match
    best_match_title, score = process.extractOne(
        title_pattern, window_titles, scorer=fuzz.partial_ratio
    )
    log(f"Best fuzzy match: '{best_match_title}' with score {score}")

    # Only return if the score is above the threshold
    if score >= threshold:
        # Find the window with the matching title
        for window in windows:
            if window["title"] == best_match_title:
                return window

    return None


def take_screenshot(
    title_pattern: str = None,
    use_regex: bool = False,
    threshold: int = 60,
    save_to_downloads: bool = False,
) -> Image:
    """
    Take screenshots based on the specified title pattern and save them to the downloads directory with absolute paths returned.
    If no title pattern is provided, take screenshot of entire screen.

    Args:
        title_pattern: Pattern to match window title, if None, take screenshot of entire screen
        use_regex: If True, treat the pattern as a regex, otherwise best match with fuzzy matching
        save_to_downloads: If True, save the screenshot to the downloads directory and return the absolute path
        threshold: Minimum score (0-100) required for a fuzzy match

    Returns:
        Always returns a single screenshot as MCP Image object, content type image not supported means preview isnt supported but Image object is there.
    """
    try:
        all_windows = gw.getAllWindows()

        # Convert to list of dictionaries for _find_matching_window
        windows = []
        for window in all_windows:
            if window.title:  # Only include windows with titles
                windows.append(
                    {
                        "title": window.title,
                        "window_obj": window,  # Store the actual window object
                    }
                )

        print(f"Found {len(windows)} windows")
        window = _find_matching_window(windows, title_pattern, use_regex, threshold)
        window = window["window_obj"] if window else None

        # Store the currently active window
        current_active_window = gw.getActiveWindow()

        # Take the screenshot
        if not window:
            print("No matching window found, taking screenshot of entire screen")
            screenshot = _mss_screenshot()
        else:
            print(f"Taking screenshot of window: {window.title}")
            # Activate the window and wait for it to be fully in focus
            window.activate()
            pyautogui.sleep(0.5)  # Wait for 0.5 seconds to ensure window is active
            screenshot = _mss_screenshot(
                region=(window.left, window.top, window.width, window.height)
            )
            # Restore the previously active window
            if current_active_window:
                current_active_window.activate()
                pyautogui.sleep(0.2)  # Wait a bit to ensure previous window is restored

        # Create temp directory
        temp_dir = Path(tempfile.mkdtemp())

        # Save screenshot and get filepath
        filepath, _ = save_image_to_downloads(
            screenshot, prefix="screenshot", directory=temp_dir
        )

        # Create Image object from filepath
        image = Image(filepath)

        # Copy from temp to downloads
        if save_to_downloads:
            print("Copying screenshot from temp to downloads")
            shutil.copy(filepath, get_downloads_dir())

        return image  # MCP Image object

    except Exception as e:
        print(f"Error taking screenshot: {str(e)}")
        return f"Error taking screenshot: {str(e)}"


def get_ocr_from_screenshot(
    title_pattern: str = None,
    use_regex: bool = False,
    threshold: int = 60,
    scale_percent: int = 100,
) -> any:
    """
    Get OCR text from the specified title pattern and save them to the downloads directory with absolute paths returned.
    If no title pattern is provided, get all Text on the screen.

    Args:
        title_pattern: Pattern to match window title, if None, get all UI elements on the screen
        use_regex: If True, treat the pattern as a regex, otherwise best match with fuzzy matching
        save_to_downloads: If True, save the screenshot to the downloads directory and return the absolute path
        threshold: Minimum score (0-100) required for a fuzzy match

    Returns:
        List of UI elements as MCP Image objects
    """
    try:

        all_windows = gw.getAllWindows()

        # Convert to list of dictionaries for _find_matching_window
        windows = []
        for window in all_windows:
            if window.title:  # Only include windows with titles
                windows.append(
                    {
                        "title": window.title,
                        "window_obj": window,  # Store the actual window object
                    }
                )

        log(f"Found {len(windows)} windows")
        window = _find_matching_window(windows, title_pattern, use_regex, threshold)
        window = window["window_obj"] if window else None

        # Store the currently active window
        current_active_window = gw.getActiveWindow()

        # Take the screenshot
        if not window:
            log("No matching window found, taking screenshot of entire screen")
            screenshot = _mss_screenshot()
        else:
            log(f"Taking screenshot of window: {window.title}")
            # Activate the window and wait for it to be fully in focus
            window.activate()
            pyautogui.sleep(0.5)  # Wait for 0.5 seconds to ensure window is active
            screenshot = _mss_screenshot(
                region=(window.left, window.top, window.width, window.height)
            )
            # Restore the previously active window
            if current_active_window:
                current_active_window.activate()
                pyautogui.sleep(0.2)  # Wait a bit to ensure previous window is restored

        # Create temp directory
        temp_dir = Path(tempfile.mkdtemp())

        # Save screenshot and get filepath
        filepath, _ = save_image_to_downloads(
            screenshot, prefix="screenshot", directory=temp_dir
        )

        # Create Image object from filepath
        image = Image(filepath)

        # Copy from temp to downloads
        if False:
            log("Copying screenshot from temp to downloads")
            shutil.copy(filepath, get_downloads_dir())

        image_path = image.path
        img = cv2.imread(image_path)

        # Lower down resolution before processing
        width = int(img.shape[1] * scale_percent / 100)
        height = int(img.shape[0] * scale_percent / 100)
        dim = (width, height)
        resized_img = cv2.resize(img, dim, interpolation=cv2.INTER_AREA)
        # save resized image to pwd
        # cv2.imwrite("resized_img.png", resized_img)
        engine = RapidOCR()
        vis = VisRes()

        result, elapse_list = engine(resized_img)
        boxes, txts, scores = list(zip(*result))
        boxes = [[[x + window.left, y + window.top] for x, y in box] for box in boxes]
        zipped_results = list(zip(boxes, txts, scores))

        return zipped_results

    except Exception as e:
        log(f"Error getting UI elements: {str(e)}")
        import traceback

        stack_trace = traceback.format_exc()
        log(f"Stack trace:\n{stack_trace}")
        return f"Error getting UI elements: {str(e)}\nStack trace:\n{stack_trace}"


import json

print(json.dumps(get_ocr_from_screenshot("chrome")))

```

--------------------------------------------------------------------------------
/src/computer_control_mcp/core.py:
--------------------------------------------------------------------------------

```python
#!/usr/bin/env python3
"""
Computer Control MCP - Core Implementation
A compact ModelContextProtocol server that provides computer control capabilities
using PyAutoGUI for mouse/keyboard control.
"""

import json
import shutil
import sys
import os
from typing import Dict, Any, List, Optional, Tuple
from io import BytesIO
import re
import asyncio
import uuid
import datetime
from pathlib import Path
import tempfile
from typing import Union

# --- Auto-install dependencies if needed ---
import pyautogui
from mcp.server.fastmcp import FastMCP, Image
import mss
from PIL import Image as PILImage

try:
    import pywinctl as gw
except (NotImplementedError, ImportError):
    import pygetwindow as gw
from fuzzywuzzy import fuzz, process

import cv2
from rapidocr import RapidOCR

from pydantic import BaseModel

BaseModel.model_config = {"arbitrary_types_allowed": True}

engine = RapidOCR()


DEBUG = True  # Set to False in production
RELOAD_ENABLED = True  # Set to False to disable auto-reload

# Create FastMCP server instance at module level
mcp = FastMCP("ComputerControlMCP")


# Determine mode automatically
IS_DEVELOPMENT = os.getenv("ENV") == "development"


def log(message: str) -> None:
    """Log to stderr in dev, to stdout or file in production."""
    if IS_DEVELOPMENT:
        # In dev, write to stderr
        print(f"[DEV] {message}", file=sys.stderr)
    else:
        # In production, write to stdout or a file
        print(f"[PROD] {message}", file=sys.stdout)
        # or append to a file: open("app.log", "a").write(message+"\n")


def get_downloads_dir() -> Path:
    """Get the OS downloads directory."""
    if os.name == "nt":  # Windows
        import winreg

        sub_key = r"SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\Shell Folders"
        downloads_guid = "{374DE290-123F-4565-9164-39C4925E467B}"
        with winreg.OpenKey(winreg.HKEY_CURRENT_USER, sub_key) as key:
            downloads_dir = winreg.QueryValueEx(key, downloads_guid)[0]
        return Path(downloads_dir)
    else:  # macOS, Linux, etc.
        return Path.home() / "Downloads"


def _mss_screenshot(region=None):
    """Take a screenshot using mss and return PIL Image.

    Args:
        region: Optional tuple (left, top, width, height) for region capture

    Returns:
        PIL Image object
    """
    with mss.mss() as sct:
        if region is None:
            # Full screen screenshot
            monitor = sct.monitors[0]  # All monitors combined
        else:
            # Region screenshot
            left, top, width, height = region
            monitor = {
                "left": left,
                "top": top,
                "width": width,
                "height": height,
            }

        screenshot = sct.grab(monitor)
        # Convert to PIL Image
        return PILImage.frombytes(
            "RGB", screenshot.size, screenshot.bgra, "raw", "BGRX"
        )


def save_image_to_downloads(
    image, prefix: str = "screenshot", directory: Path = None
) -> Tuple[str, bytes]:
    """Save an image to the downloads directory and return its absolute path.

    Args:
        image: Either a PIL Image object or MCP Image object
        prefix: Prefix for the filename (default: 'screenshot')
        directory: Optional directory to save the image to

    Returns:
        Tuple of (absolute_path, image_data_bytes)
    """
    # Create a unique filename with timestamp
    timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
    unique_id = str(uuid.uuid4())[:8]
    filename = f"{prefix}_{timestamp}_{unique_id}.png"

    # Get downloads directory
    downloads_dir = directory or get_downloads_dir()
    filepath = downloads_dir / filename

    # Handle different image types
    if hasattr(image, "save"):  # PIL Image
        image.save(filepath)
        # Also get the bytes for returning
        img_byte_arr = BytesIO()
        image.save(img_byte_arr, format="PNG")
        img_bytes = img_byte_arr.getvalue()
    elif hasattr(image, "data"):  # MCP Image
        img_bytes = image.data
        with open(filepath, "wb") as f:
            f.write(img_bytes)
    else:
        raise TypeError("Unsupported image type")

    log(f"Saved image to {filepath}")
    return str(filepath.absolute()), img_bytes


def _find_matching_window(
    windows: any,
    title_pattern: str = None,
    use_regex: bool = False,
    threshold: int = 10,
) -> Optional[Dict[str, Any]]:
    """Helper function to find a matching window based on title pattern.

    Args:
        windows: List of window dictionaries
        title_pattern: Pattern to match window title
        use_regex: If True, treat the pattern as a regex, otherwise use fuzzy matching
        threshold: Minimum score (0-100) required for a fuzzy match

    Returns:
        The best matching window or None if no match found
    """
    if not title_pattern:
        log("No title pattern provided, returning None")
        return None

    # For regex matching
    if use_regex:
        for window in windows:
            if re.search(title_pattern, window["title"], re.IGNORECASE):
                log(f"Regex match found: {window['title']}")
                return window
        return None

    # For fuzzy matching using fuzzywuzzy
    # Extract all window titles
    window_titles = [window["title"] for window in windows]

    # Use process.extractOne to find the best match
    best_match_title, score = process.extractOne(
        title_pattern, window_titles, scorer=fuzz.partial_ratio
    )
    log(f"Best fuzzy match: '{best_match_title}' with score {score}")

    # Only return if the score is above the threshold
    if score >= threshold:
        # Find the window with the matching title
        for window in windows:
            if window["title"] == best_match_title:
                return window

    return None


# --- MCP Function Handlers ---


@mcp.tool()
def click_screen(x: int, y: int) -> str:
    """Click at the specified screen coordinates."""
    try:
        pyautogui.click(x=x, y=y)
        return f"Successfully clicked at coordinates ({x}, {y})"
    except Exception as e:
        return f"Error clicking at coordinates ({x}, {y}): {str(e)}"


@mcp.tool()
def get_screen_size() -> Dict[str, Any]:
    """Get the current screen resolution."""
    try:
        width, height = pyautogui.size()
        return {
            "width": width,
            "height": height,
            "message": f"Screen size: {width}x{height}",
        }
    except Exception as e:
        return {"error": str(e), "message": f"Error getting screen size: {str(e)}"}


@mcp.tool()
def type_text(text: str) -> str:
    """Type the specified text at the current cursor position."""
    try:
        pyautogui.typewrite(text)
        return f"Successfully typed text: {text}"
    except Exception as e:
        return f"Error typing text: {str(e)}"


@mcp.tool()
def take_screenshot(
    title_pattern: str = None,
    use_regex: bool = False,
    threshold: int = 10,
    scale_percent_for_ocr: int = None,
    save_to_downloads: bool = False,
) -> Image:
    """
    Get screenshot Image as MCP Image object. If no title pattern is provided, get screenshot of entire screen and all text on the screen.

    Args:
        title_pattern: Pattern to match window title, if None, take screenshot of entire screen
        use_regex: If True, treat the pattern as a regex, otherwise best match with fuzzy matching
        threshold: Minimum score (0-100) required for a fuzzy match
        scale_percent_for_ocr: Percentage to scale the image down before processing, you wont need this most of the time unless your pc is extremely old or slow
        save_to_downloads: If True, save the screenshot to the downloads directory and return the absolute path

    Returns:
        Returns a single screenshot as MCP Image object. "content type image not supported" means preview isnt supported but Image object is there and returned successfully.
    """
    try:
        all_windows = gw.getAllWindows()

        # Convert to list of dictionaries for _find_matching_window
        windows = []
        for window in all_windows:
            if window.title:  # Only include windows with titles
                windows.append(
                    {
                        "title": window.title,
                        "window_obj": window,  # Store the actual window object
                    }
                )

        log(f"Found {len(windows)} windows")
        window = _find_matching_window(windows, title_pattern, use_regex, threshold)
        window = window["window_obj"] if window else None

        import ctypes
        import time

        def force_activate(window):
            """Force a window to the foreground on Windows."""
            try:
                hwnd = window._hWnd  # pywinctl window handle

                # Restore if minimized
                if window.isMinimized:
                    window.restore()
                    time.sleep(0.1)

                # Bring to top and set foreground
                ctypes.windll.user32.SetForegroundWindow(hwnd)
                ctypes.windll.user32.BringWindowToTop(hwnd)
                window.activate()  # fallback
                time.sleep(0.3)  # wait for OS to update

            except Exception as e:
                print(f"Warning: Could not force window: {e}", file=sys.stderr)

        # Take the screenshot
        if not window:
            log("No matching window found, taking screenshot of entire screen")
            screenshot = _mss_screenshot()
        else:
            try:
                # Re-fetch window handle to ensure it's valid
                window = gw.getWindowsWithTitle(window.title)[0]
                current_active_window = gw.getActiveWindow()
                log(f"Taking screenshot of window: {window.title}")

                if sys.platform == "win32":
                    force_activate(window)
                else:
                    window.activate()
                pyautogui.sleep(0.5)  # Give Windows time to focus

                screen_width, screen_height = pyautogui.size()

                screenshot = _mss_screenshot(
                    region=(
                        max(window.left, 0),
                        max(window.top, 0),
                        min(window.width, screen_width),
                        min(window.height, screen_height),
                    )
                )

                # Restore previously active window
                if current_active_window and current_active_window != window:
                    try:
                        if sys.platform == "win32":
                            force_activate(current_active_window)
                        else:
                            current_active_window.activate()
                        pyautogui.sleep(0.2)
                    except Exception as e:
                        log(f"Error restoring previous window: {str(e)}")
            except Exception as e:
                log(f"Error taking screenshot of window: {str(e)}")
                screenshot = _mss_screenshot()  # fallback to full screen

        # Create temp directory
        temp_dir = Path(tempfile.mkdtemp())

        # Save screenshot and get filepath
        filepath, _ = save_image_to_downloads(
            screenshot, prefix="screenshot", directory=temp_dir
        )

        # Create Image object from filepath
        image = Image(filepath)

        if save_to_downloads:
            log("Copying screenshot from temp to downloads")
            shutil.copy(filepath, get_downloads_dir())

        return image  # MCP Image object

    except Exception as e:
        log(f"Error in screenshot or getting UI elements: {str(e)}")
        import traceback

        stack_trace = traceback.format_exc()
        log(f"Stack trace:\n{stack_trace}")
        return f"Error in screenshot or getting UI elements: {str(e)}\nStack trace:\n{stack_trace}"


def is_low_spec_pc() -> bool:
    try:
        import psutil

        cpu_low = psutil.cpu_count(logical=False) < 4
        ram_low = psutil.virtual_memory().total < 8 * 1024**3
        return cpu_low or ram_low
    except Exception:
        # Fallback if psutil not available or info unavailable
        return False


@mcp.tool()
def take_screenshot_with_ocr(
    title_pattern: str = None,
    use_regex: bool = False,
    threshold: int = 10,
    scale_percent_for_ocr: int = None,
    save_to_downloads: bool = False,
) -> str:
    """
    Get OCR text from screenshot with absolute coordinates as JSON string of List[Tuple[List[List[int]], str, float]] (returned after adding the window offset from true (0, 0) of screen to the OCR coordinates, so clicking is on-point. Recommended to click in the middle of OCR Box) and using confidence from window with the specified title pattern. If no title pattern is provided, get screenshot of entire screen and all text on the screen. Know that OCR takes around 20 seconds on an mid-spec pc at 1080p resolution.

    Args:
        title_pattern: Pattern to match window title, if None, take screenshot of entire screen
        use_regex: If True, treat the pattern as a regex, otherwise best match with fuzzy matching
        threshold: Minimum score (0-100) required for a fuzzy match
        scale_percent_for_ocr: Percentage to scale the image down before processing, you wont need this most of the time unless your pc is extremely old or slow
        save_to_downloads: If True, save the screenshot to the downloads directory and return the absolute path

    Returns:
        Returns a list of UI elements as List[Tuple[List[List[int]], str, float]] where each tuple is [[4 corners of box], text, confidence], "content type image not supported" means preview isnt supported but Image object is there.
    """
    try:
        all_windows = gw.getAllWindows()

        # Convert to list of dictionaries for _find_matching_window
        windows = []
        for window in all_windows:
            if window.title:  # Only include windows with titles
                windows.append(
                    {
                        "title": window.title,
                        "window_obj": window,  # Store the actual window object
                    }
                )

        log(f"Found {len(windows)} windows")
        window = _find_matching_window(windows, title_pattern, use_regex, threshold)
        window = window["window_obj"] if window else None

        # Store the currently active window

        # Take the screenshot
        if not window:
            log("No matching window found, taking screenshot of entire screen")
            screenshot = _mss_screenshot()
        else:
            current_active_window = gw.getActiveWindow()
            log(f"Taking screenshot of window: {window.title}")
            # Activate the window and wait for it to be fully in focus
            try:
                window.activate()
                pyautogui.sleep(0.5)  # Wait for 0.5 seconds to ensure window is active
                screenshot = _mss_screenshot(
                    region=(window.left, window.top, window.width, window.height)
                )
                # Restore the previously active window
                if current_active_window:
                    try:
                        current_active_window.activate()
                        pyautogui.sleep(
                            0.2
                        )  # Wait a bit to ensure previous window is restored
                    except Exception as e:
                        log(f"Error restoring previous window: {str(e)}")
            except Exception as e:
                log(f"Error taking screenshot of window: {str(e)}")
                return f"Error taking screenshot of window: {str(e)}"

        # Create temp directory
        temp_dir = Path(tempfile.mkdtemp())

        # Save screenshot and get filepath
        filepath, _ = save_image_to_downloads(
            screenshot, prefix="screenshot", directory=temp_dir
        )

        # Create Image object from filepath
        image = Image(filepath)

        # Copy from temp to downloads
        if save_to_downloads:
            log("Copying screenshot from temp to downloads")
            shutil.copy(filepath, get_downloads_dir())

        image_path = image.path
        img = cv2.imread(image_path)

        if scale_percent_for_ocr is None:
            # Calculate percent to scale height to 360 pixels
            scale_percent_for_ocr = 100  # 360 / img.shape[0] * 100

        # Lower down resolution before processing
        width = int(img.shape[1] * scale_percent_for_ocr / 100)
        height = int(img.shape[0] * scale_percent_for_ocr / 100)
        dim = (width, height)
        resized_img = cv2.resize(img, dim, interpolation=cv2.INTER_AREA)
        # save resized image to pwd
        # cv2.imwrite("resized_img.png", resized_img)

        output = engine(resized_img)
        boxes = output.boxes
        txts = output.txts
        scores = output.scores
        zipped_results = list(zip(boxes, txts, scores))
        zipped_results = [
            (
                box.tolist(),
                text,
                float(score),
            )  # convert np.array -> list, ensure score is float
            for box, text, score in zipped_results
        ]
        log(f"Found {len(zipped_results)} text items in OCR result.")
        log(f"First 5 items: {zipped_results[:5]}")
        return (
            ",\n".join([str(item) for item in zipped_results])
            if zipped_results
            else "No text found"
        )

    except Exception as e:
        log(f"Error in screenshot or getting UI elements: {str(e)}")
        import traceback

        stack_trace = traceback.format_exc()
        log(f"Stack trace:\n{stack_trace}")
        return f"Error in screenshot or getting UI elements: {str(e)}\nStack trace:\n{stack_trace}"


@mcp.tool()
def move_mouse(x: int, y: int) -> str:
    """Move the mouse to the specified screen coordinates."""
    try:
        pyautogui.moveTo(x=x, y=y)
        return f"Successfully moved mouse to coordinates ({x}, {y})"
    except Exception as e:
        return f"Error moving mouse to coordinates ({x}, {y}): {str(e)}"


@mcp.tool()
def mouse_down(button: str = "left") -> str:
    """Hold down a mouse button ('left', 'right', 'middle')."""
    try:
        pyautogui.mouseDown(button=button)
        return f"Held down {button} mouse button"
    except Exception as e:
        return f"Error holding {button} mouse button: {str(e)}"


@mcp.tool()
def mouse_up(button: str = "left") -> str:
    """Release a mouse button ('left', 'right', 'middle')."""
    try:
        pyautogui.mouseUp(button=button)
        return f"Released {button} mouse button"
    except Exception as e:
        return f"Error releasing {button} mouse button: {str(e)}"


@mcp.tool()
async def drag_mouse(
    from_x: int, from_y: int, to_x: int, to_y: int, duration: float = 0.5
) -> str:
    """
    Drag the mouse from one position to another.

    Args:
        from_x: Starting X coordinate
        from_y: Starting Y coordinate
        to_x: Ending X coordinate
        to_y: Ending Y coordinate
        duration: Duration of the drag in seconds (default: 0.5)

    Returns:
        Success or error message
    """
    try:
        # First move to the starting position
        pyautogui.moveTo(x=from_x, y=from_y)
        # Then drag to the destination
        log("starting drag")
        await asyncio.to_thread(pyautogui.dragTo, x=to_x, y=to_y, duration=duration)
        log("done drag")
        return f"Successfully dragged from ({from_x}, {from_y}) to ({to_x}, {to_y})"
    except Exception as e:
        return f"Error dragging from ({from_x}, {from_y}) to ({to_x}, {to_y}): {str(e)}"


import pyautogui
from typing import Union, List


@mcp.tool()
def key_down(key: str) -> str:
    """Hold down a specific keyboard key until released."""
    try:
        pyautogui.keyDown(key)
        return f"Held down key: {key}"
    except Exception as e:
        return f"Error holding key {key}: {str(e)}"


@mcp.tool()
def key_up(key: str) -> str:
    """Release a specific keyboard key."""
    try:
        pyautogui.keyUp(key)
        return f"Released key: {key}"
    except Exception as e:
        return f"Error releasing key {key}: {str(e)}"


@mcp.tool()
def press_keys(keys: Union[str, List[Union[str, List[str]]]]) -> str:
    """
    Press keyboard keys.

    Args:
        keys:
            - Single key as string (e.g., "enter")
            - Sequence of keys as list (e.g., ["a", "b", "c"])
            - Key combinations as nested list (e.g., [["ctrl", "c"], ["alt", "tab"]])

    Examples:
        press_keys("enter")
        press_keys(["a", "b", "c"])
        press_keys([["ctrl", "c"], ["alt", "tab"]])
    """
    try:
        if isinstance(keys, str):
            # Single key
            pyautogui.press(keys)
            return f"Pressed single key: {keys}"

        elif isinstance(keys, list):
            for item in keys:
                if isinstance(item, str):
                    # Sequential key press
                    pyautogui.press(item)
                elif isinstance(item, list):
                    # Key combination (e.g., ctrl+c)
                    pyautogui.hotkey(*item)
                else:
                    return f"Invalid key format: {item}"
            return f"Successfully pressed keys sequence: {keys}"

        else:
            return "Invalid input: must be str or list"

    except Exception as e:
        return f"Error pressing keys {keys}: {str(e)}"


@mcp.tool()
def list_windows() -> List[Dict[str, Any]]:
    """List all open windows on the system."""
    try:
        windows = gw.getAllWindows()
        result = []
        for window in windows:
            if window.title:  # Only include windows with titles
                result.append(
                    {
                        "title": window.title,
                        "left": window.left,
                        "top": window.top,
                        "width": window.width,
                        "height": window.height,
                        "is_active": window.isActive,
                        "is_visible": window.visible,
                        "is_minimized": window.isMinimized,
                        "is_maximized": window.isMaximized,
                        # "screenshot": pyautogui.screenshot(
                        #     region=(
                        #         window.left,
                        #         window.top,
                        #         window.width,
                        #         window.height,
                        #     )
                        # ),
                    }
                )
        return result
    except Exception as e:
        log(f"Error listing windows: {str(e)}")
        return [{"error": str(e)}]


@mcp.tool()
def activate_window(
    title_pattern: str, use_regex: bool = False, threshold: int = 60
) -> str:
    """
    Activate a window (bring it to the foreground) by matching its title.

    Args:
        title_pattern: Pattern to match window title
        use_regex: If True, treat the pattern as a regex, otherwise use fuzzy matching
        threshold: Minimum score (0-100) required for a fuzzy match

    Returns:
        Success or error message
    """
    try:
        # Get all windows
        all_windows = gw.getAllWindows()

        # Convert to list of dictionaries for _find_matching_window
        windows = []
        for window in all_windows:
            if window.title:  # Only include windows with titles
                windows.append(
                    {
                        "title": window.title,
                        "window_obj": window,  # Store the actual window object
                    }
                )

        # Find matching window using our improved function
        matched_window_dict = _find_matching_window(
            windows, title_pattern, use_regex, threshold
        )

        if not matched_window_dict:
            log(f"No window found matching pattern: {title_pattern}")
            return f"Error: No window found matching pattern: {title_pattern}"

        # Get the actual window object
        matched_window = matched_window_dict["window_obj"]

        # Activate the window
        matched_window.activate()

        return f"Successfully activated window: '{matched_window.title}'"
    except Exception as e:
        log(f"Error activating window: {str(e)}")
        return f"Error activating window: {str(e)}"


def main():
    """Main entry point for the MCP server."""
    pyautogui.FAILSAFE = True

    try:
        # Run the server
        log("Computer Control MCP Server Started...")
        mcp.run()

    except KeyboardInterrupt:
        log("Server shutting down...")
    except Exception as e:
        log(f"Error: {str(e)}")


if __name__ == "__main__":
    main()

```