#
tokens: 15477/50000 9/9 files
lines: on (toggle) GitHub
raw markdown copy reset
# Directory Structure

```
├── .gitignore
├── Dockerfile
├── MANIFEST.in
├── mcp_server_stats
│   ├── __init__.py
│   ├── __main__.py
│   └── server.py
├── README.md
├── rebuild-docker.ps1
├── requirements.txt
└── setup.py
```

# Files

--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------

```
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | share/python-wheels/
 24 | *.egg-info/
 25 | .installed.cfg
 26 | *.egg
 27 | MANIFEST
 28 | 
 29 | # PyInstaller
 30 | #  Usually these files are written by a python script from a template
 31 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 32 | *.manifest
 33 | *.spec
 34 | 
 35 | # Installer logs
 36 | pip-log.txt
 37 | pip-delete-this-directory.txt
 38 | 
 39 | # Unit test / coverage reports
 40 | htmlcov/
 41 | .tox/
 42 | .nox/
 43 | .coverage
 44 | .coverage.*
 45 | .cache
 46 | nosetests.xml
 47 | coverage.xml
 48 | *.cover
 49 | *.py,cover
 50 | .hypothesis/
 51 | .pytest_cache/
 52 | cover/
 53 | 
 54 | # Translations
 55 | *.mo
 56 | *.pot
 57 | 
 58 | # Django stuff:
 59 | *.log
 60 | local_settings.py
 61 | db.sqlite3
 62 | db.sqlite3-journal
 63 | 
 64 | # Flask stuff:
 65 | instance/
 66 | .webassets-cache
 67 | 
 68 | # Scrapy stuff:
 69 | .scrapy
 70 | 
 71 | # Sphinx documentation
 72 | docs/_build/
 73 | 
 74 | # PyBuilder
 75 | .pybuilder/
 76 | target/
 77 | 
 78 | # Jupyter Notebook
 79 | .ipynb_checkpoints
 80 | 
 81 | # IPython
 82 | profile_default/
 83 | ipython_config.py
 84 | 
 85 | # pyenv
 86 | #   For a library or package, you might want to ignore these files since the code is
 87 | #   intended to run in multiple environments; otherwise, check them in:
 88 | # .python-version
 89 | 
 90 | # pipenv
 91 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 92 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 93 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 94 | #   install all needed dependencies.
 95 | #Pipfile.lock
 96 | 
 97 | # UV
 98 | #   Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
 99 | #   This is especially recommended for binary packages to ensure reproducibility, and is more
100 | #   commonly ignored for libraries.
101 | #uv.lock
102 | 
103 | # poetry
104 | #   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
105 | #   This is especially recommended for binary packages to ensure reproducibility, and is more
106 | #   commonly ignored for libraries.
107 | #   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
108 | #poetry.lock
109 | 
110 | # pdm
111 | #   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
112 | #pdm.lock
113 | #   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
114 | #   in version control.
115 | #   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
116 | .pdm.toml
117 | .pdm-python
118 | .pdm-build/
119 | 
120 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
121 | __pypackages__/
122 | 
123 | # Celery stuff
124 | celerybeat-schedule
125 | celerybeat.pid
126 | 
127 | # SageMath parsed files
128 | *.sage.py
129 | 
130 | # Environments
131 | .env
132 | .venv
133 | env/
134 | venv/
135 | ENV/
136 | env.bak/
137 | venv.bak/
138 | 
139 | # Spyder project settings
140 | .spyderproject
141 | .spyproject
142 | 
143 | # Rope project settings
144 | .ropeproject
145 | 
146 | # mkdocs documentation
147 | /site
148 | 
149 | # mypy
150 | .mypy_cache/
151 | .dmypy.json
152 | dmypy.json
153 | 
154 | # Pyre type checker
155 | .pyre/
156 | 
157 | # pytype static type analyzer
158 | .pytype/
159 | 
160 | # Cython debug symbols
161 | cython_debug/
162 | 
163 | # PyCharm
164 | #  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
165 | #  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
166 | #  and can be added to the global gitignore or merged into this file.  For a more nuclear
167 | #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
168 | #.idea/
169 | 
170 | # Ruff stuff:
171 | .ruff_cache/
172 | 
173 | # PyPI configuration file
174 | .pypirc
175 | 
```

--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------

```markdown
  1 | # Statsource MCP Server
  2 | 
  3 | A Model Context Protocol server that provides statistical analysis capabilities. This server enables LLMs to analyze data from various sources, calculate statistics, and generate predictions.
  4 | 
  5 | The statistics tool connects to our analytics API and allows AI models to perform statistical analysis and generate ML predictions based on user data, whether it's in a PostgreSQL database or a CSV file.
  6 | 
  7 | ## Available Tools
  8 | 
  9 | ### get_statistics
 10 | 
 11 | Analyze data and calculate statistics or generate ML predictions based on provided parameters.
 12 | 
 13 | **Arguments:**
 14 | 
 15 | - `columns` (list of strings, required): List of column names to analyze or predict (Ask user for exact column names).
 16 | - `data_source` (string, optional): Path to data file (uploaded to statsource.me), database connection string (ask user for exact string), or API endpoint. If not provided, uses `DB_CONNECTION_STRING` from env config if set.
 17 | - `source_type` (string, optional): Type of data source ("csv", "database", or "api"). If not provided, uses `DB_SOURCE_TYPE` from env config if set.
 18 | - `table_name` (string, optional but **required** if `source_type` is "database"): Name of the database table to use (Ask user for exact table name).
 19 | - `statistics` (list of strings, optional): List of statistics to calculate (required for `query_type="statistics"`). Valid options include: 'mean', 'median', 'std', 'sum', 'count', 'min', 'max', 'describe', 'correlation', 'missing', 'unique', 'boxplot'.
 20 | - `query_type` (string, optional, default="statistics"): Type of query ("statistics" or "ml_prediction").
 21 | - `periods` (integer, optional): Number of future periods to predict (required for `query_type="ml_prediction"`).
 22 | - `filters` (dict, optional): Dictionary of column-value pairs to filter data (e.g., `{"status": "completed", "region": ["North", "East"]}`).
 23 | - `groupby` (list of strings, optional): List of column names to group data by before calculating statistics (e.g., `["region", "product_category"]`).
 24 | - `options` (dict, optional): Dictionary of additional options for specific operations.
 25 | - `date_column` (string, optional): Column name containing date/timestamp information for filtering and time-series analysis.
 26 | - `start_date` (string or datetime, optional): Inclusive start date for filtering (ISO 8601 format, e.g., "2023-01-01").
 27 | - `end_date` (string or datetime, optional): Inclusive end date for filtering (ISO 8601 format, e.g., "2023-12-31").
 28 | 
 29 | **Key Usage Notes:**
 30 | 
 31 | - **Data Sources:** For CSV, the user must upload the file to statsource.me first and provide the filename. For databases, ask the user for the _exact_ connection string and table name. Never guess or invent connection details.
 32 | - **Configuration:** If `data_source` and `source_type` are not provided, the tool will attempt to use `DB_CONNECTION_STRING` and `DB_SOURCE_TYPE` from the environment configuration (see below).
 33 | - **Filtering/Grouping:** Use `filters`, `groupby`, `date_column`, `start_date`, and `end_date` to analyze specific subsets of data.
 34 | 
 35 | ### suggest_feature
 36 | 
 37 | Suggest a new feature or improvement for the StatSource analytics platform.
 38 | 
 39 | **Arguments:**
 40 | 
 41 | - `description` (string, required): A clear, detailed description of the suggested feature
 42 | - `use_case` (string, required): Explanation of how and why users would use this feature
 43 | - `priority` (string, optional): Suggested priority level ("low", "medium", "high")
 44 | 
 45 | ## Installation
 46 | 
 47 | ### Using uv (recommended)
 48 | 
 49 | When using uv no specific installation is needed. We will use uvx to directly run mcp-server-stats.
 50 | 
 51 | ### Docker Support
 52 | 
 53 | A pre-built Docker image is available on Docker Hub, which simplifies running the server. You can use this image directly without needing to build it yourself.
 54 | 
 55 | Pull the image (optional, as `docker run` will do this automatically if the image isn't present locally):
 56 | 
 57 | ```bash
 58 | docker pull jamie78933/statsource-mcp
 59 | ```
 60 | 
 61 | To run the server using the Docker image:
 62 | 
 63 | ```bash
 64 | docker run -i --rm jamie78933/statsource-mcp
 65 | ```
 66 | 
 67 | Note: For actual usage within applications like Claude.app, refer to the Configuration section below for passing necessary environment variables like API keys and database connection strings.
 68 | 
 69 | ### Using PIP
 70 | 
 71 | Alternatively you can install mcp-server-stats via pip:
 72 | 
 73 | ```bash
 74 | pip install mcp-server-stats
 75 | ```
 76 | 
 77 | After installation, you can run it as a script using:
 78 | 
 79 | ```bash
 80 | python -m mcp_server_stats
 81 | ```
 82 | 
 83 | Or use the console script:
 84 | 
 85 | ```bash
 86 | mcp-server-stats
 87 | ```
 88 | 
 89 | ## Configuration
 90 | 
 91 | ### Configure for Claude.app
 92 | 
 93 | Add to your Claude settings:
 94 | 
 95 | **Using uvx**
 96 | 
 97 | ```json
 98 | "mcpServers": {
 99 |   "statsource": {
100 |     "command": "uvx",
101 |     "args": ["mcp-server-stats"]
102 |   }
103 | }
104 | ```
105 | 
106 | **Using docker**
107 | 
108 | ```json
109 | {
110 |   "mcpServers": {
111 |     "statsource": {
112 |       "command": "docker",
113 |       "args": [
114 |         "run",
115 |         "-i",
116 |         "--rm",
117 |         "-e",
118 |         "API_KEY=YOUR_STATSOURCE_API_KEY",
119 |         "-e",
120 |         "DB_CONNECTION_STRING=postgresql://your_db_user:your_db_password@your_db_host:5432/your_db_name",
121 |         "-e",
122 |         "DB_SOURCE_TYPE=database",
123 |         "jamie78933/statsource-mcp"
124 |       ],
125 |       "protocolVersion": "2024-11-05"
126 |     }
127 |   }
128 | }
129 | ```
130 | 
131 | **Using pip installation**
132 | 
133 | ```json
134 | "mcpServers": {
135 |   "statsource": {
136 |     "command": "python",
137 |     "args": ["-m", "mcp_server_stats"]
138 |   }
139 | }
140 | ```
141 | 
142 | ### Environment Variables
143 | 
144 | You can configure the server using environment variables in your Claude.app configuration:
145 | 
146 | ```json
147 | "mcpServers": {
148 |   "statsource": {
149 |     "command": "python",
150 |     "args": ["-m", "mcp_server_stats"],
151 |     "env": {
152 |       "API_KEY": "your_api_key",
153 |       "DB_CONNECTION_STRING": "postgresql://username:password@localhost:5432/your_db",
154 |       "DB_SOURCE_TYPE": "database"
155 |     }
156 |   }
157 | }
158 | ```
159 | 
160 | Available environment variables:
161 | 
162 | - `API_KEY`: Your API key for authentication with statsource.me
163 | - `DB_CONNECTION_STRING`: Default database connection string
164 | - `DB_SOURCE_TYPE`: Default data source type (usually "database")
165 | 
166 | ## Debugging
167 | 
168 | You can use the MCP inspector to debug the server. For uvx installations:
169 | 
170 | ```bash
171 | npx @modelcontextprotocol/inspector uvx mcp-server-stats
172 | ```
173 | 
174 | Or if you've installed the package in a specific directory or are developing on it:
175 | 
176 | ```bash
177 | cd path/to/servers/
178 | npx @modelcontextprotocol/inspector python -m mcp_server_stats
179 | ```
180 | 
181 | ## Contributing
182 | 
183 | We encourage contributions to help expand and improve mcp-server-stats. Whether you want to add new tools, enhance existing functionality, or improve documentation, your input is valuable.
184 | 
185 | Pull requests are welcome! Feel free to contribute new ideas, bug fixes, or enhancements to make mcp-server-stats even more powerful and useful.
186 | 
187 | ## License
188 | 
189 | mcp-server-stats is licensed under the MIT License. This means you are free to use, modify, and distribute the software, subject to the terms and conditions of the MIT License. For more details, please see the LICENSE file in the project repository.
190 | 
```

--------------------------------------------------------------------------------
/mcp_server_stats/__init__.py:
--------------------------------------------------------------------------------

```python
1 | """
2 | MCP Server Stats - A Model Context Protocol server for statistical analysis
3 | """
4 | 
5 | __version__ = "0.2.2" 
6 | 
```

--------------------------------------------------------------------------------
/mcp_server_stats/__main__.py:
--------------------------------------------------------------------------------

```python
1 | """
2 | Main entry point for the MCP server stats package.
3 | """
4 | 
5 | from .server import run_server
6 | 
7 | if __name__ == "__main__":
8 |     run_server() 
```

--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------

```
 1 | # Core dependencies
 2 | mcp>=0.2.0,<1.0.0  # Official Model Context Protocol Python SDK with FastMCP
 3 | requests>=2.31.0,<3.0.0
 4 | pydantic>=2.4.2,<3.0.0
 5 | python-dotenv>=1.0.0,<2.0.0  # For reading .env files (optional)
 6 | 
 7 | # Logging
 8 | python-json-logger>=2.0.7,<3.0.0  # For structured logging (optional)
 9 | 
10 | # Development
11 | # Use this for local development:
12 | # -e .
13 | 
14 | # Production
15 | # For production, install the package:
16 | # mcp-server-stats 
```

--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------

```dockerfile
 1 | # Use an official Python runtime as a parent image
 2 | FROM python:3.11-slim
 3 | 
 4 | # Set the working directory in the container
 5 | WORKDIR /app
 6 | 
 7 | # Copy the requirements file into the container at /app
 8 | COPY requirements.txt ./
 9 | 
10 | # Install any needed packages specified in requirements.txt
11 | # Use --no-cache-dir to reduce image size
12 | RUN pip install --no-cache-dir -r requirements.txt
13 | 
14 | # Copy the rest of the application code into the container at /app
15 | # Copy setup.py and MANIFEST.in for installation
16 | COPY setup.py MANIFEST.in ./
17 | COPY README.md ./
18 | # Copy the main package directory
19 | COPY mcp_server_stats ./mcp_server_stats
20 | 
21 | # Install the project itself
22 | RUN pip install --no-cache-dir .
23 | 
24 | # Make port 80 available to the world outside this container (if your server listens on a port - adjust if needed)
25 | # EXPOSE 80 # MCP usually communicates over stdin/stdout, so this might not be necessary
26 | 
27 | # Define environment variables (optional, can be overridden at runtime)
28 | # ENV API_KEY="your_default_api_key"
29 | # ENV DB_CONNECTION_STRING="your_default_db_string"
30 | # ENV DB_SOURCE_TYPE="database"
31 | 
32 | # Run the application when the container launches
33 | CMD ["python", "-m", "mcp_server_stats"] 
```

--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------

```python
 1 | from setuptools import setup, find_packages
 2 | import os
 3 | from mcp_server_stats import __version__
 4 | 
 5 | # Read long description from README.md
 6 | with open("README.md", "r", encoding="utf-8") as fh:
 7 |     long_description = fh.read()
 8 | 
 9 | setup(
10 |     name="mcp-server-stats",
11 |     version=__version__,
12 |     author="Statsource Team",
13 |     author_email="[email protected]",
14 |     description="A Model Context Protocol server for statistical analysis",
15 |     long_description=long_description,
16 |     long_description_content_type="text/markdown",
17 |     url="https://github.com/jamie7893/statsource-mcp",
18 |     packages=find_packages(),
19 |     include_package_data=True,
20 |     classifiers=[
21 |         "Programming Language :: Python :: 3",
22 |         "License :: OSI Approved :: MIT License",
23 |         "Operating System :: OS Independent",
24 |     ],
25 |     python_requires=">=3.8",
26 |     install_requires=[
27 |         "mcp>=1.4.1,<2.0.0",
28 |         "requests>=2.31.0,<3.0.0",
29 |         "pydantic>=2.4.2,<3.0.0",
30 |         "python-dotenv>=1.0.0,<2.0.0",
31 |         "python-json-logger>=2.0.7,<3.0.0"
32 |     ],
33 |     entry_points={
34 |         "console_scripts": [
35 |             "mcp-server-stats=mcp_server_stats.server:run_server",
36 |         ],
37 |     },
38 | ) 
```

--------------------------------------------------------------------------------
/rebuild-docker.ps1:
--------------------------------------------------------------------------------

```
  1 | #!/usr/bin/env pwsh
  2 | #
  3 | # Rebuild and run StatSource MCP Docker container
  4 | # Usage: ./rebuild-docker.ps1 [options]
  5 | #
  6 | # Options:
  7 | #   -ImageName <name>     Docker image name (default: statsource/mcp)
  8 | #   -Push                 Push to Docker Hub after building
  9 | #   -LogLevel <level>     Set logging level (DEBUG, INFO, WARNING, ERROR)
 10 | #   -ApiKey <key>         Set API key for authentication
 11 | #   -DbConnection <conn>  Set database connection string
 12 | #   -RunContainer         Run the container after building (default: false)
 13 | #   -ServerVersion <ver>  Set server version tag (default: development)
 14 | 
 15 | param (
 16 |     [string]$ImageName = "jamie78933/statsource-mcp",
 17 |     [switch]$Push = $false,
 18 |     [string]$LogLevel = "",
 19 |     [string]$ApiKey = "",
 20 |     [string]$DbConnection = "",
 21 |     [switch]$RunContainer = $false,
 22 |     [string]$ServerVersion = "development"
 23 | )
 24 | 
 25 | # Make sure we're in the right directory
 26 | $scriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
 27 | Set-Location $scriptDir
 28 | 
 29 | # Set up error handling
 30 | $ErrorActionPreference = "Stop"
 31 | 
 32 | # Display startup banner
 33 | Write-Host "=======================================" -ForegroundColor Blue
 34 | Write-Host "  StatSource MCP Docker Build Script   " -ForegroundColor Blue
 35 | Write-Host "=======================================" -ForegroundColor Blue
 36 | Write-Host "Image name:    $ImageName"
 37 | Write-Host "Log level:     $(if ($LogLevel) {$LogLevel} else {"[DISABLED]"})"
 38 | Write-Host "Server version: $ServerVersion"
 39 | Write-Host "API key:       $(if ($ApiKey) {"[CONFIGURED]"} else {"[NOT CONFIGURED]"})"
 40 | Write-Host "DB connection: $(if ($DbConnection) {"[CONFIGURED]"} else {"[NOT CONFIGURED]"})"
 41 | Write-Host "Run container: $RunContainer"
 42 | Write-Host "Push to hub:   $Push"
 43 | Write-Host "======================================="
 44 | Write-Host ""
 45 | 
 46 | try {
 47 |     # Check if Docker is running
 48 |     $dockerStatus = docker info 2>&1
 49 |     if ($LASTEXITCODE -ne 0) {
 50 |         throw "Docker is not running. Please start Docker and try again."
 51 |     }
 52 | 
 53 |     # Build the Docker image
 54 |     Write-Host "Building Docker image '$ImageName'..." -ForegroundColor Yellow
 55 |     docker build -t $ImageName --build-arg SERVER_VERSION=$ServerVersion .
 56 | 
 57 |     if ($LASTEXITCODE -ne 0) {
 58 |         throw "Docker build failed with exit code $LASTEXITCODE"
 59 |     }
 60 |     
 61 |     Write-Host "Docker image built successfully!" -ForegroundColor Green
 62 | 
 63 |     # Push to Docker Hub if requested
 64 |     if ($Push) {
 65 |         Write-Host "Pushing '$ImageName' to Docker Hub..." -ForegroundColor Yellow
 66 |         docker push $ImageName
 67 |         
 68 |         if ($LASTEXITCODE -ne 0) {
 69 |             throw "Docker push failed with exit code $LASTEXITCODE. Make sure you're logged in with 'docker login'"
 70 |         }
 71 |         
 72 |         Write-Host "Docker image pushed successfully!" -ForegroundColor Green
 73 |     }
 74 | 
 75 |     # Run the container if requested
 76 |     if ($RunContainer) {
 77 |         Write-Host "Running container from image '$ImageName'..." -ForegroundColor Yellow
 78 |         
 79 |         $env_vars = @()
 80 |         if ($LogLevel) {
 81 |             $env_vars += "-e", "LOG_LEVEL=$LogLevel"
 82 |         }
 83 |         if ($ServerVersion) {
 84 |             $env_vars += "-e", "SERVER_VERSION=$ServerVersion"
 85 |         }
 86 |         if ($ApiKey) {
 87 |             $env_vars += "-e", "API_KEY=$ApiKey"
 88 |         }
 89 |         if ($DbConnection) {
 90 |             $env_vars += "-e", "DB_CONNECTION_STRING=$DbConnection"
 91 |             $env_vars += "-e", "DB_SOURCE_TYPE=$DbSourceType"
 92 |         }
 93 |         
 94 |         # Create Docker run command
 95 |         $docker_run_cmd = @("docker", "run", "-i", "--rm") + $env_vars + @($ImageName)
 96 |         
 97 |         # Display the command (with masked sensitive data)
 98 |         $displayCmd = $docker_run_cmd -replace "API_KEY=.*?(?=\s|$)", "API_KEY=***" -replace "DB_CONNECTION_STRING=.*?(?=\s|$)", "DB_CONNECTION_STRING=***"
 99 |         Write-Host "Executing: $($displayCmd -join ' ')" -ForegroundColor Gray
100 |         
101 |         # Run the docker command
102 |         & $docker_run_cmd[0] $docker_run_cmd[1..($docker_run_cmd.Length-1)]
103 |         
104 |         if ($LASTEXITCODE -ne 0) {
105 |             throw "Docker run failed with exit code $LASTEXITCODE"
106 |         }
107 |     }
108 |     
109 |     Write-Host "Script completed successfully!" -ForegroundColor Green
110 | 
111 | } catch {
112 |     Write-Host "ERROR: $_" -ForegroundColor Red
113 |     exit 1
114 | } 
```

--------------------------------------------------------------------------------
/mcp_server_stats/server.py:
--------------------------------------------------------------------------------

```python
  1 | """
  2 | MCP (Message Control Protocol) Server for Statistical Analysis
  3 | 
  4 | This module implements an MCP server that acts as a middleware between
  5 | clients (like Claude Desktop app) and our existing API. It runs independently
  6 | and forwards requests to the API whose URL is configurable via environment variables.
  7 | """
  8 | 
  9 | import os
 10 | import json
 11 | import requests
 12 | import sys
 13 | import time
 14 | from typing import Dict, List, Any, Optional, Union
 15 | from pydantic import BaseModel, Field, field_validator
 16 | from mcp.server.fastmcp import FastMCP
 17 | from datetime import datetime
 18 | 
 19 | # Read API location from environment variable with a default fallback
 20 | API_URL = "https://api.statsource.me"
 21 | API_KEY = os.getenv("API_KEY", None)  # Optional API key for authentication
 22 | 
 23 | # Database connection string from environment variable
 24 | DB_CONNECTION_STRING = os.getenv("DB_CONNECTION_STRING", None)
 25 | DB_SOURCE_TYPE = os.getenv("DB_SOURCE_TYPE", "database")  # Default to database if not specified
 26 | 
 27 | # Initialize MCP server with specific protocol version
 28 | mcp = FastMCP("ai_mcp_server", protocol_version="2024-11-05")
 29 | 
 30 | # Define input models for data validation
 31 | class StatisticsRequest(BaseModel):
 32 |     """Request model for statistical operations."""
 33 |     operation: str = Field(..., description="Statistical operation to perform (mean, median, sum, etc.)")
 34 |     data: List[float] = Field(..., description="List of numeric data points")
 35 |     
 36 |     @field_validator('operation')
 37 |     @classmethod
 38 |     def validate_operation(cls, v):
 39 |         valid_operations = ['mean', 'median', 'sum', 'min', 'max', 'std', 'var', 'count']
 40 |         if v.lower() not in valid_operations:
 41 |             raise ValueError(f"Operation must be one of {valid_operations}")
 42 |         return v.lower()
 43 |     
 44 |     @field_validator('data')
 45 |     @classmethod
 46 |     def validate_data(cls, v):
 47 |         if not v:
 48 |             raise ValueError("Data list cannot be empty")
 49 |         return v
 50 | 
 51 | # Helper function to check if API is available
 52 | def is_api_available() -> bool:
 53 |     """
 54 |     Check if the API is available.
 55 |     
 56 |     Returns:
 57 |         bool: True if API is available, False otherwise
 58 |     """
 59 |     try:
 60 |         # Try to connect to the base URL
 61 |         response = requests.get(API_URL, timeout=5)
 62 |         return response.status_code < 500  # Consider 2xx, 3xx, 4xx as "available"
 63 |     except requests.RequestException:
 64 |         return False
 65 | 
 66 | # Helper function to make API calls
 67 | def call_api(endpoint: str, data: Dict[str, Any], params: Dict[str, Any] = None) -> Dict[str, Any]:
 68 |     """
 69 |     Make a request to the API.
 70 |     
 71 |     Args:
 72 |         endpoint: API endpoint path (without base URL)
 73 |         data: Request payload
 74 |         params: URL query parameters
 75 |         
 76 |     Returns:
 77 |         API response as dictionary
 78 |     
 79 |     Raises:
 80 |         Exception: If the API request fails
 81 |     """
 82 |     # Check if API is available first
 83 |     if not is_api_available():
 84 |         raise Exception(f"API at {API_URL} is not available")
 85 |     
 86 |     headers = {"Content-Type": "application/json"}
 87 |     
 88 |     # Add authentication if API key is provided
 89 |     if API_KEY:
 90 |         headers["API-Key"] = API_KEY
 91 |     
 92 |     full_url = f"{API_URL}{endpoint}"
 93 |     try:
 94 |         response = requests.post(full_url, json=data, headers=headers, params=params)
 95 |         response.raise_for_status()  # Raise exception for 4XX/5XX responses
 96 |         
 97 |         response_data = response.json()
 98 |         return response_data
 99 |     except requests.RequestException as e:
100 |         if hasattr(e, 'response') and e.response is not None:
101 |             error_text = e.response.text
102 |             status_code = e.response.status_code
103 |             return {"error": f"API request failed with status {status_code}: {error_text}"}
104 |         else:
105 |             error_text = str(e)
106 |             return {"error": f"API request failed: {error_text}"}
107 | 
108 | # Define MCP tools
109 | @mcp.tool()
110 | def suggest_feature(description: str, use_case: str, priority: str = "medium") -> str:
111 |     """
112 |     Suggest a new feature or improvement for the StatSource analytics platform.
113 |     
114 |     ### What this tool does:
115 |     This tool allows you to submit feature suggestions or enhancement requests for 
116 |     the StatSource platform. Suggestions are logged and reviewed by the development team.
117 |     
118 |     ### When to use this tool:
119 |     - When a user asks for functionality that doesn't currently exist
120 |     - When you identify gaps or limitations in the current analytics capabilities
121 |     - When a user expresses frustration about missing capabilities
122 |     - When you think of enhancements that would improve the user experience
123 |     
124 |     ### Required inputs:
125 |     - description: A clear, detailed description of the suggested feature
126 |     - use_case: Explanation of how and why users would use this feature
127 |     
128 |     ### Optional inputs:
129 |     - priority: Suggested priority level ("low", "medium", "high")
130 |     
131 |     ### Returns:
132 |     A confirmation message and reference ID for the feature suggestion.
133 |     """
134 |     try:
135 |         # Format the request
136 |         suggestion_data = {
137 |             "description": description,
138 |             "use_case": use_case,
139 |             "priority": priority,
140 |             "source": "ai_agent"
141 |         }
142 |         
143 |         # Call the feature suggestion endpoint
144 |         endpoint = "/api/v1/feature_suggestions"
145 |         response = call_api(endpoint, suggestion_data)
146 |         
147 |         if "error" in response:
148 |             return f"Error: {response['error']}"
149 |         
150 |         # Format the response
151 |         suggestion_id = response.get("id", "unknown")
152 |         return json.dumps({
153 |             "status": "received",
154 |             "message": "Thank you for your feature suggestion. Our team will review it.",
155 |             "suggestion_id": f"FEAT-{suggestion_id}"
156 |         }, indent=2)
157 |     except Exception as e:
158 |         return f"Error submitting feature suggestion: {str(e)}"
159 | 
160 | # Internal helper function to handle common logic for statistics/prediction API calls
161 | def _invoke_statistics_api(
162 |     query_type: str,
163 |     columns: List[str],
164 |     statistics: Optional[List[str]] = None,
165 |     periods: Optional[int] = None,
166 |     anomaly_options: Optional[Dict[str, Any]] = None,
167 |     data_source: Optional[str] = None,
168 |     source_type: Optional[str] = None,
169 |     table_name: Optional[str] = None,
170 |     filters: Optional[Dict[str, Any]] = None,
171 |     groupby: Optional[List[str]] = None,
172 |     options: Optional[Dict[str, Any]] = None,
173 |     date_column: Optional[str] = None,
174 |     start_date: Optional[Union[str, datetime]] = None,
175 |     end_date: Optional[Union[str, datetime]] = None,
176 |     aggregation: Optional[str] = None
177 | ) -> str:
178 |     """Internal helper to call the statistics API endpoint."""
179 |     # Determine the final data source and type, considering environment variables/config
180 |     final_data_source = data_source
181 |     final_source_type = source_type
182 | 
183 |     # Use default DB connection if no source is specified and DB_CONNECTION_STRING is set
184 |     if not final_data_source and DB_CONNECTION_STRING:
185 |         final_data_source = DB_CONNECTION_STRING
186 |         final_source_type = DB_SOURCE_TYPE  # Use configured DB type
187 |     elif not final_source_type and final_data_source:
188 |         # Infer source type if not explicitly provided (basic inference)
189 |         if "://" in final_data_source or final_data_source.lower().startswith("http"):
190 |             if any(db_protocol in final_data_source.lower() for db_protocol in ["postgresql://", "mysql://", "sqlite://"]):
191 |                 final_source_type = "database"
192 |             else:
193 |                  final_source_type = "api" # Assume API if it looks like a URL but not a DB string
194 |         else:
195 |             final_source_type = "csv" # Assume CSV otherwise (filename)
196 | 
197 |     # Basic validation based on context
198 |     if not columns:
199 |         return json.dumps({"error": "The 'columns' parameter is required and cannot be empty."})
200 |         
201 |     if final_source_type == "database" and not table_name:
202 |         return json.dumps({"error": "The 'table_name' parameter is required when source_type is 'database'."})
203 |         
204 |     if not final_data_source and not DB_CONNECTION_STRING:
205 |          return json.dumps({"error": "No data_source provided and no default database connection configured. Please provide a data_source (filename, DB connection string, or API URL)."})
206 | 
207 |     # Prepare request payload and parameters for the API call
208 |     api_request_data = {
209 |         "data_source": final_data_source,
210 |         "source_type": final_source_type,
211 |         "columns": columns,
212 |         "table_name": table_name,
213 |         "filters": filters,
214 |         "groupby": groupby,
215 |         "options": options,
216 |         "date_column": date_column,
217 |         # Convert datetime objects to ISO strings for JSON serialization if necessary
218 |         "start_date": start_date.isoformat() if isinstance(start_date, datetime) else start_date,
219 |         "end_date": end_date.isoformat() if isinstance(end_date, datetime) else end_date,
220 |     }
221 |     
222 |     api_params = {"query_type": query_type}
223 | 
224 |     if query_type == "statistics":
225 |         if not statistics:
226 |              return json.dumps({"error": "The 'statistics' parameter is required for calculate_statistics."})
227 |         api_request_data["statistics"] = statistics
228 |         # Groupby only makes sense for statistics
229 |         if groupby:
230 |              api_request_data["groupby"] = groupby
231 |         else:
232 |              if "groupby" in api_request_data: del api_request_data["groupby"] # Ensure it's not sent if None
233 |              
234 |     elif query_type == "ml_prediction":
235 |         if periods is None:
236 |             return json.dumps({"error": "The 'periods' parameter is required for predict_trends."})
237 |         api_params["periods"] = periods
238 |         # Handle aggregation parameter if provided
239 |         if aggregation:
240 |             api_request_data["aggregation"] = aggregation
241 |         # Remove stats/grouping/anomaly params if present
242 |         if "statistics" in api_request_data: del api_request_data["statistics"]
243 |         if "groupby" in api_request_data: del api_request_data["groupby"]
244 |         if "anomaly_options" in api_request_data: del api_request_data["anomaly_options"]
245 | 
246 |     elif query_type == "anomaly_detection":
247 |         if not date_column:
248 |              return json.dumps({"error": "The 'date_column' parameter is required for anomaly_detection."})
249 |         if not columns:
250 |             return json.dumps({"error": "The 'columns' parameter is required for anomaly_detection."})
251 |         api_request_data["anomaly_options"] = anomaly_options
252 |         # Remove stats/grouping/periods params if present
253 |         if "statistics" in api_request_data: del api_request_data["statistics"]
254 |         if "groupby" in api_request_data: del api_request_data["groupby"]
255 |         if "periods" in api_params: del api_params["periods"]
256 | 
257 |     # Remove None values from payload to avoid sending empty optional fields
258 |     api_request_data = {k: v for k, v in api_request_data.items() if v is not None}
259 | 
260 |     try:
261 |         # Call the API endpoint
262 |         endpoint = "/api/v1/get_statistics"
263 |         response = call_api(endpoint, data=api_request_data, params=api_params)
264 |         
265 |         # Return the API response directly (as JSON string)
266 |         return json.dumps(response, indent=2)
267 |     except Exception as e:
268 |         return json.dumps({"error": f"An unexpected error occurred: {str(e)}"}, indent=2)
269 | 
270 | @mcp.tool()
271 | def calculate_statistics(
272 |     columns: List[str],
273 |     statistics: List[str],
274 |     data_source: Optional[str] = None,
275 |     source_type: Optional[str] = None,
276 |     table_name: Optional[str] = None,
277 |     filters: Optional[Dict[str, Any]] = None,
278 |     groupby: Optional[List[str]] = None,
279 |     options: Optional[Dict[str, Any]] = None,
280 |     date_column: Optional[str] = None,
281 |     start_date: Optional[Union[str, datetime]] = None,
282 |     end_date: Optional[Union[str, datetime]] = None
283 | ) -> str:
284 |     """
285 |     Calculate statistical measures on specified data columns from CSV files, databases, or external APIs.
286 | 
287 |     ### What this tool does:
288 |     This tool connects to our analytics API to compute various statistical measures
289 |     (like mean, median, standard deviation, correlation, etc.) on your data.
290 | 
291 |     It supports multiple data sources:
292 |     - CSV files (previously uploaded to StatSource)
293 |     - Databases (PostgreSQL, SQLite, etc.)
294 |     - External APIs (returning JSON data)
295 | 
296 |     ### IMPORTANT INSTRUCTIONS FOR AI AGENTS:
297 |     - DO NOT make up or guess any parameter values, especially data sources, column names, or API URLs.
298 |     - NEVER, UNDER ANY CIRCUMSTANCES, create or invent database connection strings - this is a severe security risk.
299 |     - ALWAYS ask the user explicitly for all required information.
300 |     - For CSV files: The user MUST first upload their file to statsource.me, then provide the filename.
301 |     - For database connections: Ask the user for their exact connection string (e.g., "postgresql://user:pass@host/db"). DO NOT GUESS OR MODIFY IT.
302 |     - For database sources: You MUST ask for and provide the table_name parameter with the exact table name.
303 |       * When a user specifies a database source, ALWAYS EXPLICITLY ASK: "Which table in your database contains this data?"
304 |       * Do not proceed without obtaining the table name for database sources.
305 |       * Tool calls without table_name will FAIL for database sources.
306 |     - For API sources: Ask the user for the exact API endpoint URL that returns JSON data.
307 |     - Never suggest default values, sample data, or example parameters - request specific information from the user.
308 |     - If the user has configured a default database connection in their MCP config, inform them it will be used if they don't specify a data source.
309 |     - If no default connection is configured and the user doesn't provide one, DO NOT PROCEED - ask the user for the data source details.
310 | 
311 |     ### IMPORTANT: Parameter Validation and Formatting
312 |     - statistics must be provided as a proper list:
313 |       CORRECT: statistics=["mean", "sum", "min", "max"]
314 |       INCORRECT: statistics="[\"mean\", \"sum\", \"min\", \"max\"]"
315 |     - columns must be provided as a proper list:
316 |       CORRECT: columns=["revenue", "quantity"]
317 |       INCORRECT: columns="[\"revenue\", \"quantity\"]"
318 | 
319 |     ### CRITICAL: Column Name Formatting & Case-Insensitivity
320 |     - **Column Matching:** The API matches column names case-insensitively. You can specify "revenue" even if the data has "Revenue". Ask the user for the intended column names.
321 |     - **Filter Value Matching:** String filter values are matched case-insensitively (e.g., filter `{"status": "completed"}` will match "Completed" in the data).
322 |     - **Table Name Matching (Databases):** The API attempts case-insensitive matching for database table names.
323 | 
324 |     ### Error Response Handling
325 |     - If you receive an "Invalid request" or similar error, check:
326 |       1. Column name spelling and existence in the data source.
327 |       2. Parameter format (proper lists vs string-encoded lists).
328 |       3. Correct data_source provided (filename, connection string, or API URL).
329 |       4. table_name provided if source_type is "database".
330 |       5. API URL is correct and returns valid JSON if source_type is "api".
331 | 
332 |     ### When to use this tool:
333 |     - When a user needs statistical analysis of their data (means, medians, correlations, distributions, etc.).
334 |     - When analyzing patterns or summarizing datasets from files, databases, or APIs.
335 | 
336 |     ### Required inputs:
337 |     - columns: List of column names to analyze (ask user for exact column names in their data).
338 |     - statistics: List of statistics to calculate.
339 | 
340 |     ### Optional inputs:
341 |     - data_source: Identifier for the data source.
342 |       * For CSV: Filename of a previously uploaded file on statsource.me (ask user to upload first).
343 |       * For Database: Full connection string (ask user for exact string).
344 |       * For API: The exact URL of the API endpoint returning JSON data (ask user for the URL).
345 |       * If not provided, will use the connection string from MCP config if available (defaults to database type).
346 |     - source_type: Type of data source ('csv', 'database', or 'api').
347 |       * Determines how `data_source` is interpreted.
348 |       * If not provided, will use the source type from MCP config if available (defaults to 'database'). Ensure this matches the provided `data_source`.
349 |     - table_name: Name of the database table to use (REQUIRED for database sources).
350 |       * Must be provided when source_type is 'database'.
351 |       * Ask user for the exact table name in their database.
352 |       * Always explicitly ask for table name when data source is a database.
353 |     - filters: Dictionary of column-value pairs to filter data *before* analysis.
354 |       * Format: {"column_name": "value"} or {"column_name": ["val1", "val2"]}
355 |       * **API Source Behavior:** For 'api' sources, data is fetched *first*, then filters are applied to the resulting data.
356 |     - groupby: List of column names to group data by before calculating statistics.
357 |     - options: Dictionary of additional options for specific operations (currently less used).
358 |     - date_column: Column name containing date/timestamp information for filtering. Matched case-insensitively.
359 |     - start_date: Inclusive start date for filtering (ISO 8601 format string like "YYYY-MM-DD" or datetime).
360 |     - end_date: Inclusive end date for filtering (ISO 8601 format string like "YYYY-MM-DD" or datetime).
361 |       * **API Source Behavior:** For 'api' sources, date filtering happens *after* data is fetched.
362 | 
363 |     ### Valid statistics options:
364 |     - 'mean', 'median', 'std', 'sum', 'count', 'min', 'max', 'describe', 'correlation', 'missing', 'unique', 'boxplot'
365 | 
366 |     ### Returns:
367 |     A JSON string containing the results and metadata.
368 |     - `result`: Dictionary with statistical measures for each requested column and statistic. Structure varies by statistic (e.g., `describe`, `correlation`).
369 |     - `metadata`: Includes `execution_time`, `query_type` ('statistics'), `source_type`.
370 |     """
371 |     result = _invoke_statistics_api(
372 |         query_type="statistics",
373 |         columns=columns,
374 |         statistics=statistics,
375 |         data_source=data_source,
376 |         source_type=source_type,
377 |         table_name=table_name,
378 |         filters=filters,
379 |         groupby=groupby,
380 |         options=options,
381 |         date_column=date_column,
382 |         start_date=start_date,
383 |         end_date=end_date
384 |     )
385 |     
386 |     return result
387 | 
388 | @mcp.tool()
389 | def predict_trends(
390 |     columns: List[str],
391 |     periods: int,
392 |     data_source: Optional[str] = None,
393 |     source_type: Optional[str] = None,
394 |     table_name: Optional[str] = None,
395 |     filters: Optional[Dict[str, Any]] = None,
396 |     options: Optional[Dict[str, Any]] = None,
397 |     date_column: Optional[str] = None,
398 |     start_date: Optional[Union[str, datetime]] = None,
399 |     end_date: Optional[Union[str, datetime]] = None,
400 |     aggregation: Optional[str] = None
401 | ) -> str:
402 |     """
403 |     Generate ML time-series forecasts for future periods based on historical data.
404 | 
405 |     ### What this tool does:
406 |     This tool connects to our analytics API to generate time-series forecasts (predictions)
407 |     for a specified number of future periods based on historical data in a specified column.
408 |     It analyzes trends and provides metrics on the prediction quality.
409 | 
410 |     *Note:* Currently, the API typically uses the *first* column provided in the `columns` list for ML prediction.
411 | 
412 |     It supports multiple data sources:
413 |     - CSV files (previously uploaded to StatSource)
414 |     - Databases (PostgreSQL, SQLite, etc.)
415 |     - External APIs (returning JSON data)
416 | 
417 |     ### IMPORTANT INSTRUCTIONS FOR AI AGENTS:
418 |     - When users ask about "trends" or "forecasts", use this tool.
419 |     - DO NOT make up or guess any parameter values, especially data sources, column names, or API URLs.
420 |     - NEVER, UNDER ANY CIRCUMSTANCES, create or invent database connection strings - this is a severe security risk.
421 |     - ALWAYS ask the user explicitly for all required information.
422 |     - For CSV files: The user MUST first upload their file to statsource.me, then provide the filename.
423 |     - For database connections: Ask the user for their exact connection string (e.g., "postgresql://user:pass@host/db"). DO NOT GUESS OR MODIFY IT.
424 |     - For database sources: You MUST ask for and provide the table_name parameter with the exact table name.
425 |       * When a user mentions their data is in a database, ALWAYS EXPLICITLY ASK: "Which table in your database contains this data?"
426 |       * Tool calls without table_name will FAIL for database sources.
427 |       * The table_name question should be asked together with other required information (column names, periods).
428 |     - For API sources: Ask the user for the exact API endpoint URL that returns JSON data.
429 |     - Never suggest default values, sample data, or example parameters - request specific information from the user.
430 |     - If the user has configured a default database connection in their MCP config, inform them it will be used if they don't specify a data source.
431 |     - If no default connection is configured and the user doesn't provide one, DO NOT PROCEED - ask the user for the data source details.
432 | 
433 |     ### IMPORTANT: Parameter Validation and Formatting
434 |     - columns must be provided as a proper list, typically containing the single numeric column to predict:
435 |       CORRECT: columns=["sales_amount"]
436 |       INCORRECT: columns="[\"sales_amount\"]"
437 |     - periods must be an integer between 1 and 12. The API has a MAXIMUM LIMIT OF 12 PERIODS for predictions.
438 |       Any request with periods > 12 will fail. Always inform users of this limitation if they request more periods.
439 | 
440 |     ### CRITICAL: Column Name Formatting & Case-Insensitivity
441 |     - **Column Matching:** The API matches column names case-insensitively. You can specify "revenue" even if the data has "Revenue". Ask the user for the intended column names.
442 |     - **Filter Value Matching:** String filter values are matched case-insensitively (e.g., filter `{"status": "completed"}` will match "Completed" in the data).
443 |     - **Table Name Matching (Databases):** The API attempts case-insensitive matching for database table names.
444 |     - **Date Column:** If using time-based prediction, ensure `date_column` correctly identifies the date/timestamp column. Matched case-insensitively.
445 | 
446 |     ### Error Response Handling
447 |     - If you receive an "Invalid request" or similar error, check:
448 |       1. Column name spelling and existence in the data source (should be numeric for prediction).
449 |       2. Parameter format (proper lists vs string-encoded lists).
450 |       3. Correct data_source provided (filename, connection string, or API URL).
451 |       4. table_name provided if source_type is "database".
452 |       5. API URL is correct and returns valid JSON if source_type is "api".
453 |       6. `periods` parameter is provided and is a positive integer not exceeding 12.
454 |       7. `date_column` is specified if required for the underlying model.
455 | 
456 |     ### When to use this tool:
457 |     - When a user wants to predict future values based on historical trends (forecasting).
458 |     - When generating forecasts for business planning or decision-making.
459 |     - When analyzing the likely future direction of a time-series metric.
460 | 
461 |     ### Required inputs:
462 |     - columns: List containing the name of the (usually single) numeric column to predict trends for.
463 |     - periods: Number of future periods to predict (maximum: 12).
464 | 
465 |     ### Optional inputs:
466 |     - data_source: Identifier for the data source.
467 |       * For CSV: Filename of a previously uploaded file on statsource.me (ask user to upload first).
468 |       * For Database: Full connection string (ask user for exact string).
469 |       * For API: The exact URL of the API endpoint returning JSON data (ask user for the URL).
470 |       * If not provided, will use the connection string from MCP config if available (defaults to database type).
471 |     - source_type: Type of data source ('csv', 'database', or 'api').
472 |       * Determines how `data_source` is interpreted.
473 |       * If not provided, will use the source type from MCP config if available (defaults to 'database'). Ensure this matches the provided `data_source`.
474 |     - table_name: Name of the database table to use (REQUIRED for database sources).
475 |       * Must be provided when source_type is 'database'.
476 |       * Ask user for the exact table name in their database.
477 |       * ALWAYS ask for table name when using database sources.
478 |     - filters: Dictionary of column-value pairs to filter data *before* analysis.
479 |       * Format: {"column_name": "value"} or {"column_name": ["val1", "val2"]}
480 |       * **API Source Behavior:** For 'api' sources, data is fetched *first*, then filters are applied to the resulting data.
481 |     - options: Dictionary of additional options for specific operations (currently less used, might include model tuning params in future).
482 |     - date_column: Column name containing date/timestamp information.
483 |       * Used for date filtering and essential for time-based trend analysis/predictions. Matched case-insensitively.
484 |     - start_date: Inclusive start date for filtering historical data (ISO 8601 format string like "YYYY-MM-DD" or datetime).
485 |     - end_date: Inclusive end date for filtering historical data (ISO 8601 format string like "YYYY-MM-DD" or datetime).
486 |       * **API Source Behavior:** For 'api' sources, date filtering happens *after* data is fetched.
487 |     - aggregation (str, Optional, default: "auto"): Specifies how time-series data should be aggregated before forecasting. Ask the user for their preference if unsure, or default to 'auto'/'monthly'.
488 |       * 'auto': Automatically selects 'weekly' or 'monthly' based on data density and timeframe. Defaults to 'monthly' if unsure. A safe default choice.
489 |       * 'weekly': Aggregates data by week. Use for forecasting short-term trends (e.g., predicting next few weeks/months) or when weekly patterns are important.
490 |       * 'monthly': Aggregates data by month. Recommended for most business forecasting (e.g., predicting quarterly or annual trends) as it smooths out daily/weekly noise.
491 |       * 'daily': Uses daily data. Choose only if the user needs very granular forecasts and understands the potential for noise. Requires sufficient daily data points.
492 | 
493 |     ### ML Prediction features returned:
494 |     - Time series forecasting with customizable prediction periods (up to 12 periods maximum).
495 |     - Trend direction analysis ("increasing", "decreasing", "stable").
496 |     - Model quality metrics (r-squared, slope).
497 |     - Works with numeric data columns from any supported data source.
498 |     - Can use a specific `date_column` for time-based regression.
499 | 
500 |     ### Returns:
501 |     A JSON string containing the prediction results and metadata.
502 |     - `result`: Dictionary containing prediction details per analyzed column (typically the first one specified): `{"r_squared": ..., "slope": ..., "trend_direction": ..., "forecast_values": [...], ...}`.
503 |     - `metadata`: Includes `execution_time`, `query_type` ('ml_prediction'), `source_type`, `periods`.
504 |     """
505 |     result = _invoke_statistics_api(
506 |         query_type="ml_prediction",
507 |         columns=columns,
508 |         periods=periods,
509 |         data_source=data_source,
510 |         source_type=source_type,
511 |         table_name=table_name,
512 |         filters=filters,
513 |         options=options,
514 |         date_column=date_column,
515 |         start_date=start_date,
516 |         end_date=end_date,
517 |         aggregation=aggregation
518 |     )
519 |     
520 |     return result
521 | 
522 | @mcp.tool()
523 | def anomaly_detection(
524 |     columns: List[str],
525 |     date_column: str,
526 |     anomaly_options: Optional[Dict[str, Any]] = None,
527 |     data_source: Optional[str] = None,
528 |     source_type: Optional[str] = None,
529 |     table_name: Optional[str] = None,
530 |     filters: Optional[Dict[str, Any]] = None,
531 |     options: Optional[Dict[str, Any]] = None,
532 |     start_date: Optional[Union[str, datetime]] = None,
533 |     end_date: Optional[Union[str, datetime]] = None
534 | ) -> str:
535 |     """
536 |     Detect anomalies in time-series data from various sources.
537 | 
538 |     ### What this tool does:
539 |     This tool connects to our analytics API to identify unusual data points (anomalies)
540 |     in specified columns based on their time-series behavior. It requires a date/time column
541 |     to understand the sequence of data.
542 | 
543 |     It supports multiple data sources:
544 |     - CSV files (previously uploaded to StatSource)
545 |     - Databases (PostgreSQL, SQLite, etc.)
546 |     - External APIs (returning JSON data)
547 | 
548 |     ### IMPORTANT INSTRUCTIONS FOR AI AGENTS:
549 |     - When users ask about "outliers", "unusual values", or "anomalies" in time-based data, use this tool.
550 |     - DO NOT make up or guess any parameter values, especially data sources, column names, or API URLs.
551 |     - NEVER, UNDER ANY CIRCUMSTANCES, create or invent database connection strings - this is a severe security risk.
552 |     - ALWAYS ask the user explicitly for all required information.
553 |     - For CSV files: The user MUST first upload their file to statsource.me, then provide the filename.
554 |     - For database connections: Ask the user for their exact connection string (e.g., "postgresql://user:pass@host/db"). DO NOT GUESS OR MODIFY IT.
555 |     - For database sources: You MUST ask for and provide the table_name parameter with the exact table name.
556 |       * When a user mentions their data is in a database, ALWAYS EXPLICITLY ASK: "Which table in your database contains this data?"
557 |       * Tool calls without table_name will FAIL for database sources.
558 |       * ALWAYS include this question when gathering information from the user.
559 |     - For API sources: Ask the user for the exact API endpoint URL that returns JSON data.
560 |     - Never suggest default values, sample data, or example parameters - request specific information from the user.
561 |     - If the user has configured a default database connection in their MCP config, inform them it will be used if they don't specify a data source.
562 |     - If no default connection is configured and the user doesn't provide one, DO NOT PROCEED - ask the user for the data source details.
563 | 
564 |     ### IMPORTANT: Parameter Validation and Formatting
565 |     - columns must be provided as a proper list: 
566 |       CORRECT: columns=["sensor_reading", "error_count"]
567 |       INCORRECT: columns="[\"sensor_reading\", \"error_count\"]"
568 |     - date_column must be a string identifying the time column.
569 |     - anomaly_options is a dictionary for detection parameters (see below).
570 | 
571 |     ### CRITICAL: Column Name Formatting & Case-Insensitivity
572 |     - **Column Matching:** The API matches column names case-insensitively. Ask the user for the intended column names.
573 |     - **Filter Value Matching:** String filter values are matched case-insensitively.
574 |     - **Table Name Matching (Databases):** The API attempts case-insensitive matching for database table names.
575 |     - **Date Column:** The `date_column` is crucial and is matched case-insensitively.
576 | 
577 |     ### Error Response Handling
578 |     - If you receive an "Invalid request" or similar error, check:
579 |       1. Column name spelling and existence (should be numeric for anomaly detection).
580 |       2. `date_column` spelling and existence.
581 |       3. Parameter format (proper lists vs string-encoded lists).
582 |       4. Correct data_source provided (filename, connection string, or API URL).
583 |       5. `table_name` provided if source_type is "database".
584 |       6. API URL is correct and returns valid JSON if source_type is "api".
585 |       7. `date_column` parameter is provided.
586 | 
587 |     ### When to use this tool:
588 |     - When a user wants to identify outliers or unusual patterns in time-series data.
589 |     - When monitoring metrics for unexpected spikes or drops.
590 |     - When cleaning data by identifying potentially erroneous readings.
591 | 
592 |     ### Required inputs:
593 |     - columns: List of numeric column names to check for anomalies.
594 |     - date_column: Name of the column containing date/timestamp information.
595 | 
596 |     ### Optional inputs:
597 |     - data_source: Identifier for the data source.
598 |       * For CSV: Filename of a previously uploaded file on statsource.me.
599 |       * For Database: Full connection string.
600 |       * For API: The exact URL of the API endpoint returning JSON data.
601 |       * If not provided, uses the default connection from MCP config if available.
602 |     - source_type: Type of data source ('csv', 'database', or 'api').
603 |       * Determines how `data_source` is interpreted.
604 |       * Defaults based on MCP config if available.
605 |     - table_name: Name of the database table (REQUIRED for database sources).
606 |       * Must be provided when source_type is 'database'.
607 |       * Always ask for table name when using database sources.
608 |     - filters: Dictionary of column-value pairs to filter data *before* analysis.
609 |     - options: Dictionary of additional options (less common for anomaly detection currently).
610 |     - start_date: Inclusive start date for filtering historical data (ISO 8601 string or datetime).
611 |     - end_date: Inclusive end date for filtering historical data (ISO 8601 string or datetime).
612 |     - anomaly_options: Dictionary to configure the detection method and parameters.
613 |       * `method` (str, Optional, default: "iqr"): The anomaly detection method to use. Must be one of:
614 |         - 'iqr': Interquartile Range - Identifies outliers based on distribution quartiles
615 |         - 'zscore': Z-score - Identifies outliers based on standard deviations from the mean
616 |         - 'isolation_forest': Machine learning approach that isolates anomalies using random forest
617 |       * `sensitivity` (float, Optional, default: 1.5): For 'iqr' method, the multiplier for the IQR to define outlier bounds.
618 |         - Higher values are less sensitive (1.5 is standard, 3.0 would detect only extreme outliers)
619 |       * `threshold` (float, Optional, default: 3.0): For 'zscore' method, the threshold for Z-scores to define outliers.
620 |         - Higher values are less sensitive (3.0 is standard, 2.0 would detect more outliers)
621 |       * `window_size` (int, Optional, default: 20): Size of rolling window for detection methods.
622 |         - If not provided, uses global statistics
623 |         - Smaller windows (e.g., 7-14) detect local anomalies, larger windows detect global anomalies
624 |       * `contamination` (float, Optional, default: 0.05): For 'isolation_forest' method, the expected proportion of anomalies.
625 |         - Values typically range from 0.01 (1%) to 0.1 (10%)
626 | 
627 |     ### Returns:
628 |     A JSON string containing the anomaly detection results and metadata.
629 |     - `result`: Dictionary with structure for each analyzed column:
630 |       ```
631 |       {
632 |         column_name: {
633 |           "timestamps": [...],  # List of datetime values
634 |           "values": [...],      # List of numeric values
635 |           "is_anomaly": [...],  # Boolean flags indicating anomalies
636 |           "anomaly_score": [...], # Scores indicating degree of deviation
637 |           "summary": {
638 |             "total_points": int,
639 |             "anomaly_count": int,
640 |             "percentage": float,
641 |             "method": str      # Method used for detection
642 |           }
643 |         }
644 |       }
645 |       ```
646 |     - `metadata`: Includes `execution_time`, `query_type` ('anomaly_detection'), `source_type`, `anomaly_method`.
647 |     """
648 |     result = _invoke_statistics_api(
649 |         query_type="anomaly_detection",
650 |         columns=columns,
651 |         anomaly_options=anomaly_options,
652 |         data_source=data_source,
653 |         source_type=source_type,
654 |         table_name=table_name,
655 |         filters=filters,
656 |         options=options,
657 |         date_column=date_column,
658 |         start_date=start_date,
659 |         end_date=end_date
660 |     )
661 |     
662 |     return result
663 | 
664 | def run_server():
665 |     """Run the MCP server."""
666 |     try:
667 |         # Run the server
668 |         mcp.run()
669 |     except Exception as e:
670 |         sys.exit(1)
671 | 
672 | if __name__ == "__main__":
673 |     run_server() 
```