# Directory Structure ``` ├── .github │ └── workflows │ ├── publish.yaml │ └── python-ci.yaml ├── .gitignore ├── .pre-commit-config.yaml ├── .python-version ├── atla_mcp_server │ ├── __init__.py │ ├── __main__.py │ ├── debug.py │ └── server.py ├── CONTRIBUTING.md ├── LICENSE ├── pyproject.toml └── README.md ``` # Files -------------------------------------------------------------------------------- /.python-version: -------------------------------------------------------------------------------- ``` 1 | 3.11 2 | ``` -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- ``` 1 | # Python-generated files 2 | __pycache__/ 3 | *.py[oc] 4 | build/ 5 | dist/ 6 | wheels/ 7 | *.egg-info 8 | 9 | # Virtual environments 10 | .venv 11 | 12 | # Lock files 13 | uv.lock 14 | ``` -------------------------------------------------------------------------------- /.pre-commit-config.yaml: -------------------------------------------------------------------------------- ```yaml 1 | repos: 2 | - repo: https://github.com/pre-commit/pre-commit-hooks 3 | rev: v4.5.0 4 | hooks: 5 | - id: check-yaml 6 | - id: check-json 7 | - id: check-toml 8 | - id: check-merge-conflict 9 | - id: end-of-file-fixer 10 | - id: trailing-whitespace 11 | - id: mixed-line-ending 12 | - id: check-case-conflict 13 | - id: detect-private-key 14 | 15 | - repo: https://github.com/astral-sh/ruff-pre-commit 16 | rev: v0.9.7 17 | hooks: 18 | # Run the linter 19 | - id: ruff 20 | args: [--fix] 21 | # Run the formatter 22 | - id: ruff-format 23 | ``` -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- ```markdown 1 | # Atla MCP Server 2 | 3 | > [!CAUTION] 4 | > This repository was archived on July 21, 2025. The Atla API is no longer active. 5 | 6 | An MCP server implementation providing a standardized interface for LLMs to interact with the Atla API for state-of-the-art LLMJ evaluation. 7 | 8 | > Learn more about Atla [here](https://docs.atla-ai.com). Learn more about the Model Context Protocol [here](https://modelcontextprotocol.io). 9 | 10 | <a href="https://glama.ai/mcp/servers/@atla-ai/atla-mcp-server"> 11 | <img width="380" height="200" src="https://glama.ai/mcp/servers/@atla-ai/atla-mcp-server/badge" alt="Atla MCP server" /> 12 | </a> 13 | 14 | ## Available Tools 15 | 16 | - `evaluate_llm_response`: Evaluate an LLM's response to a prompt using a given evaluation criteria. This function uses an Atla evaluation model under the hood to return a dictionary containing a score for the model's response and a textual critique containing feedback on the model's response. 17 | - `evaluate_llm_response_on_multiple_criteria`: Evaluate an LLM's response to a prompt across _multiple_ evaluation criteria. This function uses an Atla evaluation model under the hood to return a list of dictionaries, each containing an evaluation score and critique for a given criteria. 18 | 19 | ## Usage 20 | 21 | > To use the MCP server, you will need an Atla API key. You can find your existing API key [here](https://www.atla-ai.com/sign-in) or create a new one [here](https://www.atla-ai.com/sign-up). 22 | 23 | ### Installation 24 | 25 | > We recommend using `uv` to manage the Python environment. See [here](https://docs.astral.sh/uv/getting-started/installation/) for installation instructions. 26 | 27 | ### Manually running the server 28 | 29 | Once you have `uv` installed and have your Atla API key, you can manually run the MCP server using `uvx` (which is provided by `uv`): 30 | 31 | ```bash 32 | ATLA_API_KEY=<your-api-key> uvx atla-mcp-server 33 | ``` 34 | 35 | ### Connecting to the server 36 | 37 | > Having issues or need help connecting to another client? Feel free to open an issue or [contact us](mailto:[email protected])! 38 | 39 | #### OpenAI Agents SDK 40 | 41 | > For more details on using the OpenAI Agents SDK with MCP servers, refer to the [official documentation](https://openai.github.io/openai-agents-python/). 42 | 43 | 1. Install the OpenAI Agents SDK: 44 | 45 | ```shell 46 | pip install openai-agents 47 | ``` 48 | 49 | 2. Use the OpenAI Agents SDK to connect to the server: 50 | 51 | ```python 52 | import os 53 | 54 | from agents import Agent 55 | from agents.mcp import MCPServerStdio 56 | 57 | async with MCPServerStdio( 58 | params={ 59 | "command": "uvx", 60 | "args": ["atla-mcp-server"], 61 | "env": {"ATLA_API_KEY": os.environ.get("ATLA_API_KEY")} 62 | } 63 | ) as atla_mcp_server: 64 | ... 65 | ``` 66 | 67 | #### Claude Desktop 68 | 69 | > For more details on configuring MCP servers in Claude Desktop, refer to the [official MCP quickstart guide](https://modelcontextprotocol.io/quickstart/user). 70 | 71 | 1. Add the following to your `claude_desktop_config.json` file: 72 | 73 | ```json 74 | { 75 | "mcpServers": { 76 | "atla-mcp-server": { 77 | "command": "uvx", 78 | "args": ["atla-mcp-server"], 79 | "env": { 80 | "ATLA_API_KEY": "<your-atla-api-key>" 81 | } 82 | } 83 | } 84 | } 85 | ``` 86 | 87 | 2. **Restart Claude Desktop** to apply the changes. 88 | 89 | You should now see options from `atla-mcp-server` in the list of available MCP tools. 90 | 91 | #### Cursor 92 | 93 | > For more details on configuring MCP servers in Cursor, refer to the [official documentation](https://docs.cursor.com/context/model-context-protocol). 94 | 95 | 1. Add the following to your `.cursor/mcp.json` file: 96 | 97 | ```json 98 | { 99 | "mcpServers": { 100 | "atla-mcp-server": { 101 | "command": "uvx", 102 | "args": ["atla-mcp-server"], 103 | "env": { 104 | "ATLA_API_KEY": "<your-atla-api-key>" 105 | } 106 | } 107 | } 108 | } 109 | ``` 110 | 111 | You should now see `atla-mcp-server` in the list of available MCP servers. 112 | 113 | ## Contributing 114 | 115 | Contributions are welcome! Please see the [CONTRIBUTING.md](CONTRIBUTING.md) file for details. 116 | 117 | ## License 118 | 119 | This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details. 120 | ``` -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- ```markdown 1 | # Contributing to Atla MCP Server 2 | 3 | We welcome contributions to the Atla MCP Server! This document provides guidelines and steps for contributing. 4 | 5 | ## Development Setup 6 | 7 | Follow the installation steps in the [README.md](README.md#installation), making sure to install the development dependencies: 8 | 9 | ```shell 10 | uv pip install -e ".[dev]" 11 | pre-commit install # Set up git hooks 12 | ``` 13 | 14 | ## Making Changes 15 | 16 | 1. Fork the repository on GitHub 17 | 2. Clone your fork locally 18 | 3. Create a new branch for your changes 19 | 4. Make your changes 20 | 5. Commit your changes (pre-commit hooks will run automatically) 21 | 6. Push to your fork 22 | 7. Submit a pull request from your fork to our main repository 23 | 24 | ## Questions? 25 | 26 | Feel free to open an issue if you have questions or run into problems. 27 | ``` -------------------------------------------------------------------------------- /atla_mcp_server/__init__.py: -------------------------------------------------------------------------------- ```python 1 | """An MCP server implementation providing a standardized interface for LLMs to interact with the Atla API.""" # noqa: E501 2 | ``` -------------------------------------------------------------------------------- /atla_mcp_server/debug.py: -------------------------------------------------------------------------------- ```python 1 | """File for debugging the Atla MCP Server via the MCP Inspector.""" 2 | 3 | import os 4 | 5 | from atla_mcp_server.server import app_factory 6 | 7 | app = app_factory(atla_api_key=os.getenv("ATLA_API_KEY", "")) 8 | ``` -------------------------------------------------------------------------------- /.github/workflows/python-ci.yaml: -------------------------------------------------------------------------------- ```yaml 1 | name: Python CI 2 | 3 | on: 4 | push: 5 | branches: [ main ] 6 | pull_request: 7 | branches: [ main ] 8 | 9 | jobs: 10 | python-ci: 11 | runs-on: ubuntu-latest 12 | steps: 13 | - uses: actions/checkout@v4 14 | - uses: actions/setup-python@v5 15 | with: 16 | python-version: "3.11" 17 | 18 | - name: Install uv 19 | run: curl -LsSf https://astral.sh/uv/install.sh | sh 20 | 21 | - name: Setup Python environment 22 | run: | 23 | uv venv 24 | . .venv/bin/activate 25 | uv pip install -e ".[dev]" 26 | 27 | - name: Run ruff checks 28 | run: | 29 | . .venv/bin/activate 30 | ruff check . 31 | ruff format --check . 32 | 33 | - name: Run mypy checks 34 | run: | 35 | . .venv/bin/activate 36 | dmypy run -- . 37 | ``` -------------------------------------------------------------------------------- /.github/workflows/publish.yaml: -------------------------------------------------------------------------------- ```yaml 1 | name: Publishing 2 | 3 | on: 4 | release: 5 | types: [published] 6 | 7 | jobs: 8 | build: 9 | runs-on: ubuntu-latest 10 | name: Build distribution 11 | steps: 12 | - uses: actions/checkout@v4 13 | 14 | - name: Install uv 15 | uses: astral-sh/setup-uv@v3 16 | 17 | - name: Build 18 | run: uv build 19 | 20 | - name: Upload artifacts 21 | uses: actions/upload-artifact@v4 22 | with: 23 | name: release-dists 24 | path: dist/ 25 | 26 | pypi-publish: 27 | name: Upload release to PyPI 28 | runs-on: ubuntu-latest 29 | environment: release 30 | needs: [build] 31 | permissions: 32 | id-token: write 33 | 34 | steps: 35 | - name: Retrieve release distribution 36 | uses: actions/download-artifact@v4 37 | with: 38 | name: release-dists 39 | path: dist/ 40 | 41 | - name: Publish package distribution to PyPI 42 | uses: pypa/gh-action-pypi-publish@release/v1 43 | ``` -------------------------------------------------------------------------------- /atla_mcp_server/__main__.py: -------------------------------------------------------------------------------- ```python 1 | """Entrypoint for the Atla MCP Server.""" 2 | 3 | import argparse 4 | import os 5 | 6 | from atla_mcp_server.server import app_factory 7 | 8 | 9 | def main(): 10 | """Entrypoint for the Atla MCP Server.""" 11 | print("Starting Atla MCP Server with stdio transport...") 12 | 13 | parser = argparse.ArgumentParser() 14 | parser.add_argument( 15 | "--atla-api-key", 16 | type=str, 17 | required=False, 18 | help="Atla API key. Can also be set via ATLA_API_KEY environment variable.", 19 | ) 20 | args = parser.parse_args() 21 | 22 | if args.atla_api_key: 23 | print("Using Atla API key from --atla-api-key CLI argument...") 24 | atla_api_key = args.atla_api_key 25 | elif os.getenv("ATLA_API_KEY"): 26 | atla_api_key = os.getenv("ATLA_API_KEY") 27 | print("Using Atla API key from ATLA_API_KEY environment variable...") 28 | else: 29 | parser.error( 30 | "Atla API key must be provided either via --atla-api-key argument " 31 | "or ATLA_API_KEY environment variable" 32 | ) 33 | 34 | print("Creating server...") 35 | app = app_factory(atla_api_key) 36 | 37 | print("Running server...") 38 | app.run(transport="stdio") 39 | 40 | 41 | if __name__ == "__main__": 42 | main() 43 | ``` -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- ```toml 1 | [build-system] 2 | requires = ["hatchling", "uv-dynamic-versioning"] 3 | build-backend = "hatchling.build" 4 | 5 | [tool.hatch.version] 6 | source = "uv-dynamic-versioning" 7 | 8 | [tool.uv-dynamic-versioning] 9 | vcs = "git" 10 | style = "pep440" 11 | bump = true 12 | 13 | [tool.hatch.build.targets.wheel] 14 | packages = ["atla_mcp_server"] 15 | 16 | [tool.hatch.build.targets.sdist] 17 | packages = ["atla_mcp_server"] 18 | 19 | [project] 20 | name = "atla-mcp-server" 21 | dynamic = ["version"] 22 | description = "An MCP server implementation providing a standardized interface for LLMs to interact with the Atla API." 23 | readme = "README.md" 24 | requires-python = ">=3.11" 25 | authors = [ 26 | { name="Atla", email="[email protected]" } 27 | ] 28 | license = { text = "MIT" } 29 | classifiers = [ 30 | "Development Status :: 4 - Beta", 31 | "Intended Audience :: Developers", 32 | "License :: OSI Approved :: MIT License", 33 | "Programming Language :: Python :: 3", 34 | "Programming Language :: Python :: 3.11", 35 | ] 36 | dependencies = [ 37 | "atla>=0.6.0", 38 | "mcp[cli]>=1.6.0", 39 | ] 40 | 41 | [project.optional-dependencies] 42 | dev = [ 43 | "mypy>=1.15.0", 44 | "pre-commit>=3.7.1", 45 | "ruff>=0.9.7", 46 | ] 47 | 48 | [project.scripts] 49 | atla-mcp-server = "atla_mcp_server.__main__:main" 50 | 51 | [project.urls] 52 | Homepage = "https://atla-ai.com" 53 | Repository = "https://github.com/atla-ai/atla-mcp-server" 54 | Issues = "https://github.com/atla-ai/atla-mcp-server/issues" 55 | 56 | [tool.mypy] 57 | exclude = ['.venv'] 58 | explicit_package_bases = false 59 | follow_untyped_imports = true 60 | implicit_optional = false 61 | mypy_path = ["atla_mcp_server"] 62 | plugins = ['pydantic.mypy'] 63 | python_version = "3.11" 64 | 65 | [tool.ruff] 66 | line-length = 90 67 | indent-width = 4 68 | 69 | [tool.ruff.lint] 70 | exclude = [".venv"] 71 | # See: https://docs.astral.sh/ruff/rules/ 72 | select = [ 73 | "B", # Bugbear 74 | "C", # Complexity 75 | "E", # Pycodestyle 76 | "F", # Pyflakes 77 | "I", # Isort 78 | "RUF", # Ruff 79 | "W", # Pycodestyle 80 | "D", # Docstrings 81 | ] 82 | ignore = [] 83 | fixable = ["ALL"] 84 | unfixable = [] 85 | 86 | [tool.ruff.lint.isort] 87 | known-first-party = ["atla_mcp_server"] 88 | 89 | [tool.ruff.lint.pydocstyle] 90 | convention = "google" 91 | 92 | [tool.ruff.format] 93 | quote-style = "double" 94 | ``` -------------------------------------------------------------------------------- /atla_mcp_server/server.py: -------------------------------------------------------------------------------- ```python 1 | """MCP server implementation.""" 2 | 3 | import asyncio 4 | from contextlib import asynccontextmanager 5 | from dataclasses import dataclass 6 | from textwrap import dedent 7 | from typing import Annotated, AsyncIterator, Literal, Optional, cast 8 | 9 | from atla import AsyncAtla 10 | from mcp.server.fastmcp import Context, FastMCP 11 | from pydantic import WithJsonSchema 12 | 13 | # config 14 | 15 | 16 | @dataclass 17 | class MCPState: 18 | """State of the MCP server.""" 19 | 20 | atla_client: AsyncAtla 21 | 22 | 23 | # types 24 | 25 | AnnotatedLlmPrompt = Annotated[ 26 | str, 27 | WithJsonSchema( 28 | { 29 | "description": dedent( 30 | """The prompt given to an LLM to generate the `llm_response` to be \ 31 | evaluated.""" 32 | ), 33 | "examples": [ 34 | "What is the capital of the moon?", 35 | "Explain the difference between supervised and unsupervised learning.", 36 | "Can you summarize the main idea behind transformers in NLP?", 37 | ], 38 | } 39 | ), 40 | ] 41 | 42 | AnnotatedLlmResponse = Annotated[ 43 | str, 44 | WithJsonSchema( 45 | { 46 | "description": dedent( 47 | """The output generated by the model in response to the `llm_prompt`, \ 48 | which needs to be evaluated.""" 49 | ), 50 | "examples": [ 51 | dedent( 52 | """The Moon doesn't have a capital — it has no countries, \ 53 | governments, or permanent residents""" 54 | ), 55 | dedent( 56 | """Supervised learning uses labeled data to train models to make \ 57 | predictions or classifications. Unsupervised learning, on the other \ 58 | hand, works with unlabeled data to uncover hidden patterns or \ 59 | groupings, such as through clustering or dimensionality reduction.""" 60 | ), 61 | dedent( 62 | """Transformers are neural network architectures designed for \ 63 | sequence modeling tasks like NLP. They rely on self-attention \ 64 | mechanisms to weigh the importance of different input tokens, \ 65 | enabling parallel processing of input data. Unlike RNNs, they don't \ 66 | process sequentially, which allows for faster training and better \ 67 | handling of long-range dependencies.""" 68 | ), 69 | ], 70 | } 71 | ), 72 | ] 73 | 74 | AnnotatedEvaluationCriteria = Annotated[ 75 | str, 76 | WithJsonSchema( 77 | { 78 | "description": dedent( 79 | """The specific criteria or instructions on which to evaluate the \ 80 | model output. A good evaluation criteria should provide the model \ 81 | with: (1) a description of the evaluation task, (2) a rubric of \ 82 | possible scores and their corresponding criteria, and (3) a \ 83 | final sentence clarifying expected score format. A good evaluation \ 84 | criteria should also be specific and focus on a single aspect of \ 85 | the model output. To evaluate a model's response on multiple \ 86 | criteria, use the `evaluate_llm_response_on_multiple_criteria` \ 87 | function and create individual criteria for each relevant evaluation \ 88 | task. Typical rubrics score responses either on a Likert scale from \ 89 | 1 to 5 or binary scale with scores of 'Yes' or 'No', depending on \ 90 | the specific evaluation task.""" 91 | ), 92 | "examples": [ 93 | dedent( 94 | """Evaluate how well the response fulfills the requirements of the instruction by providing relevant information. This includes responding in accordance with the explicit and implicit purpose of given instruction. 95 | 96 | Score 1: The response is completely unrelated to the instruction, or the model entirely misunderstands the instruction. 97 | Score 2: Most of the key points in the response are irrelevant to the instruction, and the response misses major requirements of the instruction. 98 | Score 3: Some major points in the response contain irrelevant information or miss some requirements of the instruction. 99 | Score 4: The response is relevant to the instruction but misses minor requirements of the instruction. 100 | Score 5: The response is perfectly relevant to the instruction, and the model fulfills all of the requirements of the instruction. 101 | 102 | Your score should be an integer between 1 and 5.""" # noqa: E501 103 | ), 104 | dedent( 105 | """Evaluate whether the information provided in the response is correct given the reference response. 106 | Ignore differences in punctuation and phrasing between the response and reference response. 107 | It is okay if the response contains more information than the reference response, as long as it does not contain any conflicting statements. 108 | 109 | Binary scoring 110 | "No": The response is not factually accurate when compared against the reference response or includes conflicting statements. 111 | "Yes": The response is supported by the reference response and does not contain conflicting statements. 112 | 113 | Your score should be either "No" or "Yes". 114 | """ # noqa: E501 115 | ), 116 | ], 117 | } 118 | ), 119 | ] 120 | 121 | 122 | AnnotatedExpectedLlmOutput = Annotated[ 123 | Optional[str], 124 | WithJsonSchema( 125 | { 126 | "description": dedent( 127 | """A reference or ideal answer to compare against the `llm_response`. \ 128 | This is useful in cases where a specific output is expected from \ 129 | the model. Defaults to None.""" 130 | ) 131 | } 132 | ), 133 | ] 134 | 135 | AnnotatedLlmContext = Annotated[ 136 | Optional[str], 137 | WithJsonSchema( 138 | { 139 | "description": dedent( 140 | """Additional context or information provided to the model during \ 141 | generation. This is useful in cases where the model was provided \ 142 | with additional information that is not part of the `llm_prompt` \ 143 | or `expected_llm_output` (e.g., a RAG retrieval context). \ 144 | Defaults to None.""" 145 | ) 146 | } 147 | ), 148 | ] 149 | 150 | AnnotatedModelId = Annotated[ 151 | Literal["atla-selene", "atla-selene-mini"], 152 | WithJsonSchema( 153 | { 154 | "description": dedent( 155 | """The Atla model ID to use for evaluation. `atla-selene` is the \ 156 | flagship Atla model, optimized for the highest all-round performance. \ 157 | `atla-selene-mini` is a compact model that is generally faster and \ 158 | cheaper to run. Defaults to `atla-selene`.""" 159 | ) 160 | } 161 | ), 162 | ] 163 | 164 | # tools 165 | 166 | 167 | async def evaluate_llm_response( 168 | ctx: Context, 169 | evaluation_criteria: AnnotatedEvaluationCriteria, 170 | llm_prompt: AnnotatedLlmPrompt, 171 | llm_response: AnnotatedLlmResponse, 172 | expected_llm_output: AnnotatedExpectedLlmOutput = None, 173 | llm_context: AnnotatedLlmContext = None, 174 | model_id: AnnotatedModelId = "atla-selene", 175 | ) -> dict[str, str]: 176 | """Evaluate an LLM's response to a prompt using a given evaluation criteria. 177 | 178 | This function uses an Atla evaluation model under the hood to return a dictionary 179 | containing a score for the model's response and a textual critique containing 180 | feedback on the model's response. 181 | 182 | Returns: 183 | dict[str, str]: A dictionary containing the evaluation score and critique, in 184 | the format `{"score": <score>, "critique": <critique>}`. 185 | """ 186 | state = cast(MCPState, ctx.request_context.lifespan_context) 187 | result = await state.atla_client.evaluation.create( 188 | model_id=model_id, 189 | model_input=llm_prompt, 190 | model_output=llm_response, 191 | evaluation_criteria=evaluation_criteria, 192 | expected_model_output=expected_llm_output, 193 | model_context=llm_context, 194 | ) 195 | 196 | return { 197 | "score": result.result.evaluation.score, 198 | "critique": result.result.evaluation.critique, 199 | } 200 | 201 | 202 | async def evaluate_llm_response_on_multiple_criteria( 203 | ctx: Context, 204 | evaluation_criteria_list: list[AnnotatedEvaluationCriteria], 205 | llm_prompt: AnnotatedLlmPrompt, 206 | llm_response: AnnotatedLlmResponse, 207 | expected_llm_output: AnnotatedExpectedLlmOutput = None, 208 | llm_context: AnnotatedLlmContext = None, 209 | model_id: AnnotatedModelId = "atla-selene", 210 | ) -> list[dict[str, str]]: 211 | """Evaluate an LLM's response to a prompt across *multiple* evaluation criteria. 212 | 213 | This function uses an Atla evaluation model under the hood to return a list of 214 | dictionaries, each containing an evaluation score and critique for a given 215 | criteria. 216 | 217 | Returns: 218 | list[dict[str, str]]: A list of dictionaries containing the evaluation score 219 | and critique, in the format `{"score": <score>, "critique": <critique>}`. 220 | The order of the dictionaries in the list will match the order of the 221 | criteria in the `evaluation_criteria_list` argument. 222 | """ 223 | tasks = [ 224 | evaluate_llm_response( 225 | ctx=ctx, 226 | evaluation_criteria=criterion, 227 | llm_prompt=llm_prompt, 228 | llm_response=llm_response, 229 | expected_llm_output=expected_llm_output, 230 | llm_context=llm_context, 231 | model_id=model_id, 232 | ) 233 | for criterion in evaluation_criteria_list 234 | ] 235 | results = await asyncio.gather(*tasks) 236 | return results 237 | 238 | 239 | # app factory 240 | 241 | 242 | def app_factory(atla_api_key: str) -> FastMCP: 243 | """Factory function to create an Atla MCP server with the given API key.""" 244 | 245 | @asynccontextmanager 246 | async def lifespan(_: FastMCP) -> AsyncIterator[MCPState]: 247 | async with AsyncAtla( 248 | api_key=atla_api_key, 249 | default_headers={ 250 | "X-Atla-Source": "mcp-server", 251 | }, 252 | ) as client: 253 | yield MCPState(atla_client=client) 254 | 255 | mcp = FastMCP("Atla", lifespan=lifespan) 256 | mcp.tool()(evaluate_llm_response) 257 | mcp.tool()(evaluate_llm_response_on_multiple_criteria) 258 | 259 | return mcp 260 | ```