beehiveinnovations/gemini-mcp-server # codebase.md

This is page 2 of 25. Use http://codebase.md/beehiveinnovations/gemini-mcp-server?lines=true&page={x} to view the full context.

# Directory Structure

```
├── .claude
│   ├── commands
│   │   └── fix-github-issue.md
│   └── settings.json
├── .coveragerc
├── .dockerignore
├── .env.example
├── .gitattributes
├── .github
│   ├── FUNDING.yml
│   ├── ISSUE_TEMPLATE
│   │   ├── bug_report.yml
│   │   ├── config.yml
│   │   ├── documentation.yml
│   │   ├── feature_request.yml
│   │   └── tool_addition.yml
│   ├── pull_request_template.md
│   └── workflows
│       ├── docker-pr.yml
│       ├── docker-release.yml
│       ├── semantic-pr.yml
│       ├── semantic-release.yml
│       └── test.yml
├── .gitignore
├── .pre-commit-config.yaml
├── AGENTS.md
├── CHANGELOG.md
├── claude_config_example.json
├── CLAUDE.md
├── clink
│   ├── __init__.py
│   ├── agents
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── claude.py
│   │   ├── codex.py
│   │   └── gemini.py
│   ├── constants.py
│   ├── models.py
│   ├── parsers
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── claude.py
│   │   ├── codex.py
│   │   └── gemini.py
│   └── registry.py
├── code_quality_checks.ps1
├── code_quality_checks.sh
├── communication_simulator_test.py
├── conf
│   ├── __init__.py
│   ├── azure_models.json
│   ├── cli_clients
│   │   ├── claude.json
│   │   ├── codex.json
│   │   └── gemini.json
│   ├── custom_models.json
│   ├── dial_models.json
│   ├── gemini_models.json
│   ├── openai_models.json
│   ├── openrouter_models.json
│   └── xai_models.json
├── config.py
├── docker
│   ├── README.md
│   └── scripts
│       ├── build.ps1
│       ├── build.sh
│       ├── deploy.ps1
│       ├── deploy.sh
│       └── healthcheck.py
├── docker-compose.yml
├── Dockerfile
├── docs
│   ├── adding_providers.md
│   ├── adding_tools.md
│   ├── advanced-usage.md
│   ├── ai_banter.md
│   ├── ai-collaboration.md
│   ├── azure_openai.md
│   ├── configuration.md
│   ├── context-revival.md
│   ├── contributions.md
│   ├── custom_models.md
│   ├── docker-deployment.md
│   ├── gemini-setup.md
│   ├── getting-started.md
│   ├── index.md
│   ├── locale-configuration.md
│   ├── logging.md
│   ├── model_ranking.md
│   ├── testing.md
│   ├── tools
│   │   ├── analyze.md
│   │   ├── apilookup.md
│   │   ├── challenge.md
│   │   ├── chat.md
│   │   ├── clink.md
│   │   ├── codereview.md
│   │   ├── consensus.md
│   │   ├── debug.md
│   │   ├── docgen.md
│   │   ├── listmodels.md
│   │   ├── planner.md
│   │   ├── precommit.md
│   │   ├── refactor.md
│   │   ├── secaudit.md
│   │   ├── testgen.md
│   │   ├── thinkdeep.md
│   │   ├── tracer.md
│   │   └── version.md
│   ├── troubleshooting.md
│   ├── vcr-testing.md
│   └── wsl-setup.md
├── examples
│   ├── claude_config_macos.json
│   └── claude_config_wsl.json
├── LICENSE
├── providers
│   ├── __init__.py
│   ├── azure_openai.py
│   ├── base.py
│   ├── custom.py
│   ├── dial.py
│   ├── gemini.py
│   ├── openai_compatible.py
│   ├── openai.py
│   ├── openrouter.py
│   ├── registries
│   │   ├── __init__.py
│   │   ├── azure.py
│   │   ├── base.py
│   │   ├── custom.py
│   │   ├── dial.py
│   │   ├── gemini.py
│   │   ├── openai.py
│   │   ├── openrouter.py
│   │   └── xai.py
│   ├── registry_provider_mixin.py
│   ├── registry.py
│   ├── shared
│   │   ├── __init__.py
│   │   ├── model_capabilities.py
│   │   ├── model_response.py
│   │   ├── provider_type.py
│   │   └── temperature.py
│   └── xai.py
├── pyproject.toml
├── pytest.ini
├── README.md
├── requirements-dev.txt
├── requirements.txt
├── run_integration_tests.ps1
├── run_integration_tests.sh
├── run-server.ps1
├── run-server.sh
├── scripts
│   └── sync_version.py
├── server.py
├── simulator_tests
│   ├── __init__.py
│   ├── base_test.py
│   ├── conversation_base_test.py
│   ├── log_utils.py
│   ├── test_analyze_validation.py
│   ├── test_basic_conversation.py
│   ├── test_chat_simple_validation.py
│   ├── test_codereview_validation.py
│   ├── test_consensus_conversation.py
│   ├── test_consensus_three_models.py
│   ├── test_consensus_workflow_accurate.py
│   ├── test_content_validation.py
│   ├── test_conversation_chain_validation.py
│   ├── test_cross_tool_comprehensive.py
│   ├── test_cross_tool_continuation.py
│   ├── test_debug_certain_confidence.py
│   ├── test_debug_validation.py
│   ├── test_line_number_validation.py
│   ├── test_logs_validation.py
│   ├── test_model_thinking_config.py
│   ├── test_o3_model_selection.py
│   ├── test_o3_pro_expensive.py
│   ├── test_ollama_custom_url.py
│   ├── test_openrouter_fallback.py
│   ├── test_openrouter_models.py
│   ├── test_per_tool_deduplication.py
│   ├── test_planner_continuation_history.py
│   ├── test_planner_validation_old.py
│   ├── test_planner_validation.py
│   ├── test_precommitworkflow_validation.py
│   ├── test_prompt_size_limit_bug.py
│   ├── test_refactor_validation.py
│   ├── test_secaudit_validation.py
│   ├── test_testgen_validation.py
│   ├── test_thinkdeep_validation.py
│   ├── test_token_allocation_validation.py
│   ├── test_vision_capability.py
│   └── test_xai_models.py
├── systemprompts
│   ├── __init__.py
│   ├── analyze_prompt.py
│   ├── chat_prompt.py
│   ├── clink
│   │   ├── codex_codereviewer.txt
│   │   ├── default_codereviewer.txt
│   │   ├── default_planner.txt
│   │   └── default.txt
│   ├── codereview_prompt.py
│   ├── consensus_prompt.py
│   ├── debug_prompt.py
│   ├── docgen_prompt.py
│   ├── generate_code_prompt.py
│   ├── planner_prompt.py
│   ├── precommit_prompt.py
│   ├── refactor_prompt.py
│   ├── secaudit_prompt.py
│   ├── testgen_prompt.py
│   ├── thinkdeep_prompt.py
│   └── tracer_prompt.py
├── tests
│   ├── __init__.py
│   ├── CASSETTE_MAINTENANCE.md
│   ├── conftest.py
│   ├── gemini_cassettes
│   │   ├── chat_codegen
│   │   │   └── gemini25_pro_calculator
│   │   │       └── mldev.json
│   │   ├── chat_cross
│   │   │   └── step1_gemini25_flash_number
│   │   │       └── mldev.json
│   │   └── consensus
│   │       └── step2_gemini25_flash_against
│   │           └── mldev.json
│   ├── http_transport_recorder.py
│   ├── mock_helpers.py
│   ├── openai_cassettes
│   │   ├── chat_cross_step2_gpt5_reminder.json
│   │   ├── chat_gpt5_continuation.json
│   │   ├── chat_gpt5_moon_distance.json
│   │   ├── consensus_step1_gpt5_for.json
│   │   └── o3_pro_basic_math.json
│   ├── pii_sanitizer.py
│   ├── sanitize_cassettes.py
│   ├── test_alias_target_restrictions.py
│   ├── test_auto_mode_comprehensive.py
│   ├── test_auto_mode_custom_provider_only.py
│   ├── test_auto_mode_model_listing.py
│   ├── test_auto_mode_provider_selection.py
│   ├── test_auto_mode.py
│   ├── test_auto_model_planner_fix.py
│   ├── test_azure_openai_provider.py
│   ├── test_buggy_behavior_prevention.py
│   ├── test_cassette_semantic_matching.py
│   ├── test_challenge.py
│   ├── test_chat_codegen_integration.py
│   ├── test_chat_cross_model_continuation.py
│   ├── test_chat_openai_integration.py
│   ├── test_chat_simple.py
│   ├── test_clink_claude_agent.py
│   ├── test_clink_claude_parser.py
│   ├── test_clink_codex_agent.py
│   ├── test_clink_gemini_agent.py
│   ├── test_clink_gemini_parser.py
│   ├── test_clink_integration.py
│   ├── test_clink_parsers.py
│   ├── test_clink_tool.py
│   ├── test_collaboration.py
│   ├── test_config.py
│   ├── test_consensus_integration.py
│   ├── test_consensus_schema.py
│   ├── test_consensus.py
│   ├── test_conversation_continuation_integration.py
│   ├── test_conversation_field_mapping.py
│   ├── test_conversation_file_features.py
│   ├── test_conversation_memory.py
│   ├── test_conversation_missing_files.py
│   ├── test_custom_openai_temperature_fix.py
│   ├── test_custom_provider.py
│   ├── test_debug.py
│   ├── test_deploy_scripts.py
│   ├── test_dial_provider.py
│   ├── test_directory_expansion_tracking.py
│   ├── test_disabled_tools.py
│   ├── test_docker_claude_desktop_integration.py
│   ├── test_docker_config_complete.py
│   ├── test_docker_healthcheck.py
│   ├── test_docker_implementation.py
│   ├── test_docker_mcp_validation.py
│   ├── test_docker_security.py
│   ├── test_docker_volume_persistence.py
│   ├── test_file_protection.py
│   ├── test_gemini_token_usage.py
│   ├── test_image_support_integration.py
│   ├── test_image_validation.py
│   ├── test_integration_utf8.py
│   ├── test_intelligent_fallback.py
│   ├── test_issue_245_simple.py
│   ├── test_large_prompt_handling.py
│   ├── test_line_numbers_integration.py
│   ├── test_listmodels_restrictions.py
│   ├── test_listmodels.py
│   ├── test_mcp_error_handling.py
│   ├── test_model_enumeration.py
│   ├── test_model_metadata_continuation.py
│   ├── test_model_resolution_bug.py
│   ├── test_model_restrictions.py
│   ├── test_o3_pro_output_text_fix.py
│   ├── test_o3_temperature_fix_simple.py
│   ├── test_openai_compatible_token_usage.py
│   ├── test_openai_provider.py
│   ├── test_openrouter_provider.py
│   ├── test_openrouter_registry.py
│   ├── test_parse_model_option.py
│   ├── test_per_tool_model_defaults.py
│   ├── test_pii_sanitizer.py
│   ├── test_pip_detection_fix.py
│   ├── test_planner.py
│   ├── test_precommit_workflow.py
│   ├── test_prompt_regression.py
│   ├── test_prompt_size_limit_bug_fix.py
│   ├── test_provider_retry_logic.py
│   ├── test_provider_routing_bugs.py
│   ├── test_provider_utf8.py
│   ├── test_providers.py
│   ├── test_rate_limit_patterns.py
│   ├── test_refactor.py
│   ├── test_secaudit.py
│   ├── test_server.py
│   ├── test_supported_models_aliases.py
│   ├── test_thinking_modes.py
│   ├── test_tools.py
│   ├── test_tracer.py
│   ├── test_utf8_localization.py
│   ├── test_utils.py
│   ├── test_uvx_resource_packaging.py
│   ├── test_uvx_support.py
│   ├── test_workflow_file_embedding.py
│   ├── test_workflow_metadata.py
│   ├── test_workflow_prompt_size_validation_simple.py
│   ├── test_workflow_utf8.py
│   ├── test_xai_provider.py
│   ├── transport_helpers.py
│   └── triangle.png
├── tools
│   ├── __init__.py
│   ├── analyze.py
│   ├── apilookup.py
│   ├── challenge.py
│   ├── chat.py
│   ├── clink.py
│   ├── codereview.py
│   ├── consensus.py
│   ├── debug.py
│   ├── docgen.py
│   ├── listmodels.py
│   ├── models.py
│   ├── planner.py
│   ├── precommit.py
│   ├── refactor.py
│   ├── secaudit.py
│   ├── shared
│   │   ├── __init__.py
│   │   ├── base_models.py
│   │   ├── base_tool.py
│   │   ├── exceptions.py
│   │   └── schema_builders.py
│   ├── simple
│   │   ├── __init__.py
│   │   └── base.py
│   ├── testgen.py
│   ├── thinkdeep.py
│   ├── tracer.py
│   ├── version.py
│   └── workflow
│       ├── __init__.py
│       ├── base.py
│       ├── schema_builders.py
│       └── workflow_mixin.py
├── utils
│   ├── __init__.py
│   ├── client_info.py
│   ├── conversation_memory.py
│   ├── env.py
│   ├── file_types.py
│   ├── file_utils.py
│   ├── image_utils.py
│   ├── model_context.py
│   ├── model_restrictions.py
│   ├── security_config.py
│   ├── storage_backend.py
│   └── token_utils.py
└── zen-mcp-server
```

# Files

--------------------------------------------------------------------------------
/tests/test_clink_integration.py:
--------------------------------------------------------------------------------

```python
 1 | import json
 2 | import os
 3 | import shutil
 4 | 
 5 | import pytest
 6 | 
 7 | from tools.clink import CLinkTool
 8 | 
 9 | 
10 | @pytest.mark.integration
11 | @pytest.mark.asyncio
12 | async def test_clink_gemini_single_digit_sum():
13 |     if shutil.which("gemini") is None:
14 |         pytest.skip("gemini CLI is not installed or on PATH")
15 | 
16 |     if not (os.getenv("GEMINI_API_KEY") or os.getenv("GOOGLE_API_KEY")):
17 |         pytest.skip("Gemini API key is not configured")
18 | 
19 |     tool = CLinkTool()
20 |     prompt = "Respond with a single digit equal to the sum of 2 + 2. Output only that digit."
21 | 
22 |     results = await tool.execute(
23 |         {
24 |             "prompt": prompt,
25 |             "cli_name": "gemini",
26 |             "role": "default",
27 |             "absolute_file_paths": [],
28 |             "images": [],
29 |         }
30 |     )
31 | 
32 |     assert results, "clink tool returned no outputs"
33 |     payload = json.loads(results[0].text)
34 |     status = payload["status"]
35 |     assert status in {"success", "continuation_available"}
36 | 
37 |     content = payload.get("content", "").strip()
38 |     assert content == "4"
39 | 
40 |     if status == "continuation_available":
41 |         offer = payload.get("continuation_offer") or {}
42 |         assert offer.get("continuation_id"), "Expected continuation metadata when status indicates availability"
43 | 
44 | 
45 | @pytest.mark.integration
46 | @pytest.mark.asyncio
47 | async def test_clink_claude_single_digit_sum():
48 |     if shutil.which("claude") is None:
49 |         pytest.skip("claude CLI is not installed or on PATH")
50 | 
51 |     tool = CLinkTool()
52 |     prompt = "Respond with a single digit equal to the sum of 2 + 2. Output only that digit."
53 | 
54 |     results = await tool.execute(
55 |         {
56 |             "prompt": prompt,
57 |             "cli_name": "claude",
58 |             "role": "default",
59 |             "absolute_file_paths": [],
60 |             "images": [],
61 |         }
62 |     )
63 | 
64 |     assert results, "clink tool returned no outputs"
65 |     payload = json.loads(results[0].text)
66 |     status = payload["status"]
67 | 
68 |     if status == "error":
69 |         metadata = payload.get("metadata") or {}
70 |         reason = payload.get("content") or metadata.get("message") or "Claude CLI reported an error"
71 |         pytest.skip(f"Skipping Claude integration test: {reason}")
72 | 
73 |     assert status in {"success", "continuation_available"}
74 | 
75 |     content = payload.get("content", "").strip()
76 |     assert content == "4"
77 | 
78 |     if status == "continuation_available":
79 |         offer = payload.get("continuation_offer") or {}
80 |         assert offer.get("continuation_id"), "Expected continuation metadata when status indicates availability"
81 | 
```

--------------------------------------------------------------------------------
/tests/test_clink_codex_agent.py:
--------------------------------------------------------------------------------

```python
 1 | import asyncio
 2 | import shutil
 3 | from pathlib import Path
 4 | 
 5 | import pytest
 6 | 
 7 | from clink.agents.base import CLIAgentError
 8 | from clink.agents.codex import CodexAgent
 9 | from clink.models import ResolvedCLIClient, ResolvedCLIRole
10 | 
11 | 
12 | class DummyProcess:
13 |     def __init__(self, *, stdout: bytes = b"", stderr: bytes = b"", returncode: int = 0):
14 |         self._stdout = stdout
15 |         self._stderr = stderr
16 |         self.returncode = returncode
17 | 
18 |     async def communicate(self, _input):
19 |         return self._stdout, self._stderr
20 | 
21 | 
22 | @pytest.fixture()
23 | def codex_agent():
24 |     prompt_path = Path("systemprompts/clink/codex_default.txt").resolve()
25 |     role = ResolvedCLIRole(name="default", prompt_path=prompt_path, role_args=[])
26 |     client = ResolvedCLIClient(
27 |         name="codex",
28 |         executable=["codex"],
29 |         internal_args=["exec"],
30 |         config_args=["--json", "--dangerously-bypass-approvals-and-sandbox"],
31 |         env={},
32 |         timeout_seconds=30,
33 |         parser="codex_jsonl",
34 |         roles={"default": role},
35 |         output_to_file=None,
36 |         working_dir=None,
37 |     )
38 |     return CodexAgent(client), role
39 | 
40 | 
41 | async def _run_agent_with_process(monkeypatch, agent, role, process):
42 |     async def fake_create_subprocess_exec(*_args, **_kwargs):
43 |         return process
44 | 
45 |     def fake_which(executable_name):
46 |         return f"/usr/bin/{executable_name}"
47 | 
48 |     monkeypatch.setattr(asyncio, "create_subprocess_exec", fake_create_subprocess_exec)
49 |     monkeypatch.setattr(shutil, "which", fake_which)
50 |     return await agent.run(role=role, prompt="do something", files=[], images=[])
51 | 
52 | 
53 | @pytest.mark.asyncio
54 | async def test_codex_agent_recovers_jsonl(monkeypatch, codex_agent):
55 |     agent, role = codex_agent
56 |     stdout = b"""
57 | {"type":"item.completed","item":{"id":"item_0","type":"agent_message","text":"Hello from Codex"}}
58 | {"type":"turn.completed","usage":{"input_tokens":10,"output_tokens":5}}
59 | """
60 |     process = DummyProcess(stdout=stdout, returncode=124)
61 |     result = await _run_agent_with_process(monkeypatch, agent, role, process)
62 | 
63 |     assert result.returncode == 124
64 |     assert "Hello from Codex" in result.parsed.content
65 |     assert result.parsed.metadata["usage"]["output_tokens"] == 5
66 | 
67 | 
68 | @pytest.mark.asyncio
69 | async def test_codex_agent_propagates_invalid_json(monkeypatch, codex_agent):
70 |     agent, role = codex_agent
71 |     stdout = b"not json"
72 |     process = DummyProcess(stdout=stdout, returncode=1)
73 | 
74 |     with pytest.raises(CLIAgentError):
75 |         await _run_agent_with_process(monkeypatch, agent, role, process)
76 | 
```

--------------------------------------------------------------------------------
/clink/agents/gemini.py:
--------------------------------------------------------------------------------

```python
 1 | """Gemini-specific CLI agent hooks."""
 2 | 
 3 | from __future__ import annotations
 4 | 
 5 | import json
 6 | from typing import Any
 7 | 
 8 | from clink.models import ResolvedCLIClient
 9 | from clink.parsers.base import ParsedCLIResponse
10 | 
11 | from .base import AgentOutput, BaseCLIAgent
12 | 
13 | 
14 | class GeminiAgent(BaseCLIAgent):
15 |     """Gemini-specific behaviour."""
16 | 
17 |     def __init__(self, client: ResolvedCLIClient):
18 |         super().__init__(client)
19 | 
20 |     def _recover_from_error(
21 |         self,
22 |         *,
23 |         returncode: int,
24 |         stdout: str,
25 |         stderr: str,
26 |         sanitized_command: list[str],
27 |         duration_seconds: float,
28 |         output_file_content: str | None,
29 |     ) -> AgentOutput | None:
30 |         combined = "\n".join(part for part in (stderr, stdout) if part)
31 |         if not combined:
32 |             return None
33 | 
34 |         brace_index = combined.find("{")
35 |         if brace_index == -1:
36 |             return None
37 | 
38 |         json_candidate = combined[brace_index:]
39 |         try:
40 |             payload: dict[str, Any] = json.loads(json_candidate)
41 |         except json.JSONDecodeError:
42 |             return None
43 | 
44 |         error_block = payload.get("error")
45 |         if not isinstance(error_block, dict):
46 |             return None
47 | 
48 |         code = error_block.get("code")
49 |         err_type = error_block.get("type")
50 |         detail_message = error_block.get("message")
51 | 
52 |         prologue = combined[:brace_index].strip()
53 |         lines: list[str] = []
54 |         if prologue and (not detail_message or prologue not in detail_message):
55 |             lines.append(prologue)
56 |         if detail_message:
57 |             lines.append(detail_message)
58 | 
59 |         header = "Gemini CLI reported a tool failure"
60 |         if code:
61 |             header = f"{header} ({code})"
62 |         elif err_type:
63 |             header = f"{header} ({err_type})"
64 | 
65 |         content_lines = [header.rstrip(".") + "."]
66 |         content_lines.extend(lines)
67 |         message = "\n".join(content_lines).strip()
68 | 
69 |         metadata = {
70 |             "cli_error_recovered": True,
71 |             "cli_error_code": code,
72 |             "cli_error_type": err_type,
73 |             "cli_error_payload": payload,
74 |         }
75 | 
76 |         parsed = ParsedCLIResponse(content=message or header, metadata=metadata)
77 |         return AgentOutput(
78 |             parsed=parsed,
79 |             sanitized_command=sanitized_command,
80 |             returncode=returncode,
81 |             stdout=stdout,
82 |             stderr=stderr,
83 |             duration_seconds=duration_seconds,
84 |             parser_name=self._parser.name,
85 |             output_file_content=output_file_content,
86 |         )
87 | 
```

--------------------------------------------------------------------------------
/docker-compose.yml:
--------------------------------------------------------------------------------

```yaml
  1 | services:
  2 |   zen-mcp:
  3 |     build:
  4 |       context: .
  5 |       dockerfile: Dockerfile
  6 |       target: runtime
  7 |     image: zen-mcp-server:latest
  8 |     container_name: zen-mcp-server
  9 |     
 10 |     # Container labels for traceability
 11 |     labels:
 12 |       - "com.zen-mcp.service=zen-mcp-server"
 13 |       - "com.zen-mcp.version=1.0.0"
 14 |       - "com.zen-mcp.environment=production"
 15 |       - "com.zen-mcp.description=AI-powered Model Context Protocol server"
 16 |     
 17 |     # Environment variables
 18 |     environment:
 19 |       # Default model configuration
 20 |       - DEFAULT_MODEL=${DEFAULT_MODEL:-auto}
 21 |       
 22 |       # API Keys (use Docker secrets in production)
 23 |       - GEMINI_API_KEY=${GEMINI_API_KEY}
 24 |       - GOOGLE_API_KEY=${GOOGLE_API_KEY}
 25 |       - OPENAI_API_KEY=${OPENAI_API_KEY}
 26 |       - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
 27 |       - XAI_API_KEY=${XAI_API_KEY}
 28 |       - DIAL_API_KEY=${DIAL_API_KEY}
 29 |       - DIAL_API_HOST=${DIAL_API_HOST}
 30 |       - DIAL_API_VERSION=${DIAL_API_VERSION}
 31 |       - OPENROUTER_API_KEY=${OPENROUTER_API_KEY}
 32 |       - CUSTOM_API_URL=${CUSTOM_API_URL}
 33 |       - CUSTOM_API_KEY=${CUSTOM_API_KEY}
 34 |       - CUSTOM_MODEL_NAME=${CUSTOM_MODEL_NAME}
 35 |       
 36 |       # Logging configuration
 37 |       - LOG_LEVEL=${LOG_LEVEL:-INFO}
 38 |       - LOG_MAX_SIZE=${LOG_MAX_SIZE:-10MB}
 39 |       - LOG_BACKUP_COUNT=${LOG_BACKUP_COUNT:-5}
 40 |       
 41 |       # Advanced configuration
 42 |       - DEFAULT_THINKING_MODE_THINKDEEP=${DEFAULT_THINKING_MODE_THINKDEEP:-high}
 43 |       - DISABLED_TOOLS=${DISABLED_TOOLS}
 44 |       - MAX_MCP_OUTPUT_TOKENS=${MAX_MCP_OUTPUT_TOKENS}
 45 |       
 46 |       # Server configuration
 47 |       - PYTHONUNBUFFERED=1
 48 |       - PYTHONPATH=/app
 49 |       - TZ=${TZ:-UTC}
 50 |     
 51 |     # Volumes for persistent data
 52 |     volumes:
 53 |       - ./logs:/app/logs
 54 |       - zen-mcp-config:/app/conf
 55 |       - /etc/localtime:/etc/localtime:ro
 56 |     
 57 |     # Network configuration
 58 |     networks:
 59 |       - zen-network
 60 |     
 61 |     # Resource limits
 62 |     deploy:
 63 |       resources:
 64 |         limits:
 65 |           memory: 512M
 66 |           cpus: '0.5'
 67 |         reservations:
 68 |           memory: 256M
 69 |           cpus: '0.25'
 70 |     
 71 |     # Health check
 72 |     healthcheck:
 73 |       test: ["CMD", "python", "/usr/local/bin/healthcheck.py"]
 74 |       interval: 30s
 75 |       timeout: 10s
 76 |       retries: 3
 77 |       start_period: 40s
 78 |     
 79 |     # Restart policy
 80 |     restart: unless-stopped
 81 |     
 82 |     # Security
 83 |     security_opt:
 84 |       - no-new-privileges:true
 85 |     read_only: true
 86 |     tmpfs:
 87 |       - /tmp:noexec,nosuid,size=100m
 88 |       - /app/tmp:noexec,nosuid,size=50m
 89 | 
 90 | # Named volumes
 91 | volumes:
 92 |   zen-mcp-config:
 93 |     driver: local
 94 | 
 95 | # Networks
 96 | networks:
 97 |   zen-network:
 98 |     driver: bridge
 99 |     ipam:
100 |       config:
101 |         - subnet: 172.20.0.0/16
102 | 
```

--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------

```dockerfile
 1 | # ===========================================
 2 | # STAGE 1: Build dependencies
 3 | # ===========================================
 4 | FROM python:3.11-slim AS builder
 5 | 
 6 | # Install system dependencies for building
 7 | RUN apt-get update && apt-get install -y \
 8 |     build-essential \
 9 |     curl \
10 |     && rm -rf /var/lib/apt/lists/*
11 | 
12 | # Set working directory
13 | WORKDIR /app
14 | 
15 | # Copy requirements files
16 | COPY requirements.txt ./
17 | 
18 | # Create virtual environment and install dependencies
19 | RUN python -m venv /opt/venv
20 | ENV PATH="/opt/venv/bin:$PATH"
21 | 
22 | # Install Python dependencies
23 | RUN pip install --no-cache-dir --upgrade pip setuptools wheel && \
24 |     pip install --no-cache-dir -r requirements.txt
25 | 
26 | # ===========================================
27 | # STAGE 2: Runtime image
28 | # ===========================================
29 | FROM python:3.11-slim AS runtime
30 | 
31 | # Add metadata labels for traceability
32 | LABEL maintainer="Zen MCP Server Team"
33 | LABEL version="1.0.0"
34 | LABEL description="Zen MCP Server - AI-powered Model Context Protocol server"
35 | LABEL org.opencontainers.image.title="zen-mcp-server"
36 | LABEL org.opencontainers.image.description="AI-powered Model Context Protocol server with multi-provider support"
37 | LABEL org.opencontainers.image.version="1.0.0"
38 | LABEL org.opencontainers.image.source="https://github.com/BeehiveInnovations/zen-mcp-server"
39 | LABEL org.opencontainers.image.documentation="https://github.com/BeehiveInnovations/zen-mcp-server/blob/main/README.md"
40 | LABEL org.opencontainers.image.licenses="Apache 2.0 License"
41 | 
42 | # Create non-root user for security
43 | RUN groupadd -r zenuser && useradd -r -g zenuser zenuser
44 | 
45 | # Install minimal runtime dependencies
46 | RUN apt-get update && apt-get install -y \
47 |     ca-certificates \
48 |     procps \
49 |     && rm -rf /var/lib/apt/lists/* \
50 |     && apt-get clean
51 | 
52 | # Copy virtual environment from builder
53 | COPY --from=builder /opt/venv /opt/venv
54 | ENV PATH="/opt/venv/bin:$PATH"
55 | 
56 | # Set working directory
57 | WORKDIR /app
58 | 
59 | # Copy application code
60 | COPY --chown=zenuser:zenuser . .
61 | 
62 | # Create logs directory with proper permissions
63 | RUN mkdir -p logs && chown -R zenuser:zenuser logs
64 | 
65 | # Create tmp directory for container operations
66 | RUN mkdir -p tmp && chown -R zenuser:zenuser tmp
67 | 
68 | # Copy health check script
69 | COPY --chown=zenuser:zenuser docker/scripts/healthcheck.py /usr/local/bin/healthcheck.py
70 | RUN chmod +x /usr/local/bin/healthcheck.py
71 | 
72 | # Switch to non-root user
73 | USER zenuser
74 | 
75 | # Health check configuration
76 | HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
77 |     CMD python /usr/local/bin/healthcheck.py
78 | 
79 | # Set environment variables
80 | ENV PYTHONUNBUFFERED=1
81 | ENV PYTHONPATH=/app
82 | 
83 | # Default command
84 | CMD ["python", "server.py"]
85 | 
```

--------------------------------------------------------------------------------
/run_integration_tests.sh:
--------------------------------------------------------------------------------

```bash
 1 | #!/bin/bash
 2 | 
 3 | # Zen MCP Server - Run Integration Tests
 4 | # This script runs integration tests that require API keys
 5 | # Run this locally on your Mac to ensure everything works end-to-end
 6 | 
 7 | set -e  # Exit on any error
 8 | 
 9 | echo "🧪 Running Integration Tests for Zen MCP Server"
10 | echo "=============================================="
11 | echo "These tests use real API calls with your configured keys"
12 | echo ""
13 | 
14 | # Activate virtual environment
15 | if [[ -f ".zen_venv/bin/activate" ]]; then
16 |     source .zen_venv/bin/activate
17 |     echo "✅ Using virtual environment"
18 | else
19 |     echo "❌ No virtual environment found!"
20 |     echo "Please run: ./run-server.sh first"
21 |     exit 1
22 | fi
23 | 
24 | # Check for .env file
25 | if [[ ! -f ".env" ]]; then
26 |     echo "⚠️  Warning: No .env file found. Integration tests may fail without API keys."
27 |     echo ""
28 | fi
29 | 
30 | echo "🔑 Checking API key availability:"
31 | echo "---------------------------------"
32 | 
33 | # Check which API keys are available
34 | if [[ -n "$GEMINI_API_KEY" ]] || grep -q "GEMINI_API_KEY=" .env 2>/dev/null; then
35 |     echo "✅ GEMINI_API_KEY configured"
36 | else
37 |     echo "❌ GEMINI_API_KEY not found"
38 | fi
39 | 
40 | if [[ -n "$OPENAI_API_KEY" ]] || grep -q "OPENAI_API_KEY=" .env 2>/dev/null; then
41 |     echo "✅ OPENAI_API_KEY configured"
42 | else
43 |     echo "❌ OPENAI_API_KEY not found"
44 | fi
45 | 
46 | if [[ -n "$XAI_API_KEY" ]] || grep -q "XAI_API_KEY=" .env 2>/dev/null; then
47 |     echo "✅ XAI_API_KEY configured"
48 | else
49 |     echo "❌ XAI_API_KEY not found"
50 | fi
51 | 
52 | if [[ -n "$OPENROUTER_API_KEY" ]] || grep -q "OPENROUTER_API_KEY=" .env 2>/dev/null; then
53 |     echo "✅ OPENROUTER_API_KEY configured"
54 | else
55 |     echo "❌ OPENROUTER_API_KEY not found"
56 | fi
57 | 
58 | if [[ -n "$CUSTOM_API_URL" ]] || grep -q "CUSTOM_API_URL=" .env 2>/dev/null; then
59 |     echo "✅ CUSTOM_API_URL configured (local models)"
60 | else
61 |     echo "❌ CUSTOM_API_URL not found"
62 | fi
63 | 
64 | echo ""
65 | 
66 | # Run integration tests
67 | echo "🏃 Running integration tests..."
68 | echo "------------------------------"
69 | 
70 | # Run only integration tests (marked with @pytest.mark.integration)
71 | python -m pytest tests/ -v -m "integration" --tb=short
72 | 
73 | echo ""
74 | echo "✅ Integration tests completed!"
75 | echo ""
76 | 
77 | # Also run simulator tests if requested
78 | if [[ "$1" == "--with-simulator" ]]; then
79 |     echo "🤖 Running simulator tests..."
80 |     echo "----------------------------"
81 |     python communication_simulator_test.py --verbose
82 |     echo ""
83 |     echo "✅ Simulator tests completed!"
84 | fi
85 | 
86 | echo "💡 Tips:"
87 | echo "- Run './run_integration_tests.sh' for integration tests only"
88 | echo "- Run './run_integration_tests.sh --with-simulator' to also run simulator tests"
89 | echo "- Run './code_quality_checks.sh' for unit tests and linting"
90 | echo "- Check logs in logs/mcp_server.log if tests fail"
```

--------------------------------------------------------------------------------
/tests/test_clink_gemini_agent.py:
--------------------------------------------------------------------------------

```python
 1 | import asyncio
 2 | import shutil
 3 | from pathlib import Path
 4 | 
 5 | import pytest
 6 | 
 7 | from clink.agents.base import CLIAgentError
 8 | from clink.agents.gemini import GeminiAgent
 9 | from clink.models import ResolvedCLIClient, ResolvedCLIRole
10 | 
11 | 
12 | class DummyProcess:
13 |     def __init__(self, *, stdout: bytes = b"", stderr: bytes = b"", returncode: int = 0):
14 |         self._stdout = stdout
15 |         self._stderr = stderr
16 |         self.returncode = returncode
17 | 
18 |     async def communicate(self, _input):
19 |         return self._stdout, self._stderr
20 | 
21 | 
22 | @pytest.fixture()
23 | def gemini_agent():
24 |     prompt_path = Path("systemprompts/clink/gemini_default.txt").resolve()
25 |     role = ResolvedCLIRole(name="default", prompt_path=prompt_path, role_args=[])
26 |     client = ResolvedCLIClient(
27 |         name="gemini",
28 |         executable=["gemini"],
29 |         internal_args=[],
30 |         config_args=[],
31 |         env={},
32 |         timeout_seconds=30,
33 |         parser="gemini_json",
34 |         roles={"default": role},
35 |         output_to_file=None,
36 |         working_dir=None,
37 |     )
38 |     return GeminiAgent(client), role
39 | 
40 | 
41 | async def _run_agent_with_process(monkeypatch, agent, role, process):
42 |     async def fake_create_subprocess_exec(*_args, **_kwargs):
43 |         return process
44 | 
45 |     def fake_which(executable_name):
46 |         return f"/usr/bin/{executable_name}"
47 | 
48 |     monkeypatch.setattr(asyncio, "create_subprocess_exec", fake_create_subprocess_exec)
49 |     monkeypatch.setattr(shutil, "which", fake_which)
50 |     return await agent.run(role=role, prompt="do something", files=[], images=[])
51 | 
52 | 
53 | @pytest.mark.asyncio
54 | async def test_gemini_agent_recovers_tool_error(monkeypatch, gemini_agent):
55 |     agent, role = gemini_agent
56 |     error_json = """{
57 |   "error": {
58 |     "type": "FatalToolExecutionError",
59 |     "message": "Error executing tool replace: Failed to edit",
60 |     "code": "edit_expected_occurrence_mismatch"
61 |   }
62 | }"""
63 |     stderr = ("Error: Failed to edit, expected 1 occurrence but found 2.\n" + error_json).encode()
64 |     process = DummyProcess(stderr=stderr, returncode=54)
65 | 
66 |     result = await _run_agent_with_process(monkeypatch, agent, role, process)
67 | 
68 |     assert result.returncode == 54
69 |     assert result.parsed.metadata["cli_error_recovered"] is True
70 |     assert result.parsed.metadata["cli_error_code"] == "edit_expected_occurrence_mismatch"
71 |     assert "Gemini CLI reported a tool failure" in result.parsed.content
72 | 
73 | 
74 | @pytest.mark.asyncio
75 | async def test_gemini_agent_propagates_unrecoverable_error(monkeypatch, gemini_agent):
76 |     agent, role = gemini_agent
77 |     stderr = b"Plain failure without structured payload"
78 |     process = DummyProcess(stderr=stderr, returncode=54)
79 | 
80 |     with pytest.raises(CLIAgentError):
81 |         await _run_agent_with_process(monkeypatch, agent, role, process)
82 | 
```

--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/tool_addition.yml:
--------------------------------------------------------------------------------

```yaml
 1 | name: 🛠️ New Gemini Tool Proposal
 2 | description: Propose a new Zen MCP tool (e.g., `summarize`, `fixer`, `refactor`)
 3 | labels: ["enhancement", "new-tool"]
 4 | body:
 5 |   - type: input
 6 |     id: tool-name
 7 |     attributes:
 8 |       label: Proposed Tool Name
 9 |       description: "What would the tool be called? (e.g., `summarize`, `docgen`, `refactor`)"
10 |       placeholder: "e.g., `docgen`"
11 |     validations:
12 |       required: true
13 | 
14 |   - type: textarea
15 |     id: purpose
16 |     attributes:
17 |       label: What is the primary purpose of this tool?
18 |       description: "Explain the tool's core function and the value it provides to developers using Claude + Zen."
19 |       placeholder: "This tool will automatically generate comprehensive documentation from code, extracting class and function signatures, docstrings, and creating usage examples."
20 |     validations:
21 |       required: true
22 | 
23 |   - type: textarea
24 |     id: example-usage
25 |     attributes:
26 |       label: Example Usage in Claude Desktop
27 |       description: "Show how a user would invoke this tool through Claude and what the expected output would look like."
28 |       placeholder: |
29 |         **User prompt to Claude:**
30 |         "Use zen to generate documentation for my entire src/ directory"
31 | 
32 |         **Expected behavior:**
33 |         - Analyze all Python files in src/
34 |         - Extract classes, functions, and their docstrings
35 |         - Generate structured markdown documentation
36 |         - Include usage examples where possible
37 |         - Return organized documentation with table of contents
38 |       render: markdown
39 |     validations:
40 |       required: true
41 | 
42 |   - type: dropdown
43 |     id: tool-category
44 |     attributes:
45 |       label: Tool Category
46 |       description: What category does this tool fit into?
47 |       options:
48 |         - Code Analysis (like analyze)
49 |         - Code Quality (like codereview)
50 |         - Code Generation/Refactoring
51 |         - Documentation Generation
52 |         - Testing Support
53 |         - Debugging Support (like debug)
54 |         - Workflow Automation
55 |         - Architecture Planning (like thinkdeep)
56 |         - Other
57 |     validations:
58 |       required: true
59 | 
60 |   - type: textarea
61 |     id: system-prompt
62 |     attributes:
63 |       label: Proposed System Prompt (Optional)
64 |       description: "If you have ideas for how zen should be prompted for this tool, share them here."
65 |       placeholder: |
66 |         You are an expert technical documentation generator. Your task is to create comprehensive, user-friendly documentation from source code...
67 | 
68 |   - type: checkboxes
69 |     id: contribution
70 |     attributes:
71 |       label: Contribution
72 |       options:
73 |         - label: I am willing to submit a Pull Request to implement this new tool.
74 |         - label: I have checked that this tool doesn't overlap significantly with existing tools (analyze, codereview, debug, thinkdeep, chat).
75 | 
76 | 
```

--------------------------------------------------------------------------------
/conf/azure_models.json:
--------------------------------------------------------------------------------

```json
 1 | {
 2 |   "_README": {
 3 |     "description": "Model metadata for Azure OpenAI / Azure AI Foundry-backed provider. The `models` definition can be copied from openrouter_models.json / custom_models.json",
 4 |     "documentation": "https://github.com/BeehiveInnovations/zen-mcp-server/blob/main/docs/azure_models.md",
 5 |     "usage": "Models listed here are exposed through Azure AI Foundry. Aliases are case-insensitive.",
 6 |     "field_notes": "Matches providers/shared/model_capabilities.py.",
 7 |     "field_descriptions": {
 8 |       "model_name": "The model identifier e.g., 'gpt-4'",
 9 |       "deployment": "Azure model deployment name",
10 |       "aliases": "Array of short names users can type instead of the full model name",
11 |       "context_window": "Total number of tokens the model can process (input + output combined)",
12 |       "max_output_tokens": "Maximum number of tokens the model can generate in a single response",
13 |       "supports_extended_thinking": "Whether the model supports extended reasoning tokens (currently none do via OpenRouter or custom APIs)",
14 |       "supports_json_mode": "Whether the model can guarantee valid JSON output",
15 |       "supports_function_calling": "Whether the model supports function/tool calling",
16 |       "supports_images": "Whether the model can process images/visual input",
17 |       "max_image_size_mb": "Maximum total size in MB for all images combined (capped at 40MB max for custom models)",
18 |       "supports_temperature": "Whether the model accepts temperature parameter in API calls (set to false for O3/O4 reasoning models)",
19 |       "temperature_constraint": "Type of temperature constraint: 'fixed' (fixed value), 'range' (continuous range), 'discrete' (specific values), or omit for default range",
20 |       "use_openai_response_api": "Set to true when the deployment must call Azure's /responses endpoint (O-series reasoning models). Leave false/omit for standard chat completions.",
21 |       "default_reasoning_effort": "Default reasoning effort level for models that support it (e.g., 'low', 'medium', 'high'). Omit if not applicable.",
22 |       "description": "Human-readable description of the model",
23 |       "intelligence_score": "1-20 human rating used as the primary signal for auto-mode model ordering"
24 |     }
25 |   },
26 |   "_example_models": [
27 |     {
28 |       "model_name": "gpt-4",
29 |       "deployment": "gpt-4",
30 |       "aliases": [
31 |         "gpt4"
32 |       ],
33 |       "context_window": 128000,
34 |       "max_output_tokens": 16384,
35 |       "supports_extended_thinking": false,
36 |       "supports_json_mode": true,
37 |       "supports_function_calling": false,
38 |       "supports_images": false,
39 |       "max_image_size_mb": 0.0,
40 |       "supports_temperature": false,
41 |       "temperature_constraint": "fixed",
42 |       "use_openai_response_api": false,
43 |       "description": "GPT-4 (128K context, 16K output)",
44 |       "intelligence_score": 10
45 |     }
46 |   ],
47 |   "models": []
48 | }
49 | 
```

--------------------------------------------------------------------------------
/docs/troubleshooting.md:
--------------------------------------------------------------------------------

```markdown
  1 | # Troubleshooting Guide
  2 | 
  3 | ## Quick Debugging Steps
  4 | 
  5 | If you're experiencing issues with the Zen MCP Server, follow these steps:
  6 | 
  7 | ### 1. Check MCP Connection
  8 | 
  9 | Open Claude Desktop and type `/mcp` to see if zen is connected:
 10 | - ✅ If zen appears in the list, the connection is working
 11 | - ❌ If not listed or shows an error, continue to step 2
 12 | 
 13 | ### 2. Launch Claude with Debug Mode
 14 | 
 15 | Close Claude Desktop and restart with debug logging:
 16 | 
 17 | ```bash
 18 | # macOS/Linux
 19 | claude --debug
 20 | 
 21 | # Windows (in WSL2)
 22 | claude.exe --debug
 23 | ```
 24 | 
 25 | Look for error messages in the console output, especially:
 26 | - API key errors
 27 | - Python/environment issues
 28 | - File permission errors
 29 | 
 30 | ### 3. Verify API Keys
 31 | 
 32 | Check that your API keys are properly set:
 33 | 
 34 | ```bash
 35 | # Check your .env file
 36 | cat .env
 37 | 
 38 | # Ensure at least one key is set:
 39 | # GEMINI_API_KEY=your-key-here
 40 | # OPENAI_API_KEY=your-key-here
 41 | ```
 42 | 
 43 | If you need to update your API keys, edit the `.env` file and then restart Claude for changes to take effect.
 44 | 
 45 | ### 4. Check Server Logs
 46 | 
 47 | View the server logs for detailed error information:
 48 | 
 49 | ```bash
 50 | # View recent logs
 51 | tail -n 100 logs/mcp_server.log
 52 | 
 53 | # Follow logs in real-time
 54 | tail -f logs/mcp_server.log
 55 | 
 56 | # Or use the -f flag when starting to automatically follow logs
 57 | ./run-server.sh -f
 58 | 
 59 | # Search for errors
 60 | grep "ERROR" logs/mcp_server.log
 61 | ```
 62 | 
 63 | See [Logging Documentation](logging.md) for more details on accessing logs.
 64 | 
 65 | ### 5. Common Issues
 66 | 
 67 | **"Connection failed" in Claude Desktop**
 68 | - Ensure the server path is correct in your Claude config
 69 | - Run `./run-server.sh` to verify setup and see configuration
 70 | - Check that Python is installed: `python3 --version`
 71 | 
 72 | **"API key environment variable is required"**
 73 | - Add your API key to the `.env` file
 74 | - Restart Claude Desktop after updating `.env`
 75 | 
 76 | **File path errors**
 77 | - Always use absolute paths: `/Users/you/project/file.py`
 78 | - Never use relative paths: `./file.py`
 79 | 
 80 | **Python module not found**
 81 | - Run `./run-server.sh` to reinstall dependencies
 82 | - Check virtual environment is activated: should see `.zen_venv` in the Python path
 83 | 
 84 | ### 6. Environment Issues
 85 | 
 86 | **Virtual Environment Problems**
 87 | ```bash
 88 | # Reset environment completely
 89 | rm -rf .zen_venv
 90 | ./run-server.sh
 91 | ```
 92 | 
 93 | **Permission Issues**
 94 | ```bash
 95 | # Ensure script is executable
 96 | chmod +x run-server.sh
 97 | ```
 98 | 
 99 | ### 7. Still Having Issues?
100 | 
101 | If the problem persists after trying these steps:
102 | 
103 | 1. **Reproduce the issue** - Note the exact steps that cause the problem
104 | 2. **Collect logs** - Save relevant error messages from Claude debug mode and server logs
105 | 3. **Open a GitHub issue** with:
106 |    - Your operating system
107 |    - Python version: `python3 --version`
108 |    - Error messages from logs
109 |    - Steps to reproduce
110 |    - What you've already tried
111 | 
112 | ## Windows Users
113 | 
114 | **Important**: Windows users must use WSL2. Install it with:
115 | 
116 | ```powershell
117 | wsl --install -d Ubuntu
118 | ```
119 | 
120 | Then follow the standard setup inside WSL2.
```

--------------------------------------------------------------------------------
/docs/model_ranking.md:
--------------------------------------------------------------------------------

```markdown
 1 | # Model Capability Ranking
 2 | 
 3 | Auto mode needs a short, trustworthy list of models to suggest. The server
 4 | computes a capability rank for every model at runtime using a simple recipe:
 5 | 
 6 | 1. Start with the human-supplied `intelligence_score` (1–20). This is the
 7 |    anchor—multiply it by five to map onto the 0–100 scale the server uses.
 8 | 2. Add a few light bonuses for hard capabilities:
 9 |    - **Context window:** up to +5 (log-scale bonus when the model exceeds ~1K tokens).
10 |    - **Output budget:** +2 for ≥65K tokens, +1 for ≥32K.
11 |    - **Extended thinking:** +3 when the provider supports it.
12 |    - **Function calling / JSON / images:** +1 each when available.
13 |    - **Custom endpoints:** −1 to nudge cloud-hosted defaults ahead unless tuned.
14 | 3. Clamp the final score to 0–100 so downstream callers can rely on the range.
15 | 
16 | In code this looks like:
17 | 
18 | ```python
19 | base = clamp(intelligence_score, 1, 20) * 5
20 | ctx_bonus = min(5, max(0, log10(context_window) - 3))
21 | output_bonus = 2 if max_output_tokens >= 65_000 else 1 if >= 32_000 else 0
22 | feature_bonus = (
23 |     (3 if supports_extended_thinking else 0)
24 |     + (1 if supports_function_calling else 0)
25 |     + (1 if supports_json_mode else 0)
26 |     + (1 if supports_images else 0)
27 | )
28 | penalty = 1 if provider == CUSTOM else 0
29 | 
30 | effective_rank = clamp(base + ctx_bonus + output_bonus + feature_bonus - penalty, 0, 100)
31 | ```
32 | 
33 | The bonuses are intentionally small—the human intelligence score does most
34 | of the work so you can enforce organisational preferences easily.
35 | 
36 | ## Picking an intelligence score
37 | 
38 | A straightforward rubric that mirrors typical provider tiers:
39 | 
40 | | Intelligence | Guidance |
41 | |--------------|----------|
42 | | 18–19 | Frontier reasoning models (Gemini 2.5 Pro, GPT‑5) |
43 | | 15–17 | Strong general models with large context (O3 Pro, DeepSeek R1) |
44 | | 12–14 | Balanced assistants (Claude Opus/Sonnet, Mistral Large) |
45 | | 9–11  | Fast distillations (Gemini Flash, GPT-5 Mini, Mistral medium) |
46 | | 6–8   | Local or efficiency-focused models (Llama 3 70B, Claude Haiku) |
47 | | ≤5    | Experimental/lightweight models |
48 | 
49 | Record the reasoning for your scores so future updates stay consistent.
50 | 
51 | ## How the rank is used
52 | 
53 | The ranked list is cached per provider and consumed by:
54 | - Tool schemas (`model` parameter descriptions) when auto mode is active.
55 | - The `listmodels` tool’s “top models” sections.
56 | - Fallback messaging when a requested model is unavailable.
57 | 
58 | Because the rank is computed after restriction filters, only allowed models
59 | appear in these summaries.
60 | 
61 | ## Customising further
62 | 
63 | If you need a different weighting you can:
64 | - Override `intelligence_score` in your provider or custom model config.
65 | - Subclass the provider and override `get_effective_capability_rank()`.
66 | - Post-process the rank via `get_capabilities_by_rank()` before surfacing it.
67 | 
68 | Most teams find that adjusting `intelligence_score` alone is enough to keep
69 | auto mode honest without revisiting code.
70 | 
```

--------------------------------------------------------------------------------
/.github/pull_request_template.md:
--------------------------------------------------------------------------------

```markdown
 1 | ## PR Title Format
 2 | 
 3 | **Please ensure your PR title follows [Conventional Commits](https://www.conventionalcommits.org/) format:**
 4 | 
 5 | ### Version Bumping Types (trigger semantic release):
 6 | - `feat: <description>` - New features → **MINOR** version bump (1.1.0 → 1.2.0)
 7 | - `fix: <description>` - Bug fixes → **PATCH** version bump (1.1.0 → 1.1.1) 
 8 | - `perf: <description>` - Performance improvements → **PATCH** version bump (1.1.0 → 1.1.1)
 9 | 
10 | ### Breaking Changes (trigger MAJOR version bump):
11 | For breaking changes, use any commit type above with `BREAKING CHANGE:` in the commit body or `!` after the type:
12 | - `feat!: <description>` → **MAJOR** version bump (1.1.0 → 2.0.0)
13 | - `fix!: <description>` → **MAJOR** version bump (1.1.0 → 2.0.0)
14 | 
15 | ### Non-Versioning Types (no release):
16 | - `build: <description>` - Build system changes
17 | - `chore: <description>` - Maintenance tasks
18 | - `ci: <description>` - CI/CD changes
19 | - `docs: <description>` - Documentation only
20 | - `refactor: <description>` - Code refactoring (no functional changes)
21 | - `style: <description>` - Code style/formatting changes
22 | - `test: <description>` - Test additions/changes
23 | 
24 | ### Docker Build Triggering:
25 | 
26 | Docker builds are **independent** of versioning and trigger based on:
27 | 
28 | **Automatic**: When PRs modify relevant files:
29 | - Python files (`*.py`), `requirements*.txt`, `pyproject.toml`
30 | - Docker files (`Dockerfile`, `docker-compose.yml`, `.dockerignore`)
31 | 
32 | **Manual**: Add the `docker-build` label to force builds for any PR.
33 | 
34 | ## Description
35 | 
36 | Please provide a clear and concise description of what this PR does.
37 | 
38 | ## Changes Made
39 | 
40 | - [ ] List the specific changes made
41 | - [ ] Include any breaking changes
42 | - [ ] Note any dependencies added/removed
43 | 
44 | ## Testing
45 | 
46 | **Please review our [Testing Guide](../docs/testing.md) before submitting.**
47 | 
48 | ### Run all linting and tests (required):
49 | ```bash
50 | # Activate virtual environment first
51 | source venv/bin/activate
52 | 
53 | # Run comprehensive code quality checks (recommended)
54 | ./code_quality_checks.sh
55 | 
56 | # If you made tool changes, also run simulator tests
57 | python communication_simulator_test.py
58 | ```
59 | 
60 | - [ ] All linting passes (ruff, black, isort)
61 | - [ ] All unit tests pass
62 | - [ ] **For new features**: Unit tests added in `tests/`
63 | - [ ] **For tool changes**: Simulator tests added in `simulator_tests/`
64 | - [ ] **For bug fixes**: Tests added to prevent regression
65 | - [ ] Simulator tests pass (if applicable)
66 | - [ ] Manual testing completed with realistic scenarios
67 | 
68 | ## Related Issues
69 | 
70 | Fixes #(issue number)
71 | 
72 | ## Checklist
73 | 
74 | - [ ] PR title follows the format guidelines above
75 | - [ ] **Activated venv and ran code quality checks: `source venv/bin/activate && ./code_quality_checks.sh`**
76 | - [ ] Self-review completed
77 | - [ ] **Tests added for ALL changes** (see Testing section above)
78 | - [ ] Documentation updated as needed
79 | - [ ] All unit tests passing
80 | - [ ] Relevant simulator tests passing (if tool changes)
81 | - [ ] Ready for review
82 | 
83 | ## Additional Notes
84 | 
85 | Any additional information that reviewers should know.
```

--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------

```toml
  1 | [project]
  2 | name = "zen-mcp-server"
  3 | version = "9.1.3"
  4 | description = "AI-powered MCP server with multiple model providers"
  5 | requires-python = ">=3.9"
  6 | dependencies = [
  7 |     "mcp>=1.0.0",
  8 |     "google-genai>=1.19.0",
  9 |     "openai>=1.55.2",
 10 |     "pydantic>=2.0.0",
 11 |     "python-dotenv>=1.0.0",
 12 | ]
 13 | 
 14 | [tool.setuptools.packages.find]
 15 | include = ["tools*", "providers*", "systemprompts*", "utils*", "conf*", "clink*"]
 16 | 
 17 | [tool.setuptools]
 18 | py-modules = ["server", "config"]
 19 | 
 20 | [tool.setuptools.package-data]
 21 | "*" = ["conf/*.json"]
 22 | 
 23 | [tool.setuptools.data-files]
 24 | "conf" = [
 25 |     "conf/custom_models.json",
 26 |     "conf/openrouter_models.json",
 27 |     "conf/azure_models.json",
 28 |     "conf/openai_models.json",
 29 |     "conf/gemini_models.json",
 30 |     "conf/xai_models.json",
 31 |     "conf/dial_models.json",
 32 | ]
 33 | 
 34 | [project.scripts]
 35 | zen-mcp-server = "server:run"
 36 | 
 37 | [tool.black]
 38 | line-length = 120
 39 | target-version = ['py39', 'py310', 'py311', 'py312', 'py313']
 40 | include = '\.pyi?$'
 41 | extend-exclude = '''
 42 | /(
 43 |   # directories
 44 |   \.eggs
 45 |   | \.git
 46 |   | \.hg
 47 |   | \.mypy_cache
 48 |   | \.tox
 49 |   | \.venv
 50 |   | \.zen_venv
 51 |   | venv
 52 |   | _build
 53 |   | buck-out
 54 |   | build
 55 |   | dist
 56 | )/
 57 | '''
 58 | 
 59 | [tool.isort]
 60 | profile = "black"
 61 | multi_line_output = 3
 62 | include_trailing_comma = true
 63 | force_grid_wrap = 0
 64 | use_parentheses = true
 65 | ensure_newline_before_comments = true
 66 | line_length = 120
 67 | skip_glob = ["venv/*", ".venv/*", ".zen_venv/*"]
 68 | 
 69 | [tool.ruff]
 70 | target-version = "py39"
 71 | line-length = 120
 72 | 
 73 | [tool.ruff.lint]
 74 | select = [
 75 |     "E",  # pycodestyle errors
 76 |     "W",  # pycodestyle warnings
 77 |     "F",  # pyflakes
 78 |     "I",  # isort
 79 |     "B",  # flake8-bugbear
 80 |     "C4", # flake8-comprehensions
 81 |     "UP", # pyupgrade
 82 | ]
 83 | ignore = [
 84 |     "E501",  # line too long, handled by black
 85 |     "B008",  # do not perform function calls in argument defaults
 86 |     "C901",  # too complex
 87 |     "B904",  # exception handling with raise from
 88 | ]
 89 | 
 90 | [tool.ruff.lint.per-file-ignores]
 91 | "__init__.py" = ["F401"]
 92 | "tests/*" = ["B011"]
 93 | "tests/conftest.py" = ["E402"]  # Module level imports not at top of file - needed for test setup
 94 | 
 95 | [tool.semantic_release]
 96 | version_toml = ["pyproject.toml:project.version"]
 97 | branch = "main"
 98 | version_source = "tag"
 99 | version_pattern = "v(?P<major>\\d+)\\.(?P<minor>\\d+)\\.(?P<patch>\\d+)"
100 | major_on_zero = false
101 | build_command = "python -m pip install --upgrade build && python -m build"
102 | dist_path = "dist/"
103 | upload_to_vcs_release = true
104 | upload_to_repository = false
105 | remove_dist = false
106 | commit_version_number = true
107 | commit_message = "chore(release): {version}\n\nAutomatically generated by python-semantic-release"
108 | tag_format = "v{version}"
109 | 
110 | [tool.semantic_release.branches.main]
111 | match = "main"
112 | prerelease = false
113 | 
114 | [tool.semantic_release.changelog]
115 | exclude_commit_patterns = []
116 | 
117 | [tool.semantic_release.commit_parser_options]
118 | allowed_tags = ["build", "chore", "ci", "docs", "feat", "fix", "perf", "style", "refactor", "test"]
119 | minor_tags = ["feat"]
120 | patch_tags = ["fix", "perf"]
121 | 
122 | [tool.semantic_release.remote.token]
123 | env = "GH_TOKEN"
124 | 
125 | [build-system]
126 | requires = ["setuptools>=45", "wheel", "setuptools_scm[toml]>=6.2"]
127 | build-backend = "setuptools.build_meta"
128 | 
```

--------------------------------------------------------------------------------
/docker/scripts/deploy.sh:
--------------------------------------------------------------------------------

```bash
  1 | #!/bin/bash
  2 | set -euo pipefail
  3 | 
  4 | # Colors for output
  5 | GREEN='\033[0;32m'
  6 | YELLOW='\033[1;33m'
  7 | RED='\033[0;31m'
  8 | NC='\033[0m'
  9 | 
 10 | echo -e "${GREEN}=== Deploying Zen MCP Server ===${NC}"
 11 | 
 12 | # Function to check if required environment variables are set
 13 | check_env_vars() {
 14 |     # At least one of these API keys must be set
 15 |     local required_vars=("GEMINI_API_KEY" "GOOGLE_API_KEY" "OPENAI_API_KEY" "XAI_API_KEY" "DIAL_API_KEY" "OPENROUTER_API_KEY")
 16 |     
 17 |     local has_api_key=false
 18 |     for var in "${required_vars[@]}"; do
 19 |         if [[ -n "${!var:-}" ]]; then
 20 |             has_api_key=true
 21 |             break
 22 |         fi
 23 |     done
 24 | 
 25 |     if [[ "$has_api_key" == false ]]; then
 26 |         echo -e "${RED}Error: At least one API key must be set in your .env file${NC}"
 27 |         printf '  %s\n' "${required_vars[@]}"
 28 |         exit 1
 29 |     fi
 30 | }
 31 | 
 32 | # Load environment variables
 33 | if [[ -f .env ]]; then
 34 |     set -a
 35 |     source .env
 36 |     set +a
 37 |     echo -e "${GREEN}✓ Environment variables loaded from .env${NC}"
 38 | else
 39 |     echo -e "${RED}Error: .env file not found${NC}"
 40 |     echo -e "${YELLOW}Please copy .env.example to .env and configure your API keys${NC}"
 41 |     exit 1
 42 | fi
 43 | 
 44 | # Check required environment variables
 45 | check_env_vars
 46 | 
 47 | # Exponential backoff health check function
 48 | wait_for_health() {
 49 |     local max_attempts=6
 50 |     local attempt=1
 51 |     local delay=2
 52 | 
 53 |     while (( attempt <= max_attempts )); do
 54 |         status=$(docker-compose ps -q zen-mcp | xargs docker inspect -f "{{.State.Health.Status}}" 2>/dev/null || echo "unavailable")
 55 |         if [[ "$status" == "healthy" ]]; then
 56 |             return 0
 57 |         fi
 58 |         echo -e "${YELLOW}Waiting for service to be healthy... (attempt $attempt/${max_attempts}, retrying in ${delay}s)${NC}"
 59 |         sleep $delay
 60 |         delay=$(( delay * 2 ))
 61 |         attempt=$(( attempt + 1 ))
 62 |     done
 63 | 
 64 |     echo -e "${RED}Service failed to become healthy after $max_attempts attempts${NC}"
 65 |     echo -e "${YELLOW}Checking logs:${NC}"
 66 |     docker-compose logs zen-mcp
 67 |     exit 1
 68 | }
 69 | 
 70 | # Create logs directory if it doesn't exist
 71 | mkdir -p logs
 72 | 
 73 | # Stop existing containers
 74 | echo -e "${GREEN}Stopping existing containers...${NC}"
 75 | docker-compose down
 76 | 
 77 | # Start the services
 78 | echo -e "${GREEN}Starting Zen MCP Server...${NC}"
 79 | docker-compose up -d
 80 | 
 81 | # Wait for health check
 82 | echo -e "${GREEN}Waiting for service to be healthy...${NC}"
 83 | timeout 60 bash -c 'while [[ "$(docker-compose ps -q zen-mcp | xargs docker inspect -f "{{.State.Health.Status}}")" != "healthy" ]]; do sleep 2; done' || {
 84 |     wait_for_health
 85 |     echo -e "${RED}Service failed to become healthy${NC}"
 86 |     echo -e "${YELLOW}Checking logs:${NC}"
 87 |     docker-compose logs zen-mcp
 88 |     exit 1
 89 | }
 90 | 
 91 | echo -e "${GREEN}✓ Zen MCP Server deployed successfully${NC}"
 92 | echo -e "${GREEN}Service Status:${NC}"
 93 | docker-compose ps
 94 | 
 95 | echo -e "${GREEN}=== Deployment Complete ===${NC}"
 96 | echo -e "${YELLOW}Useful commands:${NC}"
 97 | echo -e "  View logs: ${GREEN}docker-compose logs -f zen-mcp${NC}"
 98 | echo -e "  Stop service: ${GREEN}docker-compose down${NC}"
 99 | echo -e "  Restart service: ${GREEN}docker-compose restart zen-mcp${NC}"
100 | 
```

--------------------------------------------------------------------------------
/code_quality_checks.sh:
--------------------------------------------------------------------------------

```bash
  1 | #!/bin/bash
  2 | 
  3 | # Zen MCP Server - Code Quality Checks
  4 | # This script runs all required linting and testing checks before committing changes.
  5 | # ALL checks must pass 100% for CI/CD to succeed.
  6 | 
  7 | set -e  # Exit on any error
  8 | 
  9 | echo "🔍 Running Code Quality Checks for Zen MCP Server"
 10 | echo "================================================="
 11 | 
 12 | # Determine Python command
 13 | if [[ -f ".zen_venv/bin/python" ]]; then
 14 |     PYTHON_CMD=".zen_venv/bin/python"
 15 |     PIP_CMD=".zen_venv/bin/pip"
 16 |     echo "✅ Using venv"
 17 | elif [[ -n "$VIRTUAL_ENV" ]]; then
 18 |     PYTHON_CMD="python"
 19 |     PIP_CMD="pip"
 20 |     echo "✅ Using activated virtual environment: $VIRTUAL_ENV"
 21 | else
 22 |     echo "❌ No virtual environment found!"
 23 |     echo "Please run: ./run-server.sh first to set up the environment"
 24 |     exit 1
 25 | fi
 26 | echo ""
 27 | 
 28 | # Check and install dev dependencies if needed
 29 | echo "🔍 Checking development dependencies..."
 30 | DEV_DEPS_NEEDED=false
 31 | 
 32 | # Check each dev dependency
 33 | for tool in ruff black isort pytest; do
 34 |     # Check if tool exists in venv or in PATH
 35 |     if [[ -f ".zen_venv/bin/$tool" ]] || command -v $tool &> /dev/null; then
 36 |         continue
 37 |     else
 38 |         DEV_DEPS_NEEDED=true
 39 |         break
 40 |     fi
 41 | done
 42 | 
 43 | if [ "$DEV_DEPS_NEEDED" = true ]; then
 44 |     echo "📦 Installing development dependencies..."
 45 |     $PIP_CMD install -q -r requirements-dev.txt
 46 |     echo "✅ Development dependencies installed"
 47 | else
 48 |     echo "✅ Development dependencies already installed"
 49 | fi
 50 | 
 51 | # Set tool paths
 52 | if [[ -f ".zen_venv/bin/ruff" ]]; then
 53 |     RUFF=".zen_venv/bin/ruff"
 54 |     BLACK=".zen_venv/bin/black"
 55 |     ISORT=".zen_venv/bin/isort"
 56 |     PYTEST=".zen_venv/bin/pytest"
 57 | else
 58 |     RUFF="ruff"
 59 |     BLACK="black"
 60 |     ISORT="isort"
 61 |     PYTEST="pytest"
 62 | fi
 63 | echo ""
 64 | 
 65 | # Step 1: Linting and Formatting
 66 | echo "📋 Step 1: Running Linting and Formatting Checks"
 67 | echo "--------------------------------------------------"
 68 | 
 69 | echo "🔧 Running ruff linting with auto-fix..."
 70 | $RUFF check --fix --exclude test_simulation_files --exclude .zen_venv
 71 | 
 72 | echo "🎨 Running black code formatting..."
 73 | $BLACK . --exclude="test_simulation_files/" --exclude=".zen_venv/"
 74 | 
 75 | echo "📦 Running import sorting with isort..."
 76 | $ISORT . --skip-glob=".zen_venv/*" --skip-glob="test_simulation_files/*"
 77 | 
 78 | echo "✅ Verifying all linting passes..."
 79 | $RUFF check --exclude test_simulation_files --exclude .zen_venv
 80 | 
 81 | echo "✅ Step 1 Complete: All linting and formatting checks passed!"
 82 | echo ""
 83 | 
 84 | # Step 2: Unit Tests
 85 | echo "🧪 Step 2: Running Complete Unit Test Suite"
 86 | echo "---------------------------------------------"
 87 | 
 88 | echo "🏃 Running unit tests (excluding integration tests)..."
 89 | $PYTHON_CMD -m pytest tests/ -v -x -m "not integration"
 90 | 
 91 | echo "✅ Step 2 Complete: All unit tests passed!"
 92 | echo ""
 93 | 
 94 | # Step 3: Final Summary
 95 | echo "🎉 All Code Quality Checks Passed!"
 96 | echo "=================================="
 97 | echo "✅ Linting (ruff): PASSED"
 98 | echo "✅ Formatting (black): PASSED" 
 99 | echo "✅ Import sorting (isort): PASSED"
100 | echo "✅ Unit tests: PASSED"
101 | echo ""
102 | echo "🚀 Your code is ready for commit and GitHub Actions!"
103 | echo "💡 Remember to add simulator tests if you modified tools"
```

--------------------------------------------------------------------------------
/providers/xai.py:
--------------------------------------------------------------------------------

```python
 1 | """X.AI (GROK) model provider implementation."""
 2 | 
 3 | import logging
 4 | from typing import TYPE_CHECKING, ClassVar, Optional
 5 | 
 6 | if TYPE_CHECKING:
 7 |     from tools.models import ToolModelCategory
 8 | 
 9 | from .openai_compatible import OpenAICompatibleProvider
10 | from .registries.xai import XAIModelRegistry
11 | from .registry_provider_mixin import RegistryBackedProviderMixin
12 | from .shared import ModelCapabilities, ProviderType
13 | 
14 | logger = logging.getLogger(__name__)
15 | 
16 | 
17 | class XAIModelProvider(RegistryBackedProviderMixin, OpenAICompatibleProvider):
18 |     """Integration for X.AI's GROK models exposed over an OpenAI-style API.
19 | 
20 |     Publishes capability metadata for the officially supported deployments and
21 |     maps tool-category preferences to the appropriate GROK model.
22 |     """
23 | 
24 |     FRIENDLY_NAME = "X.AI"
25 | 
26 |     REGISTRY_CLASS = XAIModelRegistry
27 |     MODEL_CAPABILITIES: ClassVar[dict[str, ModelCapabilities]] = {}
28 | 
29 |     def __init__(self, api_key: str, **kwargs):
30 |         """Initialize X.AI provider with API key."""
31 |         # Set X.AI base URL
32 |         kwargs.setdefault("base_url", "https://api.x.ai/v1")
33 |         self._ensure_registry()
34 |         super().__init__(api_key, **kwargs)
35 |         self._invalidate_capability_cache()
36 | 
37 |     def get_provider_type(self) -> ProviderType:
38 |         """Get the provider type."""
39 |         return ProviderType.XAI
40 | 
41 |     def get_preferred_model(self, category: "ToolModelCategory", allowed_models: list[str]) -> Optional[str]:
42 |         """Get XAI's preferred model for a given category from allowed models.
43 | 
44 |         Args:
45 |             category: The tool category requiring a model
46 |             allowed_models: Pre-filtered list of models allowed by restrictions
47 | 
48 |         Returns:
49 |             Preferred model name or None
50 |         """
51 |         from tools.models import ToolModelCategory
52 | 
53 |         if not allowed_models:
54 |             return None
55 | 
56 |         if category == ToolModelCategory.EXTENDED_REASONING:
57 |             # Prefer GROK-4 for advanced reasoning with thinking mode
58 |             if "grok-4" in allowed_models:
59 |                 return "grok-4"
60 |             elif "grok-3" in allowed_models:
61 |                 return "grok-3"
62 |             # Fall back to any available model
63 |             return allowed_models[0]
64 | 
65 |         elif category == ToolModelCategory.FAST_RESPONSE:
66 |             # Prefer GROK-3-Fast for speed, then GROK-4
67 |             if "grok-3-fast" in allowed_models:
68 |                 return "grok-3-fast"
69 |             elif "grok-4" in allowed_models:
70 |                 return "grok-4"
71 |             # Fall back to any available model
72 |             return allowed_models[0]
73 | 
74 |         else:  # BALANCED or default
75 |             # Prefer GROK-4 for balanced use (best overall capabilities)
76 |             if "grok-4" in allowed_models:
77 |                 return "grok-4"
78 |             elif "grok-3" in allowed_models:
79 |                 return "grok-3"
80 |             elif "grok-3-fast" in allowed_models:
81 |                 return "grok-3-fast"
82 |             # Fall back to any available model
83 |             return allowed_models[0]
84 | 
85 | 
86 | # Load registry data at import time
87 | XAIModelProvider._ensure_registry()
88 | 
```

--------------------------------------------------------------------------------
/utils/image_utils.py:
--------------------------------------------------------------------------------

```python
 1 | """Utility helpers for validating image inputs."""
 2 | 
 3 | import base64
 4 | import binascii
 5 | import os
 6 | from collections.abc import Iterable
 7 | 
 8 | from utils.file_types import IMAGES, get_image_mime_type
 9 | 
10 | DEFAULT_MAX_IMAGE_SIZE_MB = 20.0
11 | 
12 | __all__ = ["DEFAULT_MAX_IMAGE_SIZE_MB", "validate_image"]
13 | 
14 | 
15 | def _valid_mime_types() -> Iterable[str]:
16 |     """Return the MIME types permitted by the IMAGES whitelist."""
17 |     return (get_image_mime_type(ext) for ext in IMAGES)
18 | 
19 | 
20 | def validate_image(image_path: str, max_size_mb: float = None) -> tuple[bytes, str]:
21 |     """Validate a user-supplied image path or data URL.
22 | 
23 |     Args:
24 |         image_path: Either a filesystem path or a data URL.
25 |         max_size_mb: Optional size limit (defaults to ``DEFAULT_MAX_IMAGE_SIZE_MB``).
26 | 
27 |     Returns:
28 |         A tuple ``(image_bytes, mime_type)`` ready for upstream providers.
29 | 
30 |     Raises:
31 |         ValueError: When the image is missing, malformed, or exceeds limits.
32 |     """
33 |     if max_size_mb is None:
34 |         max_size_mb = DEFAULT_MAX_IMAGE_SIZE_MB
35 | 
36 |     if image_path.startswith("data:"):
37 |         return _validate_data_url(image_path, max_size_mb)
38 | 
39 |     return _validate_file_path(image_path, max_size_mb)
40 | 
41 | 
42 | def _validate_data_url(image_data_url: str, max_size_mb: float) -> tuple[bytes, str]:
43 |     """Validate a data URL and return image bytes plus MIME type."""
44 |     try:
45 |         header, data = image_data_url.split(",", 1)
46 |         mime_type = header.split(";")[0].split(":")[1]
47 |     except (ValueError, IndexError) as exc:
48 |         raise ValueError(f"Invalid data URL format: {exc}")
49 | 
50 |     valid_mime_types = list(_valid_mime_types())
51 |     if mime_type not in valid_mime_types:
52 |         raise ValueError(
53 |             "Unsupported image type: {mime}. Supported types: {supported}".format(
54 |                 mime=mime_type, supported=", ".join(valid_mime_types)
55 |             )
56 |         )
57 | 
58 |     try:
59 |         image_bytes = base64.b64decode(data)
60 |     except binascii.Error as exc:
61 |         raise ValueError(f"Invalid base64 data: {exc}")
62 | 
63 |     _validate_size(image_bytes, max_size_mb)
64 |     return image_bytes, mime_type
65 | 
66 | 
67 | def _validate_file_path(file_path: str, max_size_mb: float) -> tuple[bytes, str]:
68 |     """Validate an image loaded from the filesystem."""
69 |     try:
70 |         with open(file_path, "rb") as handle:
71 |             image_bytes = handle.read()
72 |     except FileNotFoundError:
73 |         raise ValueError(f"Image file not found: {file_path}")
74 |     except OSError as exc:
75 |         raise ValueError(f"Failed to read image file: {exc}")
76 | 
77 |     ext = os.path.splitext(file_path)[1].lower()
78 |     if ext not in IMAGES:
79 |         raise ValueError(
80 |             "Unsupported image format: {ext}. Supported formats: {supported}".format(
81 |                 ext=ext, supported=", ".join(sorted(IMAGES))
82 |             )
83 |         )
84 | 
85 |     mime_type = get_image_mime_type(ext)
86 |     _validate_size(image_bytes, max_size_mb)
87 |     return image_bytes, mime_type
88 | 
89 | 
90 | def _validate_size(image_bytes: bytes, max_size_mb: float) -> None:
91 |     """Ensure the image does not exceed the configured size limit."""
92 |     size_mb = len(image_bytes) / (1024 * 1024)
93 |     if size_mb > max_size_mb:
94 |         raise ValueError(f"Image too large: {size_mb:.1f}MB (max: {max_size_mb}MB)")
95 | 
```

--------------------------------------------------------------------------------
/tests/sanitize_cassettes.py:
--------------------------------------------------------------------------------

```python
  1 | #!/usr/bin/env python3
  2 | """
  3 | Script to sanitize existing cassettes by applying PII sanitization.
  4 | 
  5 | This script will:
  6 | 1. Load existing cassettes
  7 | 2. Apply PII sanitization to all interactions
  8 | 3. Create backups of originals
  9 | 4. Save sanitized versions
 10 | """
 11 | 
 12 | import json
 13 | import shutil
 14 | import sys
 15 | from datetime import datetime
 16 | from pathlib import Path
 17 | 
 18 | # Add tests directory to path to import our modules
 19 | sys.path.insert(0, str(Path(__file__).parent))
 20 | 
 21 | from pii_sanitizer import PIISanitizer
 22 | 
 23 | 
 24 | def sanitize_cassette(cassette_path: Path, backup: bool = True) -> bool:
 25 |     """Sanitize a single cassette file."""
 26 |     print(f"\n🔍 Processing: {cassette_path}")
 27 | 
 28 |     if not cassette_path.exists():
 29 |         print(f"❌ File not found: {cassette_path}")
 30 |         return False
 31 | 
 32 |     try:
 33 |         # Load cassette
 34 |         with open(cassette_path) as f:
 35 |             cassette_data = json.load(f)
 36 | 
 37 |         # Create backup if requested
 38 |         if backup:
 39 |             backup_path = cassette_path.with_suffix(f'.backup-{datetime.now().strftime("%Y%m%d-%H%M%S")}.json')
 40 |             shutil.copy2(cassette_path, backup_path)
 41 |             print(f"📦 Backup created: {backup_path}")
 42 | 
 43 |         # Initialize sanitizer
 44 |         sanitizer = PIISanitizer()
 45 | 
 46 |         # Sanitize interactions
 47 |         if "interactions" in cassette_data:
 48 |             sanitized_interactions = []
 49 | 
 50 |             for interaction in cassette_data["interactions"]:
 51 |                 sanitized_interaction = {}
 52 | 
 53 |                 # Sanitize request
 54 |                 if "request" in interaction:
 55 |                     sanitized_interaction["request"] = sanitizer.sanitize_request(interaction["request"])
 56 | 
 57 |                 # Sanitize response
 58 |                 if "response" in interaction:
 59 |                     sanitized_interaction["response"] = sanitizer.sanitize_response(interaction["response"])
 60 | 
 61 |                 sanitized_interactions.append(sanitized_interaction)
 62 | 
 63 |             cassette_data["interactions"] = sanitized_interactions
 64 | 
 65 |         # Save sanitized cassette
 66 |         with open(cassette_path, "w") as f:
 67 |             json.dump(cassette_data, f, indent=2, sort_keys=True)
 68 | 
 69 |         print(f"✅ Sanitized: {cassette_path}")
 70 |         return True
 71 | 
 72 |     except Exception as e:
 73 |         print(f"❌ Error processing {cassette_path}: {e}")
 74 |         import traceback
 75 | 
 76 |         traceback.print_exc()
 77 |         return False
 78 | 
 79 | 
 80 | def main():
 81 |     """Sanitize all cassettes in the openai_cassettes directory."""
 82 |     cassettes_dir = Path(__file__).parent / "openai_cassettes"
 83 | 
 84 |     if not cassettes_dir.exists():
 85 |         print(f"❌ Directory not found: {cassettes_dir}")
 86 |         sys.exit(1)
 87 | 
 88 |     # Find all JSON cassettes
 89 |     cassette_files = list(cassettes_dir.glob("*.json"))
 90 | 
 91 |     if not cassette_files:
 92 |         print(f"❌ No cassette files found in {cassettes_dir}")
 93 |         sys.exit(1)
 94 | 
 95 |     print(f"🎬 Found {len(cassette_files)} cassette(s) to sanitize")
 96 | 
 97 |     # Process each cassette
 98 |     success_count = 0
 99 |     for cassette_path in cassette_files:
100 |         if sanitize_cassette(cassette_path):
101 |             success_count += 1
102 | 
103 |     print(f"\n✨ Sanitization complete: {success_count}/{len(cassette_files)} cassettes processed successfully")
104 | 
105 |     if success_count < len(cassette_files):
106 |         sys.exit(1)
107 | 
108 | 
109 | if __name__ == "__main__":
110 |     main()
111 | 
```

--------------------------------------------------------------------------------
/providers/registry_provider_mixin.py:
--------------------------------------------------------------------------------

```python
 1 | """Mixin for providers backed by capability registries.
 2 | 
 3 | This mixin centralises the boilerplate for providers that expose their model
 4 | capabilities via JSON configuration files. Subclasses only need to set
 5 | ``REGISTRY_CLASS`` to an appropriate :class:`CapabilityModelRegistry` and the
 6 | mix-in will take care of:
 7 | 
 8 | * Populating ``MODEL_CAPABILITIES`` exactly once per process (with optional
 9 |   reload support for tests).
10 | * Lazily exposing the registry contents through the standard provider hooks
11 |   (:meth:`get_all_model_capabilities` and :meth:`get_model_registry`).
12 | * Providing defensive logging when a registry cannot be constructed so the
13 |   provider can degrade gracefully instead of raising during import.
14 | 
15 | Using this helper keeps individual provider implementations focused on their
16 | SDK-specific behaviour while ensuring capability loading is consistent across
17 | OpenAI, Gemini, X.AI, and other native backends.
18 | """
19 | 
20 | from __future__ import annotations
21 | 
22 | import logging
23 | from typing import ClassVar
24 | 
25 | from .registries.base import CapabilityModelRegistry
26 | from .shared import ModelCapabilities
27 | 
28 | 
29 | class RegistryBackedProviderMixin:
30 |     """Shared helper for providers that load capabilities from JSON registries."""
31 | 
32 |     REGISTRY_CLASS: ClassVar[type[CapabilityModelRegistry] | None] = None
33 |     _registry: ClassVar[CapabilityModelRegistry | None] = None
34 |     MODEL_CAPABILITIES: ClassVar[dict[str, ModelCapabilities]] = {}
35 | 
36 |     @classmethod
37 |     def _registry_logger(cls) -> logging.Logger:
38 |         """Return the logger used for registry lifecycle messages."""
39 |         return logging.getLogger(cls.__module__)
40 | 
41 |     @classmethod
42 |     def _ensure_registry(cls, *, force_reload: bool = False) -> None:
43 |         """Populate ``MODEL_CAPABILITIES`` from the configured registry.
44 | 
45 |         Args:
46 |             force_reload: When ``True`` the registry is re-created even if it
47 |                 was previously loaded. This is primarily used by tests.
48 |         """
49 | 
50 |         if cls.REGISTRY_CLASS is None:  # pragma: no cover - defensive programming
51 |             raise RuntimeError(f"{cls.__name__} must define REGISTRY_CLASS.")
52 | 
53 |         if cls._registry is not None and not force_reload:
54 |             return
55 | 
56 |         try:
57 |             registry = cls.REGISTRY_CLASS()
58 |         except Exception as exc:  # pragma: no cover - registry failures shouldn't break the provider
59 |             cls._registry_logger().warning("Unable to load %s registry: %s", cls.__name__, exc)
60 |             cls._registry = None
61 |             cls.MODEL_CAPABILITIES = {}
62 |             return
63 | 
64 |         cls._registry = registry
65 |         cls.MODEL_CAPABILITIES = dict(registry.model_map)
66 | 
67 |     @classmethod
68 |     def reload_registry(cls) -> None:
69 |         """Force a registry reload (used in tests)."""
70 | 
71 |         cls._ensure_registry(force_reload=True)
72 | 
73 |     def get_all_model_capabilities(self) -> dict[str, ModelCapabilities]:
74 |         """Return the registry-backed ``MODEL_CAPABILITIES`` map."""
75 | 
76 |         self._ensure_registry()
77 |         return super().get_all_model_capabilities()
78 | 
79 |     def get_model_registry(self) -> dict[str, ModelCapabilities] | None:
80 |         """Return a copy of the underlying registry map when available."""
81 | 
82 |         if self._registry is None:
83 |             return None
84 |         return dict(self._registry.model_map)
85 | 
```

--------------------------------------------------------------------------------
/docker/scripts/healthcheck.py:
--------------------------------------------------------------------------------

```python
  1 | #!/usr/bin/env python3
  2 | """
  3 | Health check script for Zen MCP Server Docker container
  4 | """
  5 | 
  6 | import os
  7 | import subprocess
  8 | import sys
  9 | from pathlib import Path
 10 | 
 11 | try:
 12 |     from utils.env import get_env
 13 | except ImportError:  # pragma: no cover - resolves module path inside container
 14 |     project_root = Path(__file__).resolve().parents[2]
 15 |     if str(project_root) not in sys.path:
 16 |         sys.path.insert(0, str(project_root))
 17 |     from utils.env import get_env  # type: ignore[import-error]
 18 | 
 19 | 
 20 | def check_process():
 21 |     """Check if the main server process is running"""
 22 |     result = subprocess.run(["pgrep", "-f", "server.py"], capture_output=True, text=True, timeout=10)
 23 |     if result.returncode == 0:
 24 |         return True
 25 |     print(f"Process check failed: {result.stderr}", file=sys.stderr)
 26 |     return False
 27 | 
 28 | 
 29 | def check_python_imports():
 30 |     """Check if critical Python modules can be imported"""
 31 |     critical_modules = ["mcp", "google.genai", "openai", "pydantic", "dotenv"]
 32 | 
 33 |     for module in critical_modules:
 34 |         try:
 35 |             __import__(module)
 36 |         except ImportError as e:
 37 |             print(f"Critical module {module} cannot be imported: {e}", file=sys.stderr)
 38 |             return False
 39 |         except Exception as e:
 40 |             print(f"Error importing {module}: {e}", file=sys.stderr)
 41 |             return False
 42 |     return True
 43 | 
 44 | 
 45 | def check_log_directory():
 46 |     """Check if logs directory is writable"""
 47 |     log_dir = "/app/logs"
 48 |     try:
 49 |         if not os.path.exists(log_dir):
 50 |             print(f"Log directory {log_dir} does not exist", file=sys.stderr)
 51 |             return False
 52 | 
 53 |         test_file = os.path.join(log_dir, ".health_check")
 54 |         with open(test_file, "w") as f:
 55 |             f.write("health_check")
 56 |         os.remove(test_file)
 57 |         return True
 58 |     except Exception as e:
 59 |         print(f"Log directory check failed: {e}", file=sys.stderr)
 60 |         return False
 61 | 
 62 | 
 63 | def check_environment():
 64 |     """Check if essential environment variables are present"""
 65 |     # At least one API key should be present
 66 |     api_keys = [
 67 |         "GEMINI_API_KEY",
 68 |         "GOOGLE_API_KEY",
 69 |         "OPENAI_API_KEY",
 70 |         "XAI_API_KEY",
 71 |         "DIAL_API_KEY",
 72 |         "OPENROUTER_API_KEY",
 73 |     ]
 74 | 
 75 |     has_api_key = any(get_env(key) for key in api_keys)
 76 |     if not has_api_key:
 77 |         print("No API keys found in environment", file=sys.stderr)
 78 |         return False
 79 | 
 80 |     # Validate API key formats (basic checks)
 81 |     for key in api_keys:
 82 |         value = get_env(key)
 83 |         if value:
 84 |             if len(value.strip()) < 10:
 85 |                 print(f"API key {key} appears too short or invalid", file=sys.stderr)
 86 |                 return False
 87 | 
 88 |     return True
 89 | 
 90 | 
 91 | def main():
 92 |     """Main health check function"""
 93 |     checks = [
 94 |         ("Process", check_process),
 95 |         ("Python imports", check_python_imports),
 96 |         ("Log directory", check_log_directory),
 97 |         ("Environment", check_environment),
 98 |     ]
 99 | 
100 |     failed_checks = []
101 | 
102 |     for check_name, check_func in checks:
103 |         if not check_func():
104 |             failed_checks.append(check_name)
105 | 
106 |     if failed_checks:
107 |         print(f"Health check failed: {', '.join(failed_checks)}", file=sys.stderr)
108 |         sys.exit(1)
109 | 
110 |     print("Health check passed")
111 |     sys.exit(0)
112 | 
113 | 
114 | if __name__ == "__main__":
115 |     main()
116 | 
```

--------------------------------------------------------------------------------
/utils/env.py:
--------------------------------------------------------------------------------

```python
  1 | """Centralized environment variable access for Zen MCP Server."""
  2 | 
  3 | from __future__ import annotations
  4 | 
  5 | import os
  6 | from collections.abc import Mapping
  7 | from contextlib import contextmanager
  8 | from pathlib import Path
  9 | 
 10 | try:
 11 |     from dotenv import dotenv_values, load_dotenv
 12 | except ImportError:  # pragma: no cover - optional dependency
 13 |     dotenv_values = None  # type: ignore[assignment]
 14 |     load_dotenv = None  # type: ignore[assignment]
 15 | 
 16 | _PROJECT_ROOT = Path(__file__).resolve().parent.parent
 17 | _ENV_PATH = _PROJECT_ROOT / ".env"
 18 | 
 19 | _DOTENV_VALUES: dict[str, str | None] = {}
 20 | _FORCE_ENV_OVERRIDE = False
 21 | 
 22 | 
 23 | def _read_dotenv_values() -> dict[str, str | None]:
 24 |     if dotenv_values is not None and _ENV_PATH.exists():
 25 |         loaded = dotenv_values(_ENV_PATH)
 26 |         return dict(loaded)
 27 |     return {}
 28 | 
 29 | 
 30 | def _compute_force_override(values: Mapping[str, str | None]) -> bool:
 31 |     raw = (values.get("ZEN_MCP_FORCE_ENV_OVERRIDE") or "false").strip().lower()
 32 |     return raw == "true"
 33 | 
 34 | 
 35 | def reload_env(dotenv_mapping: Mapping[str, str | None] | None = None) -> None:
 36 |     """Reload .env values and recompute override semantics.
 37 | 
 38 |     Args:
 39 |         dotenv_mapping: Optional mapping used instead of reading the .env file.
 40 |             Intended for tests; when provided, load_dotenv is not invoked.
 41 |     """
 42 | 
 43 |     global _DOTENV_VALUES, _FORCE_ENV_OVERRIDE
 44 | 
 45 |     if dotenv_mapping is not None:
 46 |         _DOTENV_VALUES = dict(dotenv_mapping)
 47 |         _FORCE_ENV_OVERRIDE = _compute_force_override(_DOTENV_VALUES)
 48 |         return
 49 | 
 50 |     _DOTENV_VALUES = _read_dotenv_values()
 51 |     _FORCE_ENV_OVERRIDE = _compute_force_override(_DOTENV_VALUES)
 52 | 
 53 |     if load_dotenv is not None and _ENV_PATH.exists():
 54 |         load_dotenv(dotenv_path=_ENV_PATH, override=_FORCE_ENV_OVERRIDE)
 55 | 
 56 | 
 57 | reload_env()
 58 | 
 59 | 
 60 | def env_override_enabled() -> bool:
 61 |     """Return True when ZEN_MCP_FORCE_ENV_OVERRIDE is enabled via the .env file."""
 62 | 
 63 |     return _FORCE_ENV_OVERRIDE
 64 | 
 65 | 
 66 | def get_env(key: str, default: str | None = None) -> str | None:
 67 |     """Retrieve environment variables respecting ZEN_MCP_FORCE_ENV_OVERRIDE."""
 68 | 
 69 |     if env_override_enabled():
 70 |         if key in _DOTENV_VALUES:
 71 |             value = _DOTENV_VALUES[key]
 72 |             return value if value is not None else default
 73 |         return default
 74 | 
 75 |     return os.getenv(key, default)
 76 | 
 77 | 
 78 | def get_env_bool(key: str, default: bool = False) -> bool:
 79 |     """Boolean helper that respects override semantics."""
 80 | 
 81 |     raw_default = "true" if default else "false"
 82 |     raw_value = get_env(key, raw_default)
 83 |     return (raw_value or raw_default).strip().lower() == "true"
 84 | 
 85 | 
 86 | def get_all_env() -> dict[str, str | None]:
 87 |     """Expose the loaded .env mapping for diagnostics/logging."""
 88 | 
 89 |     return dict(_DOTENV_VALUES)
 90 | 
 91 | 
 92 | @contextmanager
 93 | def suppress_env_vars(*names: str):
 94 |     """Temporarily remove environment variables during the context.
 95 | 
 96 |     Args:
 97 |         names: Environment variable names to remove. Empty or falsy names are ignored.
 98 |     """
 99 | 
100 |     removed: dict[str, str] = {}
101 |     try:
102 |         for name in names:
103 |             if not name:
104 |                 continue
105 |             if name in os.environ:
106 |                 removed[name] = os.environ[name]
107 |                 del os.environ[name]
108 |         yield
109 |     finally:
110 |         for name, value in removed.items():
111 |             os.environ[name] = value
112 | 
```

--------------------------------------------------------------------------------
/tests/test_parse_model_option.py:
--------------------------------------------------------------------------------

```python
 1 | """Tests for parse_model_option function."""
 2 | 
 3 | from server import parse_model_option
 4 | 
 5 | 
 6 | class TestParseModelOption:
 7 |     """Test cases for model option parsing."""
 8 | 
 9 |     def test_openrouter_free_suffix_preserved(self):
10 |         """Test that OpenRouter :free suffix is preserved as part of model name."""
11 |         model, option = parse_model_option("openai/gpt-3.5-turbo:free")
12 |         assert model == "openai/gpt-3.5-turbo:free"
13 |         assert option is None
14 | 
15 |     def test_openrouter_beta_suffix_preserved(self):
16 |         """Test that OpenRouter :beta suffix is preserved as part of model name."""
17 |         model, option = parse_model_option("anthropic/claude-opus-4.1:beta")
18 |         assert model == "anthropic/claude-opus-4.1:beta"
19 |         assert option is None
20 | 
21 |     def test_openrouter_preview_suffix_preserved(self):
22 |         """Test that OpenRouter :preview suffix is preserved as part of model name."""
23 |         model, option = parse_model_option("google/gemini-pro:preview")
24 |         assert model == "google/gemini-pro:preview"
25 |         assert option is None
26 | 
27 |     def test_ollama_tag_parsed_as_option(self):
28 |         """Test that Ollama tags are parsed as options."""
29 |         model, option = parse_model_option("llama3.2:latest")
30 |         assert model == "llama3.2"
31 |         assert option == "latest"
32 | 
33 |     def test_consensus_stance_parsed_as_option(self):
34 |         """Test that consensus stances are parsed as options."""
35 |         model, option = parse_model_option("o3:for")
36 |         assert model == "o3"
37 |         assert option == "for"
38 | 
39 |         model, option = parse_model_option("gemini-2.5-pro:against")
40 |         assert model == "gemini-2.5-pro"
41 |         assert option == "against"
42 | 
43 |     def test_openrouter_unknown_suffix_parsed_as_option(self):
44 |         """Test that unknown suffixes on OpenRouter models are parsed as options."""
45 |         model, option = parse_model_option("openai/gpt-4:custom-tag")
46 |         assert model == "openai/gpt-4"
47 |         assert option == "custom-tag"
48 | 
49 |     def test_plain_model_name(self):
50 |         """Test plain model names without colons."""
51 |         model, option = parse_model_option("gpt-4")
52 |         assert model == "gpt-4"
53 |         assert option is None
54 | 
55 |     def test_url_not_parsed(self):
56 |         """Test that URLs are not parsed for options."""
57 |         model, option = parse_model_option("http://localhost:8080")
58 |         assert model == "http://localhost:8080"
59 |         assert option is None
60 | 
61 |     def test_whitespace_handling(self):
62 |         """Test that whitespace is properly stripped."""
63 |         model, option = parse_model_option("  openai/gpt-3.5-turbo:free  ")
64 |         assert model == "openai/gpt-3.5-turbo:free"
65 |         assert option is None
66 | 
67 |         model, option = parse_model_option("  llama3.2 : latest  ")
68 |         assert model == "llama3.2"
69 |         assert option == "latest"
70 | 
71 |     def test_case_insensitive_suffix_matching(self):
72 |         """Test that OpenRouter suffix matching is case-insensitive."""
73 |         model, option = parse_model_option("openai/gpt-3.5-turbo:FREE")
74 |         assert model == "openai/gpt-3.5-turbo:FREE"  # Original case preserved
75 |         assert option is None
76 | 
77 |         model, option = parse_model_option("openai/gpt-3.5-turbo:Free")
78 |         assert model == "openai/gpt-3.5-turbo:Free"  # Original case preserved
79 |         assert option is None
80 | 
```

--------------------------------------------------------------------------------
/docs/azure_openai.md:
--------------------------------------------------------------------------------

```markdown
 1 | # Azure OpenAI Configuration
 2 | 
 3 | Azure OpenAI support lets Zen MCP talk to GPT-4o, GPT-4.1, GPT-5, and o-series deployments that you expose through your Azure resource. This guide describes the configuration expected by the server: a couple of required environment variables plus a JSON manifest that lists every deployment you want to expose.
 4 | 
 5 | ## 1. Required Environment Variables
 6 | 
 7 | Set these entries in your `.env` (or MCP `env` block).
 8 | 
 9 | ```bash
10 | AZURE_OPENAI_API_KEY=your_azure_openai_key_here
11 | AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
12 | # AZURE_OPENAI_API_VERSION=2024-02-15-preview
13 | ```
14 | 
15 | Without the key and endpoint the provider is skipped entirely. Leave the key blank only if the endpoint truly allows anonymous access (rare for Azure).
16 | 
17 | ## 2. Define Deployments in `conf/azure_models.json`
18 | 
19 | Azure models live in `conf/azure_models.json` (or the file pointed to by `AZURE_MODELS_CONFIG_PATH`). Each entry follows the same schema as [`ModelCapabilities`](../providers/shared/model_capabilities.py) with one additional required key: `deployment`. This field must exactly match the deployment name shown in the Azure Portal (for example `prod-gpt4o`). The provider routes requests by that value, so omitting it or using the wrong name will cause the server to skip the model. You can also opt into extra behaviour per model—for example set `use_openai_response_api` to `true` when an Azure deployment requires the `/responses` endpoint (O-series reasoning models), or leave it unset for standard chat completions.
20 | 
21 | ```json
22 | {
23 |   "models": [
24 |     {
25 |       "model_name": "gpt-4o",
26 |       "deployment": "prod-gpt4o",
27 |       "friendly_name": "Azure GPT-4o EU",
28 |       "intelligence_score": 18,
29 |       "context_window": 600000,
30 |       "max_output_tokens": 128000,
31 |       "supports_temperature": false,
32 |       "temperature_constraint": "fixed",
33 |       "aliases": ["gpt4o-eu"],
34 |       "use_openai_response_api": false
35 |     }
36 |   ]
37 | }
38 | ```
39 | 
40 | Tips:
41 | 
42 | - Copy `conf/azure_models.json` into your repo and commit it, or point `AZURE_MODELS_CONFIG_PATH` at a custom path.
43 | - Add one object per deployment. Aliases are optional but help when you want short names like `gpt4o-eu`.
44 | - All capability fields are optional except `model_name`, `deployment`, and `friendly_name`. Anything you omit falls back to conservative defaults.
45 | - Set `use_openai_response_api` to `true` for models that must call Azure's `/responses` endpoint (for example O3 deployments). Leave it unset for standard chat completions.
46 | 
47 | ## 3. Optional Restrictions
48 | 
49 | Use `AZURE_OPENAI_ALLOWED_MODELS` to limit which Azure models Claude can access:
50 | 
51 | ```bash
52 | AZURE_OPENAI_ALLOWED_MODELS=gpt-4o,gpt-4o-mini
53 | ```
54 | 
55 | Aliases are matched case-insensitively.
56 | 
57 | ## 4. Quick Checklist
58 | 
59 | - [ ] `AZURE_OPENAI_API_KEY` and `AZURE_OPENAI_ENDPOINT` are set
60 | - [ ] `conf/azure_models.json` (or the file referenced by `AZURE_MODELS_CONFIG_PATH`) lists every deployment with the desired metadata
61 | - [ ] Optional: `AZURE_OPENAI_ALLOWED_MODELS` to restrict usage
62 | - [ ] Restart `./run-server.sh` and run `listmodels` to confirm the Azure entries appear with the expected metadata
63 | 
64 | See also: [`docs/adding_providers.md`](adding_providers.md) for the full provider architecture and [README (Provider Configuration)](../README.md#provider-configuration) for quick-start environment snippets.
65 | 
```

--------------------------------------------------------------------------------
/simulator_tests/test_basic_conversation.py:
--------------------------------------------------------------------------------

```python
 1 | #!/usr/bin/env python3
 2 | """
 3 | Basic Conversation Flow Test
 4 | 
 5 | Tests basic conversation continuity with the chat tool, including:
 6 | - Initial chat with file analysis
 7 | - Continuing conversation with same file (deduplication)
 8 | - Adding additional files to ongoing conversation
 9 | """
10 | 
11 | from .base_test import BaseSimulatorTest
12 | 
13 | 
14 | class BasicConversationTest(BaseSimulatorTest):
15 |     """Test basic conversation flow with chat tool"""
16 | 
17 |     @property
18 |     def test_name(self) -> str:
19 |         return "basic_conversation"
20 | 
21 |     @property
22 |     def test_description(self) -> str:
23 |         return "Basic conversation flow with chat tool"
24 | 
25 |     def run_test(self) -> bool:
26 |         """Test basic conversation flow with chat tool"""
27 |         try:
28 |             self.logger.info("Test: Basic conversation flow")
29 | 
30 |             # Setup test files
31 |             self.setup_test_files()
32 | 
33 |             # Initial chat tool call with file
34 |             self.logger.info("  1.1: Initial chat with file analysis")
35 |             response1, continuation_id = self.call_mcp_tool(
36 |                 "chat",
37 |                 {
38 |                     "prompt": "Please use low thinking mode. Analyze this Python code and explain what it does",
39 |                     "absolute_file_paths": [self.test_files["python"]],
40 |                     "model": "flash",
41 |                 },
42 |             )
43 | 
44 |             if not response1 or not continuation_id:
45 |                 self.logger.error("Failed to get initial response with continuation_id")
46 |                 return False
47 | 
48 |             self.logger.info(f"  ✅ Got continuation_id: {continuation_id}")
49 | 
50 |             # Continue conversation with same file (should be deduplicated)
51 |             self.logger.info("  1.2: Continue conversation with same file")
52 |             response2, _ = self.call_mcp_tool(
53 |                 "chat",
54 |                 {
55 |                     "prompt": "Please use low thinking mode. Now focus on the Calculator class specifically. Are there any improvements you'd suggest?",
56 |                     "absolute_file_paths": [self.test_files["python"]],  # Same file - should be deduplicated
57 |                     "continuation_id": continuation_id,
58 |                     "model": "flash",
59 |                 },
60 |             )
61 | 
62 |             if not response2:
63 |                 self.logger.error("Failed to continue conversation")
64 |                 return False
65 | 
66 |             # Continue with additional file
67 |             self.logger.info("  1.3: Continue conversation with additional file")
68 |             response3, _ = self.call_mcp_tool(
69 |                 "chat",
70 |                 {
71 |                     "prompt": "Please use low thinking mode. Now also analyze this configuration file and see how it might relate to the Python code",
72 |                     "absolute_file_paths": [self.test_files["python"], self.test_files["config"]],
73 |                     "continuation_id": continuation_id,
74 |                     "model": "flash",
75 |                 },
76 |             )
77 | 
78 |             if not response3:
79 |                 self.logger.error("Failed to continue with additional file")
80 |                 return False
81 | 
82 |             self.logger.info("  ✅ Basic conversation flow working")
83 |             return True
84 | 
85 |         except Exception as e:
86 |             self.logger.error(f"Basic conversation flow test failed: {e}")
87 |             return False
88 |         finally:
89 |             self.cleanup_test_files()
90 | 
```

--------------------------------------------------------------------------------
/clink/models.py:
--------------------------------------------------------------------------------

```python
  1 | """Pydantic models for clink configuration and runtime structures."""
  2 | 
  3 | from __future__ import annotations
  4 | 
  5 | from pathlib import Path
  6 | from typing import Any
  7 | 
  8 | from pydantic import BaseModel, Field, PositiveInt, field_validator
  9 | 
 10 | 
 11 | class OutputCaptureConfig(BaseModel):
 12 |     """Optional configuration for CLIs that write output to disk."""
 13 | 
 14 |     flag_template: str = Field(..., description="Template used to inject the output path, e.g. '--output {path}'.")
 15 |     cleanup: bool = Field(
 16 |         default=True,
 17 |         description="Whether the temporary file should be removed after reading.",
 18 |     )
 19 | 
 20 | 
 21 | class CLIRoleConfig(BaseModel):
 22 |     """Role-specific configuration loaded from JSON manifests."""
 23 | 
 24 |     prompt_path: str | None = Field(
 25 |         default=None,
 26 |         description="Path to the prompt file that seeds this role.",
 27 |     )
 28 |     role_args: list[str] = Field(default_factory=list)
 29 |     description: str | None = Field(default=None)
 30 | 
 31 |     @field_validator("role_args", mode="before")
 32 |     @classmethod
 33 |     def _ensure_list(cls, value: Any) -> list[str]:
 34 |         if value is None:
 35 |             return []
 36 |         if isinstance(value, list):
 37 |             return [str(item) for item in value]
 38 |         if isinstance(value, str):
 39 |             return [value]
 40 |         raise TypeError("role_args must be a list of strings or a single string")
 41 | 
 42 | 
 43 | class CLIClientConfig(BaseModel):
 44 |     """Raw CLI client configuration before internal defaults are applied."""
 45 | 
 46 |     name: str
 47 |     command: str | None = None
 48 |     working_dir: str | None = None
 49 |     additional_args: list[str] = Field(default_factory=list)
 50 |     env: dict[str, str] = Field(default_factory=dict)
 51 |     timeout_seconds: PositiveInt | None = Field(default=None)
 52 |     roles: dict[str, CLIRoleConfig] = Field(default_factory=dict)
 53 |     output_to_file: OutputCaptureConfig | None = None
 54 | 
 55 |     @field_validator("additional_args", mode="before")
 56 |     @classmethod
 57 |     def _ensure_args_list(cls, value: Any) -> list[str]:
 58 |         if value is None:
 59 |             return []
 60 |         if isinstance(value, list):
 61 |             return [str(item) for item in value]
 62 |         if isinstance(value, str):
 63 |             return [value]
 64 |         raise TypeError("additional_args must be a list of strings or a single string")
 65 | 
 66 | 
 67 | class ResolvedCLIRole(BaseModel):
 68 |     """Runtime representation of a CLI role with resolved prompt path."""
 69 | 
 70 |     name: str
 71 |     prompt_path: Path
 72 |     role_args: list[str] = Field(default_factory=list)
 73 |     description: str | None = None
 74 | 
 75 | 
 76 | class ResolvedCLIClient(BaseModel):
 77 |     """Runtime configuration after merging defaults and validating paths."""
 78 | 
 79 |     name: str
 80 |     executable: list[str]
 81 |     working_dir: Path | None
 82 |     internal_args: list[str] = Field(default_factory=list)
 83 |     config_args: list[str] = Field(default_factory=list)
 84 |     env: dict[str, str] = Field(default_factory=dict)
 85 |     timeout_seconds: int
 86 |     parser: str
 87 |     runner: str | None = None
 88 |     roles: dict[str, ResolvedCLIRole]
 89 |     output_to_file: OutputCaptureConfig | None = None
 90 | 
 91 |     def list_roles(self) -> list[str]:
 92 |         return list(self.roles.keys())
 93 | 
 94 |     def get_role(self, role_name: str | None) -> ResolvedCLIRole:
 95 |         key = role_name or "default"
 96 |         if key not in self.roles:
 97 |             available = ", ".join(sorted(self.roles.keys()))
 98 |             raise KeyError(f"Role '{role_name}' not configured for CLI '{self.name}'. Available roles: {available}")
 99 |         return self.roles[key]
100 | 
```

--------------------------------------------------------------------------------
/tests/test_debug.py:
--------------------------------------------------------------------------------

```python
 1 | """
 2 | Tests for the debug tool using new WorkflowTool architecture.
 3 | """
 4 | 
 5 | from tools.debug import DebugInvestigationRequest, DebugIssueTool
 6 | from tools.models import ToolModelCategory
 7 | 
 8 | 
 9 | class TestDebugTool:
10 |     """Test suite for DebugIssueTool using new WorkflowTool architecture."""
11 | 
12 |     def test_tool_metadata(self):
13 |         """Test basic tool metadata and configuration."""
14 |         tool = DebugIssueTool()
15 | 
16 |         assert tool.get_name() == "debug"
17 |         assert "debugging and root cause analysis" in tool.get_description()
18 |         assert tool.get_default_temperature() == 0.2  # TEMPERATURE_ANALYTICAL
19 |         assert tool.get_model_category() == ToolModelCategory.EXTENDED_REASONING
20 |         assert tool.requires_model() is True
21 | 
22 |     def test_request_validation(self):
23 |         """Test Pydantic request model validation."""
24 |         # Valid investigation step request
25 |         step_request = DebugInvestigationRequest(
26 |             step="Investigating null pointer exception in UserService",
27 |             step_number=1,
28 |             total_steps=3,
29 |             next_step_required=True,
30 |             findings="Found potential null reference in user authentication flow",
31 |             files_checked=["/src/UserService.java"],
32 |             relevant_files=["/src/UserService.java"],
33 |             relevant_context=["authenticate", "validateUser"],
34 |             confidence="medium",
35 |             hypothesis="Null pointer occurs when user object is not properly validated",
36 |         )
37 | 
38 |         assert step_request.step_number == 1
39 |         assert step_request.confidence == "medium"
40 |         assert len(step_request.relevant_context) == 2
41 | 
42 |     def test_input_schema_generation(self):
43 |         """Test that input schema is generated correctly."""
44 |         tool = DebugIssueTool()
45 |         schema = tool.get_input_schema()
46 | 
47 |         # Verify required investigation fields are present
48 |         assert "step" in schema["properties"]
49 |         assert "step_number" in schema["properties"]
50 |         assert "total_steps" in schema["properties"]
51 |         assert "next_step_required" in schema["properties"]
52 |         assert "findings" in schema["properties"]
53 |         assert "relevant_context" in schema["properties"]
54 | 
55 |         # Verify field types
56 |         assert schema["properties"]["step"]["type"] == "string"
57 |         assert schema["properties"]["step_number"]["type"] == "integer"
58 |         assert schema["properties"]["next_step_required"]["type"] == "boolean"
59 |         assert schema["properties"]["relevant_context"]["type"] == "array"
60 | 
61 |     def test_model_category_for_debugging(self):
62 |         """Test that debug tool correctly identifies as extended reasoning category."""
63 |         tool = DebugIssueTool()
64 |         assert tool.get_model_category() == ToolModelCategory.EXTENDED_REASONING
65 | 
66 |     def test_relevant_context_handling(self):
67 |         """Test that relevant_context is handled correctly."""
68 |         request = DebugInvestigationRequest(
69 |             step="Test investigation",
70 |             step_number=1,
71 |             total_steps=2,
72 |             next_step_required=True,
73 |             findings="Test findings",
74 |             relevant_context=["method1", "method2"],
75 |         )
76 | 
77 |         # Should have relevant_context directly
78 |         assert request.relevant_context == ["method1", "method2"]
79 | 
80 |         # Test step data preparation
81 |         tool = DebugIssueTool()
82 |         step_data = tool.prepare_step_data(request)
83 |         assert step_data["relevant_context"] == ["method1", "method2"]
84 | 
```

--------------------------------------------------------------------------------
/tests/test_clink_claude_agent.py:
--------------------------------------------------------------------------------

```python
  1 | import asyncio
  2 | import json
  3 | import shutil
  4 | from pathlib import Path
  5 | 
  6 | import pytest
  7 | 
  8 | from clink.agents.base import CLIAgentError
  9 | from clink.agents.claude import ClaudeAgent
 10 | from clink.models import ResolvedCLIClient, ResolvedCLIRole
 11 | 
 12 | 
 13 | class DummyProcess:
 14 |     def __init__(self, *, stdout: bytes = b"", stderr: bytes = b"", returncode: int = 0):
 15 |         self._stdout = stdout
 16 |         self._stderr = stderr
 17 |         self.returncode = returncode
 18 |         self.stdin_data: bytes | None = None
 19 | 
 20 |     async def communicate(self, input_data):
 21 |         self.stdin_data = input_data
 22 |         return self._stdout, self._stderr
 23 | 
 24 | 
 25 | @pytest.fixture()
 26 | def claude_agent():
 27 |     prompt_path = Path("systemprompts/clink/default.txt").resolve()
 28 |     role = ResolvedCLIRole(name="default", prompt_path=prompt_path, role_args=[])
 29 |     client = ResolvedCLIClient(
 30 |         name="claude",
 31 |         executable=["claude"],
 32 |         internal_args=["--print", "--output-format", "json"],
 33 |         config_args=["--permission-mode", "acceptEdits"],
 34 |         env={},
 35 |         timeout_seconds=30,
 36 |         parser="claude_json",
 37 |         runner="claude",
 38 |         roles={"default": role},
 39 |         output_to_file=None,
 40 |         working_dir=None,
 41 |     )
 42 |     return ClaudeAgent(client), role
 43 | 
 44 | 
 45 | async def _run_agent_with_process(monkeypatch, agent, role, process, *, system_prompt="System prompt"):
 46 |     async def fake_create_subprocess_exec(*_args, **_kwargs):
 47 |         return process
 48 | 
 49 |     def fake_which(executable_name):
 50 |         return f"/usr/bin/{executable_name}"
 51 | 
 52 |     monkeypatch.setattr(asyncio, "create_subprocess_exec", fake_create_subprocess_exec)
 53 |     monkeypatch.setattr(shutil, "which", fake_which)
 54 | 
 55 |     return await agent.run(
 56 |         role=role,
 57 |         prompt="Respond with 42",
 58 |         system_prompt=system_prompt,
 59 |         files=[],
 60 |         images=[],
 61 |     )
 62 | 
 63 | 
 64 | @pytest.mark.asyncio
 65 | async def test_claude_agent_injects_system_prompt(monkeypatch, claude_agent):
 66 |     agent, role = claude_agent
 67 |     stdout_payload = json.dumps(
 68 |         {
 69 |             "type": "result",
 70 |             "subtype": "success",
 71 |             "is_error": False,
 72 |             "result": "42",
 73 |         }
 74 |     ).encode()
 75 |     process = DummyProcess(stdout=stdout_payload)
 76 | 
 77 |     result = await _run_agent_with_process(monkeypatch, agent, role, process)
 78 | 
 79 |     assert "--append-system-prompt" in result.sanitized_command
 80 |     idx = result.sanitized_command.index("--append-system-prompt")
 81 |     assert result.sanitized_command[idx + 1] == "System prompt"
 82 |     assert process.stdin_data.decode().startswith("Respond with 42")
 83 | 
 84 | 
 85 | @pytest.mark.asyncio
 86 | async def test_claude_agent_recovers_error_payload(monkeypatch, claude_agent):
 87 |     agent, role = claude_agent
 88 |     stdout_payload = json.dumps(
 89 |         {
 90 |             "type": "result",
 91 |             "subtype": "success",
 92 |             "is_error": True,
 93 |             "result": "API Error",
 94 |         }
 95 |     ).encode()
 96 |     process = DummyProcess(stdout=stdout_payload, returncode=2)
 97 | 
 98 |     result = await _run_agent_with_process(monkeypatch, agent, role, process)
 99 | 
100 |     assert result.returncode == 2
101 |     assert result.parsed.content == "API Error"
102 |     assert result.parsed.metadata["is_error"] is True
103 | 
104 | 
105 | @pytest.mark.asyncio
106 | async def test_claude_agent_propagates_unparseable_output(monkeypatch, claude_agent):
107 |     agent, role = claude_agent
108 |     process = DummyProcess(stdout=b"", returncode=1)
109 | 
110 |     with pytest.raises(CLIAgentError):
111 |         await _run_agent_with_process(monkeypatch, agent, role, process)
112 | 
```

--------------------------------------------------------------------------------
/systemprompts/thinkdeep_prompt.py:
--------------------------------------------------------------------------------

```python
 1 | """
 2 | ThinkDeep tool system prompt
 3 | """
 4 | 
 5 | THINKDEEP_PROMPT = """
 6 | ROLE
 7 | You are a senior engineering collaborator working alongside the agent on complex software problems. The agent will send you
 8 | content—analysis, prompts, questions, ideas, or theories—to deepen, validate, or extend with rigor and clarity.
 9 | 
10 | CRITICAL LINE NUMBER INSTRUCTIONS
11 | Code is presented with line number markers "LINE│ code". These markers are for reference ONLY and MUST NOT be
12 | included in any code you generate. Always reference specific line numbers in your replies in order to locate
13 | exact positions if needed to point to exact locations. Include a very short code excerpt alongside for clarity.
14 | Include context_start_text and context_end_text as backup references. Never include "LINE│" markers in generated code
15 | snippets.
16 | 
17 | IF MORE INFORMATION IS NEEDED
18 | If you need additional context (e.g., related files, system architecture, requirements, code snippets) to provide
19 | thorough analysis, you MUST ONLY respond with this exact JSON (and nothing else). Do NOT ask for the same file you've
20 | been provided unless for some reason its content is missing or incomplete:
21 | {
22 |   "status": "files_required_to_continue",
23 |   "mandatory_instructions": "<your critical instructions for the agent>",
24 |   "files_needed": ["[file name here]", "[or some folder/]"]
25 | }
26 | 
27 | GUIDELINES
28 | 1. Begin with context analysis: identify tech stack, languages, frameworks, and project constraints.
29 | 2. Stay on scope: avoid speculative, over-engineered, or oversized ideas; keep suggestions practical and grounded.
30 | 3. Challenge and enrich: find gaps, question assumptions, and surface hidden complexities or risks.
31 | 4. Provide actionable next steps: offer specific advice, trade-offs, and implementation strategies.
32 | 5. Offer multiple viable strategies ONLY WHEN clearly beneficial within the current environment.
33 | 6. Suggest creative solutions that operate within real-world constraints, and avoid proposing major shifts unless truly warranted.
34 | 7. Use concise, technical language; assume an experienced engineering audience.
35 | 8. Remember: Overengineering is an anti-pattern — avoid suggesting solutions that introduce unnecessary abstraction,
36 |    indirection, or configuration in anticipation of complexity that does not yet exist, is not clearly justified by the
37 |    current scope, and may not arise in the foreseeable future.
38 | 
39 | KEY FOCUS AREAS (apply when relevant)
40 | - Architecture & Design: modularity, boundaries, abstraction layers, dependencies
41 | - Performance & Scalability: algorithmic efficiency, concurrency, caching, bottlenecks
42 | - Security & Safety: validation, authentication/authorization, error handling, vulnerabilities
43 | - Quality & Maintainability: readability, testing, monitoring, refactoring
44 | - Integration & Deployment: ONLY IF APPLICABLE TO THE QUESTION - external systems, compatibility, configuration, operational concerns
45 | 
46 | EVALUATION
47 | Your response will be reviewed by the agent before any decision is made. Your goal is to practically extend the agent's thinking,
48 | surface blind spots, and refine options—not to deliver final answers in isolation.
49 | 
50 | REMINDERS
51 | - Ground all insights in the current project's architecture, limitations, and goals.
52 | - If further context is needed, request it via the clarification JSON—nothing else.
53 | - Prioritize depth over breadth; propose alternatives ONLY if they clearly add value and improve the current approach.
54 | - Be the ideal development partner—rigorous, focused, and fluent in real-world software trade-offs.
55 | """
56 | 
```

--------------------------------------------------------------------------------
/tests/test_server.py:
--------------------------------------------------------------------------------

```python
  1 | """
  2 | Tests for the main server functionality
  3 | """
  4 | 
  5 | import pytest
  6 | 
  7 | from server import handle_call_tool
  8 | 
  9 | 
 10 | class TestServerTools:
 11 |     """Test server tool handling"""
 12 | 
 13 |     @pytest.mark.asyncio
 14 |     async def test_handle_call_tool_unknown(self):
 15 |         """Test calling an unknown tool"""
 16 |         result = await handle_call_tool("unknown_tool", {})
 17 |         assert len(result) == 1
 18 |         assert "Unknown tool: unknown_tool" in result[0].text
 19 | 
 20 |     @pytest.mark.asyncio
 21 |     async def test_handle_chat(self):
 22 |         """Test chat functionality using real integration testing"""
 23 |         import importlib
 24 |         import os
 25 | 
 26 |         # Set test environment
 27 |         os.environ["PYTEST_CURRENT_TEST"] = "test"
 28 | 
 29 |         # Save original environment
 30 |         original_env = {
 31 |             "OPENAI_API_KEY": os.environ.get("OPENAI_API_KEY"),
 32 |             "DEFAULT_MODEL": os.environ.get("DEFAULT_MODEL"),
 33 |         }
 34 | 
 35 |         try:
 36 |             # Set up environment for real provider resolution
 37 |             os.environ["OPENAI_API_KEY"] = "sk-test-key-server-chat-test-not-real"
 38 |             os.environ["DEFAULT_MODEL"] = "o3-mini"
 39 | 
 40 |             # Clear other provider keys to isolate to OpenAI
 41 |             for key in ["GEMINI_API_KEY", "XAI_API_KEY", "OPENROUTER_API_KEY"]:
 42 |                 os.environ.pop(key, None)
 43 | 
 44 |             # Reload config and clear registry
 45 |             import config
 46 | 
 47 |             importlib.reload(config)
 48 |             from providers.registry import ModelProviderRegistry
 49 | 
 50 |             ModelProviderRegistry._instance = None
 51 | 
 52 |             # Test with real provider resolution
 53 |             try:
 54 |                 result = await handle_call_tool("chat", {"prompt": "Hello Gemini", "model": "o3-mini"})
 55 | 
 56 |                 # If we get here, check the response format
 57 |                 assert len(result) == 1
 58 |                 # Parse JSON response
 59 |                 import json
 60 | 
 61 |                 response_data = json.loads(result[0].text)
 62 |                 assert "status" in response_data
 63 | 
 64 |             except Exception as e:
 65 |                 # Expected: API call will fail with fake key
 66 |                 error_msg = str(e)
 67 |                 # Should NOT be a mock-related error
 68 |                 assert "MagicMock" not in error_msg
 69 |                 assert "'<' not supported between instances" not in error_msg
 70 | 
 71 |                 # Should be a real provider error
 72 |                 assert any(
 73 |                     phrase in error_msg
 74 |                     for phrase in ["API", "key", "authentication", "provider", "network", "connection"]
 75 |                 )
 76 | 
 77 |         finally:
 78 |             # Restore environment
 79 |             for key, value in original_env.items():
 80 |                 if value is not None:
 81 |                     os.environ[key] = value
 82 |                 else:
 83 |                     os.environ.pop(key, None)
 84 | 
 85 |             # Reload config and clear registry
 86 |             importlib.reload(config)
 87 |             ModelProviderRegistry._instance = None
 88 | 
 89 |     @pytest.mark.asyncio
 90 |     async def test_handle_version(self):
 91 |         """Test getting version info"""
 92 |         result = await handle_call_tool("version", {})
 93 |         assert len(result) == 1
 94 | 
 95 |         response = result[0].text
 96 |         # Parse the JSON response
 97 |         import json
 98 | 
 99 |         data = json.loads(response)
100 |         assert data["status"] == "success"
101 |         content = data["content"]
102 | 
103 |         # Check for expected content in the markdown output
104 |         assert "# Zen MCP Server Version" in content
105 |         assert "## Server Information" in content
106 |         assert "## Configuration" in content
107 |         assert "Current Version" in content
108 | 
```

--------------------------------------------------------------------------------
/docs/tools/planner.md:
--------------------------------------------------------------------------------

```markdown
 1 | # Planner Tool - Interactive Step-by-Step Planning
 2 | 
 3 | **Break down complex projects into manageable, structured plans through step-by-step thinking**
 4 | 
 5 | The `planner` tool helps you break down complex ideas, problems, or projects into multiple manageable steps. Perfect for system design, migration strategies, 
 6 | architectural planning, and feature development with branching and revision capabilities.
 7 | 
 8 | ## How It Works
 9 | 
10 | The planner tool enables step-by-step thinking with incremental plan building:
11 | 
12 | 1. **Start with step 1**: Describe the task or problem to plan
13 | 2. **Continue building**: Add subsequent steps, building the plan piece by piece  
14 | 3. **Revise when needed**: Update earlier decisions as new insights emerge
15 | 4. **Branch alternatives**: Explore different approaches when multiple options exist
16 | 5. **Continue across sessions**: Resume planning later with full context
17 | 
18 | ## Example Prompts
19 | 
20 | #### Pro Tip
21 | Claude supports `sub-tasks` where it will spawn and run separate background tasks. You can ask Claude to 
22 | run Zen's planner with two separate ideas. Then when it's done, use Zen's `consensus` tool to pass the entire
23 | plan and get expert perspective from two powerful AI models on which one to work on first! Like performing **AB** testing
24 | in one-go without the wait!
25 | 
26 | ```
27 | Create two separate sub-tasks: in one, using planner tool show me how to add natural language support 
28 | to my cooking app. In the other sub-task, use planner to plan how to add support for voice notes to my cooking app. 
29 | Once done, start a consensus by sharing both plans to o3 and flash to give me the final verdict. Which one do 
30 | I implement first?
31 | ```
32 | 
33 | ```
34 | Use zen's planner and show me how to add real-time notifications to our mobile app
35 | ```
36 | 
37 | ```
38 | Using the planner tool, show me how to add CoreData sync to my app, include any sub-steps
39 | ```
40 | 
41 | ## Key Features
42 | 
43 | - **Step-by-step breakdown**: Build plans incrementally with full context awareness
44 | - **Branching support**: Explore alternative approaches when needed  
45 | - **Revision capabilities**: Update earlier decisions as new insights emerge
46 | - **Multi-session continuation**: Resume planning across multiple sessions with context
47 | - **Dynamic adjustment**: Modify step count and approach as planning progresses
48 | - **Visual presentation**: ASCII charts, diagrams, and structured formatting
49 | - **Professional output**: Clean, structured plans without emojis or time estimates
50 | 
51 | ## More Examples
52 | 
53 | ```
54 | Using planner, plan the architecture for a new real-time chat system with 100k concurrent users
55 | ```
56 | 
57 | ```
58 | Create a plan using zen for migrating our React app from JavaScript to TypeScript
59 | ```
60 | 
61 | ```
62 | Develop a plan using zen for implementing CI/CD pipelines across our development teams
63 | ```
64 | 
65 | ## Best Practices
66 | 
67 | - **Start broad, then narrow**: Begin with high-level strategy, then add implementation details
68 | - **Include constraints**: Consider technical, organizational, and resource limitations
69 | - **Plan for validation**: Include testing and verification steps
70 | - **Think about dependencies**: Identify what needs to happen before each step
71 | - **Consider alternatives**: Note when multiple approaches are viable
72 | - **Enable continuation**: Use continuation_id for multi-session planning
73 | 
74 | ## Continue With a New Plan
75 | 
76 | Like all other tools in Zen, you can `continue` with a new plan using the output from a previous plan by simply saying
77 | 
78 | ```
79 | Continue with zen's consensus tool and find out what o3:for and flash:against think of the plan 
80 | ```
81 | 
82 | You can mix and match and take one output and feed it into another, continuing from where you left off using a different 
83 | tool / model combination.
```

--------------------------------------------------------------------------------
/tests/test_issue_245_simple.py:
--------------------------------------------------------------------------------

```python
 1 | """
 2 | Simple test to verify GitHub issue #245 is fixed.
 3 | 
 4 | Issue: Custom OpenAI models (gpt-5, o3) use temperature despite the config having supports_temperature: false
 5 | """
 6 | 
 7 | from unittest.mock import Mock, patch
 8 | 
 9 | from providers.openai import OpenAIModelProvider
10 | 
11 | 
12 | def test_issue_245_custom_openai_temperature_ignored():
13 |     """Test that reproduces and validates the fix for issue #245."""
14 | 
15 |     with patch("utils.model_restrictions.get_restriction_service") as mock_restriction:
16 |         with patch("providers.openai_compatible.OpenAI") as mock_openai:
17 |             with patch("providers.registries.openrouter.OpenRouterModelRegistry") as mock_registry_class:
18 | 
19 |                 # Mock restriction service
20 |                 mock_service = Mock()
21 |                 mock_service.is_allowed.return_value = True
22 |                 mock_restriction.return_value = mock_service
23 | 
24 |                 # Mock OpenAI client
25 |                 mock_client = Mock()
26 |                 mock_openai.return_value = mock_client
27 |                 mock_response = Mock()
28 |                 mock_response.choices = [Mock()]
29 |                 mock_response.choices[0].message.content = "Test response"
30 |                 mock_response.choices[0].finish_reason = "stop"
31 |                 mock_response.model = "gpt-5-2025-08-07"
32 |                 mock_response.id = "test"
33 |                 mock_response.created = 123
34 |                 mock_response.usage = Mock()
35 |                 mock_response.usage.prompt_tokens = 10
36 |                 mock_response.usage.completion_tokens = 5
37 |                 mock_response.usage.total_tokens = 15
38 |                 mock_client.chat.completions.create.return_value = mock_response
39 | 
40 |                 # Mock registry with user's custom config (the issue scenario)
41 |                 mock_registry = Mock()
42 |                 mock_registry_class.return_value = mock_registry
43 | 
44 |                 from providers.shared import ModelCapabilities, ProviderType, TemperatureConstraint
45 | 
46 |                 # This is what the user configured in their custom_models.json
47 |                 custom_config = ModelCapabilities(
48 |                     provider=ProviderType.OPENAI,
49 |                     model_name="gpt-5-2025-08-07",
50 |                     friendly_name="Custom GPT-5",
51 |                     context_window=400000,
52 |                     max_output_tokens=128000,
53 |                     supports_extended_thinking=True,
54 |                     supports_json_mode=True,
55 |                     supports_system_prompts=True,
56 |                     supports_streaming=True,
57 |                     supports_function_calling=True,
58 |                     supports_temperature=False,  # User set this to false!
59 |                     temperature_constraint=TemperatureConstraint.create("fixed"),
60 |                     supports_images=True,
61 |                     max_image_size_mb=20.0,
62 |                     description="Custom OpenAI GPT-5",
63 |                 )
64 |                 mock_registry.get_model_config.return_value = custom_config
65 | 
66 |                 # Create provider and test
67 |                 provider = OpenAIModelProvider(api_key="test-key")
68 |                 provider.validate_model_name = lambda name: True
69 | 
70 |                 # This is what was causing the 400 error before the fix
71 |                 provider.generate_content(
72 |                     prompt="Test", model_name="gpt-5-2025-08-07", temperature=0.2  # This should be ignored!
73 |                 )
74 | 
75 |                 # Verify the fix: NO temperature should be sent to the API
76 |                 call_kwargs = mock_client.chat.completions.create.call_args[1]
77 |                 assert "temperature" not in call_kwargs, "Fix failed: temperature still being sent!"
78 | 
```

--------------------------------------------------------------------------------
/.github/workflows/docker-release.yml:
--------------------------------------------------------------------------------

```yaml
  1 | name: Docker Release Build
  2 | 
  3 | on:
  4 |   release:
  5 |     types: [published]
  6 |   workflow_dispatch:
  7 |     inputs:
  8 |       tag:
  9 |         description: 'Tag to build (leave empty for latest release)'
 10 |         required: false
 11 |         type: string
 12 | 
 13 | permissions:
 14 |   contents: read
 15 |   packages: write
 16 | 
 17 | jobs:
 18 |   docker:
 19 |     name: Build and Push Docker Image
 20 |     runs-on: ubuntu-latest
 21 |     
 22 |     steps:
 23 |       - name: Checkout
 24 |         uses: actions/checkout@v4
 25 |         with:
 26 |           # If triggered by workflow_dispatch with a tag, checkout that tag
 27 |           ref: ${{ inputs.tag || github.event.release.tag_name }}
 28 | 
 29 |       - name: Set up Docker Buildx
 30 |         uses: docker/setup-buildx-action@v3
 31 | 
 32 |       - name: Login to GitHub Container Registry
 33 |         uses: docker/login-action@v3
 34 |         with:
 35 |           registry: ghcr.io
 36 |           username: ${{ github.actor }}
 37 |           password: ${{ secrets.GITHUB_TOKEN }}
 38 | 
 39 |       - name: Extract metadata
 40 |         id: meta
 41 |         uses: docker/metadata-action@v5
 42 |         with:
 43 |           images: ghcr.io/${{ github.repository }}
 44 |           tags: |
 45 |             # Tag with the release version
 46 |             type=semver,pattern={{version}},value=${{ inputs.tag || github.event.release.tag_name }}
 47 |             type=semver,pattern={{major}}.{{minor}},value=${{ inputs.tag || github.event.release.tag_name }}
 48 |             type=semver,pattern={{major}},value=${{ inputs.tag || github.event.release.tag_name }}
 49 |             # Also tag as latest for the most recent release
 50 |             type=raw,value=latest,enable={{is_default_branch}}
 51 | 
 52 |       - name: Build and push Docker image
 53 |         uses: docker/build-push-action@v5
 54 |         with:
 55 |           context: .
 56 |           platforms: linux/amd64,linux/arm64
 57 |           push: true
 58 |           tags: ${{ steps.meta.outputs.tags }}
 59 |           labels: ${{ steps.meta.outputs.labels }}
 60 |           cache-from: type=gha
 61 |           cache-to: type=gha,mode=max
 62 | 
 63 |       - name: Update release with Docker info
 64 |         if: github.event_name == 'release'
 65 |         run: |
 66 |           RELEASE_TAG="${{ github.event.release.tag_name }}"
 67 |           DOCKER_TAGS=$(echo "${{ steps.meta.outputs.tags }}" | tr '\n' ' ')
 68 |           
 69 |           # Add Docker information to the release
 70 |           gh release edit "$RELEASE_TAG" --notes-file - << EOF
 71 |           ${{ github.event.release.body }}
 72 |           
 73 |           ---
 74 |           
 75 |           ## 🐳 Docker Images
 76 |           
 77 |           This release is available as Docker images:
 78 |           
 79 |           $(echo "$DOCKER_TAGS" | sed 's/ghcr.io/- `ghcr.io/g' | sed 's/ /`\n/g')
 80 |           
 81 |           **Quick start with Docker:**
 82 |           \`\`\`bash
 83 |           docker pull ghcr.io/${{ github.repository }}:$RELEASE_TAG
 84 |           \`\`\`
 85 |           
 86 |           **Claude Desktop configuration:**
 87 |           \`\`\`json
 88 |           {
 89 |             "mcpServers": {
 90 |               "zen-mcp-server": {
 91 |                 "command": "docker",
 92 |                 "args": [
 93 |                   "run", "--rm", "-i",
 94 |                   "-e", "GEMINI_API_KEY",
 95 |                   "ghcr.io/${{ github.repository }}:$RELEASE_TAG"
 96 |                 ],
 97 |                 "env": {
 98 |                   "GEMINI_API_KEY": "your-api-key-here"
 99 |                 }
100 |               }
101 |             }
102 |           }
103 |           \`\`\`
104 |           EOF
105 |         env:
106 |           GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
107 | 
108 |       - name: Create deployment summary
109 |         run: |
110 |           echo "## 🐳 Docker Release Build Complete" >> $GITHUB_STEP_SUMMARY
111 |           echo "" >> $GITHUB_STEP_SUMMARY
112 |           echo "**Release**: ${{ inputs.tag || github.event.release.tag_name }}" >> $GITHUB_STEP_SUMMARY
113 |           echo "**Images built:**" >> $GITHUB_STEP_SUMMARY
114 |           echo "\`\`\`" >> $GITHUB_STEP_SUMMARY
115 |           echo "${{ steps.meta.outputs.tags }}" >> $GITHUB_STEP_SUMMARY
116 |           echo "\`\`\`" >> $GITHUB_STEP_SUMMARY
```

--------------------------------------------------------------------------------
/docs/tools/version.md:
--------------------------------------------------------------------------------

```markdown
  1 | # Version Tool - Server Information
  2 | 
  3 | **Get server version, configuration details, and list of available tools**
  4 | 
  5 | The `version` tool provides information about the Zen MCP Server version, configuration details, and system capabilities. This is useful for debugging, understanding server capabilities, and verifying your installation.
  6 | 
  7 | ## Usage
  8 | 
  9 | ```
 10 | "Get zen to show its version"
 11 | ```
 12 | 
 13 | ## Key Features
 14 | 
 15 | - **Server version information**: Current version and build details
 16 | - **Configuration overview**: Active settings and capabilities
 17 | - **Tool inventory**: Complete list of available tools and their status
 18 | - **System health**: Basic server status and connectivity verification
 19 | - **Debug information**: Helpful details for troubleshooting
 20 | 
 21 | ## Output Information
 22 | 
 23 | The tool provides:
 24 | 
 25 | **Version Details:**
 26 | - Server version number
 27 | - Build timestamp and commit information
 28 | - MCP protocol version compatibility
 29 | - Python runtime version
 30 | 
 31 | **Configuration Summary:**
 32 | - Active providers and their status
 33 | - Default model configuration
 34 | - Feature flags and settings
 35 | - Environment configuration overview
 36 | 
 37 | **Tool Availability:**
 38 | - Complete list of available tools
 39 | - Tool version information
 40 | - Capability status for each tool
 41 | 
 42 | **System Information:**
 43 | - Server uptime and status
 44 | - Memory and resource usage (if available)
 45 | - Conversation memory status
 46 | - Server process information
 47 | 
 48 | ## Example Output
 49 | 
 50 | ```
 51 | 🔧 Zen MCP Server Information
 52 | 
 53 | 📋 Version: 2.15.0
 54 | 🏗️ Build: 2024-01-15T10:30:00Z (commit: abc123f)
 55 | 🔌 MCP Protocol: 1.0.0
 56 | 🐍 Python Runtime: 3.11.7
 57 | 
 58 | ⚙️ Configuration:
 59 | • Default Model: auto
 60 | • Providers: Google ✅, OpenAI ✅, Custom ✅
 61 | • Conversation Memory: Active ✅
 62 | • Web Search: Enabled
 63 | 
 64 | 🛠️ Available Tools (12):
 65 | • chat - General development chat & collaborative thinking
 66 | • thinkdeep - Extended reasoning partner  
 67 | • consensus - Multi-model perspective gathering
 68 | • codereview - Professional code review
 69 | • precommit - Pre-commit validation
 70 | • debug - Expert debugging assistant
 71 | • analyze - Smart file analysis
 72 | • refactor - Intelligent code refactoring
 73 | • tracer - Static code analysis prompt generator
 74 | • testgen - Comprehensive test generation
 75 | • listmodels - List available models
 76 | • version - Server information
 77 | 
 78 | 🔍 System Status:
 79 | • Server Uptime: 2h 35m
 80 | • Memory Storage: Active
 81 | • Server Process: Running
 82 | ```
 83 | 
 84 | ## When to Use Version Tool
 85 | 
 86 | - **Troubleshooting**: When experiencing issues with the server or tools
 87 | - **Configuration verification**: To confirm your setup is correct
 88 | - **Support requests**: To provide system information when asking for help
 89 | - **Update checking**: To verify you're running the latest version
 90 | - **Capability discovery**: To understand what features are available
 91 | 
 92 | ## Debug Information
 93 | 
 94 | The version tool can help diagnose common issues:
 95 | 
 96 | **Connection Problems:**
 97 | - Verify server is running and responsive
 98 | - Check MCP protocol compatibility
 99 | - Confirm tool availability
100 | 
101 | **Configuration Issues:**
102 | - Validate provider setup
103 | - Check API key configuration status
104 | - Verify feature enablement
105 | 
106 | **Performance Troubleshooting:**
107 | - Server uptime and stability
108 | - Resource usage patterns
109 | - Memory storage health
110 | 
111 | ## Tool Parameters
112 | 
113 | This tool requires no parameters - it provides comprehensive server information automatically.
114 | 
115 | ## Best Practices
116 | 
117 | - **Include in bug reports**: Always include version output when reporting issues
118 | - **Check after updates**: Verify version information after server updates
119 | - **Monitor system health**: Use periodically to check server status
120 | - **Validate configuration**: Confirm settings match your expectations
121 | 
122 | ## When to Use Version vs Other Tools
123 | 
124 | - **Use `version`** for: Server diagnostics, configuration verification, troubleshooting
125 | - **Use `listmodels`** for: Model availability and capability information
126 | - **Use other tools** for: Actual development and analysis tasks
127 | - **Use with support**: Essential information for getting help with issues
```

--------------------------------------------------------------------------------
/docs/tools/listmodels.md:
--------------------------------------------------------------------------------

```markdown
  1 | # ListModels Tool - List Available Models
  2 | 
  3 | **Display all available AI models organized by provider**
  4 | 
  5 | The `listmodels` tool shows which providers are configured, available models, their aliases, context windows, and capabilities. This is useful for understanding what models can be used and their characteristics.
  6 | 
  7 | ## Usage
  8 | 
  9 | ```
 10 | "Use zen to list available models"
 11 | ```
 12 | 
 13 | ## Key Features
 14 | 
 15 | - **Provider organization**: Shows all configured providers and their status
 16 | - **Model capabilities**: Context windows, thinking mode support, and special features
 17 | - **Alias mapping**: Shows shorthand names and their full model mappings
 18 | - **Configuration status**: Indicates which providers are available based on API keys
 19 | - **Context window information**: Helps you choose models based on your content size needs
 20 | - **Capability overview**: Understanding which models support extended thinking, vision, etc.
 21 | 
 22 | ## Output Information
 23 | 
 24 | The tool displays:
 25 | 
 26 | **Provider Status:**
 27 | - Which providers are configured and available
 28 | - API key status (without revealing the actual keys)
 29 | - Provider priority order
 30 | 
 31 | **Model Details:**
 32 | - Full model names and their aliases
 33 | - Context window sizes (tokens)
 34 | - Special capabilities (thinking modes, vision support, etc.)
 35 | - Provider-specific features
 36 | 
 37 | **Capability Summary:**
 38 | - Which models support extended thinking
 39 | - Vision-capable models for image analysis
 40 | - Models with largest context windows
 41 | - Fastest models for quick tasks
 42 | 
 43 | ## Example Output
 44 | 
 45 | ```
 46 | 📋 Available Models by Provider
 47 | 
 48 | 🔹 Google (Gemini) - ✅ Configured
 49 |   • pro (gemini-2.5-pro) - 1M context, thinking modes
 50 |   • flash (gemini-2.0-flash-experimental) - 1M context, ultra-fast
 51 | 
 52 | 🔹 OpenAI - ✅ Configured  
 53 |   • o3 (o3) - 200K context, strong reasoning
 54 |   • o3-mini (o3-mini) - 200K context, balanced
 55 |   • o4-mini (o4-mini) - 200K context, latest reasoning
 56 | 
 57 | 🔹 Custom/Local - ✅ Configured
 58 |   • local-llama (llama3.2) - 128K context, local inference
 59 |   • Available at: http://localhost:11434/v1
 60 | 
 61 | 🔹 OpenRouter - ❌ Not configured
 62 |   Set OPENROUTER_API_KEY to enable access to Claude, GPT-4, and more models
 63 | ```
 64 | 
 65 | ## When to Use ListModels
 66 | 
 67 | - **Model selection**: When you're unsure which models are available
 68 | - **Capability checking**: To verify what features each model supports
 69 | - **Configuration validation**: To confirm your API keys are working
 70 | - **Context planning**: To choose models based on content size requirements
 71 | - **Performance optimization**: To select the right model for speed vs quality trade-offs
 72 | 
 73 | ## Configuration Dependencies
 74 | 
 75 | The available models depend on your configuration:
 76 | 
 77 | **API Keys Required:**
 78 | - `GEMINI_API_KEY` - Enables Gemini Pro and Flash models
 79 | - `OPENAI_API_KEY` - Enables OpenAI O3, O4-mini, and GPT models
 80 | - `OPENROUTER_API_KEY` - Enables access to multiple providers through OpenRouter
 81 | - `CUSTOM_API_URL` - Enables local/custom models (Ollama, vLLM, etc.)
 82 | 
 83 | **Model Restrictions:**
 84 | If you've set model usage restrictions via environment variables, the tool will show:
 85 | - Which models are allowed vs restricted
 86 | - Active restriction policies
 87 | - How to modify restrictions
 88 | 
 89 | ## Tool Parameters
 90 | 
 91 | This tool requires no parameters - it simply queries the server configuration and displays all available information.
 92 | 
 93 | ## Best Practices
 94 | 
 95 | - **Check before planning**: Use this tool to understand your options before starting complex tasks
 96 | - **Verify configuration**: Confirm your API keys are working as expected
 97 | - **Choose appropriate models**: Match model capabilities to your specific needs
 98 | - **Understand limits**: Be aware of context windows when working with large files
 99 | 
100 | ## When to Use ListModels vs Other Tools
101 | 
102 | - **Use `listmodels`** for: Understanding available options and model capabilities
103 | - **Use `chat`** for: General discussions about which model to use for specific tasks
104 | - **Use `version`** for: Server configuration and version information
105 | - **Use other tools** for: Actual analysis, debugging, or development work
```

--------------------------------------------------------------------------------
/tests/test_gemini_token_usage.py:
--------------------------------------------------------------------------------

```python
  1 | """Tests for Gemini provider token usage extraction."""
  2 | 
  3 | import unittest
  4 | from unittest.mock import Mock
  5 | 
  6 | from providers.gemini import GeminiModelProvider
  7 | 
  8 | 
  9 | class TestGeminiTokenUsage(unittest.TestCase):
 10 |     """Test Gemini provider token usage handling."""
 11 | 
 12 |     def setUp(self):
 13 |         """Set up test fixtures."""
 14 |         self.provider = GeminiModelProvider("test-key")
 15 | 
 16 |     def test_extract_usage_with_valid_tokens(self):
 17 |         """Test token extraction with valid token counts."""
 18 |         response = Mock()
 19 |         response.usage_metadata = Mock()
 20 |         response.usage_metadata.prompt_token_count = 100
 21 |         response.usage_metadata.candidates_token_count = 50
 22 | 
 23 |         usage = self.provider._extract_usage(response)
 24 | 
 25 |         self.assertEqual(usage["input_tokens"], 100)
 26 |         self.assertEqual(usage["output_tokens"], 50)
 27 |         self.assertEqual(usage["total_tokens"], 150)
 28 | 
 29 |     def test_extract_usage_with_none_input_tokens(self):
 30 |         """Test token extraction when input_tokens is None (regression test for bug)."""
 31 |         response = Mock()
 32 |         response.usage_metadata = Mock()
 33 |         response.usage_metadata.prompt_token_count = None  # This was causing crashes
 34 |         response.usage_metadata.candidates_token_count = 50
 35 | 
 36 |         usage = self.provider._extract_usage(response)
 37 | 
 38 |         # Should not include input_tokens when None
 39 |         self.assertNotIn("input_tokens", usage)
 40 |         self.assertEqual(usage["output_tokens"], 50)
 41 |         # Should not calculate total_tokens when input is None
 42 |         self.assertNotIn("total_tokens", usage)
 43 | 
 44 |     def test_extract_usage_with_none_output_tokens(self):
 45 |         """Test token extraction when output_tokens is None (regression test for bug)."""
 46 |         response = Mock()
 47 |         response.usage_metadata = Mock()
 48 |         response.usage_metadata.prompt_token_count = 100
 49 |         response.usage_metadata.candidates_token_count = None  # This was causing crashes
 50 | 
 51 |         usage = self.provider._extract_usage(response)
 52 | 
 53 |         self.assertEqual(usage["input_tokens"], 100)
 54 |         # Should not include output_tokens when None
 55 |         self.assertNotIn("output_tokens", usage)
 56 |         # Should not calculate total_tokens when output is None
 57 |         self.assertNotIn("total_tokens", usage)
 58 | 
 59 |     def test_extract_usage_with_both_none_tokens(self):
 60 |         """Test token extraction when both token counts are None."""
 61 |         response = Mock()
 62 |         response.usage_metadata = Mock()
 63 |         response.usage_metadata.prompt_token_count = None
 64 |         response.usage_metadata.candidates_token_count = None
 65 | 
 66 |         usage = self.provider._extract_usage(response)
 67 | 
 68 |         # Should return empty dict when all tokens are None
 69 |         self.assertEqual(usage, {})
 70 | 
 71 |     def test_extract_usage_without_usage_metadata(self):
 72 |         """Test token extraction when response has no usage_metadata."""
 73 |         response = Mock(spec=[])
 74 | 
 75 |         usage = self.provider._extract_usage(response)
 76 | 
 77 |         # Should return empty dict
 78 |         self.assertEqual(usage, {})
 79 | 
 80 |     def test_extract_usage_with_zero_tokens(self):
 81 |         """Test token extraction with zero token counts."""
 82 |         response = Mock()
 83 |         response.usage_metadata = Mock()
 84 |         response.usage_metadata.prompt_token_count = 0
 85 |         response.usage_metadata.candidates_token_count = 0
 86 | 
 87 |         usage = self.provider._extract_usage(response)
 88 | 
 89 |         self.assertEqual(usage["input_tokens"], 0)
 90 |         self.assertEqual(usage["output_tokens"], 0)
 91 |         self.assertEqual(usage["total_tokens"], 0)
 92 | 
 93 |     def test_extract_usage_missing_attributes(self):
 94 |         """Test token extraction when metadata lacks token count attributes."""
 95 |         response = Mock()
 96 |         response.usage_metadata = Mock(spec=[])
 97 | 
 98 |         usage = self.provider._extract_usage(response)
 99 | 
100 |         # Should return empty dict when attributes are missing
101 |         self.assertEqual(usage, {})
102 | 
103 | 
104 | if __name__ == "__main__":
105 |     unittest.main()
106 | 
```

--------------------------------------------------------------------------------
/systemprompts/chat_prompt.py:
--------------------------------------------------------------------------------

```python
 1 | """
 2 | Chat tool system prompt
 3 | """
 4 | 
 5 | CHAT_PROMPT = """
 6 | You are a senior engineering thought-partner collaborating with another AI agent. Your mission is to brainstorm, validate ideas,
 7 | and offer well-reasoned second opinions on technical decisions when they are justified and practical.
 8 | 
 9 | CRITICAL LINE NUMBER INSTRUCTIONS
10 | Code is presented with line number markers "LINE│ code". These markers are for reference ONLY and MUST NOT be
11 | included in any code you generate. Always reference specific line numbers in your replies in order to locate
12 | exact positions if needed to point to exact locations. Include a very short code excerpt alongside for clarity.
13 | Include context_start_text and context_end_text as backup references. Never include "LINE│" markers in generated code
14 | snippets.
15 | 
16 | IF MORE INFORMATION IS NEEDED
17 | If the agent is discussing specific code, functions, or project components that was not given as part of the context,
18 | and you need additional context (e.g., related files, configuration, dependencies, test files) to provide meaningful
19 | collaboration, you MUST respond ONLY with this JSON format (and nothing else). Do NOT ask for the same file you've been
20 | provided unless for some reason its content is missing or incomplete:
21 | {
22 |   "status": "files_required_to_continue",
23 |   "mandatory_instructions": "<your critical instructions for the agent>",
24 |   "files_needed": ["[file name here]", "[or some folder/]"]
25 | }
26 | 
27 | SCOPE & FOCUS
28 | • Ground every suggestion in the project's current tech stack, languages, frameworks, and constraints.
29 | • Recommend new technologies or patterns ONLY when they provide clearly superior outcomes with minimal added complexity.
30 | • Avoid speculative, over-engineered, or unnecessarily abstract designs that exceed current project goals or needs.
31 | • Keep proposals practical and directly actionable within the existing architecture.
32 | • Overengineering is an anti-pattern — avoid solutions that introduce unnecessary abstraction, indirection, or
33 |   configuration in anticipation of complexity that does not yet exist, is not clearly justified by the current scope,
34 |   and may not arise in the foreseeable future.
35 | 
36 | COLLABORATION APPROACH
37 | 1. Treat the collaborating agent as an equally senior peer. Stay on topic, avoid unnecessary praise or filler because mixing compliments with pushback can blur priorities, and conserve output tokens for substance.
38 | 2. Engage deeply with the agent's input – extend, refine, and explore alternatives ONLY WHEN they are well-justified and materially beneficial.
39 | 3. Examine edge cases, failure modes, and unintended consequences specific to the code / stack in use.
40 | 4. Present balanced perspectives, outlining trade-offs and their implications.
41 | 5. Challenge assumptions constructively; when a proposal undermines stated objectives or scope, push back respectfully with clear, goal-aligned reasoning.
42 | 6. Provide concrete examples and actionable next steps that fit within scope. Prioritize direct, achievable outcomes.
43 | 7. Ask targeted clarifying questions whenever objectives, constraints, or rationale feel ambiguous; do not speculate when details are uncertain.
44 | 
45 | BRAINSTORMING GUIDELINES
46 | • Offer multiple viable strategies ONLY WHEN clearly beneficial within the current environment.
47 | • Suggest creative solutions that operate within real-world constraints, and avoid proposing major shifts unless truly warranted.
48 | • Surface pitfalls early, particularly those tied to the chosen frameworks, languages, design direction or choice.
49 | • Evaluate scalability, maintainability, and operational realities inside the existing architecture and current
50 | framework.
51 | • Reference industry best practices relevant to the technologies in use.
52 | • Communicate concisely and technically, assuming an experienced engineering audience.
53 | 
54 | REMEMBER
55 | Act as a peer, not a lecturer. Avoid overcomplicating. Aim for depth over breadth, stay within project boundaries, and help the team
56 | reach sound, actionable decisions.
57 | """
58 | 
```

--------------------------------------------------------------------------------
/simulator_tests/test_logs_validation.py:
--------------------------------------------------------------------------------

```python
  1 | #!/usr/bin/env python3
  2 | """
  3 | Server Logs Validation Test
  4 | 
  5 | Validates server logs to confirm file deduplication behavior and
  6 | conversation threading is working properly.
  7 | """
  8 | 
  9 | from .base_test import BaseSimulatorTest
 10 | 
 11 | 
 12 | class LogsValidationTest(BaseSimulatorTest):
 13 |     """Validate server logs to confirm file deduplication behavior"""
 14 | 
 15 |     @property
 16 |     def test_name(self) -> str:
 17 |         return "logs_validation"
 18 | 
 19 |     @property
 20 |     def test_description(self) -> str:
 21 |         return "Server logs validation"
 22 | 
 23 |     def run_test(self) -> bool:
 24 |         """Validate server logs to confirm file deduplication behavior"""
 25 |         try:
 26 |             self.logger.info("📋 Test: Validating server logs for file deduplication...")
 27 | 
 28 |             # Get server logs from log files
 29 |             import os
 30 | 
 31 |             logs = ""
 32 |             log_files = ["logs/mcp_server.log", "logs/mcp_activity.log"]
 33 | 
 34 |             for log_file in log_files:
 35 |                 if os.path.exists(log_file):
 36 |                     try:
 37 |                         with open(log_file) as f:
 38 |                             file_content = f.read()
 39 |                             logs += f"\n=== {log_file} ===\n{file_content}\n"
 40 |                             self.logger.debug(f"Read {len(file_content)} characters from {log_file}")
 41 |                     except Exception as e:
 42 |                         self.logger.warning(f"Could not read {log_file}: {e}")
 43 |                 else:
 44 |                     self.logger.warning(f"Log file not found: {log_file}")
 45 | 
 46 |             if not logs.strip():
 47 |                 self.logger.warning("No log content found - server may not have processed any requests yet")
 48 |                 return False
 49 | 
 50 |             # Look for conversation threading patterns that indicate the system is working
 51 |             conversation_patterns = [
 52 |                 "CONVERSATION_RESUME",
 53 |                 "CONVERSATION_CONTEXT",
 54 |                 "previous turns loaded",
 55 |                 "tool embedding",
 56 |                 "files included",
 57 |                 "files truncated",
 58 |                 "already in conversation history",
 59 |             ]
 60 | 
 61 |             conversation_lines = []
 62 |             for line in logs.split("\n"):
 63 |                 for pattern in conversation_patterns:
 64 |                     if pattern.lower() in line.lower():
 65 |                         conversation_lines.append(line.strip())
 66 |                         break
 67 | 
 68 |             # Look for evidence of conversation threading and file handling
 69 |             conversation_threading_found = False
 70 |             multi_turn_conversations = False
 71 | 
 72 |             for line in conversation_lines:
 73 |                 lower_line = line.lower()
 74 |                 if "conversation_resume" in lower_line:
 75 |                     conversation_threading_found = True
 76 |                     self.logger.debug(f"📄 Conversation threading: {line}")
 77 |                 elif "previous turns loaded" in lower_line:
 78 |                     multi_turn_conversations = True
 79 |                     self.logger.debug(f"📄 Multi-turn conversation: {line}")
 80 |                 elif "already in conversation" in lower_line:
 81 |                     self.logger.info(f"✅ Found explicit deduplication: {line}")
 82 |                     return True
 83 | 
 84 |             # Conversation threading with multiple turns is evidence of file deduplication working
 85 |             if conversation_threading_found and multi_turn_conversations:
 86 |                 self.logger.info("✅ Conversation threading with multi-turn context working")
 87 |                 self.logger.info(
 88 |                     "✅ File deduplication working implicitly (files embedded once in conversation history)"
 89 |                 )
 90 |                 return True
 91 |             elif conversation_threading_found:
 92 |                 self.logger.info("✅ Conversation threading detected")
 93 |                 return True
 94 |             else:
 95 |                 self.logger.warning("⚠️  No clear evidence of conversation threading in logs")
 96 |                 self.logger.debug(f"Found {len(conversation_lines)} conversation-related log lines")
 97 |                 return False
 98 | 
 99 |         except Exception as e:
100 |             self.logger.error(f"Log validation failed: {e}")
101 |             return False
102 | 
```

--------------------------------------------------------------------------------
/conf/xai_models.json:
--------------------------------------------------------------------------------

```json
 1 | {
 2 |   "_README": {
 3 |     "description": "Model metadata for X.AI (GROK) API access.",
 4 |     "documentation": "https://github.com/BeehiveInnovations/zen-mcp-server/blob/main/docs/custom_models.md",
 5 |     "usage": "Models listed here are exposed directly through the X.AI provider. Aliases are case-insensitive.",
 6 |     "field_notes": "Matches providers/shared/model_capabilities.py.",
 7 |     "field_descriptions": {
 8 |       "model_name": "The model identifier (e.g., 'grok-4', 'grok-3-fast')",
 9 |       "aliases": "Array of short names users can type instead of the full model name",
10 |       "context_window": "Total number of tokens the model can process (input + output combined)",
11 |       "max_output_tokens": "Maximum number of tokens the model can generate in a single response",
12 |       "max_thinking_tokens": "Maximum reasoning/thinking tokens the model will allocate when extended thinking is requested",
13 |       "supports_extended_thinking": "Whether the model supports extended reasoning tokens (currently none do via OpenRouter or custom APIs)",
14 |       "supports_json_mode": "Whether the model can guarantee valid JSON output",
15 |       "supports_function_calling": "Whether the model supports function/tool calling",
16 |       "supports_images": "Whether the model can process images/visual input",
17 |       "max_image_size_mb": "Maximum total size in MB for all images combined (capped at 40MB max for custom models)",
18 |       "supports_temperature": "Whether the model accepts temperature parameter in API calls (set to false for O3/O4 reasoning models)",
19 |       "temperature_constraint": "Type of temperature constraint: 'fixed' (fixed value), 'range' (continuous range), 'discrete' (specific values), or omit for default range",
20 |       "use_openai_response_api": "Set to true when the model must use the /responses endpoint (reasoning models like GPT-5 Pro). Leave false/omit for standard chat completions.",
21 |       "default_reasoning_effort": "Default reasoning effort level for models that support it (e.g., 'low', 'medium', 'high'). Omit if not applicable.",
22 |       "description": "Human-readable description of the model",
23 |       "intelligence_score": "1-20 human rating used as the primary signal for auto-mode model ordering"
24 |     }
25 |   },
26 |   "models": [
27 |     {
28 |       "model_name": "grok-4",
29 |       "friendly_name": "X.AI (Grok 4)",
30 |       "aliases": [
31 |         "grok",
32 |         "grok4",
33 |         "grok-4"
34 |       ],
35 |       "intelligence_score": 16,
36 |       "description": "GROK-4 (256K context) - Frontier multimodal reasoning model with advanced capabilities",
37 |       "context_window": 256000,
38 |       "max_output_tokens": 256000,
39 |       "supports_extended_thinking": true,
40 |       "supports_system_prompts": true,
41 |       "supports_streaming": true,
42 |       "supports_function_calling": true,
43 |       "supports_json_mode": true,
44 |       "supports_images": true,
45 |       "supports_temperature": true,
46 |       "max_image_size_mb": 20.0
47 |     },
48 |     {
49 |       "model_name": "grok-3",
50 |       "friendly_name": "X.AI (Grok 3)",
51 |       "aliases": [
52 |         "grok3"
53 |       ],
54 |       "intelligence_score": 13,
55 |       "description": "GROK-3 (131K context) - Advanced reasoning model from X.AI, excellent for complex analysis",
56 |       "context_window": 131072,
57 |       "max_output_tokens": 131072,
58 |       "supports_extended_thinking": false,
59 |       "supports_system_prompts": true,
60 |       "supports_streaming": true,
61 |       "supports_function_calling": true,
62 |       "supports_json_mode": false,
63 |       "supports_images": false,
64 |       "supports_temperature": true
65 |     },
66 |     {
67 |       "model_name": "grok-3-fast",
68 |       "friendly_name": "X.AI (Grok 3 Fast)",
69 |       "aliases": [
70 |         "grok3fast",
71 |         "grokfast",
72 |         "grok3-fast"
73 |       ],
74 |       "intelligence_score": 12,
75 |       "description": "GROK-3 Fast (131K context) - Higher performance variant, faster processing but more expensive",
76 |       "context_window": 131072,
77 |       "max_output_tokens": 131072,
78 |       "supports_extended_thinking": false,
79 |       "supports_system_prompts": true,
80 |       "supports_streaming": true,
81 |       "supports_function_calling": true,
82 |       "supports_json_mode": false,
83 |       "supports_images": false,
84 |       "supports_temperature": true
85 |     }
86 |   ]
87 | }
88 | 
```

--------------------------------------------------------------------------------
/tests/test_chat_codegen_integration.py:
--------------------------------------------------------------------------------

```python
  1 | """Integration test for Chat tool code generation with Gemini 2.5 Pro.
  2 | 
  3 | This test uses the Google Gemini SDK's built-in record/replay support. To refresh the
  4 | cassette, delete the existing JSON file under
  5 | ``tests/gemini_cassettes/chat_codegen/gemini25_pro_calculator/mldev.json`` and run:
  6 | 
  7 | ```
  8 | GEMINI_API_KEY=<real-key> pytest tests/test_chat_codegen_integration.py::test_chat_codegen_saves_file
  9 | ```
 10 | 
 11 | The test will automatically record a new interaction when the cassette is missing and
 12 | the environment variable `GEMINI_API_KEY` is set to a valid key.
 13 | """
 14 | 
 15 | from __future__ import annotations
 16 | 
 17 | import json
 18 | import os
 19 | from pathlib import Path
 20 | 
 21 | import pytest
 22 | 
 23 | from providers.gemini import GeminiModelProvider
 24 | from providers.registry import ModelProviderRegistry, ProviderType
 25 | from tools.chat import ChatTool
 26 | 
 27 | REPLAYS_ROOT = Path(__file__).parent / "gemini_cassettes"
 28 | CASSETTE_DIR = REPLAYS_ROOT / "chat_codegen"
 29 | CASSETTE_PATH = CASSETTE_DIR / "gemini25_pro_calculator" / "mldev.json"
 30 | CASSETTE_REPLAY_ID = "chat_codegen/gemini25_pro_calculator/mldev"
 31 | 
 32 | 
 33 | @pytest.mark.asyncio
 34 | @pytest.mark.no_mock_provider
 35 | async def test_chat_codegen_saves_file(monkeypatch, tmp_path):
 36 |     """Ensure Gemini 2.5 Pro responses create zen_generated.code when code is emitted."""
 37 | 
 38 |     CASSETTE_PATH.parent.mkdir(parents=True, exist_ok=True)
 39 | 
 40 |     recording_mode = not CASSETTE_PATH.exists()
 41 |     gemini_key = os.getenv("GEMINI_API_KEY", "")
 42 | 
 43 |     if recording_mode:
 44 |         if not gemini_key or gemini_key.startswith("dummy"):
 45 |             pytest.skip("Cassette missing and GEMINI_API_KEY not configured. Provide a real key to record.")
 46 |         client_mode = "record"
 47 |     else:
 48 |         gemini_key = "dummy-key-for-replay"
 49 |         client_mode = "replay"
 50 | 
 51 |     with monkeypatch.context() as m:
 52 |         m.setenv("GEMINI_API_KEY", gemini_key)
 53 |         m.setenv("DEFAULT_MODEL", "auto")
 54 |         m.setenv("GOOGLE_ALLOWED_MODELS", "gemini-2.5-pro")
 55 |         m.setenv("GOOGLE_GENAI_CLIENT_MODE", client_mode)
 56 |         m.setenv("GOOGLE_GENAI_REPLAYS_DIRECTORY", str(REPLAYS_ROOT))
 57 |         m.setenv("GOOGLE_GENAI_REPLAY_ID", CASSETTE_REPLAY_ID)
 58 | 
 59 |         # Clear other provider keys to avoid unintended routing
 60 |         for key in ["OPENAI_API_KEY", "XAI_API_KEY", "OPENROUTER_API_KEY", "CUSTOM_API_KEY"]:
 61 |             m.delenv(key, raising=False)
 62 | 
 63 |         ModelProviderRegistry.reset_for_testing()
 64 |         ModelProviderRegistry.register_provider(ProviderType.GOOGLE, GeminiModelProvider)
 65 | 
 66 |         working_dir = tmp_path / "codegen"
 67 |         working_dir.mkdir()
 68 |         preexisting = working_dir / "zen_generated.code"
 69 |         preexisting.write_text("stale contents", encoding="utf-8")
 70 | 
 71 |         chat_tool = ChatTool()
 72 |         prompt = (
 73 |             "Please generate a Python module with functions `add` and `multiply` that perform"
 74 |             " basic addition and multiplication. Produce the response using the structured"
 75 |             " <GENERATED-CODE> format so the assistant can apply the files directly."
 76 |         )
 77 | 
 78 |         result = await chat_tool.execute(
 79 |             {
 80 |                 "prompt": prompt,
 81 |                 "model": "gemini-2.5-pro",
 82 |                 "working_directory_absolute_path": str(working_dir),
 83 |             }
 84 |         )
 85 | 
 86 |         provider = ModelProviderRegistry.get_provider_for_model("gemini-2.5-pro")
 87 |         if provider is not None:
 88 |             try:
 89 |                 provider.client.close()
 90 |             except AttributeError:
 91 |                 pass
 92 | 
 93 |         # Reset restriction service cache to avoid leaking allowed-model config
 94 |         try:
 95 |             from utils import model_restrictions
 96 | 
 97 |             model_restrictions._restriction_service = None  # type: ignore[attr-defined]
 98 |         except Exception:
 99 |             pass
100 | 
101 |     assert result and result[0].type == "text"
102 |     payload = json.loads(result[0].text)
103 |     assert payload["status"] in {"success", "continuation_available"}
104 | 
105 |     artifact_path = working_dir / "zen_generated.code"
106 |     assert artifact_path.exists()
107 |     saved = artifact_path.read_text()
108 |     assert "<GENERATED-CODE>" in saved
109 |     assert "<NEWFILE:" in saved
110 |     assert "def add" in saved and "def multiply" in saved
111 |     assert "stale contents" not in saved
112 | 
113 |     artifact_path.unlink()
114 | 
```

--------------------------------------------------------------------------------
/docs/vcr-testing.md:
--------------------------------------------------------------------------------

```markdown
  1 | # HTTP Transport Recorder for Testing
  2 | 
  3 | A custom HTTP recorder for testing expensive API calls (like o3-pro) with real responses.
  4 | 
  5 | ## Overview
  6 | 
  7 | The HTTP Transport Recorder captures and replays HTTP interactions at the transport layer, enabling:
  8 | - Cost-efficient testing of expensive APIs (record once, replay forever)
  9 | - Deterministic tests with real API responses
 10 | - Seamless integration with httpx and OpenAI SDK
 11 | - Automatic PII sanitization for secure recordings
 12 | 
 13 | ## Quick Start
 14 | 
 15 | ```python
 16 | from tests.transport_helpers import inject_transport
 17 | 
 18 | # Simple one-line setup with automatic transport injection
 19 | def test_expensive_api_call(monkeypatch):
 20 |     inject_transport(monkeypatch, "tests/openai_cassettes/my_test.json")
 21 |     
 22 |     # Make API calls - automatically recorded/replayed with PII sanitization
 23 |     result = await chat_tool.execute({"prompt": "2+2?", "model": "o3-pro"})
 24 | ```
 25 | 
 26 | ## How It Works
 27 | 
 28 | 1. **First run** (cassette doesn't exist): Records real API calls
 29 | 2. **Subsequent runs** (cassette exists): Replays saved responses
 30 | 3. **Re-record**: Delete cassette file and run again
 31 | 
 32 | ## Usage in Tests
 33 | 
 34 | The `transport_helpers.inject_transport()` function simplifies test setup:
 35 | 
 36 | ```python
 37 | from tests.transport_helpers import inject_transport
 38 | 
 39 | async def test_with_recording(monkeypatch):
 40 |     # One-line setup - handles all transport injection complexity
 41 |     inject_transport(monkeypatch, "tests/openai_cassettes/my_test.json")
 42 |     
 43 |     # Use API normally - recording/replay happens transparently
 44 |     result = await chat_tool.execute({"prompt": "2+2?", "model": "o3-pro"})
 45 | ```
 46 | 
 47 | For manual setup, see `test_o3_pro_output_text_fix.py`.
 48 | 
 49 | ## Automatic PII Sanitization
 50 | 
 51 | All recordings are automatically sanitized to remove sensitive data:
 52 | 
 53 | - **API Keys & Tokens**: Bearer tokens, API keys, and auth headers
 54 | - **Personal Data**: Email addresses, IP addresses, phone numbers
 55 | - **URLs**: Sensitive query parameters and paths
 56 | - **Custom Patterns**: Add your own sanitization rules
 57 | 
 58 | Sanitization is enabled by default in `RecordingTransport`. To disable:
 59 | 
 60 | ```python
 61 | transport = TransportFactory.create_transport(cassette_path, sanitize=False)
 62 | ```
 63 | 
 64 | ## File Structure
 65 | 
 66 | ```
 67 | tests/
 68 | ├── openai_cassettes/           # Recorded API interactions
 69 | │   └── *.json                  # Cassette files
 70 | ├── http_transport_recorder.py  # Transport implementation
 71 | ├── pii_sanitizer.py           # Automatic PII sanitization
 72 | ├── transport_helpers.py       # Simplified transport injection
 73 | ├── sanitize_cassettes.py      # Batch sanitization script
 74 | └── test_o3_pro_output_text_fix.py  # Example usage
 75 | ```
 76 | 
 77 | ## Sanitizing Existing Cassettes
 78 | 
 79 | Use the `sanitize_cassettes.py` script to clean existing recordings:
 80 | 
 81 | ```bash
 82 | # Sanitize all cassettes (creates backups)
 83 | python tests/sanitize_cassettes.py
 84 | 
 85 | # Sanitize specific cassette
 86 | python tests/sanitize_cassettes.py tests/openai_cassettes/my_test.json
 87 | 
 88 | # Skip backup creation
 89 | python tests/sanitize_cassettes.py --no-backup
 90 | ```
 91 | 
 92 | The script will:
 93 | - Create timestamped backups of original files
 94 | - Apply comprehensive PII sanitization
 95 | - Preserve JSON structure and functionality
 96 | 
 97 | ## Cost Management
 98 | 
 99 | - **One-time cost**: Initial recording only
100 | - **Zero ongoing cost**: Replays are free
101 | - **CI-friendly**: No API keys needed for replay
102 | 
103 | ## Re-recording
104 | 
105 | When API changes require new recordings:
106 | 
107 | ```bash
108 | # Delete specific cassette
109 | rm tests/openai_cassettes/my_test.json
110 | 
111 | # Run test with real API key
112 | python -m pytest tests/test_o3_pro_output_text_fix.py
113 | ```
114 | 
115 | ## Implementation Details
116 | 
117 | - **RecordingTransport**: Captures real HTTP calls with automatic PII sanitization
118 | - **ReplayTransport**: Serves saved responses from cassettes
119 | - **TransportFactory**: Auto-selects mode based on cassette existence
120 | - **PIISanitizer**: Comprehensive sanitization of sensitive data (integrated by default)
121 | 
122 | **Security Note**: While recordings are automatically sanitized, always review new cassette files before committing. The sanitizer removes known patterns of sensitive data, but domain-specific secrets may need custom rules.
123 | 
124 | For implementation details, see:
125 | - `tests/http_transport_recorder.py` - Core transport implementation
126 | - `tests/pii_sanitizer.py` - Sanitization patterns and logic
127 | - `tests/transport_helpers.py` - Simplified test integration
128 | 
129 | 
```

--------------------------------------------------------------------------------
/utils/storage_backend.py:
--------------------------------------------------------------------------------

```python
  1 | """
  2 | In-memory storage backend for conversation threads
  3 | 
  4 | This module provides a thread-safe, in-memory alternative to Redis for storing
  5 | conversation contexts. It's designed for ephemeral MCP server sessions where
  6 | conversations only need to persist during a single Claude session.
  7 | 
  8 | ⚠️  PROCESS-SPECIFIC STORAGE: This storage is confined to a single Python process.
  9 |     Data stored in one process is NOT accessible from other processes or subprocesses.
 10 |     This is why simulator tests that run server.py as separate subprocesses cannot
 11 |     share conversation state between tool calls.
 12 | 
 13 | Key Features:
 14 | - Thread-safe operations using locks
 15 | - TTL support with automatic expiration
 16 | - Background cleanup thread for memory management
 17 | - Singleton pattern for consistent state within a single process
 18 | - Drop-in replacement for Redis storage (for single-process scenarios)
 19 | """
 20 | 
 21 | import logging
 22 | import threading
 23 | import time
 24 | from typing import Optional
 25 | 
 26 | from utils.env import get_env
 27 | 
 28 | logger = logging.getLogger(__name__)
 29 | 
 30 | 
 31 | class InMemoryStorage:
 32 |     """Thread-safe in-memory storage for conversation threads"""
 33 | 
 34 |     def __init__(self):
 35 |         self._store: dict[str, tuple[str, float]] = {}
 36 |         self._lock = threading.Lock()
 37 |         # Match Redis behavior: cleanup interval based on conversation timeout
 38 |         # Run cleanup at 1/10th of timeout interval (e.g., 18 mins for 3 hour timeout)
 39 |         timeout_hours = int(get_env("CONVERSATION_TIMEOUT_HOURS", "3") or "3")
 40 |         self._cleanup_interval = (timeout_hours * 3600) // 10
 41 |         self._cleanup_interval = max(300, self._cleanup_interval)  # Minimum 5 minutes
 42 |         self._shutdown = False
 43 | 
 44 |         # Start background cleanup thread
 45 |         self._cleanup_thread = threading.Thread(target=self._cleanup_worker, daemon=True)
 46 |         self._cleanup_thread.start()
 47 | 
 48 |         logger.info(
 49 |             f"In-memory storage initialized with {timeout_hours}h timeout, cleanup every {self._cleanup_interval//60}m"
 50 |         )
 51 | 
 52 |     def set_with_ttl(self, key: str, ttl_seconds: int, value: str) -> None:
 53 |         """Store value with expiration time"""
 54 |         with self._lock:
 55 |             expires_at = time.time() + ttl_seconds
 56 |             self._store[key] = (value, expires_at)
 57 |             logger.debug(f"Stored key {key} with TTL {ttl_seconds}s")
 58 | 
 59 |     def get(self, key: str) -> Optional[str]:
 60 |         """Retrieve value if not expired"""
 61 |         with self._lock:
 62 |             if key in self._store:
 63 |                 value, expires_at = self._store[key]
 64 |                 if time.time() < expires_at:
 65 |                     logger.debug(f"Retrieved key {key}")
 66 |                     return value
 67 |                 else:
 68 |                     # Clean up expired entry
 69 |                     del self._store[key]
 70 |                     logger.debug(f"Key {key} expired and removed")
 71 |         return None
 72 | 
 73 |     def setex(self, key: str, ttl_seconds: int, value: str) -> None:
 74 |         """Redis-compatible setex method"""
 75 |         self.set_with_ttl(key, ttl_seconds, value)
 76 | 
 77 |     def _cleanup_worker(self):
 78 |         """Background thread that periodically cleans up expired entries"""
 79 |         while not self._shutdown:
 80 |             time.sleep(self._cleanup_interval)
 81 |             self._cleanup_expired()
 82 | 
 83 |     def _cleanup_expired(self):
 84 |         """Remove all expired entries"""
 85 |         with self._lock:
 86 |             current_time = time.time()
 87 |             expired_keys = [k for k, (_, exp) in self._store.items() if exp < current_time]
 88 |             for key in expired_keys:
 89 |                 del self._store[key]
 90 | 
 91 |             if expired_keys:
 92 |                 logger.debug(f"Cleaned up {len(expired_keys)} expired conversation threads")
 93 | 
 94 |     def shutdown(self):
 95 |         """Graceful shutdown of background thread"""
 96 |         self._shutdown = True
 97 |         if self._cleanup_thread.is_alive():
 98 |             self._cleanup_thread.join(timeout=1)
 99 | 
100 | 
101 | # Global singleton instance
102 | _storage_instance = None
103 | _storage_lock = threading.Lock()
104 | 
105 | 
106 | def get_storage_backend() -> InMemoryStorage:
107 |     """Get the global storage instance (singleton pattern)"""
108 |     global _storage_instance
109 |     if _storage_instance is None:
110 |         with _storage_lock:
111 |             if _storage_instance is None:
112 |                 _storage_instance = InMemoryStorage()
113 |                 logger.info("Initialized in-memory conversation storage")
114 |     return _storage_instance
115 | 
```

--------------------------------------------------------------------------------
/.github/workflows/docker-pr.yml:
--------------------------------------------------------------------------------

```yaml
  1 | name: PR Docker Build
  2 | 
  3 | on:
  4 |   pull_request:
  5 |     types: [opened, synchronize, reopened, labeled, unlabeled]
  6 |     paths:
  7 |       - '**.py'
  8 |       - 'requirements*.txt'
  9 |       - 'pyproject.toml'
 10 |       - 'Dockerfile'
 11 |       - 'docker-compose.yml'
 12 |       - '.dockerignore'
 13 | 
 14 | permissions:
 15 |   contents: read
 16 |   packages: write
 17 |   pull-requests: write
 18 | 
 19 | jobs:
 20 |   docker:
 21 |     name: Build Docker Image
 22 |     runs-on: ubuntu-latest
 23 |     if: |
 24 |       github.event.action == 'opened' ||
 25 |       github.event.action == 'synchronize' ||
 26 |       github.event.action == 'reopened' ||
 27 |       contains(github.event.pull_request.labels.*.name, 'docker-build')
 28 |     
 29 |     steps:
 30 |       - name: Checkout
 31 |         uses: actions/checkout@v4
 32 | 
 33 |       - name: Set up Docker Buildx
 34 |         uses: docker/setup-buildx-action@v3
 35 | 
 36 |       - name: Login to GitHub Container Registry
 37 |         if: github.event.pull_request.head.repo.full_name == github.repository
 38 |         uses: docker/login-action@v3
 39 |         with:
 40 |           registry: ghcr.io
 41 |           username: ${{ github.actor }}
 42 |           password: ${{ secrets.GITHUB_TOKEN }}
 43 | 
 44 |       - name: Extract metadata
 45 |         id: meta
 46 |         uses: docker/metadata-action@v5
 47 |         with:
 48 |           images: ghcr.io/${{ github.repository }}
 49 |           tags: |
 50 |             # PR-specific tag for testing
 51 |             type=raw,value=pr-${{ github.event.number }}-${{ github.sha }}
 52 |             type=raw,value=pr-${{ github.event.number }}
 53 | 
 54 |       - name: Build and push Docker image (internal PRs)
 55 |         if: github.event.pull_request.head.repo.full_name == github.repository
 56 |         uses: docker/build-push-action@v5
 57 |         with:
 58 |           context: .
 59 |           platforms: linux/amd64,linux/arm64
 60 |           push: true
 61 |           tags: ${{ steps.meta.outputs.tags }}
 62 |           labels: ${{ steps.meta.outputs.labels }}
 63 |           cache-from: type=gha
 64 |           cache-to: type=gha,mode=max
 65 | 
 66 |       - name: Build Docker image (fork PRs)
 67 |         if: github.event.pull_request.head.repo.full_name != github.repository
 68 |         uses: docker/build-push-action@v5
 69 |         with:
 70 |           context: .
 71 |           platforms: linux/amd64,linux/arm64
 72 |           push: false
 73 |           tags: ${{ steps.meta.outputs.tags }}
 74 |           labels: ${{ steps.meta.outputs.labels }}
 75 |           cache-from: type=gha
 76 |           cache-to: type=gha,mode=max
 77 | 
 78 |       - name: Add Docker build comment (internal PRs)
 79 |         if: github.event.pull_request.head.repo.full_name == github.repository
 80 |         uses: marocchino/sticky-pull-request-comment@d2ad0de260ae8b0235ce059e63f2949ba9e05943 # v2.9.3
 81 |         with:
 82 |           header: docker-build
 83 |           message: |
 84 |             ## 🐳 Docker Build Complete
 85 |             
 86 |             **PR**: #${{ github.event.number }} | **Commit**: `${{ github.sha }}`
 87 |             
 88 |             ```
 89 |             ${{ steps.meta.outputs.tags }}
 90 |             ```
 91 |             
 92 |             **Test:** `docker pull ghcr.io/${{ github.repository }}:pr-${{ github.event.number }}`
 93 |             
 94 |             **Claude config:**
 95 |             ```json
 96 |             {
 97 |               "mcpServers": {
 98 |                 "zen": {
 99 |                   "command": "docker",
100 |                   "args": ["run", "--rm", "-i", "-e", "GEMINI_API_KEY", "ghcr.io/${{ github.repository }}:pr-${{ github.event.number }}"],
101 |                   "env": { "GEMINI_API_KEY": "your-key" }
102 |                 }
103 |               }
104 |             }
105 |             ```
106 |             
107 |             💡 Add `docker-build` label to manually trigger builds
108 | 
109 | 
110 |       - name: Update job summary (internal PRs)
111 |         if: github.event.pull_request.head.repo.full_name == github.repository
112 |         run: |
113 |           {
114 |             echo "## 🐳 Docker Build Complete"
115 |             echo "**PR**: #${{ github.event.number }} | **Commit**: ${{ github.sha }}"
116 |             echo '```'
117 |             echo "${{ steps.meta.outputs.tags }}"
118 |             echo '```'
119 |           } >> $GITHUB_STEP_SUMMARY
120 | 
121 |       - name: Update job summary (fork PRs)
122 |         if: github.event.pull_request.head.repo.full_name != github.repository
123 |         run: |
124 |           {
125 |             echo "## 🐳 Docker Build Complete (Build Only)"
126 |             echo "**PR**: #${{ github.event.number }} | **Commit**: ${{ github.sha }}"
127 |             echo "✅ Multi-platform Docker build successful"
128 |             echo "Note: Fork PRs only build (no push) for security"
129 |           } >> $GITHUB_STEP_SUMMARY
130 | 
```

--------------------------------------------------------------------------------
/clink/parsers/gemini.py:
--------------------------------------------------------------------------------

```python
 1 | """Parser for Gemini CLI JSON output."""
 2 | 
 3 | from __future__ import annotations
 4 | 
 5 | import json
 6 | from typing import Any
 7 | 
 8 | from .base import BaseParser, ParsedCLIResponse, ParserError
 9 | 
10 | 
11 | class GeminiJSONParser(BaseParser):
12 |     """Parse stdout produced by `gemini -o json`."""
13 | 
14 |     name = "gemini_json"
15 | 
16 |     def parse(self, stdout: str, stderr: str) -> ParsedCLIResponse:
17 |         if not stdout.strip():
18 |             raise ParserError("Gemini CLI returned empty stdout while JSON output was expected")
19 | 
20 |         try:
21 |             payload: dict[str, Any] = json.loads(stdout)
22 |         except json.JSONDecodeError as exc:  # pragma: no cover - defensive logging
23 |             raise ParserError(f"Failed to decode Gemini CLI JSON output: {exc}") from exc
24 | 
25 |         response = payload.get("response")
26 |         response_text = response.strip() if isinstance(response, str) else ""
27 | 
28 |         metadata: dict[str, Any] = {"raw": payload}
29 | 
30 |         stats = payload.get("stats")
31 |         if isinstance(stats, dict):
32 |             metadata["stats"] = stats
33 |             models = stats.get("models")
34 |             if isinstance(models, dict) and models:
35 |                 model_name = next(iter(models.keys()))
36 |                 metadata["model_used"] = model_name
37 |                 model_stats = models.get(model_name) or {}
38 |                 tokens = model_stats.get("tokens")
39 |                 if isinstance(tokens, dict):
40 |                     metadata["token_usage"] = tokens
41 |                 api_stats = model_stats.get("api")
42 |                 if isinstance(api_stats, dict):
43 |                     metadata["latency_ms"] = api_stats.get("totalLatencyMs")
44 | 
45 |         if response_text:
46 |             if stderr and stderr.strip():
47 |                 metadata["stderr"] = stderr.strip()
48 |             return ParsedCLIResponse(content=response_text, metadata=metadata)
49 | 
50 |         fallback_message, extra_metadata = self._build_fallback_message(payload, stderr)
51 |         if fallback_message:
52 |             metadata.update(extra_metadata)
53 |             if stderr and stderr.strip():
54 |                 metadata["stderr"] = stderr.strip()
55 |             return ParsedCLIResponse(content=fallback_message, metadata=metadata)
56 | 
57 |         raise ParserError("Gemini CLI response is missing a textual 'response' field")
58 | 
59 |     def _build_fallback_message(self, payload: dict[str, Any], stderr: str) -> tuple[str | None, dict[str, Any]]:
60 |         """Derive a human friendly message when Gemini returns empty content."""
61 | 
62 |         stderr_text = stderr.strip() if stderr else ""
63 |         stderr_lower = stderr_text.lower()
64 |         extra_metadata: dict[str, Any] = {"empty_response": True}
65 | 
66 |         if "429" in stderr_lower or "rate limit" in stderr_lower:
67 |             extra_metadata["rate_limit_status"] = 429
68 |             message = (
69 |                 "Gemini request returned no content because the API reported a 429 rate limit. "
70 |                 "Retry after reducing the request size or waiting for quota to replenish."
71 |             )
72 |             return message, extra_metadata
73 | 
74 |         stats = payload.get("stats")
75 |         if isinstance(stats, dict):
76 |             models = stats.get("models")
77 |             if isinstance(models, dict) and models:
78 |                 first_model = next(iter(models.values()))
79 |                 if isinstance(first_model, dict):
80 |                     api_stats = first_model.get("api")
81 |                     if isinstance(api_stats, dict):
82 |                         total_errors = api_stats.get("totalErrors")
83 |                         total_requests = api_stats.get("totalRequests")
84 |                         if isinstance(total_errors, int) and total_errors > 0:
85 |                             extra_metadata["api_total_errors"] = total_errors
86 |                             if isinstance(total_requests, int):
87 |                                 extra_metadata["api_total_requests"] = total_requests
88 |                             message = (
89 |                                 "Gemini CLI returned no textual output. The API reported "
90 |                                 f"{total_errors} error(s); see stderr for details."
91 |                             )
92 |                             return message, extra_metadata
93 | 
94 |         if stderr_text:
95 |             message = "Gemini CLI returned no textual output. Raw stderr was preserved for troubleshooting."
96 |             return message, extra_metadata
97 | 
98 |         return None, extra_metadata
99 | 
```

--------------------------------------------------------------------------------
/docs/tools/apilookup.md:
--------------------------------------------------------------------------------

```markdown
  1 | # API Lookup Tool
  2 | 
  3 | The `apilookup` tool ensures you get **current, accurate API/SDK documentation** by forcing the AI to search for the latest information rather than relying on outdated training data. This is especially critical for OS-tied APIs (iOS, macOS, Android, etc.) where the AI's knowledge cutoff may be months or years old.
  4 | Most importantly, it does this within in a sub-process / sub-agent, saving you precious tokens within your working context window. 
  5 | 
  6 | ## Why Use This Tool?
  7 | 
  8 | ### Without Zen (Using Standard AI)
  9 | ```
 10 | User: "How do I add glass look to a button in Swift?"
 11 | 
 12 | AI: [Searches based on training data knowledge cutoff]
 13 |     "SwiftUI glass morphism frosted glass effect button iOS 18 2025"
 14 | 
 15 | Result: You get outdated APIs for iOS 18, not the iOS 26 effect you're after
 16 | ```
 17 | 
 18 | <div align="center">
 19 |     
 20 |  [API without Zen](https://github.com/user-attachments/assets/01a79dc9-ad16-4264-9ce1-76a56c3580ee)
 21 |  
 22 | </div>
 23 | 
 24 | ### With Zen (Using apilookup)
 25 | ```
 26 | User: "use apilookup how do I add glass look to a button in swift?"
 27 | 
 28 | AI: Step 1 - Search: "what is the latest iOS version 2025"
 29 |     → Finds: iOS 26 is current
 30 | 
 31 |     Step 2 - Search: "iOS 26 SwiftUI glass effect button 2025"
 32 |     → Gets current APIs specific to iOS 26
 33 | 
 34 | Result: You get the correct, current APIs that work with today's iOS version
 35 | ```
 36 | 
 37 | <div align="center">
 38 | 
 39 | [API with Zen](https://github.com/user-attachments/assets/5c847326-4b66-41f7-8f30-f380453dce22)
 40 | 
 41 | </div>
 42 | 
 43 | ## Key Features
 44 | 
 45 | ### 1. **OS Version Detection** (Critical!)
 46 | For any OS-tied request (iOS, macOS, Windows, Android, watchOS, tvOS), `apilookup` **MUST**:
 47 | - First search for the current OS version ("what is the latest iOS version 2025")
 48 | - **Never** rely on the AI's training data for version numbers
 49 | - Only after confirming current version, search for APIs/SDKs for that specific version
 50 | 
 51 | ### 2. **Authoritative Sources Only**
 52 | Prioritizes official documentation:
 53 | - Project documentation sites
 54 | - GitHub repositories
 55 | - Package registries (npm, PyPI, crates.io, Maven Central, etc.)
 56 | - Official blogs and release notes
 57 | 
 58 | ### 3. **Actionable, Concise Results**
 59 | - Current version numbers and release dates
 60 | - Breaking changes and migration notes
 61 | - Code examples and configuration options
 62 | - Deprecation warnings and security advisories
 63 | 
 64 | ## When to Use
 65 | 
 66 | - You need current API/SDK documentation or version info
 67 | - You're working with OS-specific frameworks (SwiftUI, UIKit, Jetpack Compose, etc.)
 68 | - You want to verify which version supports a feature
 69 | - You need migration guides or breaking change notes
 70 | - You're checking for deprecations or security advisories
 71 | 
 72 | ## Usage Examples
 73 | 
 74 | ### OS-Specific APIs
 75 | ```
 76 | use apilookup how do I add glass look to a button in swift?
 77 | use apilookup what's the latest way to handle permissions in Android?
 78 | use apilookup how do I use the new macOS window management APIs?
 79 | ```
 80 | 
 81 | ### Library/Framework Versions
 82 | ```
 83 | use apilookup find the latest Stripe Python SDK version and note any breaking changes since v7
 84 | use apilookup what's the current AWS CDK release and list migration steps from v2
 85 | use apilookup check the latest React version and any new hooks introduced in 2025
 86 | ```
 87 | 
 88 | ### Feature Compatibility
 89 | ```
 90 | use apilookup does the latest TypeScript support decorators natively?
 91 | use apilookup what's the current status of Swift async/await on Linux?
 92 | ```
 93 | 
 94 | ## How It Works
 95 | 
 96 | 1. **Receives your query** with API/SDK/framework name
 97 | 2. **Injects mandatory instructions** that force current-year searches
 98 | 3. **For OS-tied requests**: Requires two-step search (OS version first, then API)
 99 | 4. **Returns structured guidance** with instructions for web search
100 | 5. **AI executes searches** and provides authoritative, current documentation
101 | 
102 | ## Output Format
103 | 
104 | The tool returns JSON with:
105 | - `status`: "web_lookup_needed"
106 | - `instructions`: Detailed search strategy and requirements
107 | - `user_prompt`: Your original request
108 | 
109 | The AI then performs the actual web searches and synthesizes the results into actionable documentation.
110 | 
111 | ## Codex CLI Configuration Reminder
112 | 
113 | If you use Zen through the Codex CLI, the assistant needs Codex's native web-search tool to fetch current documentation. After adding the Zen MCP entry to `~/.codex/config.toml`, confirm the file also contains:
114 | 
115 | ```toml
116 | [tools]
117 | web_search = true
118 | ```
119 | 
120 | If `[tools]` is missing, append the block manually. Without this flag, `apilookup` will keep requesting web searches that Codex cannot execute, and you'll see repeated attempts at using `curl` incorrectly.
121 | 
```