dicklesworthstone/llm_gateway_mcp

This is page 7 of 45. Use http://codebase.md/dicklesworthstone/llm_gateway_mcp_server?lines=true&page={x} to view the full context.

# Directory Structure

```
├── .cursorignore
├── .env.example
├── .envrc
├── .gitignore
├── additional_features.md
├── check_api_keys.py
├── completion_support.py
├── comprehensive_test.py
├── docker-compose.yml
├── Dockerfile
├── empirically_measured_model_speeds.json
├── error_handling.py
├── example_structured_tool.py
├── examples
│   ├── __init__.py
│   ├── advanced_agent_flows_using_unified_memory_system_demo.py
│   ├── advanced_extraction_demo.py
│   ├── advanced_unified_memory_system_demo.py
│   ├── advanced_vector_search_demo.py
│   ├── analytics_reporting_demo.py
│   ├── audio_transcription_demo.py
│   ├── basic_completion_demo.py
│   ├── cache_demo.py
│   ├── claude_integration_demo.py
│   ├── compare_synthesize_demo.py
│   ├── cost_optimization.py
│   ├── data
│   │   ├── sample_event.txt
│   │   ├── Steve_Jobs_Introducing_The_iPhone_compressed.md
│   │   └── Steve_Jobs_Introducing_The_iPhone_compressed.mp3
│   ├── docstring_refiner_demo.py
│   ├── document_conversion_and_processing_demo.py
│   ├── entity_relation_graph_demo.py
│   ├── filesystem_operations_demo.py
│   ├── grok_integration_demo.py
│   ├── local_text_tools_demo.py
│   ├── marqo_fused_search_demo.py
│   ├── measure_model_speeds.py
│   ├── meta_api_demo.py
│   ├── multi_provider_demo.py
│   ├── ollama_integration_demo.py
│   ├── prompt_templates_demo.py
│   ├── python_sandbox_demo.py
│   ├── rag_example.py
│   ├── research_workflow_demo.py
│   ├── sample
│   │   ├── article.txt
│   │   ├── backprop_paper.pdf
│   │   ├── buffett.pdf
│   │   ├── contract_link.txt
│   │   ├── legal_contract.txt
│   │   ├── medical_case.txt
│   │   ├── northwind.db
│   │   ├── research_paper.txt
│   │   ├── sample_data.json
│   │   └── text_classification_samples
│   │       ├── email_classification.txt
│   │       ├── news_samples.txt
│   │       ├── product_reviews.txt
│   │       └── support_tickets.txt
│   ├── sample_docs
│   │   └── downloaded
│   │       └── attention_is_all_you_need.pdf
│   ├── sentiment_analysis_demo.py
│   ├── simple_completion_demo.py
│   ├── single_shot_synthesis_demo.py
│   ├── smart_browser_demo.py
│   ├── sql_database_demo.py
│   ├── sse_client_demo.py
│   ├── test_code_extraction.py
│   ├── test_content_detection.py
│   ├── test_ollama.py
│   ├── text_classification_demo.py
│   ├── text_redline_demo.py
│   ├── tool_composition_examples.py
│   ├── tournament_code_demo.py
│   ├── tournament_text_demo.py
│   ├── unified_memory_system_demo.py
│   ├── vector_search_demo.py
│   ├── web_automation_instruction_packs.py
│   └── workflow_delegation_demo.py
├── LICENSE
├── list_models.py
├── marqo_index_config.json.example
├── mcp_protocol_schema_2025-03-25_version.json
├── mcp_python_lib_docs.md
├── mcp_tool_context_estimator.py
├── model_preferences.py
├── pyproject.toml
├── quick_test.py
├── README.md
├── resource_annotations.py
├── run_all_demo_scripts_and_check_for_errors.py
├── storage
│   └── smart_browser_internal
│       ├── locator_cache.db
│       ├── readability.js
│       └── storage_state.enc
├── test_client.py
├── test_connection.py
├── TEST_README.md
├── test_sse_client.py
├── test_stdio_client.py
├── tests
│   ├── __init__.py
│   ├── conftest.py
│   ├── integration
│   │   ├── __init__.py
│   │   └── test_server.py
│   ├── manual
│   │   ├── test_extraction_advanced.py
│   │   └── test_extraction.py
│   └── unit
│       ├── __init__.py
│       ├── test_cache.py
│       ├── test_providers.py
│       └── test_tools.py
├── TODO.md
├── tool_annotations.py
├── tools_list.json
├── ultimate_mcp_banner.webp
├── ultimate_mcp_logo.webp
├── ultimate_mcp_server
│   ├── __init__.py
│   ├── __main__.py
│   ├── cli
│   │   ├── __init__.py
│   │   ├── __main__.py
│   │   ├── commands.py
│   │   ├── helpers.py
│   │   └── typer_cli.py
│   ├── clients
│   │   ├── __init__.py
│   │   ├── completion_client.py
│   │   └── rag_client.py
│   ├── config
│   │   └── examples
│   │       └── filesystem_config.yaml
│   ├── config.py
│   ├── constants.py
│   ├── core
│   │   ├── __init__.py
│   │   ├── evaluation
│   │   │   ├── base.py
│   │   │   └── evaluators.py
│   │   ├── providers
│   │   │   ├── __init__.py
│   │   │   ├── anthropic.py
│   │   │   ├── base.py
│   │   │   ├── deepseek.py
│   │   │   ├── gemini.py
│   │   │   ├── grok.py
│   │   │   ├── ollama.py
│   │   │   ├── openai.py
│   │   │   └── openrouter.py
│   │   ├── server.py
│   │   ├── state_store.py
│   │   ├── tournaments
│   │   │   ├── manager.py
│   │   │   ├── tasks.py
│   │   │   └── utils.py
│   │   └── ums_api
│   │       ├── __init__.py
│   │       ├── ums_database.py
│   │       ├── ums_endpoints.py
│   │       ├── ums_models.py
│   │       └── ums_services.py
│   ├── exceptions.py
│   ├── graceful_shutdown.py
│   ├── services
│   │   ├── __init__.py
│   │   ├── analytics
│   │   │   ├── __init__.py
│   │   │   ├── metrics.py
│   │   │   └── reporting.py
│   │   ├── cache
│   │   │   ├── __init__.py
│   │   │   ├── cache_service.py
│   │   │   ├── persistence.py
│   │   │   ├── strategies.py
│   │   │   └── utils.py
│   │   ├── cache.py
│   │   ├── document.py
│   │   ├── knowledge_base
│   │   │   ├── __init__.py
│   │   │   ├── feedback.py
│   │   │   ├── manager.py
│   │   │   ├── rag_engine.py
│   │   │   ├── retriever.py
│   │   │   └── utils.py
│   │   ├── prompts
│   │   │   ├── __init__.py
│   │   │   ├── repository.py
│   │   │   └── templates.py
│   │   ├── prompts.py
│   │   └── vector
│   │       ├── __init__.py
│   │       ├── embeddings.py
│   │       └── vector_service.py
│   ├── tool_token_counter.py
│   ├── tools
│   │   ├── __init__.py
│   │   ├── audio_transcription.py
│   │   ├── base.py
│   │   ├── completion.py
│   │   ├── docstring_refiner.py
│   │   ├── document_conversion_and_processing.py
│   │   ├── enhanced-ums-lookbook.html
│   │   ├── entity_relation_graph.py
│   │   ├── excel_spreadsheet_automation.py
│   │   ├── extraction.py
│   │   ├── filesystem.py
│   │   ├── html_to_markdown.py
│   │   ├── local_text_tools.py
│   │   ├── marqo_fused_search.py
│   │   ├── meta_api_tool.py
│   │   ├── ocr_tools.py
│   │   ├── optimization.py
│   │   ├── provider.py
│   │   ├── pyodide_boot_template.html
│   │   ├── python_sandbox.py
│   │   ├── rag.py
│   │   ├── redline-compiled.css
│   │   ├── sentiment_analysis.py
│   │   ├── single_shot_synthesis.py
│   │   ├── smart_browser.py
│   │   ├── sql_databases.py
│   │   ├── text_classification.py
│   │   ├── text_redline_tools.py
│   │   ├── tournament.py
│   │   ├── ums_explorer.html
│   │   └── unified_memory_system.py
│   ├── utils
│   │   ├── __init__.py
│   │   ├── async_utils.py
│   │   ├── display.py
│   │   ├── logging
│   │   │   ├── __init__.py
│   │   │   ├── console.py
│   │   │   ├── emojis.py
│   │   │   ├── formatter.py
│   │   │   ├── logger.py
│   │   │   ├── panels.py
│   │   │   ├── progress.py
│   │   │   └── themes.py
│   │   ├── parse_yaml.py
│   │   ├── parsing.py
│   │   ├── security.py
│   │   └── text.py
│   └── working_memory_api.py
├── unified_memory_system_technical_analysis.md
└── uv.lock
```

# Files

--------------------------------------------------------------------------------
/examples/compare_synthesize_demo.py:
--------------------------------------------------------------------------------

```python
  1 | #!/usr/bin/env python
  2 | """Enhanced demo of the Advanced Response Comparator & Synthesizer Tool."""
  3 | import asyncio
  4 | import json
  5 | import sys
  6 | from collections import namedtuple  # Import namedtuple
  7 | from pathlib import Path
  8 | 
  9 | # Add project root to path for imports when running as script
 10 | sys.path.insert(0, str(Path(__file__).parent.parent))
 11 | 
 12 | from rich import box
 13 | from rich.markup import escape
 14 | from rich.panel import Panel
 15 | from rich.rule import Rule
 16 | from rich.syntax import Syntax
 17 | from rich.table import Table
 18 | 
 19 | from ultimate_mcp_server.constants import Provider
 20 | from ultimate_mcp_server.core.server import Gateway  # Use Gateway to get MCP
 21 | 
 22 | # from ultimate_mcp_server.tools.meta import compare_and_synthesize  # Add correct import
 23 | from ultimate_mcp_server.utils import get_logger
 24 | from ultimate_mcp_server.utils.display import CostTracker  # Import CostTracker
 25 | from ultimate_mcp_server.utils.logging.console import console
 26 | 
 27 | # Initialize logger
 28 | logger = get_logger("example.compare_synthesize_v2")
 29 | 
 30 | # Create a simple structure for cost tracking from dict (tokens might be missing)
 31 | TrackableResult = namedtuple("TrackableResult", ["cost", "input_tokens", "output_tokens", "provider", "model", "processing_time"])
 32 | 
 33 | # Global MCP instance (will be populated from Gateway)
 34 | mcp = None
 35 | 
 36 | async def setup_gateway_and_tools():
 37 |     """Set up the gateway and register tools."""
 38 |     global mcp
 39 |     logger.info("Initializing Gateway and MetaTools for enhanced demo...", emoji_key="start")
 40 |     gateway = Gateway("compare-synthesize-demo-v2", register_tools=False)
 41 | 
 42 |     # Initialize providers (needed for the tool to function)
 43 |     try:
 44 |         await gateway._initialize_providers()
 45 |     except Exception as e:
 46 |         logger.critical(f"Failed to initialize providers: {e}. Check API keys.", emoji_key="critical", exc_info=True)
 47 |         sys.exit(1) # Exit if providers can't be initialized
 48 | 
 49 |     # REMOVE MetaTools instance
 50 |     # meta_tools = MetaTools(gateway) # Pass the gateway instance  # noqa: F841
 51 |     mcp = gateway.mcp # Store the MCP server instance
 52 |     
 53 |     # Manually register the required tool
 54 |     # mcp.tool()(compare_and_synthesize) 
 55 |     # logger.info("Manually registered compare_and_synthesize tool.")
 56 | 
 57 |     # Verify tool registration
 58 |     tool_list = await mcp.list_tools()
 59 |     tool_names = [t.name for t in tool_list] # Access name attribute directly
 60 |     # Use console.print for tool list
 61 |     console.print(f"Registered tools: [cyan]{escape(str(tool_names))}[/cyan]")
 62 |     if "compare_and_synthesize" in tool_names:
 63 |         logger.success("compare_and_synthesize tool registered successfully.", emoji_key="success")
 64 |     else:
 65 |         logger.error("compare_and_synthesize tool FAILED to register.", emoji_key="error")
 66 |         sys.exit(1) # Exit if the required tool isn't available
 67 | 
 68 |     logger.success("Setup complete.", emoji_key="success")
 69 | 
 70 | # Refactored print_result function using Rich
 71 | def print_result(title: str, result: dict):
 72 |     """Helper function to print results clearly using Rich components."""
 73 |     console.print(Rule(f"[bold blue]{escape(title)}[/bold blue]"))
 74 |     
 75 |     # Handle potential list result format (from older tool versions?)
 76 |     if isinstance(result, list) and len(result) > 0:
 77 |         if hasattr(result[0], 'text'):
 78 |             try:
 79 |                 result = json.loads(result[0].text)
 80 |             except Exception:
 81 |                 result = {"error": "Failed to parse result from list format"}
 82 |         else:
 83 |             result = result[0] # Assume first item is the dict
 84 |     elif not isinstance(result, dict):
 85 |         result = {"error": f"Unexpected result format: {type(result)}"}
 86 |     
 87 |     if result.get("error"):
 88 |         error_content = f"[red]Error:[/red] {escape(result['error'])}"
 89 |         if "partial_results" in result and result["partial_results"]:
 90 |             try:
 91 |                 partial_json = json.dumps(result["partial_results"], indent=2)
 92 |                 error_content += "\n\n[yellow]Partial Results:[/yellow]"
 93 |                 error_panel_content = Syntax(partial_json, "json", theme="default", line_numbers=False, word_wrap=True)
 94 |             except Exception as json_err:
 95 |                  error_panel_content = f"[red]Could not display partial results: {escape(str(json_err))}[/red]"
 96 |         else:
 97 |             error_panel_content = error_content
 98 |             
 99 |         console.print(Panel(
100 |             error_panel_content,
101 |             title="[bold red]Tool Error[/bold red]",
102 |             border_style="red",
103 |             expand=False
104 |         ))
105 |         
106 |     else:
107 |         # Display synthesis/analysis sections
108 |         if "synthesis" in result:
109 |             synthesis_data = result["synthesis"]
110 |             if isinstance(synthesis_data, dict):
111 |                 
112 |                 if "best_response_text" in synthesis_data:
113 |                     console.print(Panel(
114 |                         escape(synthesis_data["best_response_text"].strip()),
115 |                         title="[bold green]Best Response Text[/bold green]",
116 |                         border_style="green",
117 |                         expand=False
118 |                     ))
119 |                 
120 |                 if "synthesized_response" in synthesis_data:
121 |                      console.print(Panel(
122 |                         escape(synthesis_data["synthesized_response"].strip()),
123 |                         title="[bold magenta]Synthesized Response[/bold magenta]",
124 |                         border_style="magenta",
125 |                         expand=False
126 |                     ))
127 |                     
128 |                 if synthesis_data.get("best_response", {}).get("reasoning"):
129 |                     console.print(Panel(
130 |                         escape(synthesis_data["best_response"]["reasoning"].strip()),
131 |                         title="[bold cyan]Best Response Reasoning[/bold cyan]",
132 |                         border_style="dim cyan",
133 |                         expand=False
134 |                     ))
135 |                     
136 |                 if synthesis_data.get("synthesis_strategy"):
137 |                     console.print(Panel(
138 |                         escape(synthesis_data["synthesis_strategy"].strip()),
139 |                         title="[bold yellow]Synthesis Strategy Explanation[/bold yellow]",
140 |                         border_style="dim yellow",
141 |                         expand=False
142 |                     ))
143 | 
144 |                 if "ranking" in synthesis_data:
145 |                     try:
146 |                         ranking_json = json.dumps(synthesis_data["ranking"], indent=2)
147 |                         console.print(Panel(
148 |                             Syntax(ranking_json, "json", theme="default", line_numbers=False, word_wrap=True),
149 |                             title="[bold]Ranking[/bold]",
150 |                             border_style="dim blue",
151 |                             expand=False
152 |                         ))
153 |                     except Exception as json_err:
154 |                         console.print(f"[red]Could not display ranking: {escape(str(json_err))}[/red]")
155 |                         
156 |                 if "comparative_analysis" in synthesis_data:
157 |                     try:
158 |                         analysis_json = json.dumps(synthesis_data["comparative_analysis"], indent=2)
159 |                         console.print(Panel(
160 |                             Syntax(analysis_json, "json", theme="default", line_numbers=False, word_wrap=True),
161 |                             title="[bold]Comparative Analysis[/bold]",
162 |                             border_style="dim blue",
163 |                             expand=False
164 |                         ))
165 |                     except Exception as json_err:
166 |                         console.print(f"[red]Could not display comparative analysis: {escape(str(json_err))}[/red]")
167 | 
168 |             else: # Handle case where synthesis data isn't a dict (e.g., raw text error)
169 |                 console.print(Panel(
170 |                     f"[yellow]Synthesis Output (raw/unexpected format):[/yellow]\n{escape(str(synthesis_data))}",
171 |                     title="[bold yellow]Synthesis Data[/bold yellow]",
172 |                     border_style="yellow",
173 |                     expand=False
174 |                 ))
175 | 
176 |         # Display Stats Table
177 |         stats_table = Table(title="[bold]Execution Stats[/bold]", box=box.ROUNDED, show_header=False, expand=False)
178 |         stats_table.add_column("Metric", style="cyan", no_wrap=True)
179 |         stats_table.add_column("Value", style="white")
180 |         stats_table.add_row("Eval/Synth Model", f"{escape(result.get('synthesis_provider','N/A'))}/{escape(result.get('synthesis_model','N/A'))}")
181 |         stats_table.add_row("Total Cost", f"${result.get('cost', {}).get('total_cost', 0.0):.6f}")
182 |         stats_table.add_row("Processing Time", f"{result.get('processing_time', 0.0):.2f}s")
183 |         console.print(stats_table)
184 |         
185 |     console.print() # Add spacing after each result block
186 | 
187 | 
188 | async def run_comparison_demo(tracker: CostTracker):
189 |     """Demonstrate different modes of compare_and_synthesize."""
190 |     if not mcp:
191 |         logger.error("MCP server not initialized. Run setup first.", emoji_key="error")
192 |         return
193 | 
194 |     prompt = "Explain the main benefits of using asynchronous programming in Python for a moderately technical audience. Provide 2-3 key advantages."
195 | 
196 |     # --- Configuration for initial responses ---
197 |     console.print(Rule("[bold green]Configurations[/bold green]"))
198 |     console.print(f"[cyan]Prompt:[/cyan] {escape(prompt)}")
199 |     initial_configs = [
200 |         {"provider": Provider.OPENAI.value, "model": "gpt-4.1-mini", "parameters": {"temperature": 0.6, "max_tokens": 150}},
201 |         {"provider": Provider.ANTHROPIC.value, "model": "claude-3-5-haiku-20241022", "parameters": {"temperature": 0.5, "max_tokens": 150}},
202 |         {"provider": Provider.GEMINI.value, "model": "gemini-2.0-flash-lite", "parameters": {"temperature": 0.7, "max_tokens": 150}},
203 |         {"provider": Provider.DEEPSEEK.value, "model": "deepseek-chat", "parameters": {"temperature": 0.6, "max_tokens": 150}},
204 |     ]
205 |     console.print(f"[cyan]Initial Models:[/cyan] {[f'{c['provider']}:{c['model']}' for c in initial_configs]}")
206 | 
207 |     # --- Evaluation Criteria ---
208 |     criteria = [
209 |         "Clarity: Is the explanation clear and easy to understand for the target audience?",
210 |         "Accuracy: Are the stated benefits of async programming technically correct?",
211 |         "Relevance: Does the response directly address the prompt and focus on key advantages?",
212 |         "Conciseness: Is the explanation brief and to the point?",
213 |         "Completeness: Does it mention 2-3 distinct and significant benefits?",
214 |     ]
215 |     console.print("[cyan]Evaluation Criteria:[/cyan]")
216 |     for i, criterion in enumerate(criteria): 
217 |         console.print(f"  {i+1}. {escape(criterion)}")
218 | 
219 |     # --- Criteria Weights (Optional) ---
220 |     criteria_weights = {
221 |         "Clarity: Is the explanation clear and easy to understand for the target audience?": 0.3,
222 |         "Accuracy: Are the stated benefits of async programming technically correct?": 0.3,
223 |         "Relevance: Does the response directly address the prompt and focus on key advantages?": 0.15,
224 |         "Conciseness: Is the explanation brief and to the point?": 0.1,
225 |         "Completeness: Does it mention 2-3 distinct and significant benefits?": 0.15,
226 |     }
227 |     console.print("[cyan]Criteria Weights (Optional):[/cyan]")
228 |     # Create a small table for weights
229 |     weights_table = Table(box=box.MINIMAL, show_header=False)
230 |     weights_table.add_column("Criterion Snippet", style="dim")
231 |     weights_table.add_column("Weight", style="green")
232 |     for crit, weight in criteria_weights.items():
233 |         weights_table.add_row(escape(crit.split(':')[0]), f"{weight:.2f}")
234 |     console.print(weights_table)
235 | 
236 |     # --- Synthesis/Evaluation Model ---
237 |     synthesis_model_config = {"provider": Provider.OPENAI.value, "model": "gpt-4.1"} 
238 |     console.print(f"[cyan]Synthesis/Evaluation Model:[/cyan] {escape(synthesis_model_config['provider'])}:{escape(synthesis_model_config['model'])}")
239 |     console.print() # Spacing before demos start
240 | 
241 |     common_args = {
242 |         "prompt": prompt,
243 |         "configs": initial_configs,
244 |         "criteria": criteria,
245 |         "criteria_weights": criteria_weights,
246 |     }
247 | 
248 |     # --- Demo 1: Select Best Response ---
249 |     logger.info("Running format 'best'...", emoji_key="processing")
250 |     try:
251 |         result = await mcp.call_tool("compare_and_synthesize", {
252 |             **common_args,
253 |             "response_format": "best",
254 |             "include_reasoning": True, # Show why it was selected
255 |             "synthesis_model": synthesis_model_config # Explicitly specify model to avoid OpenRouter
256 |         })
257 |         print_result("Response Format: 'best' (with reasoning)", result)
258 |         # Track cost
259 |         if isinstance(result, dict) and "cost" in result and "synthesis_provider" in result and "synthesis_model" in result:
260 |             try:
261 |                 trackable = TrackableResult(
262 |                     cost=result.get("cost", {}).get("total_cost", 0.0),
263 |                     input_tokens=0, # Tokens not typically aggregated in this tool's output
264 |                     output_tokens=0,
265 |                     provider=result.get("synthesis_provider", "unknown"),
266 |                     model=result.get("synthesis_model", "compare_synthesize"),
267 |                     processing_time=result.get("processing_time", 0.0)
268 |                 )
269 |                 tracker.add_call(trackable)
270 |             except Exception as track_err:
271 |                 logger.warning(f"Could not track cost for 'best' format: {track_err}", exc_info=False)
272 |     except Exception as e:
273 |         logger.error(f"Error during 'best' format demo: {e}", emoji_key="error", exc_info=True)
274 | 
275 |     # --- Demo 2: Synthesize Responses (Comprehensive Strategy) ---
276 |     logger.info("Running format 'synthesis' (comprehensive)...", emoji_key="processing")
277 |     try:
278 |         result = await mcp.call_tool("compare_and_synthesize", {
279 |             **common_args,
280 |             "response_format": "synthesis",
281 |             "synthesis_strategy": "comprehensive",
282 |             "synthesis_model": synthesis_model_config, # Specify model for consistency
283 |             "include_reasoning": True,
284 |         })
285 |         print_result("Response Format: 'synthesis' (Comprehensive Strategy)", result)
286 |         # Track cost
287 |         if isinstance(result, dict) and "cost" in result and "synthesis_provider" in result and "synthesis_model" in result:
288 |             try:
289 |                 trackable = TrackableResult(
290 |                     cost=result.get("cost", {}).get("total_cost", 0.0),
291 |                     input_tokens=0, # Tokens not typically aggregated
292 |                     output_tokens=0,
293 |                     provider=result.get("synthesis_provider", "unknown"),
294 |                     model=result.get("synthesis_model", "compare_synthesize"),
295 |                     processing_time=result.get("processing_time", 0.0)
296 |                 )
297 |                 tracker.add_call(trackable)
298 |             except Exception as track_err:
299 |                 logger.warning(f"Could not track cost for 'synthesis comprehensive': {track_err}", exc_info=False)
300 |     except Exception as e:
301 |         logger.error(f"Error during 'synthesis comprehensive' demo: {e}", emoji_key="error", exc_info=True)
302 | 
303 |     # --- Demo 3: Synthesize Responses (Conservative Strategy, No Reasoning) ---
304 |     logger.info("Running format 'synthesis' (conservative, no reasoning)...", emoji_key="processing")
305 |     try:
306 |         result = await mcp.call_tool("compare_and_synthesize", {
307 |             **common_args,
308 |             "response_format": "synthesis",
309 |             "synthesis_strategy": "conservative",
310 |             "synthesis_model": synthesis_model_config, # Explicitly specify
311 |             "include_reasoning": False, # Hide the synthesis strategy explanation
312 |         })
313 |         print_result("Response Format: 'synthesis' (Conservative, No Reasoning)", result)
314 |         # Track cost
315 |         if isinstance(result, dict) and "cost" in result and "synthesis_provider" in result and "synthesis_model" in result:
316 |             try:
317 |                 trackable = TrackableResult(
318 |                     cost=result.get("cost", {}).get("total_cost", 0.0),
319 |                     input_tokens=0, # Tokens not typically aggregated
320 |                     output_tokens=0,
321 |                     provider=result.get("synthesis_provider", "unknown"),
322 |                     model=result.get("synthesis_model", "compare_synthesize"),
323 |                     processing_time=result.get("processing_time", 0.0)
324 |                 )
325 |                 tracker.add_call(trackable)
326 |             except Exception as track_err:
327 |                 logger.warning(f"Could not track cost for 'synthesis conservative': {track_err}", exc_info=False)
328 |     except Exception as e:
329 |         logger.error(f"Error during 'synthesis conservative' demo: {e}", emoji_key="error", exc_info=True)
330 | 
331 |     # --- Demo 4: Rank Responses ---
332 |     logger.info("Running format 'ranked'...", emoji_key="processing")
333 |     try:
334 |         result = await mcp.call_tool("compare_and_synthesize", {
335 |             **common_args,
336 |             "response_format": "ranked",
337 |             "include_reasoning": True, # Show reasoning for ranks
338 |             "synthesis_model": synthesis_model_config, # Explicitly specify
339 |         })
340 |         print_result("Response Format: 'ranked' (with reasoning)", result)
341 |         # Track cost
342 |         if isinstance(result, dict) and "cost" in result and "synthesis_provider" in result and "synthesis_model" in result:
343 |             try:
344 |                 trackable = TrackableResult(
345 |                     cost=result.get("cost", {}).get("total_cost", 0.0),
346 |                     input_tokens=0, # Tokens not typically aggregated
347 |                     output_tokens=0,
348 |                     provider=result.get("synthesis_provider", "unknown"),
349 |                     model=result.get("synthesis_model", "compare_synthesize"),
350 |                     processing_time=result.get("processing_time", 0.0)
351 |                 )
352 |                 tracker.add_call(trackable)
353 |             except Exception as track_err:
354 |                 logger.warning(f"Could not track cost for 'ranked' format: {track_err}", exc_info=False)
355 |     except Exception as e:
356 |         logger.error(f"Error during 'ranked' format demo: {e}", emoji_key="error", exc_info=True)
357 | 
358 |     # --- Demo 5: Analyze Responses ---
359 |     logger.info("Running format 'analysis'...", emoji_key="processing")
360 |     try:
361 |         result = await mcp.call_tool("compare_and_synthesize", {
362 |             **common_args,
363 |             "response_format": "analysis",
364 |             # No reasoning needed for analysis format, it's inherent
365 |             "synthesis_model": synthesis_model_config, # Explicitly specify
366 |         })
367 |         print_result("Response Format: 'analysis'", result)
368 |         # Track cost
369 |         if isinstance(result, dict) and "cost" in result and "synthesis_provider" in result and "synthesis_model" in result:
370 |             try:
371 |                 trackable = TrackableResult(
372 |                     cost=result.get("cost", {}).get("total_cost", 0.0),
373 |                     input_tokens=0, # Tokens not typically aggregated
374 |                     output_tokens=0,
375 |                     provider=result.get("synthesis_provider", "unknown"),
376 |                     model=result.get("synthesis_model", "compare_synthesize"),
377 |                     processing_time=result.get("processing_time", 0.0)
378 |                 )
379 |                 tracker.add_call(trackable)
380 |             except Exception as track_err:
381 |                 logger.warning(f"Could not track cost for 'analysis' format: {track_err}", exc_info=False)
382 |     except Exception as e:
383 |         logger.error(f"Error during 'analysis' format demo: {e}", emoji_key="error", exc_info=True)
384 | 
385 |     # Display cost summary at the end
386 |     tracker.display_summary(console)
387 | 
388 | 
389 | async def main():
390 |     """Run the enhanced compare_and_synthesize demo."""
391 |     tracker = CostTracker() # Instantiate tracker
392 |     await setup_gateway_and_tools()
393 |     await run_comparison_demo(tracker) # Pass tracker
394 |     # logger.info("Skipping run_comparison_demo() as the 'compare_and_synthesize' tool function is missing.") # Remove skip message
395 | 
396 | if __name__ == "__main__":
397 |     try:
398 |         asyncio.run(main())
399 |     except KeyboardInterrupt:
400 |         logger.info("Demo stopped by user.")
401 |     except Exception as main_err:
402 |          logger.critical(f"Demo failed with unexpected error: {main_err}", emoji_key="critical", exc_info=True)
```

--------------------------------------------------------------------------------
/ultimate_mcp_server/tools/tournament.py:
--------------------------------------------------------------------------------

```python
  1 | """Tournament tools for Ultimate MCP Server."""
  2 | from typing import Any, Dict, List, Optional
  3 | 
  4 | from ultimate_mcp_server.exceptions import ToolError
  5 | 
  6 | from ultimate_mcp_server.core.models.tournament import (
  7 |     CancelTournamentInput,
  8 |     CancelTournamentOutput,
  9 |     CreateTournamentInput,
 10 |     CreateTournamentOutput,
 11 |     GetTournamentResultsInput,
 12 |     GetTournamentStatusInput,
 13 |     GetTournamentStatusOutput,
 14 |     TournamentBasicInfo,
 15 |     TournamentData,
 16 |     TournamentStatus,
 17 | )
 18 | from ultimate_mcp_server.core.models.tournament import (
 19 |     EvaluatorConfig as InputEvaluatorConfig,
 20 | )
 21 | from ultimate_mcp_server.core.models.tournament import (
 22 |     ModelConfig as InputModelConfig,
 23 | )
 24 | from ultimate_mcp_server.core.tournaments.manager import tournament_manager
 25 | from ultimate_mcp_server.tools.base import with_error_handling, with_tool_metrics
 26 | from ultimate_mcp_server.utils import get_logger
 27 | 
 28 | logger = get_logger("ultimate_mcp_server.tools.tournament")
 29 | 
 30 | # --- Standalone Tool Functions ---
 31 | 
 32 | @with_tool_metrics
 33 | @with_error_handling
 34 | async def create_tournament(
 35 |     name: str,
 36 |     prompt: str,
 37 |     models: List[Dict[str, Any]],
 38 |     rounds: int = 3,
 39 |     tournament_type: str = "code",
 40 |     extraction_model_id: Optional[str] = "anthropic/claude-3-5-haiku-20241022",
 41 |     evaluators: Optional[List[Dict[str, Any]]] = None,
 42 |     max_retries_per_model_call: int = 3,
 43 |     retry_backoff_base_seconds: float = 1.0,
 44 |     max_concurrent_model_calls: int = 5
 45 | ) -> Dict[str, Any]:
 46 |     """
 47 |     Creates and starts a new LLM competition (tournament) based on a prompt and model configurations.
 48 | 
 49 |     Args:
 50 |         name: Human-readable name for the tournament (e.g., "Essay Refinement Contest", "Python Sorting Challenge").
 51 |         prompt: The task prompt provided to all participating LLM models.
 52 |         models: List of model configurations (external key is "models"). Each config is a dictionary specifying:
 53 |             - model_id (str, required): e.g., 'openai/gpt-4o'.
 54 |             - diversity_count (int, optional, default 1): Number of variants per model.
 55 |             # ... (rest of ModelConfig fields) ...
 56 |         rounds: Number of tournament rounds. Each round allows models to refine their previous output (if applicable to the tournament type). Default is 3.
 57 |         tournament_type: The type of tournament defining the task and evaluation method. Supported types include:
 58 |                          - "code": For evaluating code generation based on correctness and potentially style/efficiency.
 59 |                          - "text": For general text generation, improvement, or refinement tasks.
 60 |                          Default is "code".
 61 |         extraction_model_id: (Optional, primarily for 'code' type) Specific LLM model to use for extracting and evaluating results like code blocks. If None, a default is used.
 62 |         evaluators: (Optional) List of evaluator configurations as dicts.
 63 |         max_retries_per_model_call: Maximum retries per model call.
 64 |         retry_backoff_base_seconds: Base seconds for retry backoff.
 65 |         max_concurrent_model_calls: Maximum concurrent model calls.
 66 | 
 67 |     Returns:
 68 |         Dictionary with tournament creation status containing:
 69 |         - tournament_id: Unique identifier for the created tournament.
 70 |         - status: Initial tournament status (usually 'PENDING' or 'RUNNING').
 71 |         - storage_path: Filesystem path where tournament data will be stored.
 72 | 
 73 |     Example:
 74 |         {
 75 |             "tournament_id": "tour_abc123xyz789",
 76 |             "status": "PENDING",
 77 |             "storage_path": "/path/to/storage/tour_abc123xyz789"
 78 |         }
 79 | 
 80 |     Raises:
 81 |         ToolError: If input is invalid, tournament creation fails, or scheduling fails.
 82 |     """
 83 |     logger.info(f"Tool 'create_tournament' invoked for: {name}")
 84 |     try:
 85 |         parsed_model_configs = [InputModelConfig(**mc) for mc in models]
 86 |         parsed_evaluators = [InputEvaluatorConfig(**ev) for ev in (evaluators or [])]
 87 |         input_data = CreateTournamentInput(
 88 |             name=name,
 89 |             prompt=prompt,
 90 |             models=parsed_model_configs,
 91 |             rounds=rounds,
 92 |             tournament_type=tournament_type,
 93 |             extraction_model_id=extraction_model_id,
 94 |             evaluators=parsed_evaluators,
 95 |             max_retries_per_model_call=max_retries_per_model_call,
 96 |             retry_backoff_base_seconds=retry_backoff_base_seconds,
 97 |             max_concurrent_model_calls=max_concurrent_model_calls
 98 |         )
 99 | 
100 |         tournament = tournament_manager.create_tournament(input_data)
101 |         if not tournament:
102 |             raise ToolError("Failed to create tournament entry.")
103 | 
104 |         logger.info("Calling start_tournament_execution (using asyncio)")
105 |         success = tournament_manager.start_tournament_execution(
106 |             tournament_id=tournament.tournament_id
107 |         )
108 | 
109 |         if not success:
110 |             logger.error(f"Failed to schedule background execution for tournament {tournament.tournament_id}")
111 |             updated_tournament = tournament_manager.get_tournament(tournament.tournament_id)
112 |             error_msg = updated_tournament.error_message if updated_tournament else "Failed to schedule execution."
113 |             raise ToolError(f"Failed to start tournament execution: {error_msg}")
114 | 
115 |         logger.info(f"Tournament {tournament.tournament_id} ({tournament.name}) created and background execution started.")
116 |         # Include storage_path in the return value
117 |         output = CreateTournamentOutput(
118 |             tournament_id=tournament.tournament_id,
119 |             status=tournament.status,
120 |             storage_path=tournament.storage_path,
121 |             message=f"Tournament '{tournament.name}' created successfully and execution started."
122 |         )
123 |         return output.dict()
124 | 
125 |     except ValueError as ve:
126 |         logger.warning(f"Validation error creating tournament: {ve}")
127 |         raise ToolError(f"Invalid input: {ve}") from ve
128 |     except Exception as e:
129 |         logger.error(f"Error creating tournament: {e}", exc_info=True)
130 |         raise ToolError(f"An unexpected error occurred: {e}") from e
131 | 
132 | @with_tool_metrics
133 | @with_error_handling
134 | async def get_tournament_status(
135 |     tournament_id: str
136 | ) -> Dict[str, Any]:
137 |     """Retrieves the current status and progress of a specific tournament.
138 | 
139 |     Use this tool to monitor an ongoing tournament (PENDING, RUNNING) or check the final
140 |     state (COMPLETED, FAILED, CANCELLED) of a past tournament.
141 | 
142 |     Args:
143 |         tournament_id: Unique identifier of the tournament to check.
144 | 
145 |     Returns:
146 |         Dictionary containing tournament status information:
147 |         - tournament_id: Unique identifier for the tournament.
148 |         - name: Human-readable name of the tournament.
149 |         - tournament_type: Type of tournament (e.g., "code", "text").
150 |         - status: Current status (e.g., "PENDING", "RUNNING", "COMPLETED", "FAILED", "CANCELLED").
151 |         - current_round: Current round number (1-based) if RUNNING, else the last active round.
152 |         - total_rounds: Total number of rounds configured for this tournament.
153 |         - created_at: ISO timestamp when the tournament was created.
154 |         - updated_at: ISO timestamp when the tournament status was last updated.
155 |         - error_message: Error message if the tournament FAILED (null otherwise).
156 | 
157 |     Error Handling:
158 |         - Raises ToolError (400) if tournament_id format is invalid.
159 |         - Raises ToolError (404) if the tournament ID is not found.
160 |         - Raises ToolError (500) for internal server errors.
161 | 
162 |     Example:
163 |         {
164 |             "tournament_id": "tour_abc123xyz789",
165 |             "name": "Essay Refinement Contest",
166 |             "tournament_type": "text",
167 |             "status": "RUNNING",
168 |             "current_round": 2,
169 |             "total_rounds": 3,
170 |             "created_at": "2023-04-15T14:32:17.123456",
171 |             "updated_at": "2023-04-15T14:45:22.123456",
172 |             "error_message": null
173 |         }
174 |     """
175 |     logger.debug(f"Getting status for tournament: {tournament_id}")
176 |     try:
177 |         if not tournament_id or not isinstance(tournament_id, str):
178 |             raise ToolError(
179 |                 status_code=400,
180 |                 detail="Invalid tournament ID format. Tournament ID must be a non-empty string."
181 |             )
182 | 
183 |         try:
184 |             input_data = GetTournamentStatusInput(tournament_id=tournament_id)
185 |         except ValueError as ve:
186 |             raise ToolError(
187 |                 status_code=400,
188 |                 detail=f"Invalid tournament ID: {str(ve)}"
189 |             ) from ve
190 | 
191 |         tournament = tournament_manager.get_tournament(input_data.tournament_id, force_reload=True)
192 |         if not tournament:
193 |             raise ToolError(
194 |                 status_code=404,
195 |                 detail=f"Tournament not found: {tournament_id}. Check if the tournament ID is correct or use list_tournaments to see all available tournaments."
196 |             )
197 | 
198 |         try:
199 |             output = GetTournamentStatusOutput(
200 |                 tournament_id=tournament.tournament_id,
201 |                 name=tournament.name,
202 |                 tournament_type=tournament.config.tournament_type,
203 |                 status=tournament.status,
204 |                 current_round=tournament.current_round,
205 |                 total_rounds=tournament.config.rounds,
206 |                 created_at=tournament.created_at,
207 |                 updated_at=tournament.updated_at,
208 |                 error_message=tournament.error_message
209 |             )
210 |             return output.dict()
211 |         except Exception as e:
212 |             logger.error(f"Error converting tournament data to output format: {e}", exc_info=True)
213 |             raise ToolError(
214 |                 status_code=500,
215 |                 detail=f"Error processing tournament data: {str(e)}. The tournament data may be corrupted."
216 |             ) from e
217 |     except ToolError:
218 |         raise
219 |     except Exception as e:
220 |         logger.error(f"Error getting tournament status for {tournament_id}: {e}", exc_info=True)
221 |         raise ToolError(
222 |             status_code=500,
223 |             detail=f"Internal server error retrieving tournament status: {str(e)}. Please try again or check the server logs."
224 |         ) from e
225 | 
226 | @with_tool_metrics
227 | @with_error_handling
228 | async def list_tournaments(
229 | ) -> List[Dict[str, Any]]:
230 |     """Lists all created tournaments with basic identifying information and status.
231 | 
232 |     Useful for discovering existing tournaments and their current states without fetching full results.
233 | 
234 |     Returns:
235 |         List of dictionaries, each containing basic tournament info:
236 |         - tournament_id: Unique identifier for the tournament.
237 |         - name: Human-readable name of the tournament.
238 |         - tournament_type: Type of tournament (e.g., "code", "text").
239 |         - status: Current status (e.g., "PENDING", "RUNNING", "COMPLETED", "FAILED", "CANCELLED").
240 |         - created_at: ISO timestamp when the tournament was created.
241 |         - updated_at: ISO timestamp when the tournament was last updated.
242 | 
243 |     Example:
244 |         [
245 |             {
246 |                 "tournament_id": "tour_abc123",
247 |                 "name": "Tournament A",
248 |                 "tournament_type": "code",
249 |                 "status": "COMPLETED",
250 |                 "created_at": "2023-04-10T10:00:00",
251 |                 "updated_at": "2023-04-10T12:30:00"
252 |             },
253 |             ...
254 |         ]
255 |     """
256 |     logger.debug("Listing all tournaments")
257 |     try:
258 |         tournaments = tournament_manager.list_tournaments()
259 |         output_list = []
260 |         for tournament in tournaments:
261 |             try:
262 |                 # Ensure tournament object has necessary attributes before accessing
263 |                 if not hasattr(tournament, 'tournament_id') or \
264 |                    not hasattr(tournament, 'name') or \
265 |                    not hasattr(tournament, 'config') or \
266 |                    not hasattr(tournament.config, 'tournament_type') or \
267 |                    not hasattr(tournament, 'status') or \
268 |                    not hasattr(tournament, 'created_at') or \
269 |                    not hasattr(tournament, 'updated_at'):
270 |                     logger.warning(f"Skipping tournament due to missing attributes: {getattr(tournament, 'tournament_id', 'UNKNOWN ID')}")
271 |                     continue
272 | 
273 |                 basic_info = TournamentBasicInfo(
274 |                     tournament_id=tournament.tournament_id,
275 |                     name=tournament.name,
276 |                     tournament_type=tournament.config.tournament_type,
277 |                     status=tournament.status,
278 |                     created_at=tournament.created_at,
279 |                     updated_at=tournament.updated_at,
280 |                 )
281 |                 output_list.append(basic_info.dict())
282 |             except Exception as e:
283 |                 logger.warning(f"Skipping tournament {getattr(tournament, 'tournament_id', 'UNKNOWN')} due to data error during processing: {e}")
284 |         return output_list
285 |     except Exception as e:
286 |         logger.error(f"Error listing tournaments: {e}", exc_info=True)
287 |         raise ToolError(
288 |             status_code=500,
289 |             detail=f"Internal server error listing tournaments: {str(e)}"
290 |         ) from e
291 | 
292 | @with_tool_metrics
293 | @with_error_handling
294 | async def get_tournament_results(
295 |     tournament_id: str
296 | ) -> List[Dict[str, str]]:
297 |     """Retrieves the complete results and configuration for a specific tournament.
298 | 
299 |     Provides comprehensive details including configuration, final scores (if applicable),
300 |     detailed round-by-round results, model outputs, and any errors encountered.
301 |     Use this *after* a tournament has finished (COMPLETED or FAILED) for full analysis.
302 | 
303 |     Args:
304 |         tournament_id: Unique identifier for the tournament.
305 | 
306 |     Returns:
307 |         Dictionary containing the full tournament data (structure depends on the tournament manager's implementation, but generally includes config, status, results, timestamps, etc.).
308 | 
309 |     Example (Conceptual - actual structure may vary):
310 |         {
311 |             "tournament_id": "tour_abc123",
312 |             "name": "Sorting Algo Test",
313 |             "status": "COMPLETED",
314 |             "config": { ... },
315 |             "results": { "scores": { ... }, "round_results": [ { ... }, ... ] },
316 |             "created_at": "...",
317 |             "updated_at": "...",
318 |             "error_message": null
319 |         }
320 | 
321 |     Raises:
322 |         ToolError: If the tournament ID is invalid, not found, results are not ready (still PENDING/RUNNING), or an internal error occurs.
323 |     """
324 |     logger.debug(f"Getting results for tournament: {tournament_id}")
325 |     try:
326 |         if not tournament_id or not isinstance(tournament_id, str):
327 |             raise ToolError(
328 |                 status_code=400,
329 |                 detail="Invalid tournament ID format. Tournament ID must be a non-empty string."
330 |             )
331 | 
332 |         try:
333 |             input_data = GetTournamentResultsInput(tournament_id=tournament_id)
334 |         except ValueError as ve:
335 |              raise ToolError(
336 |                 status_code=400,
337 |                 detail=f"Invalid tournament ID: {str(ve)}"
338 |             ) from ve
339 | 
340 |         # Make sure to request TournamentData which should contain results
341 |         tournament_data: Optional[TournamentData] = tournament_manager.get_tournament(input_data.tournament_id, force_reload=True)
342 | 
343 |         if not tournament_data:
344 |             # Check if the tournament exists but just has no results yet (e.g., PENDING)
345 |             tournament_status_info = tournament_manager.get_tournament(tournament_id) # Gets basic info
346 |             if tournament_status_info:
347 |                 current_status = tournament_status_info.status
348 |                 if current_status in [TournamentStatus.PENDING, TournamentStatus.RUNNING]:
349 |                      raise ToolError(
350 |                          status_code=404, # Use 404 to indicate results not ready
351 |                          detail=f"Tournament '{tournament_id}' is currently {current_status}. Results are not yet available."
352 |                      )
353 |                 else: # Should have results if COMPLETED or ERROR, maybe data issue?
354 |                      logger.error(f"Tournament {tournament_id} status is {current_status} but get_tournament_results returned None.")
355 |                      raise ToolError(
356 |                          status_code=500,
357 |                          detail=f"Could not retrieve results for tournament '{tournament_id}' despite status being {current_status}. There might be an internal data issue."
358 |                      )
359 |             else:
360 |                 raise ToolError(
361 |                     status_code=404,
362 |                     detail=f"Tournament not found: {tournament_id}. Cannot retrieve results."
363 |                 )
364 | 
365 |         # NEW: Return a structure that FastMCP might recognize as a pre-formatted content list
366 |         json_string = tournament_data.json()
367 |         logger.info(f"[DEBUG_GET_RESULTS] Returning pre-formatted TextContent list. JSON Snippet: {json_string[:150]}")
368 |         return [{ "type": "text", "text": json_string }]
369 | 
370 |     except ToolError:
371 |         raise
372 |     except Exception as e:
373 |         logger.error(f"Error getting tournament results for {tournament_id}: {e}", exc_info=True)
374 |         raise ToolError(
375 |             f"Internal server error retrieving tournament results: {str(e)}",
376 |             500 # status_code
377 |         ) from e
378 | 
379 | @with_tool_metrics
380 | @with_error_handling
381 | async def cancel_tournament(
382 |     tournament_id: str
383 | ) -> Dict[str, Any]:
384 |     """Attempts to cancel a running (RUNNING) or pending (PENDING) tournament.
385 | 
386 |     Signals the tournament manager to stop processing. Cancellation is not guaranteed
387 |     to be immediate. Check status afterwards using `get_tournament_status`.
388 |     Cannot cancel tournaments that are already COMPLETED, FAILED, or CANCELLED.
389 | 
390 |     Args:
391 |         tournament_id: Unique identifier for the tournament to cancel.
392 | 
393 |     Returns:
394 |         Dictionary confirming the cancellation attempt:
395 |         - tournament_id: The ID of the tournament targeted for cancellation.
396 |         - status: The status *after* the cancellation attempt (e.g., "CANCELLED", or the previous state like "COMPLETED" if cancellation was not possible).
397 |         - message: A message indicating the outcome (e.g., "Tournament cancellation requested successfully.", "Cancellation failed: Tournament is already COMPLETED.").
398 | 
399 |     Raises:
400 |         ToolError: If the tournament ID is invalid, not found, or an internal error occurs.
401 |     """
402 |     logger.info(f"Received request to cancel tournament: {tournament_id}")
403 |     try:
404 |         if not tournament_id or not isinstance(tournament_id, str):
405 |             raise ToolError(status_code=400, detail="Invalid tournament ID format.")
406 | 
407 |         try:
408 |             input_data = CancelTournamentInput(tournament_id=tournament_id)
409 |         except ValueError as ve:
410 |             raise ToolError(status_code=400, detail=f"Invalid tournament ID: {str(ve)}") from ve
411 | 
412 |         # Call the manager's cancel function
413 |         success, message, final_status = await tournament_manager.cancel_tournament(input_data.tournament_id)
414 | 
415 |         # Prepare output using the Pydantic model
416 |         output = CancelTournamentOutput(
417 |             tournament_id=tournament_id,
418 |             status=final_status, # Return the actual status after attempt
419 |             message=message
420 |         )
421 | 
422 |         if not success:
423 |             # Log the failure but return the status/message from the manager
424 |             logger.warning(f"Cancellation attempt for tournament {tournament_id} reported failure: {message}")
425 |             # Raise ToolError if the status implies a client error (e.g., not found)
426 |             if "not found" in message.lower():
427 |                 raise ToolError(status_code=404, detail=message)
428 |             elif final_status in [TournamentStatus.COMPLETED, TournamentStatus.FAILED, TournamentStatus.CANCELLED] and "already" in message.lower():
429 |                 raise ToolError(status_code=409, detail=message)
430 |             # Optionally handle other errors as 500
431 |             # else:
432 |             #     raise ToolError(status_code=500, detail=f"Cancellation failed: {message}")
433 |         else:
434 |             logger.info(f"Cancellation attempt for tournament {tournament_id} successful. Final status: {final_status}")
435 | 
436 |         return output.dict()
437 | 
438 |     except ToolError:
439 |         raise
440 |     except Exception as e:
441 |         logger.error(f"Error cancelling tournament {tournament_id}: {e}", exc_info=True)
442 |         raise ToolError(status_code=500, detail=f"Internal server error during cancellation: {str(e)}") from e
```

--------------------------------------------------------------------------------
/ultimate_mcp_server/core/providers/openai.py:
--------------------------------------------------------------------------------

```python
  1 | """OpenAI provider implementation."""
  2 | import time
  3 | from typing import Any, AsyncGenerator, Dict, List, Optional, Tuple
  4 | 
  5 | from openai import AsyncOpenAI
  6 | 
  7 | from ultimate_mcp_server.constants import Provider
  8 | from ultimate_mcp_server.core.providers.base import BaseProvider, ModelResponse
  9 | from ultimate_mcp_server.utils import get_logger
 10 | 
 11 | # Use the same naming scheme everywhere: logger at module level
 12 | logger = get_logger("ultimate_mcp_server.providers.openai")
 13 | 
 14 | 
 15 | class OpenAIProvider(BaseProvider):
 16 |     """Provider implementation for OpenAI API."""
 17 |     
 18 |     provider_name = Provider.OPENAI.value
 19 |     
 20 |     def __init__(self, api_key: Optional[str] = None, **kwargs):
 21 |         """Initialize the OpenAI provider.
 22 |         
 23 |         Args:
 24 |             api_key: OpenAI API key
 25 |             **kwargs: Additional options
 26 |         """
 27 |         super().__init__(api_key=api_key, **kwargs)
 28 |         self.base_url = kwargs.get("base_url")
 29 |         self.organization = kwargs.get("organization")
 30 |         self.models_cache = None
 31 |         
 32 |     async def initialize(self) -> bool:
 33 |         """Initialize the OpenAI client.
 34 |         
 35 |         Returns:
 36 |             bool: True if initialization was successful
 37 |         """
 38 |         try:
 39 |             self.client = AsyncOpenAI(
 40 |                 api_key=self.api_key, 
 41 |                 base_url=self.base_url,
 42 |                 organization=self.organization,
 43 |             )
 44 |             
 45 |             # Skip API call if using a mock key (for tests)
 46 |             if self.api_key and "mock-" in self.api_key:
 47 |                 self.logger.info(
 48 |                     "Using mock OpenAI key - skipping API validation",
 49 |                     emoji_key="mock"
 50 |                 )
 51 |                 return True
 52 |             
 53 |             # Test connection by listing models
 54 |             await self.list_models()
 55 |             
 56 |             self.logger.success(
 57 |                 "OpenAI provider initialized successfully", 
 58 |                 emoji_key="provider"
 59 |             )
 60 |             return True
 61 |             
 62 |         except Exception as e:
 63 |             self.logger.error(
 64 |                 f"Failed to initialize OpenAI provider: {str(e)}", 
 65 |                 emoji_key="error"
 66 |             )
 67 |             return False
 68 |         
 69 |     async def generate_completion(
 70 |         self,
 71 |         prompt: Optional[str] = None,
 72 |         model: Optional[str] = None,
 73 |         max_tokens: Optional[int] = None,
 74 |         temperature: float = 0.7,
 75 |         **kwargs
 76 |     ) -> ModelResponse:
 77 |         """Generate a completion using OpenAI.
 78 |         
 79 |         Args:
 80 |             prompt: Text prompt to send to the model
 81 |             model: Model name to use (e.g., "gpt-4o")
 82 |             max_tokens: Maximum tokens to generate
 83 |             temperature: Temperature parameter (0.0-1.0)
 84 |             **kwargs: Additional model-specific parameters
 85 |             
 86 |         Returns:
 87 |             ModelResponse with completion result
 88 |             
 89 |         Raises:
 90 |             Exception: If API call fails
 91 |         """
 92 |         if not self.client:
 93 |             await self.initialize()
 94 |             
 95 |         # Use default model if not specified
 96 |         model = model or self.get_default_model()
 97 |         
 98 |         # Strip provider prefix if present (e.g., "openai/gpt-4o" -> "gpt-4o")
 99 |         if model.startswith(f"{self.provider_name}/"):
100 |             original_model = model
101 |             model = model.split("/", 1)[1]
102 |             self.logger.debug(f"Stripped provider prefix from model name: {original_model} -> {model}")
103 |         
104 |         # Handle case when messages are provided instead of prompt (for chat_completion)
105 |         messages = kwargs.pop("messages", None)
106 |         
107 |         # If neither prompt nor messages are provided, raise an error
108 |         if prompt is None and not messages:
109 |             raise ValueError("Either 'prompt' or 'messages' must be provided")
110 |             
111 |         # Create messages if not already provided
112 |         if not messages:
113 |             messages = [{"role": "user", "content": prompt}]
114 |         
115 |         # Prepare API call parameters
116 |         params = {
117 |             "model": model,
118 |             "messages": messages,
119 |             "temperature": temperature,
120 |         }
121 |         
122 |         # Add max_tokens if specified
123 |         if max_tokens is not None:
124 |             params["max_tokens"] = max_tokens
125 |             
126 |         # Check for json_mode flag and remove it from kwargs
127 |         json_mode = kwargs.pop("json_mode", False)
128 |         if json_mode:
129 |             # Use the correct response_format for JSON mode
130 |             params["response_format"] = {"type": "json_object"}
131 |             self.logger.debug("Setting response_format to JSON mode for OpenAI")
132 | 
133 |         # Handle any legacy response_format passed directly, but prefer json_mode
134 |         if "response_format" in kwargs and not json_mode:
135 |              # Support both direct format object and type-only specification
136 |              response_format = kwargs.pop("response_format")
137 |              if isinstance(response_format, dict):
138 |                  params["response_format"] = response_format
139 |              elif isinstance(response_format, str) and response_format in ["json_object", "text"]:
140 |                  params["response_format"] = {"type": response_format}
141 |              self.logger.debug(f"Setting response_format from direct param: {params.get('response_format')}")
142 | 
143 |         # Add any remaining additional parameters
144 |         params.update(kwargs)
145 | 
146 |         # --- Special handling for specific model parameter constraints ---
147 |         if model == 'o3-mini':
148 |             if 'temperature' in params:
149 |                 self.logger.debug(f"Removing unsupported 'temperature' parameter for model {model}")
150 |                 del params['temperature']
151 |         elif model == 'o1-preview':
152 |             current_temp = params.get('temperature')
153 |             # Only allow temperature if it's explicitly set to 1.0, otherwise remove it to use API default.
154 |             if current_temp is not None and current_temp != 1.0:
155 |                 self.logger.debug(f"Removing non-default 'temperature' ({current_temp}) for model {model}")
156 |                 del params['temperature']
157 |         # --- End special handling ---
158 |         
159 |         # Log request
160 |         prompt_length = len(prompt) if prompt else sum(len(m.get("content", "")) for m in messages)
161 |         self.logger.info(
162 |             f"Generating completion with OpenAI model {model}",
163 |             emoji_key=self.provider_name,
164 |             prompt_length=prompt_length,
165 |             json_mode=json_mode # Log if json_mode was requested
166 |         )
167 |         
168 |         try:
169 |             # API call with timing
170 |             response, processing_time = await self.process_with_timer(
171 |                 self.client.chat.completions.create, **params
172 |             )
173 |             
174 |             # Extract response text
175 |             completion_text = response.choices[0].message.content
176 |             
177 |             # Create message object for chat_completion
178 |             message = {
179 |                 "role": "assistant",
180 |                 "content": completion_text
181 |             }
182 |             
183 |             # Create standardized response
184 |             result = ModelResponse(
185 |                 text=completion_text,
186 |                 model=model,
187 |                 provider=self.provider_name,
188 |                 input_tokens=response.usage.prompt_tokens,
189 |                 output_tokens=response.usage.completion_tokens,
190 |                 total_tokens=response.usage.total_tokens,
191 |                 processing_time=processing_time,
192 |                 raw_response=response,
193 |             )
194 |             
195 |             # Add message to result for chat_completion
196 |             result.message = message
197 |             
198 |             # Log success
199 |             self.logger.success(
200 |                 "OpenAI completion successful",
201 |                 emoji_key="success",
202 |                 model=model,
203 |                 tokens={
204 |                     "input": result.input_tokens,
205 |                     "output": result.output_tokens
206 |                 },
207 |                 cost=result.cost,
208 |                 time=result.processing_time
209 |             )
210 |             
211 |             return result
212 |             
213 |         except Exception as e:
214 |             self.logger.error(
215 |                 f"OpenAI completion failed: {str(e)}",
216 |                 emoji_key="error",
217 |                 model=model
218 |             )
219 |             raise
220 |             
221 |     async def generate_completion_stream(
222 |         self,
223 |         prompt: Optional[str] = None,
224 |         model: Optional[str] = None,
225 |         max_tokens: Optional[int] = None,
226 |         temperature: float = 0.7,
227 |         **kwargs
228 |     ) -> AsyncGenerator[Tuple[str, Dict[str, Any]], None]:
229 |         """Generate a streaming completion using OpenAI.
230 |         
231 |         Args:
232 |             prompt: Text prompt to send to the model
233 |             model: Model name to use (e.g., "gpt-4o")
234 |             max_tokens: Maximum tokens to generate
235 |             temperature: Temperature parameter (0.0-1.0)
236 |             **kwargs: Additional model-specific parameters
237 |             
238 |         Yields:
239 |             Tuple of (text_chunk, metadata)
240 |             
241 |         Raises:
242 |             Exception: If API call fails
243 |         """
244 |         if not self.client:
245 |             await self.initialize()
246 |             
247 |         # Use default model if not specified
248 |         model = model or self.get_default_model()
249 |         
250 |         # Strip provider prefix if present (e.g., "openai/gpt-4o" -> "gpt-4o")
251 |         if model.startswith(f"{self.provider_name}/"):
252 |             original_model = model
253 |             model = model.split("/", 1)[1]
254 |             self.logger.debug(f"Stripped provider prefix from model name (stream): {original_model} -> {model}")
255 |         
256 |         # Handle case when messages are provided instead of prompt (for chat_completion)
257 |         messages = kwargs.pop("messages", None)
258 |         
259 |         # If neither prompt nor messages are provided, raise an error
260 |         if prompt is None and not messages:
261 |             raise ValueError("Either 'prompt' or 'messages' must be provided")
262 |             
263 |         # Create messages if not already provided
264 |         if not messages:
265 |             messages = [{"role": "user", "content": prompt}]
266 |         
267 |         # Prepare API call parameters
268 |         params = {
269 |             "model": model,
270 |             "messages": messages,
271 |             "temperature": temperature,
272 |             "stream": True,
273 |         }
274 |         
275 |         # Add max_tokens if specified
276 |         if max_tokens is not None:
277 |             params["max_tokens"] = max_tokens
278 |             
279 |         # Check for json_mode flag and remove it from kwargs
280 |         json_mode = kwargs.pop("json_mode", False)
281 |         if json_mode:
282 |             # Use the correct response_format for JSON mode
283 |             params["response_format"] = {"type": "json_object"}
284 |             self.logger.debug("Setting response_format to JSON mode for OpenAI streaming")
285 | 
286 |         # Add any remaining additional parameters
287 |         params.update(kwargs)
288 |         
289 |         # Log request
290 |         prompt_length = len(prompt) if prompt else sum(len(m.get("content", "")) for m in messages)
291 |         self.logger.info(
292 |             f"Generating streaming completion with OpenAI model {model}",
293 |             emoji_key=self.provider_name,
294 |             prompt_length=prompt_length,
295 |             json_mode=json_mode # Log if json_mode was requested
296 |         )
297 |         
298 |         start_time = time.time()
299 |         total_chunks = 0
300 |         
301 |         try:
302 |             # Make streaming API call
303 |             stream = await self.client.chat.completions.create(**params)
304 |             
305 |             # Process the stream
306 |             async for chunk in stream:
307 |                 total_chunks += 1
308 |                 
309 |                 # Extract content from the chunk
310 |                 delta = chunk.choices[0].delta
311 |                 content = delta.content or ""
312 |                 
313 |                 # Metadata for this chunk
314 |                 metadata = {
315 |                     "model": model,
316 |                     "provider": self.provider_name,
317 |                     "chunk_index": total_chunks,
318 |                     "finish_reason": chunk.choices[0].finish_reason,
319 |                 }
320 |                 
321 |                 yield content, metadata
322 |                 
323 |             # Log success
324 |             processing_time = time.time() - start_time
325 |             self.logger.success(
326 |                 "OpenAI streaming completion successful",
327 |                 emoji_key="success",
328 |                 model=model,
329 |                 chunks=total_chunks,
330 |                 time=processing_time
331 |             )
332 |             
333 |         except Exception as e:
334 |             self.logger.error(
335 |                 f"OpenAI streaming completion failed: {str(e)}",
336 |                 emoji_key="error",
337 |                 model=model
338 |             )
339 |             raise
340 |             
341 |     async def list_models(self) -> List[Dict[str, Any]]:
342 |         """
343 |         List available OpenAI models with their capabilities and metadata.
344 |         
345 |         This method queries the OpenAI API to retrieve a comprehensive list of available
346 |         models accessible to the current API key. It filters the results to focus on
347 |         GPT models that are relevant to text generation tasks, excluding embeddings,
348 |         moderation, and other specialized models.
349 |         
350 |         For efficiency, the method uses a caching mechanism that stores the model list
351 |         after the first successful API call. Subsequent calls return the cached results
352 |         without making additional API requests. This reduces latency and API usage while
353 |         ensuring the available models information is readily accessible.
354 |         
355 |         If the API call fails (due to network issues, invalid credentials, etc.), the
356 |         method falls back to returning a hardcoded list of common OpenAI models to ensure
357 |         the application can continue functioning with reasonable defaults.
358 |         
359 |         Returns:
360 |             A list of dictionaries containing model information with these fields:
361 |             - id: The model identifier used when making API calls (e.g., "gpt-4o")
362 |             - provider: Always "openai" for this provider
363 |             - created: Timestamp of when the model was created (if available from API)
364 |             - owned_by: Organization that owns the model (e.g., "openai", "system")
365 |             
366 |             The fallback model list (used on API errors) includes basic information
367 |             for gpt-4o, gpt-4.1-mini, and other commonly used models.
368 |             
369 |         Example response:
370 |             ```python
371 |             [
372 |                 {
373 |                     "id": "gpt-4o",
374 |                     "provider": "openai",
375 |                     "created": 1693399330,
376 |                     "owned_by": "openai"
377 |                 },
378 |                 {
379 |                     "id": "gpt-4.1-mini",
380 |                     "provider": "openai", 
381 |                     "created": 1705006269,
382 |                     "owned_by": "openai"
383 |                 }
384 |             ]
385 |             ```
386 |             
387 |         Note:
388 |             The specific models returned depend on the API key's permissions and
389 |             the models currently offered by OpenAI. As new models are released
390 |             or existing ones deprecated, the list will change accordingly.
391 |         """
392 |         if self.models_cache:
393 |             return self.models_cache
394 |             
395 |         try:
396 |             if not self.client:
397 |                 await self.initialize()
398 |                 
399 |             # Fetch models from API
400 |             response = await self.client.models.list()
401 |             
402 |             # Process response
403 |             models = []
404 |             for model in response.data:
405 |                 # Filter to relevant models (chat-capable GPT models)
406 |                 if model.id.startswith("gpt-"):
407 |                     models.append({
408 |                         "id": model.id,
409 |                         "provider": self.provider_name,
410 |                         "created": model.created,
411 |                         "owned_by": model.owned_by,
412 |                     })
413 |             
414 |             # Cache results
415 |             self.models_cache = models
416 |             
417 |             return models
418 |             
419 |         except Exception as e:
420 |             self.logger.error(
421 |                 f"Failed to list OpenAI models: {str(e)}",
422 |                 emoji_key="error"
423 |             )
424 |             
425 |             # Return basic models on error
426 |             return [
427 |                 {
428 |                     "id": "gpt-4o",
429 |                     "provider": self.provider_name,
430 |                     "description": "Most capable GPT-4 model",
431 |                 },
432 |                 {
433 |                     "id": "gpt-4.1-mini",
434 |                     "provider": self.provider_name,
435 |                     "description": "Smaller, efficient GPT-4 model",
436 |                 },
437 |                 {
438 |                     "id": "gpt-4.1-mini",
439 |                     "provider": self.provider_name,
440 |                     "description": "Fast and cost-effective GPT model",
441 |                 },
442 |             ]
443 |             
444 |     def get_default_model(self) -> str:
445 |         """
446 |         Get the default OpenAI model identifier to use when none is specified.
447 |         
448 |         This method determines the appropriate default model for OpenAI completions
449 |         through a prioritized selection process:
450 |         
451 |         1. First, it attempts to load the default_model setting from the Ultimate MCP Server
452 |            configuration system (from providers.openai.default_model in the config)
453 |         2. If that's not available or valid, it falls back to a hardcoded default model
454 |            that represents a reasonable balance of capability, cost, and availability
455 |         
456 |         Using the configuration system allows for flexible deployment-specific defaults
457 |         without code changes, while the hardcoded fallback ensures the system remains
458 |         functional even with minimal configuration.
459 |         
460 |         Returns:
461 |             String identifier of the default OpenAI model to use (e.g., "gpt-4.1-mini").
462 |             This identifier can be directly used in API calls to the OpenAI API.
463 |             
464 |         Note:
465 |             The current hardcoded default is "gpt-4.1-mini", chosen for its balance of
466 |             capability and cost. This may change in future versions as new models are
467 |             released or existing ones are deprecated.
468 |         """
469 |         from ultimate_mcp_server.config import get_config
470 |         
471 |         # Safely get from config if available
472 |         try:
473 |             config = get_config()
474 |             provider_config = getattr(config, 'providers', {}).get(self.provider_name, None)
475 |             if provider_config and provider_config.default_model:
476 |                 return provider_config.default_model
477 |         except (AttributeError, TypeError):
478 |             # Handle case when providers attribute doesn't exist or isn't a dict
479 |             pass
480 |             
481 |         # Otherwise return hard-coded default
482 |         return "gpt-4.1-mini"
483 |         
484 |     async def check_api_key(self) -> bool:
485 |         """Check if the OpenAI API key is valid.
486 |         
487 |         This method performs a lightweight validation of the configured OpenAI API key
488 |         by attempting to list available models. A successful API call confirms that:
489 |         
490 |         1. The API key is properly formatted and not empty
491 |         2. The key has at least read permissions on the OpenAI API
492 |         3. The API endpoint is accessible and responding
493 |         4. The account associated with the key is active and not suspended
494 |         
495 |         This validation is useful when initializing the provider to ensure the API key
496 |         works before attempting to make model completion requests that might fail later.
497 |         
498 |         Returns:
499 |             bool: True if the API key is valid and the API is accessible, False otherwise.
500 |             A False result may indicate an invalid key, network issues, or API service disruption.
501 |             
502 |         Notes:
503 |             - This method simply calls list_models() which caches results for efficiency
504 |             - No detailed error information is returned, only a boolean success indicator
505 |             - The method silently catches all exceptions and returns False rather than raising
506 |             - For debugging key issues, check server logs for the full exception details
507 |         """
508 |         try:
509 |             # Just list models as a simple validation
510 |             await self.list_models()
511 |             return True
512 |         except Exception:
513 |             return False
```

--------------------------------------------------------------------------------
/ultimate_mcp_server/services/knowledge_base/rag_engine.py:
--------------------------------------------------------------------------------

```python
  1 | """RAG engine for retrieval-augmented generation."""
  2 | import time
  3 | from typing import Any, Dict, List, Optional, Set
  4 | 
  5 | from ultimate_mcp_server.core.models.requests import CompletionRequest
  6 | from ultimate_mcp_server.services.cache import get_cache_service
  7 | from ultimate_mcp_server.services.knowledge_base.feedback import get_rag_feedback_service
  8 | from ultimate_mcp_server.services.knowledge_base.retriever import KnowledgeBaseRetriever
  9 | from ultimate_mcp_server.services.knowledge_base.utils import (
 10 |     extract_keywords,
 11 |     generate_token_estimate,
 12 | )
 13 | from ultimate_mcp_server.services.prompts import get_prompt_service
 14 | from ultimate_mcp_server.utils import get_logger
 15 | 
 16 | logger = get_logger(__name__)
 17 | 
 18 | 
 19 | # Default RAG prompt templates
 20 | DEFAULT_RAG_TEMPLATES = {
 21 |     "rag_default": """Answer the question based only on the following context:
 22 | 
 23 | {context}
 24 | 
 25 | Question: {query}
 26 | 
 27 | Answer:""",
 28 |     
 29 |     "rag_with_sources": """Answer the question based only on the following context:
 30 | 
 31 | {context}
 32 | 
 33 | Question: {query}
 34 | 
 35 | Provide your answer along with the source document IDs in [brackets] for each piece of information:""",
 36 | 
 37 |     "rag_summarize": """Summarize the following context information:
 38 | 
 39 | {context}
 40 | 
 41 | Summary:""",
 42 | 
 43 |     "rag_analysis": """Analyze the following information and provide key insights:
 44 | 
 45 | {context}
 46 | 
 47 | Query: {query}
 48 | 
 49 | Analysis:"""
 50 | }
 51 | 
 52 | 
 53 | class RAGEngine:
 54 |     """Engine for retrieval-augmented generation."""
 55 |     
 56 |     def __init__(
 57 |         self, 
 58 |         retriever: KnowledgeBaseRetriever,
 59 |         provider_manager,
 60 |         optimization_service=None,
 61 |         analytics_service=None
 62 |     ):
 63 |         """Initialize the RAG engine.
 64 |         
 65 |         Args:
 66 |             retriever: Knowledge base retriever
 67 |             provider_manager: Provider manager for LLM access
 68 |             optimization_service: Optional optimization service for model selection
 69 |             analytics_service: Optional analytics service for tracking
 70 |         """
 71 |         self.retriever = retriever
 72 |         self.provider_manager = provider_manager
 73 |         self.optimization_service = optimization_service
 74 |         self.analytics_service = analytics_service
 75 |         
 76 |         # Initialize prompt service
 77 |         self.prompt_service = get_prompt_service()
 78 |         
 79 |         # Initialize feedback service
 80 |         self.feedback_service = get_rag_feedback_service()
 81 |         
 82 |         # Initialize cache service
 83 |         self.cache_service = get_cache_service()
 84 |         
 85 |         # Register RAG templates
 86 |         for template_name, template_text in DEFAULT_RAG_TEMPLATES.items():
 87 |             self.prompt_service.register_template(template_name, template_text)
 88 |         
 89 |         logger.info("RAG engine initialized", extra={"emoji_key": "success"})
 90 |     
 91 |     async def _select_optimal_model(self, task_info: Dict[str, Any]) -> Dict[str, Any]:
 92 |         """Select optimal model for a RAG task.
 93 |         
 94 |         Args:
 95 |             task_info: Task information
 96 |             
 97 |         Returns:
 98 |             Model selection
 99 |         """
100 |         if self.optimization_service:
101 |             try:
102 |                 return await self.optimization_service.get_optimal_model(task_info)
103 |             except Exception as e:
104 |                 logger.error(
105 |                     f"Error selecting optimal model: {str(e)}", 
106 |                     extra={"emoji_key": "error"}
107 |                 )
108 |         
109 |         # Fallback to default models for RAG
110 |         return {
111 |             "provider": "openai",
112 |             "model": "gpt-4.1-mini"
113 |         }
114 |     
115 |     async def _track_rag_metrics(
116 |         self,
117 |         knowledge_base: str,
118 |         query: str,
119 |         provider: str,
120 |         model: str,
121 |         metrics: Dict[str, Any]
122 |     ) -> None:
123 |         """Track RAG operation metrics.
124 |         
125 |         Args:
126 |             knowledge_base: Knowledge base name
127 |             query: Query text
128 |             provider: Provider name
129 |             model: Model name
130 |             metrics: Operation metrics
131 |         """
132 |         if not self.analytics_service:
133 |             return
134 |             
135 |         try:
136 |             await self.analytics_service.track_operation(
137 |                 operation_type="rag",
138 |                 provider=provider,
139 |                 model=model,
140 |                 input_tokens=metrics.get("input_tokens", 0),
141 |                 output_tokens=metrics.get("output_tokens", 0),
142 |                 total_tokens=metrics.get("total_tokens", 0),
143 |                 cost=metrics.get("cost", 0.0),
144 |                 duration=metrics.get("total_time", 0.0),
145 |                 metadata={
146 |                     "knowledge_base": knowledge_base,
147 |                     "query": query,
148 |                     "retrieval_count": metrics.get("retrieval_count", 0),
149 |                     "retrieval_time": metrics.get("retrieval_time", 0.0),
150 |                     "generation_time": metrics.get("generation_time", 0.0)
151 |                 }
152 |             )
153 |         except Exception as e:
154 |             logger.error(
155 |                 f"Error tracking RAG metrics: {str(e)}", 
156 |                 extra={"emoji_key": "error"}
157 |             )
158 |     
159 |     def _format_context(
160 |         self, 
161 |         results: List[Dict[str, Any]],
162 |         include_metadata: bool = True
163 |     ) -> str:
164 |         """Format retrieval results into context.
165 |         
166 |         Args:
167 |             results: List of retrieval results
168 |             include_metadata: Whether to include metadata
169 |             
170 |         Returns:
171 |             Formatted context
172 |         """
173 |         context_parts = []
174 |         
175 |         for i, result in enumerate(results):
176 |             # Format metadata if included
177 |             metadata_str = ""
178 |             if include_metadata and result.get("metadata"):
179 |                 # Extract relevant metadata fields
180 |                 metadata_fields = []
181 |                 for key in ["title", "source", "author", "date", "source_id", "potential_title"]:
182 |                     if key in result["metadata"]:
183 |                         metadata_fields.append(f"{key}: {result['metadata'][key]}")
184 |                 
185 |                 if metadata_fields:
186 |                     metadata_str = " | ".join(metadata_fields)
187 |                     metadata_str = f"[{metadata_str}]\n"
188 |             
189 |             # Add document with index
190 |             context_parts.append(f"Document {i+1} [ID: {result['id']}]:\n{metadata_str}{result['document']}")
191 |         
192 |         return "\n\n".join(context_parts)
193 |     
194 |     async def _adjust_retrieval_params(self, query: str, knowledge_base_name: str) -> Dict[str, Any]:
195 |         """Dynamically adjust retrieval parameters based on query complexity.
196 |         
197 |         Args:
198 |             query: Query text
199 |             knowledge_base_name: Knowledge base name
200 |             
201 |         Returns:
202 |             Adjusted parameters
203 |         """
204 |         # Analyze query complexity
205 |         query_length = len(query.split())
206 |         query_keywords = extract_keywords(query)
207 |         
208 |         # Base parameters
209 |         params = {
210 |             "top_k": 5,
211 |             "retrieval_method": "vector",
212 |             "min_score": 0.6,
213 |             "search_params": {"search_ef": 100}
214 |         }
215 |         
216 |         # Adjust based on query length
217 |         if query_length > 30:  # Complex query
218 |             params["top_k"] = 8
219 |             params["search_params"]["search_ef"] = 200
220 |             params["retrieval_method"] = "hybrid"
221 |         elif query_length < 5:  # Very short query
222 |             params["top_k"] = 10  # Get more results for short queries
223 |             params["min_score"] = 0.5  # Lower threshold
224 |         
225 |         # Check if similar queries exist
226 |         similar_queries = await self.feedback_service.get_similar_queries(
227 |             knowledge_base_name=knowledge_base_name,
228 |             query=query,
229 |             top_k=1,
230 |             threshold=0.85
231 |         )
232 |         
233 |         # If we have similar past queries, use their parameters
234 |         if similar_queries:
235 |             params["retrieval_method"] = "hybrid"  # Hybrid works well for repeat queries
236 |         
237 |         # Add keywords
238 |         params["additional_keywords"] = query_keywords
239 |         
240 |         return params
241 |     
242 |     async def _analyze_used_documents(
243 |         self, 
244 |         answer: str, 
245 |         results: List[Dict[str, Any]]
246 |     ) -> Set[str]:
247 |         """Analyze which documents were used in the answer.
248 |         
249 |         Args:
250 |             answer: Generated answer
251 |             results: List of retrieval results
252 |             
253 |         Returns:
254 |             Set of document IDs used in the answer
255 |         """
256 |         used_ids = set()
257 |         
258 |         # Check for explicit mentions of document IDs
259 |         for result in results:
260 |             doc_id = result["id"]
261 |             if f"[ID: {doc_id}]" in answer or f"[{doc_id}]" in answer:
262 |                 used_ids.add(doc_id)
263 |         
264 |         # Check content overlap (crude approximation)
265 |         for result in results:
266 |             if result["id"] in used_ids:
267 |                 continue
268 |                 
269 |             # Check for significant phrases from document in answer
270 |             doc_keywords = extract_keywords(result["document"], max_keywords=5)
271 |             matched_keywords = sum(1 for kw in doc_keywords if kw in answer.lower())
272 |             
273 |             # If multiple keywords match, consider document used
274 |             if matched_keywords >= 2:
275 |                 used_ids.add(result["id"])
276 |         
277 |         return used_ids
278 |     
279 |     async def _check_cached_response(
280 |         self,
281 |         knowledge_base_name: str,
282 |         query: str
283 |     ) -> Optional[Dict[str, Any]]:
284 |         """Check for cached RAG response.
285 |         
286 |         Args:
287 |             knowledge_base_name: Knowledge base name
288 |             query: Query text
289 |             
290 |         Returns:
291 |             Cached response or None
292 |         """
293 |         if not self.cache_service:
294 |             return None
295 |             
296 |         cache_key = f"rag_{knowledge_base_name}_{query}"
297 |         
298 |         try:
299 |             cached = await self.cache_service.get(cache_key)
300 |             if cached:
301 |                 logger.info(
302 |                     f"Using cached RAG response for query in '{knowledge_base_name}'",
303 |                     extra={"emoji_key": "cache"}
304 |                 )
305 |                 return cached
306 |         except Exception as e:
307 |             logger.error(
308 |                 f"Error checking cache: {str(e)}",
309 |                 extra={"emoji_key": "error"}
310 |             )
311 |             
312 |         return None
313 |     
314 |     async def _cache_response(
315 |         self,
316 |         knowledge_base_name: str,
317 |         query: str,
318 |         response: Dict[str, Any]
319 |     ) -> None:
320 |         """Cache RAG response.
321 |         
322 |         Args:
323 |             knowledge_base_name: Knowledge base name
324 |             query: Query text
325 |             response: Response to cache
326 |         """
327 |         if not self.cache_service:
328 |             return
329 |             
330 |         cache_key = f"rag_{knowledge_base_name}_{query}"
331 |         
332 |         try:
333 |             # Cache for 1 day
334 |             await self.cache_service.set(cache_key, response, ttl=86400)
335 |         except Exception as e:
336 |             logger.error(
337 |                 f"Error caching response: {str(e)}",
338 |                 extra={"emoji_key": "error"}
339 |             )
340 |     
341 |     async def generate_with_rag(
342 |         self,
343 |         knowledge_base_name: str,
344 |         query: str,
345 |         provider: Optional[str] = None,
346 |         model: Optional[str] = None,
347 |         template: str = "rag_default",
348 |         max_tokens: int = 1000,
349 |         temperature: float = 0.3,
350 |         top_k: Optional[int] = None,
351 |         retrieval_method: Optional[str] = None,
352 |         min_score: Optional[float] = None,
353 |         metadata_filter: Optional[Dict[str, Any]] = None,
354 |         include_metadata: bool = True,
355 |         include_sources: bool = True,
356 |         use_cache: bool = True,
357 |         apply_feedback: bool = True,
358 |         search_params: Optional[Dict[str, Any]] = None
359 |     ) -> Dict[str, Any]:
360 |         """Generate a response using RAG.
361 |         
362 |         Args:
363 |             knowledge_base_name: Knowledge base name
364 |             query: Query text
365 |             provider: Provider name (auto-selected if None)
366 |             model: Model name (auto-selected if None)
367 |             template: RAG prompt template name
368 |             max_tokens: Maximum tokens for generation
369 |             temperature: Temperature for generation
370 |             top_k: Number of documents to retrieve (auto-adjusted if None)
371 |             retrieval_method: Retrieval method (vector, hybrid)
372 |             min_score: Minimum similarity score
373 |             metadata_filter: Optional metadata filter
374 |             include_metadata: Whether to include metadata in context
375 |             include_sources: Whether to include sources in response
376 |             use_cache: Whether to use cached responses
377 |             apply_feedback: Whether to apply feedback adjustments
378 |             search_params: Optional ChromaDB search parameters
379 |             
380 |         Returns:
381 |             Generated response with sources and metrics
382 |         """
383 |         start_time = time.time()
384 |         operation_metrics = {}
385 |         
386 |         # Check cache first if enabled
387 |         if use_cache:
388 |             cached_response = await self._check_cached_response(knowledge_base_name, query)
389 |             if cached_response:
390 |                 return cached_response
391 |         
392 |         # Auto-select model if not specified
393 |         if not provider or not model:
394 |             # Determine task complexity based on query
395 |             task_complexity = "medium"
396 |             if len(query) > 100:
397 |                 task_complexity = "high"
398 |             elif len(query) < 30:
399 |                 task_complexity = "low"
400 |                 
401 |             # Get optimal model
402 |             model_selection = await self._select_optimal_model({
403 |                 "task_type": "rag_completion",
404 |                 "complexity": task_complexity,
405 |                 "query_length": len(query)
406 |             })
407 |             
408 |             provider = provider or model_selection["provider"]
409 |             model = model or model_selection["model"]
410 |         
411 |         # Dynamically adjust retrieval parameters if not specified
412 |         if top_k is None or retrieval_method is None or min_score is None:
413 |             adjusted_params = await self._adjust_retrieval_params(query, knowledge_base_name)
414 |             
415 |             # Use specified parameters or adjusted ones
416 |             top_k = top_k or adjusted_params["top_k"]
417 |             retrieval_method = retrieval_method or adjusted_params["retrieval_method"]
418 |             min_score = min_score or adjusted_params["min_score"]
419 |             search_params = search_params or adjusted_params.get("search_params")
420 |             additional_keywords = adjusted_params.get("additional_keywords")
421 |         else:
422 |             additional_keywords = None
423 |         
424 |         # Retrieve context
425 |         retrieval_start = time.time()
426 |         
427 |         if retrieval_method == "hybrid":
428 |             # Use hybrid search
429 |             retrieval_result = await self.retriever.retrieve_hybrid(
430 |                 knowledge_base_name=knowledge_base_name,
431 |                 query=query,
432 |                 top_k=top_k,
433 |                 min_score=min_score,
434 |                 metadata_filter=metadata_filter,
435 |                 additional_keywords=additional_keywords,
436 |                 apply_feedback=apply_feedback,
437 |                 search_params=search_params
438 |             )
439 |         else:
440 |             # Use standard vector search
441 |             retrieval_result = await self.retriever.retrieve(
442 |                 knowledge_base_name=knowledge_base_name,
443 |                 query=query,
444 |                 top_k=top_k,
445 |                 min_score=min_score,
446 |                 metadata_filter=metadata_filter,
447 |                 content_filter=None,  # No content filter for vector-only search
448 |                 apply_feedback=apply_feedback,
449 |                 search_params=search_params
450 |             )
451 |         
452 |         retrieval_time = time.time() - retrieval_start
453 |         operation_metrics["retrieval_time"] = retrieval_time
454 |         
455 |         # Check if retrieval was successful
456 |         if retrieval_result.get("status") != "success" or not retrieval_result.get("results"):
457 |             logger.warning(
458 |                 f"No relevant documents found for query in knowledge base '{knowledge_base_name}'", 
459 |                 extra={"emoji_key": "warning"}
460 |             )
461 |             
462 |             # Return error response
463 |             error_response = {
464 |                 "status": "no_results",
465 |                 "message": "No relevant documents found for query",
466 |                 "query": query,
467 |                 "retrieval_time": retrieval_time,
468 |                 "total_time": time.time() - start_time
469 |             }
470 |             
471 |             # Cache error response if enabled
472 |             if use_cache:
473 |                 await self._cache_response(knowledge_base_name, query, error_response)
474 |             
475 |             return error_response
476 |         
477 |         # Format context from retrieval results
478 |         context = self._format_context(
479 |             retrieval_result["results"],
480 |             include_metadata=include_metadata
481 |         )
482 |         
483 |         # Get prompt template
484 |         template_text = self.prompt_service.get_template(template)
485 |         if not template_text:
486 |             # Fallback to default template
487 |             template_text = DEFAULT_RAG_TEMPLATES["rag_default"]
488 |         
489 |         # Format prompt with template
490 |         rag_prompt = template_text.format(
491 |             context=context,
492 |             query=query
493 |         )
494 |         
495 |         # Calculate token estimates
496 |         input_tokens = generate_token_estimate(rag_prompt)
497 |         operation_metrics["context_tokens"] = generate_token_estimate(context)
498 |         operation_metrics["input_tokens"] = input_tokens
499 |         operation_metrics["retrieval_count"] = len(retrieval_result["results"])
500 |         
501 |         # Generate completion
502 |         generation_start = time.time()
503 |         
504 |         provider_service = self.provider_manager.get_provider(provider)
505 |         completion_request = CompletionRequest(
506 |             prompt=rag_prompt,
507 |             model=model,
508 |             max_tokens=max_tokens,
509 |             temperature=temperature
510 |         )
511 |         
512 |         completion_result = await provider_service.generate_completion(
513 |             request=completion_request
514 |         )
515 |         
516 |         generation_time = time.time() - generation_start
517 |         operation_metrics["generation_time"] = generation_time
518 |         
519 |         # Extract completion and metrics
520 |         completion = completion_result.get("completion", "")
521 |         operation_metrics["output_tokens"] = completion_result.get("output_tokens", 0)
522 |         operation_metrics["total_tokens"] = completion_result.get("total_tokens", 0)
523 |         operation_metrics["cost"] = completion_result.get("cost", 0.0)
524 |         operation_metrics["total_time"] = time.time() - start_time
525 |         
526 |         # Prepare sources if requested
527 |         sources = []
528 |         if include_sources:
529 |             for result in retrieval_result["results"]:
530 |                 # Include limited context for each source
531 |                 doc_preview = result["document"]
532 |                 if len(doc_preview) > 100:
533 |                     doc_preview = doc_preview[:100] + "..."
534 |                     
535 |                 sources.append({
536 |                     "id": result["id"],
537 |                     "document": doc_preview,
538 |                     "score": result["score"],
539 |                     "metadata": result.get("metadata", {})
540 |                 })
541 |         
542 |         # Analyze which documents were used in the answer
543 |         used_doc_ids = await self._analyze_used_documents(completion, retrieval_result["results"])
544 |         
545 |         # Record feedback
546 |         if apply_feedback:
547 |             await self.retriever.record_feedback(
548 |                 knowledge_base_name=knowledge_base_name,
549 |                 query=query,
550 |                 retrieved_documents=retrieval_result["results"],
551 |                 used_document_ids=list(used_doc_ids)
552 |             )
553 |         
554 |         # Track metrics
555 |         await self._track_rag_metrics(
556 |             knowledge_base=knowledge_base_name,
557 |             query=query,
558 |             provider=provider,
559 |             model=model,
560 |             metrics=operation_metrics
561 |         )
562 |         
563 |         logger.info(
564 |             f"Generated RAG response using {provider}/{model} in {operation_metrics['total_time']:.2f}s", 
565 |             extra={"emoji_key": "success"}
566 |         )
567 |         
568 |         # Create response
569 |         response = {
570 |             "status": "success",
571 |             "query": query,
572 |             "answer": completion,
573 |             "sources": sources,
574 |             "knowledge_base": knowledge_base_name,
575 |             "provider": provider,
576 |             "model": model,
577 |             "used_document_ids": list(used_doc_ids),
578 |             "metrics": operation_metrics
579 |         }
580 |         
581 |         # Cache response if enabled
582 |         if use_cache:
583 |             await self._cache_response(knowledge_base_name, query, response)
584 |         
585 |         return response 
```

--------------------------------------------------------------------------------
/ultimate_mcp_server/tools/redline-compiled.css:
--------------------------------------------------------------------------------

```css
1 | *,:after,:before{--tw-border-spacing-x:0;--tw-border-spacing-y:0;--tw-translate-x:0;--tw-translate-y:0;--tw-rotate:0;--tw-skew-x:0;--tw-skew-y:0;--tw-scale-x:1;--tw-scale-y:1;--tw-pan-x: ;--tw-pan-y: ;--tw-pinch-zoom: ;--tw-scroll-snap-strictness:proximity;--tw-gradient-from-position: ;--tw-gradient-via-position: ;--tw-gradient-to-position: ;--tw-ordinal: ;--tw-slashed-zero: ;--tw-numeric-figure: ;--tw-numeric-spacing: ;--tw-numeric-fraction: ;--tw-ring-inset: ;--tw-ring-offset-width:0px;--tw-ring-offset-color:#fff;--tw-ring-color:rgba(59,130,246,.5);--tw-ring-offset-shadow:0 0 #0000;--tw-ring-shadow:0 0 #0000;--tw-shadow:0 0 #0000;--tw-shadow-colored:0 0 #0000;--tw-blur: ;--tw-brightness: ;--tw-contrast: ;--tw-grayscale: ;--tw-hue-rotate: ;--tw-invert: ;--tw-saturate: ;--tw-sepia: ;--tw-drop-shadow: ;--tw-backdrop-blur: ;--tw-backdrop-brightness: ;--tw-backdrop-contrast: ;--tw-backdrop-grayscale: ;--tw-backdrop-hue-rotate: ;--tw-backdrop-invert: ;--tw-backdrop-opacity: ;--tw-backdrop-saturate: ;--tw-backdrop-sepia: ;--tw-contain-size: ;--tw-contain-layout: ;--tw-contain-paint: ;--tw-contain-style: }::backdrop{--tw-border-spacing-x:0;--tw-border-spacing-y:0;--tw-translate-x:0;--tw-translate-y:0;--tw-rotate:0;--tw-skew-x:0;--tw-skew-y:0;--tw-scale-x:1;--tw-scale-y:1;--tw-pan-x: ;--tw-pan-y: ;--tw-pinch-zoom: ;--tw-scroll-snap-strictness:proximity;--tw-gradient-from-position: ;--tw-gradient-via-position: ;--tw-gradient-to-position: ;--tw-ordinal: ;--tw-slashed-zero: ;--tw-numeric-figure: ;--tw-numeric-spacing: ;--tw-numeric-fraction: ;--tw-ring-inset: ;--tw-ring-offset-width:0px;--tw-ring-offset-color:#fff;--tw-ring-color:rgba(59,130,246,.5);--tw-ring-offset-shadow:0 0 #0000;--tw-ring-shadow:0 0 #0000;--tw-shadow:0 0 #0000;--tw-shadow-colored:0 0 #0000;--tw-blur: ;--tw-brightness: ;--tw-contrast: ;--tw-grayscale: ;--tw-hue-rotate: ;--tw-invert: ;--tw-saturate: ;--tw-sepia: ;--tw-drop-shadow: ;--tw-backdrop-blur: ;--tw-backdrop-brightness: ;--tw-backdrop-contrast: ;--tw-backdrop-grayscale: ;--tw-backdrop-hue-rotate: ;--tw-backdrop-invert: ;--tw-backdrop-opacity: ;--tw-backdrop-saturate: ;--tw-backdrop-sepia: ;--tw-contain-size: ;--tw-contain-layout: ;--tw-contain-paint: ;--tw-contain-style: }
2 | /*! tailwindcss v3.4.17 | MIT License | https://tailwindcss.com*/*,:after,:before{box-sizing:border-box;border:0 solid #e5e7eb}:after,:before{--tw-content:""}:host,html{line-height:1.5;-webkit-text-size-adjust:100%;-moz-tab-size:4;-o-tab-size:4;tab-size:4;font-family:ui-sans-serif,system-ui,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol,Noto Color Emoji;font-feature-settings:normal;font-variation-settings:normal;-webkit-tap-highlight-color:transparent}body{margin:0;line-height:inherit}hr{height:0;color:inherit;border-top-width:1px}abbr:where([title]){-webkit-text-decoration:underline dotted;text-decoration:underline dotted}h1,h2,h3,h4,h5,h6{font-size:inherit;font-weight:inherit}a{color:inherit;text-decoration:inherit}b,strong{font-weight:bolder}code,kbd,pre,samp{font-family:ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,monospace;font-feature-settings:normal;font-variation-settings:normal;font-size:1em}small{font-size:80%}sub,sup{font-size:75%;line-height:0;position:relative;vertical-align:baseline}sub{bottom:-.25em}sup{top:-.5em}table{text-indent:0;border-color:inherit;border-collapse:collapse}button,input,optgroup,select,textarea{font-family:inherit;font-feature-settings:inherit;font-variation-settings:inherit;font-size:100%;font-weight:inherit;line-height:inherit;letter-spacing:inherit;color:inherit;margin:0;padding:0}button,select{text-transform:none}button,input:where([type=button]),input:where([type=reset]),input:where([type=submit]){-webkit-appearance:button;background-color:transparent;background-image:none}:-moz-focusring{outline:auto}:-moz-ui-invalid{box-shadow:none}progress{vertical-align:baseline}::-webkit-inner-spin-button,::-webkit-outer-spin-button{height:auto}[type=search]{-webkit-appearance:textfield;outline-offset:-2px}::-webkit-search-decoration{-webkit-appearance:none}::-webkit-file-upload-button{-webkit-appearance:button;font:inherit}summary{display:list-item}blockquote,dd,dl,figure,h1,h2,h3,h4,h5,h6,hr,p,pre{margin:0}fieldset{margin:0}fieldset,legend{padding:0}menu,ol,ul{list-style:none;margin:0;padding:0}dialog{padding:0}textarea{resize:vertical}input::-moz-placeholder,textarea::-moz-placeholder{opacity:1;color:#9ca3af}input::placeholder,textarea::placeholder{opacity:1;color:#9ca3af}[role=button],button{cursor:pointer}:disabled{cursor:default}audio,canvas,embed,iframe,img,object,svg,video{display:block;vertical-align:middle}img,video{max-width:100%;height:auto}[hidden]:where(:not([hidden=until-found])){display:none}.\!container{width:100%!important}.container{width:100%}@media (min-width:640px){.\!container{max-width:640px!important}.container{max-width:640px}}@media (min-width:768px){.\!container{max-width:768px!important}.container{max-width:768px}}@media (min-width:1024px){.\!container{max-width:1024px!important}.container{max-width:1024px}}@media (min-width:1280px){.\!container{max-width:1280px!important}.container{max-width:1280px}}@media (min-width:1536px){.\!container{max-width:1536px!important}.container{max-width:1536px}}.diff-insert,ins.diff-insert,ins.diff-insert-text{--tw-bg-opacity:1;background-color:rgb(239 246 255/var(--tw-bg-opacity,1));--tw-text-opacity:1;color:rgb(30 64 175/var(--tw-text-opacity,1));text-decoration-line:none;--tw-ring-offset-shadow:var(--tw-ring-inset) 0 0 0 var(--tw-ring-offset-width) var(--tw-ring-offset-color);--tw-ring-shadow:var(--tw-ring-inset) 0 0 0 calc(1px + var(--tw-ring-offset-width)) var(--tw-ring-color);--tw-ring-inset:inset;--tw-ring-color:rgba(147,197,253,.6)}.diff-delete,.diff-insert,del.diff-delete,del.diff-delete-text,ins.diff-insert,ins.diff-insert-text{border-radius:.125rem;padding-left:.125rem;padding-right:.125rem;box-shadow:var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow,0 0 #0000)}.diff-delete,del.diff-delete,del.diff-delete-text{--tw-bg-opacity:1;background-color:rgb(255 241 242/var(--tw-bg-opacity,1));--tw-text-opacity:1;color:rgb(159 18 57/var(--tw-text-opacity,1));text-decoration-line:line-through;--tw-ring-offset-shadow:var(--tw-ring-inset) 0 0 0 var(--tw-ring-offset-width) var(--tw-ring-offset-color);--tw-ring-shadow:var(--tw-ring-inset) 0 0 0 calc(1px + var(--tw-ring-offset-width)) var(--tw-ring-color);--tw-ring-inset:inset;--tw-ring-color:rgba(253,164,175,.6)}.diff-move-target,ins.diff-move-target{border-radius:.125rem;border-width:1px;--tw-border-opacity:1;border-color:rgb(110 231 183/var(--tw-border-opacity,1));--tw-bg-opacity:1;background-color:rgb(236 253 245/var(--tw-bg-opacity,1));padding-left:.125rem;padding-right:.125rem;--tw-text-opacity:1;color:rgb(6 78 59/var(--tw-text-opacity,1));--tw-ring-offset-shadow:var(--tw-ring-inset) 0 0 0 var(--tw-ring-offset-width) var(--tw-ring-offset-color);--tw-ring-shadow:var(--tw-ring-inset) 0 0 0 calc(1px + var(--tw-ring-offset-width)) var(--tw-ring-color);box-shadow:var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow,0 0 #0000);--tw-ring-color:rgba(52,211,153,.6)}.diff-move-source,del.diff-move-source{border-radius:.125rem;border:1px dashed rgba(52,211,153,.4);background-color:rgba(236,253,245,.5);padding-left:.125rem;padding-right:.125rem;color:rgba(6,95,70,.6);text-decoration-line:line-through}span.diff-attrib-change{border-bottom-width:1px;border-style:dotted;border-color:rgb(251 146 60/var(--tw-border-opacity,1));background-color:rgba(255,247,237,.4)}span.diff-attrib-change,span.diff-rename-node{--tw-border-opacity:1;padding-left:.125rem;padding-right:.125rem}span.diff-rename-node{border-bottom-width:2px;border-style:dotted;border-color:rgb(192 132 252/var(--tw-border-opacity,1));background-color:rgba(245,243,255,.4)}.visible{visibility:visible}.collapse{visibility:collapse}.static{position:static}.fixed{position:fixed}.absolute{position:absolute}.relative{position:relative}.bottom-10{bottom:2.5rem}.bottom-2{bottom:.5rem}.left-2{left:.5rem}.right-1{right:.25rem}.right-2{right:.5rem}.top-10{top:2.5rem}.top-2{top:.5rem}.isolate{isolation:isolate}.z-40{z-index:40}.z-50{z-index:50}.mx-auto{margin-left:auto;margin-right:auto}.ml-1{margin-left:.25rem}.ml-2{margin-left:.5rem}.mr-1{margin-right:.25rem}.block{display:block}.inline-block{display:inline-block}.inline{display:inline}.flex{display:flex}.\!table{display:table!important}.table{display:table}.grid{display:grid}.contents{display:contents}.hidden{display:none}.h-0\.5{height:.125rem}.h-3{height:.75rem}.h-4{height:1rem}.w-1{width:.25rem}.w-3{width:.75rem}.w-4{width:1rem}.max-w-3xl{max-width:48rem}.shrink{flex-shrink:1}.grow{flex-grow:1}.border-collapse{border-collapse:collapse}.transform{transform:translate(var(--tw-translate-x),var(--tw-translate-y)) rotate(var(--tw-rotate)) skewX(var(--tw-skew-x)) skewY(var(--tw-skew-y)) scaleX(var(--tw-scale-x)) scaleY(var(--tw-scale-y))}.cursor-pointer{cursor:pointer}.resize{resize:both}.flex-col{flex-direction:column}.flex-wrap{flex-wrap:wrap}.items-center{align-items:center}.gap-2{gap:.5rem}.truncate{overflow:hidden;text-overflow:ellipsis;white-space:nowrap}.rounded{border-radius:.25rem}.rounded-lg{border-radius:.5rem}.rounded-sm{border-radius:.125rem}.border{border-width:1px}.border-b{border-bottom-width:1px}.border-b-2{border-bottom-width:2px}.border-dashed{border-style:dashed}.border-dotted{border-style:dotted}.border-emerald-300{--tw-border-opacity:1;border-color:rgb(110 231 183/var(--tw-border-opacity,1))}.border-emerald-400{--tw-border-opacity:1;border-color:rgb(52 211 153/var(--tw-border-opacity,1))}.border-emerald-400\/40{border-color:rgba(52,211,153,.4)}.border-gray-300{--tw-border-opacity:1;border-color:rgb(209 213 219/var(--tw-border-opacity,1))}.border-orange-400{--tw-border-opacity:1;border-color:rgb(251 146 60/var(--tw-border-opacity,1))}.border-orange-500{--tw-border-opacity:1;border-color:rgb(249 115 22/var(--tw-border-opacity,1))}.border-purple-400{--tw-border-opacity:1;border-color:rgb(192 132 252/var(--tw-border-opacity,1))}.border-purple-500{--tw-border-opacity:1;border-color:rgb(168 85 247/var(--tw-border-opacity,1))}.bg-blue-100{--tw-bg-opacity:1;background-color:rgb(219 234 254/var(--tw-bg-opacity,1))}.bg-blue-50{--tw-bg-opacity:1;background-color:rgb(239 246 255/var(--tw-bg-opacity,1))}.bg-blue-500{--tw-bg-opacity:1;background-color:rgb(59 130 246/var(--tw-bg-opacity,1))}.bg-blue-900\/40{background-color:rgba(30,58,138,.4)}.bg-emerald-100{--tw-bg-opacity:1;background-color:rgb(209 250 229/var(--tw-bg-opacity,1))}.bg-emerald-100\/70{background-color:rgba(209,250,229,.7)}.bg-emerald-50{--tw-bg-opacity:1;background-color:rgb(236 253 245/var(--tw-bg-opacity,1))}.bg-emerald-50\/50{background-color:rgba(236,253,245,.5)}.bg-emerald-500{--tw-bg-opacity:1;background-color:rgb(16 185 129/var(--tw-bg-opacity,1))}.bg-emerald-900\/30{background-color:rgba(6,78,59,.3)}.bg-emerald-900\/40{background-color:rgba(6,78,59,.4)}.bg-gray-100{--tw-bg-opacity:1;background-color:rgb(243 244 246/var(--tw-bg-opacity,1))}.bg-gray-200{--tw-bg-opacity:1;background-color:rgb(229 231 235/var(--tw-bg-opacity,1))}.bg-orange-50{--tw-bg-opacity:1;background-color:rgb(255 247 237/var(--tw-bg-opacity,1))}.bg-orange-50\/40{background-color:rgba(255,247,237,.4)}.bg-rose-100{--tw-bg-opacity:1;background-color:rgb(255 228 230/var(--tw-bg-opacity,1))}.bg-rose-50{--tw-bg-opacity:1;background-color:rgb(255 241 242/var(--tw-bg-opacity,1))}.bg-rose-500{--tw-bg-opacity:1;background-color:rgb(244 63 94/var(--tw-bg-opacity,1))}.bg-rose-900\/40{background-color:rgba(136,19,55,.4)}.bg-transparent{background-color:transparent}.bg-violet-50\/40{background-color:rgba(245,243,255,.4)}.bg-white{--tw-bg-opacity:1;background-color:rgb(255 255 255/var(--tw-bg-opacity,1))}.bg-white\/90{background-color:hsla(0,0%,100%,.9)}.p-2{padding:.5rem}.px-0\.5{padding-left:.125rem;padding-right:.125rem}.px-1{padding-left:.25rem;padding-right:.25rem}.px-2{padding-left:.5rem;padding-right:.5rem}.px-4{padding-left:1rem;padding-right:1rem}.py-1{padding-top:.25rem;padding-bottom:.25rem}.py-8{padding-top:2rem;padding-bottom:2rem}.font-\[\'Newsreader\'\]{font-family:Newsreader}.text-\[0\.95em\]{font-size:.95em}.text-xs{font-size:.75rem;line-height:1rem}.uppercase{text-transform:uppercase}.lowercase{text-transform:lowercase}.capitalize{text-transform:capitalize}.italic{font-style:italic}.ordinal{--tw-ordinal:ordinal;font-variant-numeric:var(--tw-ordinal) var(--tw-slashed-zero) var(--tw-numeric-figure) var(--tw-numeric-spacing) var(--tw-numeric-fraction)}.text-black{--tw-text-opacity:1;color:rgb(0 0 0/var(--tw-text-opacity,1))}.text-blue-200{--tw-text-opacity:1;color:rgb(191 219 254/var(--tw-text-opacity,1))}.text-blue-800{--tw-text-opacity:1;color:rgb(30 64 175/var(--tw-text-opacity,1))}.text-emerald-200{--tw-text-opacity:1;color:rgb(167 243 208/var(--tw-text-opacity,1))}.text-emerald-300\/60{color:rgba(110,231,183,.6)}.text-emerald-800\/60{color:rgba(6,95,70,.6)}.text-emerald-900{--tw-text-opacity:1;color:rgb(6 78 59/var(--tw-text-opacity,1))}.text-gray-500{--tw-text-opacity:1;color:rgb(107 114 128/var(--tw-text-opacity,1))}.text-rose-200{--tw-text-opacity:1;color:rgb(254 205 211/var(--tw-text-opacity,1))}.text-rose-800{--tw-text-opacity:1;color:rgb(159 18 57/var(--tw-text-opacity,1))}.underline{text-decoration-line:underline}.overline{text-decoration-line:overline}.line-through{text-decoration-line:line-through}.no-underline{text-decoration-line:none}.decoration-2{text-decoration-thickness:2px}.underline-offset-4{text-underline-offset:4px}.shadow{--tw-shadow:0 1px 3px 0 rgba(0,0,0,.1),0 1px 2px -1px rgba(0,0,0,.1);--tw-shadow-colored:0 1px 3px 0 var(--tw-shadow-color),0 1px 2px -1px var(--tw-shadow-color)}.shadow,.shadow-lg{box-shadow:var(--tw-ring-offset-shadow,0 0 #0000),var(--tw-ring-shadow,0 0 #0000),var(--tw-shadow)}.shadow-lg{--tw-shadow:0 10px 15px -3px rgba(0,0,0,.1),0 4px 6px -4px rgba(0,0,0,.1);--tw-shadow-colored:0 10px 15px -3px var(--tw-shadow-color),0 4px 6px -4px var(--tw-shadow-color)}.outline{outline-style:solid}.ring{--tw-ring-offset-shadow:var(--tw-ring-inset) 0 0 0 var(--tw-ring-offset-width) var(--tw-ring-offset-color);--tw-ring-shadow:var(--tw-ring-inset) 0 0 0 calc(3px + var(--tw-ring-offset-width)) var(--tw-ring-color)}.ring,.ring-0{box-shadow:var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow,0 0 #0000)}.ring-0{--tw-ring-offset-shadow:var(--tw-ring-inset) 0 0 0 var(--tw-ring-offset-width) var(--tw-ring-offset-color);--tw-ring-shadow:var(--tw-ring-inset) 0 0 0 calc(var(--tw-ring-offset-width)) var(--tw-ring-color)}.ring-1{--tw-ring-offset-shadow:var(--tw-ring-inset) 0 0 0 var(--tw-ring-offset-width) var(--tw-ring-offset-color);--tw-ring-shadow:var(--tw-ring-inset) 0 0 0 calc(1px + var(--tw-ring-offset-width)) var(--tw-ring-color)}.ring-1,.ring-2{box-shadow:var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow,0 0 #0000)}.ring-2{--tw-ring-offset-shadow:var(--tw-ring-inset) 0 0 0 var(--tw-ring-offset-width) var(--tw-ring-offset-color);--tw-ring-shadow:var(--tw-ring-inset) 0 0 0 calc(2px + var(--tw-ring-offset-width)) var(--tw-ring-color)}.ring-inset{--tw-ring-inset:inset}.ring-black\/10{--tw-ring-color:rgba(0,0,0,.1)}.ring-blue-300{--tw-ring-opacity:1;--tw-ring-color:rgb(147 197 253/var(--tw-ring-opacity,1))}.ring-blue-300\/60{--tw-ring-color:rgba(147,197,253,.6)}.ring-emerald-300{--tw-ring-opacity:1;--tw-ring-color:rgb(110 231 183/var(--tw-ring-opacity,1))}.ring-emerald-400\/60{--tw-ring-color:rgba(52,211,153,.6)}.ring-emerald-500\/30{--tw-ring-color:rgba(16,185,129,.3)}.ring-orange-300{--tw-ring-opacity:1;--tw-ring-color:rgb(253 186 116/var(--tw-ring-opacity,1))}.ring-rose-300{--tw-ring-opacity:1;--tw-ring-color:rgb(253 164 175/var(--tw-ring-opacity,1))}.ring-rose-300\/60{--tw-ring-color:rgba(253,164,175,.6)}.ring-offset-1{--tw-ring-offset-width:1px}.blur{--tw-blur:blur(8px)}.blur,.grayscale{filter:var(--tw-blur) var(--tw-brightness) var(--tw-contrast) var(--tw-grayscale) var(--tw-hue-rotate) var(--tw-invert) var(--tw-saturate) var(--tw-sepia) var(--tw-drop-shadow)}.grayscale{--tw-grayscale:grayscale(100%)}.invert{--tw-invert:invert(100%)}.filter,.invert{filter:var(--tw-blur) var(--tw-brightness) var(--tw-contrast) var(--tw-grayscale) var(--tw-hue-rotate) var(--tw-invert) var(--tw-saturate) var(--tw-sepia) var(--tw-drop-shadow)}.backdrop-blur-sm{--tw-backdrop-blur:blur(4px);-webkit-backdrop-filter:var(--tw-backdrop-blur) var(--tw-backdrop-brightness) var(--tw-backdrop-contrast) var(--tw-backdrop-grayscale) var(--tw-backdrop-hue-rotate) var(--tw-backdrop-invert) var(--tw-backdrop-opacity) var(--tw-backdrop-saturate) var(--tw-backdrop-sepia);backdrop-filter:var(--tw-backdrop-blur) var(--tw-backdrop-brightness) var(--tw-backdrop-contrast) var(--tw-backdrop-grayscale) var(--tw-backdrop-hue-rotate) var(--tw-backdrop-invert) var(--tw-backdrop-opacity) var(--tw-backdrop-saturate) var(--tw-backdrop-sepia)}.transition{transition-property:color,background-color,border-color,text-decoration-color,fill,stroke,opacity,box-shadow,transform,filter,-webkit-backdrop-filter;transition-property:color,background-color,border-color,text-decoration-color,fill,stroke,opacity,box-shadow,transform,filter,backdrop-filter;transition-property:color,background-color,border-color,text-decoration-color,fill,stroke,opacity,box-shadow,transform,filter,backdrop-filter,-webkit-backdrop-filter;transition-timing-function:cubic-bezier(.4,0,.2,1);transition-duration:.15s}.transition-all{transition-property:all;transition-timing-function:cubic-bezier(.4,0,.2,1);transition-duration:.15s}.transition-colors{transition-property:color,background-color,border-color,text-decoration-color,fill,stroke;transition-timing-function:cubic-bezier(.4,0,.2,1);transition-duration:.15s}.duration-200{transition-duration:.2s}.ease-out{transition-timing-function:cubic-bezier(0,0,.2,1)}.\[_hdr_start\:_hdr_end\]{_hdr_start:hdr end}.\[_nonce_start\:_nonce_end\]{_nonce_start:nonce end}.\[a\:a\]{a:a}.\[ctx_start\:ctx_end\]{ctx_start:ctx end}.\[current_pos\:best_split_pos\]{current_pos:best split pos}.\[end-1\:end\]{end-1:end}.\[hl_start\:hl_end\]{hl_start:hl end}.\[i1\:i2\]{i1:i2}.\[i\:i\+3\]{i:i+3}.\[i\:i\+chunk_size\]{i:i+chunk size}.\[i\:i\+len\(pattern\)\]{i:i+len(pattern)}.\[inherit_ndx\:inherit_ndx\+1\]{inherit_ndx:inherit ndx+1}.\[j1\:j2\]{j1:j2}.\[json_start\:json_end\]{json_start:json end}.\[last\:start\]{last:start}.\[last_end\:start\]{last_end:start}.\[last_section_end\:first_match_start\]{last_section_end:first match start}.\[left\:right\]{left:right}.\[line_offset\:end_line\]{line_offset:end line}.\[offset\:next_offset\]{offset:next offset}.\[offset\:offset\+limit\]{offset:offset+limit}.\[pos\:end_pos\]{pos:end pos}.\[pos\:new_pos\]{pos:new pos}.\[position\:offset\]{position:offset}.\[position\:start\]{position:start}.\[search_region_start\:end_index\]{search_region_start:end index}.\[section_content_start\:section_content_end\]{section_content_start:section content end}.\[section_start\:section_end\]{section_start:section end}.\[start\:end\]{start:end}.\[start\:start\+1\]{start:start+1}.\[start_index\:best_split_index\]{start_index:best split index}.\[start_index\:end_index\]{start_index:end index}.\[start_pos\:pos\]{start_pos:pos}.\[user\:passwd\@\]{user:passwd@}.hover\:bg-gray-200:hover{--tw-bg-opacity:1;background-color:rgb(229 231 235/var(--tw-bg-opacity,1))}.hover\:bg-gray-300:hover{--tw-bg-opacity:1;background-color:rgb(209 213 219/var(--tw-bg-opacity,1))}.hover\:text-gray-700:hover{--tw-text-opacity:1;color:rgb(55 65 81/var(--tw-text-opacity,1))}@media (min-width:640px){.sm\:inline{display:inline}.sm\:hidden{display:none}}@media (min-width:768px){.md\:flex{display:flex}}@media (prefers-color-scheme:dark){.dark\:inline{display:inline}.dark\:hidden{display:none}.dark\:bg-blue-900\/60{background-color:rgba(30,58,138,.6)}.dark\:bg-emerald-900\/60{background-color:rgba(6,78,59,.6)}.dark\:bg-gray-700{--tw-bg-opacity:1;background-color:rgb(55 65 81/var(--tw-bg-opacity,1))}.dark\:bg-gray-800{--tw-bg-opacity:1;background-color:rgb(31 41 55/var(--tw-bg-opacity,1))}.dark\:bg-gray-800\/90{background-color:rgba(31,41,55,.9)}.dark\:bg-gray-900{--tw-bg-opacity:1;background-color:rgb(17 24 39/var(--tw-bg-opacity,1))}.dark\:bg-orange-900\/60{background-color:rgba(124,45,18,.6)}.dark\:bg-rose-900\/60{background-color:rgba(136,19,55,.6)}.dark\:text-gray-200{--tw-text-opacity:1;color:rgb(229 231 235/var(--tw-text-opacity,1))}.dark\:ring-blue-700{--tw-ring-opacity:1;--tw-ring-color:rgb(29 78 216/var(--tw-ring-opacity,1))}.dark\:ring-emerald-700{--tw-ring-opacity:1;--tw-ring-color:rgb(4 120 87/var(--tw-ring-opacity,1))}.dark\:ring-orange-700{--tw-ring-opacity:1;--tw-ring-color:rgb(194 65 12/var(--tw-ring-opacity,1))}.dark\:ring-rose-700{--tw-ring-opacity:1;--tw-ring-color:rgb(190 18 60/var(--tw-ring-opacity,1))}.dark\:hover\:bg-gray-600:hover{--tw-bg-opacity:1;background-color:rgb(75 85 99/var(--tw-bg-opacity,1))}}
```

--------------------------------------------------------------------------------
/examples/ollama_integration_demo.py:
--------------------------------------------------------------------------------

```python
  1 | #!/usr/bin/env python
  2 | """Ollama integration demonstration using Ultimate MCP Server."""
  3 | import asyncio
  4 | import sys
  5 | import time
  6 | from pathlib import Path
  7 | 
  8 | # Add project root to path for imports when running as script
  9 | sys.path.insert(0, str(Path(__file__).parent.parent))
 10 | 
 11 | # Import configuration system instead of using update_ollama_env
 12 | from ultimate_mcp_server.config import get_config
 13 | 
 14 | # Load the config to ensure environment variables from .env are read
 15 | config = get_config()
 16 | 
 17 | # Third-party imports
 18 | # These imports need to be below sys.path modification, which is why they have noqa comments
 19 | from rich import box  # noqa: E402
 20 | from rich.markup import escape  # noqa: E402
 21 | from rich.panel import Panel  # noqa: E402
 22 | from rich.rule import Rule  # noqa: E402
 23 | from rich.table import Table  # noqa: E402
 24 | 
 25 | # Project imports
 26 | from ultimate_mcp_server.constants import Provider  # noqa: E402
 27 | from ultimate_mcp_server.core.server import Gateway  # noqa: E402
 28 | from ultimate_mcp_server.utils import get_logger  # noqa: E402
 29 | from ultimate_mcp_server.utils.display import CostTracker  # Import CostTracker  # noqa: E402
 30 | from ultimate_mcp_server.utils.logging.console import console  # noqa: E402
 31 | 
 32 | # Initialize logger
 33 | logger = get_logger("example.ollama_integration_demo")
 34 | 
 35 | 
 36 | async def compare_ollama_models(tracker: CostTracker):
 37 |     """Compare different Ollama models."""
 38 |     console.print(Rule("[bold blue]🦙 Ollama Model Comparison[/bold blue]"))
 39 |     logger.info("Starting Ollama models comparison", emoji_key="start")
 40 |     
 41 |     # Create Gateway instance - this handles provider initialization
 42 |     gateway = Gateway("ollama-demo", register_tools=False)
 43 |     
 44 |     try:
 45 |         # Initialize providers
 46 |         logger.info("Initializing providers...", emoji_key="provider")
 47 |         await gateway._initialize_providers()
 48 |         
 49 |         provider_name = Provider.OLLAMA.value
 50 |         try:
 51 |             # Get the provider from the gateway
 52 |             provider = gateway.providers.get(provider_name)
 53 |             if not provider:
 54 |                 logger.error(f"Provider {provider_name} not available or initialized", emoji_key="error")
 55 |                 return
 56 |             
 57 |             logger.info(f"Using provider: {provider_name}", emoji_key="provider")
 58 |             
 59 |             models = await provider.list_models()
 60 |             model_names = [m["id"] for m in models] # Extract names from model dictionaries
 61 |             console.print(f"Found {len(model_names)} Ollama models: [cyan]{escape(str(model_names))}[/cyan]")
 62 |             
 63 |             # Select specific models to compare (adjust these based on what you have installed locally)
 64 |             ollama_models = [
 65 |                 "mix_77/gemma3-qat-tools:27b", 
 66 |                 "JollyLlama/GLM-Z1-32B-0414-Q4_K_M:latest",
 67 |                 "llama3.2-vision:latest"
 68 |             ]
 69 |             # Filter based on available models
 70 |             models_to_compare = [m for m in ollama_models if m in model_names]
 71 |             if not models_to_compare:
 72 |                 logger.error("None of the selected models for comparison are available. Please use 'ollama pull MODEL' to download models first.", emoji_key="error")
 73 |                 console.print("[red]Selected models not found. Use 'ollama pull mix_77/gemma3-qat-tools:27b' to download models first.[/red]")
 74 |                 return
 75 |             console.print(f"Comparing models: [yellow]{escape(str(models_to_compare))}[/yellow]")
 76 |             
 77 |             prompt = """
 78 |             Explain the concept of quantum entanglement in a way that a high school student would understand.
 79 |             Keep your response brief and accessible.
 80 |             """
 81 |             console.print(f"[cyan]Using Prompt:[/cyan] {escape(prompt.strip())[:100]}...")
 82 |             
 83 |             results_data = []
 84 |             
 85 |             for model_name in models_to_compare:
 86 |                 try:
 87 |                     logger.info(f"Testing model: {model_name}", emoji_key="model")
 88 |                     start_time = time.time()
 89 |                     result = await provider.generate_completion(
 90 |                         prompt=prompt,
 91 |                         model=model_name,
 92 |                         temperature=0.3,
 93 |                         max_tokens=300
 94 |                     )
 95 |                     processing_time = time.time() - start_time
 96 |                     
 97 |                     # Track the cost
 98 |                     tracker.add_call(result)
 99 |                     
100 |                     results_data.append({
101 |                         "model": model_name,
102 |                         "text": result.text,
103 |                         "tokens": {
104 |                             "input": result.input_tokens,
105 |                             "output": result.output_tokens,
106 |                             "total": result.total_tokens
107 |                         },
108 |                         "cost": result.cost,
109 |                         "time": processing_time
110 |                     })
111 |                     
112 |                     logger.success(
113 |                         f"Completion for {model_name} successful",
114 |                         emoji_key="success",
115 |                         # Tokens/cost/time logged implicitly by storing in results_data
116 |                     )
117 |                     
118 |                 except Exception as e:
119 |                     logger.error(f"Error testing model {model_name}: {str(e)}", emoji_key="error", exc_info=True)
120 |                     # Optionally add an error entry to results_data if needed
121 |             
122 |             # Display comparison results using Rich
123 |             if results_data:
124 |                 console.print(Rule("[bold green]Comparison Results[/bold green]"))
125 |                 
126 |                 for result_item in results_data:
127 |                     model = result_item["model"]
128 |                     time_s = result_item["time"]
129 |                     tokens = result_item.get("tokens", {}).get("total", 0)
130 |                     
131 |                     # If tokens is 0 but we have text, estimate based on text length
132 |                     text = result_item.get("text", "").strip()
133 |                     if tokens == 0 and text:
134 |                         # Rough estimate: ~1.3 tokens per word plus some for punctuation
135 |                         tokens = len(text.split()) + len(text) // 10
136 |                         # Update the result item for cost tracking
137 |                         result_item["tokens"]["output"] = tokens
138 |                         result_item["tokens"]["total"] = tokens + result_item["tokens"]["input"]
139 |                         
140 |                     tokens_per_second = tokens / time_s if time_s > 0 and tokens > 0 else 0
141 |                     cost = result_item.get("cost", 0.0)
142 |                     
143 |                     stats_line = (
144 |                         f"Time: [yellow]{time_s:.2f}s[/yellow] | "
145 |                         f"Tokens: [cyan]{tokens}[/cyan] | "
146 |                         f"Speed: [blue]{tokens_per_second:.1f} tok/s[/blue] | "
147 |                         f"Cost: [green]${cost:.6f}[/green]"
148 |                     )
149 |                     
150 |                     console.print(Panel(
151 |                         escape(text),
152 |                         title=f"[bold magenta]{escape(model)}[/bold magenta]",
153 |                         subtitle=stats_line,
154 |                         border_style="blue",
155 |                         expand=False
156 |                     ))
157 |                 console.print()
158 |             
159 |         except Exception as e:
160 |             logger.error(f"Error in model comparison: {str(e)}", emoji_key="error", exc_info=True)
161 |     finally:
162 |         # Ensure resources are cleaned up
163 |         if hasattr(gateway, 'shutdown'):
164 |             await gateway.shutdown()
165 | 
166 | 
167 | async def demonstrate_system_prompt(tracker: CostTracker):
168 |     """Demonstrate Ollama with system prompts."""
169 |     console.print(Rule("[bold blue]🦙 Ollama System Prompt Demonstration[/bold blue]"))
170 |     logger.info("Demonstrating Ollama with system prompts", emoji_key="start")
171 |     
172 |     # Create Gateway instance - this handles provider initialization
173 |     gateway = Gateway("ollama-demo", register_tools=False)
174 |     
175 |     try:
176 |         # Initialize providers
177 |         logger.info("Initializing providers...", emoji_key="provider")
178 |         await gateway._initialize_providers()
179 |         
180 |         provider_name = Provider.OLLAMA.value
181 |         try:
182 |             # Get the provider from the gateway
183 |             provider = gateway.providers.get(provider_name)
184 |             if not provider:
185 |                 logger.error(f"Provider {provider_name} not available or initialized", emoji_key="error")
186 |                 return
187 |             
188 |             # Use mix_77/gemma3-qat-tools:27b (ensure it's available)
189 |             model = "mix_77/gemma3-qat-tools:27b"
190 |             available_models = await provider.list_models()
191 |             model_names = [m["id"] for m in available_models]
192 |             
193 |             if model not in model_names:
194 |                 logger.warning(f"Model {model} not available, please run 'ollama pull mix_77/gemma3-qat-tools:27b'", emoji_key="warning")
195 |                 console.print("[yellow]Model not found. Please run 'ollama pull mix_77/gemma3-qat-tools:27b' to download it.[/yellow]")
196 |                 return
197 |             
198 |             logger.info(f"Using model: {model}", emoji_key="model")
199 |             
200 |             system_prompt = """
201 | You are a helpful assistant with expertise in physics.
202 | Keep all explanations accurate but very concise.
203 | Always provide real-world examples to illustrate concepts.
204 | """
205 |             user_prompt = "Explain the concept of gravity."
206 |             
207 |             logger.info("Generating completion with system prompt", emoji_key="processing")
208 |             
209 |             result = await provider.generate_completion(
210 |                 prompt=user_prompt,
211 |                 model=model,
212 |                 temperature=0.7,
213 |                 system=system_prompt,
214 |                 max_tokens=1000  # Increased max_tokens
215 |             )
216 |             
217 |             # Track the cost
218 |             tracker.add_call(result)
219 |             
220 |             logger.success("Completion with system prompt successful", emoji_key="success")
221 |             
222 |             # Display result using Rich Panels
223 |             console.print(Panel(
224 |                 escape(system_prompt.strip()),
225 |                 title="[bold cyan]System Prompt[/bold cyan]",
226 |                 border_style="dim cyan",
227 |                 expand=False
228 |             ))
229 |             console.print(Panel(
230 |                 escape(user_prompt.strip()),
231 |                 title="[bold yellow]User Prompt[/bold yellow]",
232 |                 border_style="dim yellow",
233 |                 expand=False
234 |             ))
235 |             console.print(Panel(
236 |                 escape(result.text.strip()),
237 |                 title="[bold green]Ollama Response[/bold green]",
238 |                 border_style="green",
239 |                 expand=False
240 |             ))
241 |             
242 |             # Display stats in a small table
243 |             stats_table = Table(title="Execution Stats", show_header=False, box=box.MINIMAL, expand=False)
244 |             stats_table.add_column("Metric", style="cyan")
245 |             stats_table.add_column("Value", style="white")
246 |             stats_table.add_row("Input Tokens", str(result.input_tokens))
247 |             stats_table.add_row("Output Tokens", str(result.output_tokens))
248 |             stats_table.add_row("Cost", f"${result.cost:.6f}")
249 |             stats_table.add_row("Processing Time", f"{result.processing_time:.3f}s")
250 |             console.print(stats_table)
251 |             console.print()
252 |             
253 |         except Exception as e:
254 |             logger.error(f"Error in system prompt demonstration: {str(e)}", emoji_key="error", exc_info=True)
255 |             # Optionally re-raise or handle differently
256 |     finally:
257 |         # Ensure resources are cleaned up
258 |         if hasattr(gateway, 'shutdown'):
259 |             await gateway.shutdown()
260 | 
261 | 
262 | async def demonstrate_streaming(tracker: CostTracker):
263 |     """Demonstrate Ollama streaming capabilities."""
264 |     console.print(Rule("[bold blue]🦙 Ollama Streaming Demonstration[/bold blue]"))
265 |     logger.info("Demonstrating Ollama streaming capabilities", emoji_key="start")
266 |     
267 |     # Create Gateway instance - this handles provider initialization
268 |     gateway = Gateway("ollama-demo", register_tools=False)
269 |     
270 |     try:
271 |         # Initialize providers
272 |         logger.info("Initializing providers...", emoji_key="provider")
273 |         await gateway._initialize_providers()
274 |         
275 |         provider_name = Provider.OLLAMA.value
276 |         try:
277 |             # Get the provider from the gateway
278 |             provider = gateway.providers.get(provider_name)
279 |             if not provider:
280 |                 logger.error(f"Provider {provider_name} not available or initialized", emoji_key="error")
281 |                 return
282 |             
283 |             # Use any available Ollama model
284 |             model = provider.get_default_model()
285 |             logger.info(f"Using model: {model}", emoji_key="model")
286 |             
287 |             prompt = "Write a short poem about programming with AI"
288 |             console.print(Panel(
289 |                 escape(prompt.strip()),
290 |                 title="[bold yellow]Prompt[/bold yellow]",
291 |                 border_style="dim yellow",
292 |                 expand=False
293 |             ))
294 |             
295 |             logger.info("Generating streaming completion", emoji_key="processing")
296 |             
297 |             console.print("[bold green]Streaming response:[/bold green]")
298 |             
299 |             # Stream completion and display tokens as they arrive
300 |             output_text = ""
301 |             token_count = 0
302 |             start_time = time.time()
303 |             
304 |             stream = provider.generate_completion_stream(
305 |                 prompt=prompt,
306 |                 model=model,
307 |                 temperature=0.7,
308 |                 max_tokens=200
309 |             )
310 |             
311 |             final_metadata = None
312 |             async for chunk, metadata in stream:
313 |                 # Display the streaming chunk
314 |                 console.print(chunk, end="", highlight=False)
315 |                 output_text += chunk
316 |                 token_count += 1
317 |                 final_metadata = metadata
318 |             
319 |             # Newline after streaming is complete
320 |             console.print()
321 |             console.print()
322 |             
323 |             processing_time = time.time() - start_time
324 |             
325 |             if final_metadata:
326 |                 # Track the cost at the end
327 |                 if tracker:
328 |                     # Create a simple object with the necessary attributes for the tracker
329 |                     class StreamingCall:
330 |                         def __init__(self, metadata):
331 |                             self.model = metadata.get("model", "")
332 |                             self.provider = metadata.get("provider", "")
333 |                             self.input_tokens = metadata.get("input_tokens", 0)
334 |                             self.output_tokens = metadata.get("output_tokens", 0)
335 |                             self.total_tokens = metadata.get("total_tokens", 0)
336 |                             self.cost = metadata.get("cost", 0.0)
337 |                             self.processing_time = metadata.get("processing_time", 0.0)
338 |                     
339 |                     tracker.add_call(StreamingCall(final_metadata))
340 |                 
341 |                 # Display stats
342 |                 stats_table = Table(title="Streaming Stats", show_header=False, box=box.MINIMAL, expand=False)
343 |                 stats_table.add_column("Metric", style="cyan")
344 |                 stats_table.add_column("Value", style="white")
345 |                 stats_table.add_row("Input Tokens", str(final_metadata.get("input_tokens", 0)))
346 |                 stats_table.add_row("Output Tokens", str(final_metadata.get("output_tokens", 0)))
347 |                 stats_table.add_row("Cost", f"${final_metadata.get('cost', 0.0):.6f}")
348 |                 stats_table.add_row("Processing Time", f"{processing_time:.3f}s")
349 |                 stats_table.add_row("Tokens per Second", f"{final_metadata.get('output_tokens', 0) / processing_time if processing_time > 0 else 0:.1f}")
350 |                 console.print(stats_table)
351 |             
352 |             logger.success("Streaming demonstration completed", emoji_key="success")
353 |             
354 |         except Exception as e:
355 |             logger.error(f"Error in streaming demonstration: {str(e)}", emoji_key="error", exc_info=True)
356 |     finally:
357 |         # Ensure resources are cleaned up
358 |         if hasattr(gateway, 'shutdown'):
359 |             await gateway.shutdown()
360 | 
361 | 
362 | async def explore_ollama_models():
363 |     """Display available Ollama models."""
364 |     console.print(Rule("[bold cyan]🦙 Available Ollama Models[/bold cyan]"))
365 |     
366 |     # Create Gateway instance - this handles provider initialization
367 |     gateway = Gateway("ollama-demo", register_tools=False)
368 |     
369 |     try:
370 |         # Initialize providers
371 |         logger.info("Initializing providers...", emoji_key="provider")
372 |         await gateway._initialize_providers()
373 |         
374 |         # Get provider from the gateway
375 |         provider = gateway.providers.get(Provider.OLLAMA.value)
376 |         if not provider:
377 |             logger.error(f"Provider {Provider.OLLAMA.value} not available or initialized", emoji_key="error")
378 |             console.print("[red]Ollama provider not available. Make sure Ollama is installed and running on your machine.[/red]")
379 |             console.print("[yellow]Visit https://ollama.com/download for installation instructions.[/yellow]")
380 |             return
381 |         
382 |         # Get list of available models
383 |         try:
384 |             models = await provider.list_models()
385 |             
386 |             if not models:
387 |                 console.print("[yellow]No Ollama models found. Use 'ollama pull MODEL' to download models.[/yellow]")
388 |                 console.print("Example: [green]ollama pull mix_77/gemma3-qat-tools:27b[/green]")
389 |                 return
390 |             
391 |             # Create a table to display model information
392 |             table = Table(title="Local Ollama Models")
393 |             table.add_column("Model ID", style="cyan")
394 |             table.add_column("Description", style="green")
395 |             table.add_column("Size", style="yellow")
396 |             
397 |             for model in models:
398 |                 # Extract size from description if available
399 |                 size_str = "Unknown"
400 |                 description = model.get("description", "")
401 |                 
402 |                 # Check if size information is in the description (format: "... (X.XX GB)")
403 |                 import re
404 |                 size_match = re.search(r'\((\d+\.\d+) GB\)', description)
405 |                 if size_match:
406 |                     size_gb = float(size_match.group(1))
407 |                     size_str = f"{size_gb:.2f} GB"
408 |                 
409 |                 table.add_row(
410 |                     model["id"], 
411 |                     description,
412 |                     size_str
413 |                 )
414 |             
415 |             console.print(table)
416 |             console.print("\n[dim]Note: To add more models, use 'ollama pull MODEL_NAME'[/dim]")
417 |         except Exception as e:
418 |             logger.error(f"Error listing Ollama models: {str(e)}", emoji_key="error")
419 |             console.print(f"[red]Failed to list Ollama models: {str(e)}[/red]")
420 |             console.print("[yellow]Make sure Ollama is installed and running on your machine.[/yellow]")
421 |             console.print("[yellow]Visit https://ollama.com/download for installation instructions.[/yellow]")
422 |     finally:
423 |         # Ensure resources are cleaned up
424 |         if hasattr(gateway, 'shutdown'):
425 |             await gateway.shutdown()
426 | 
427 | 
428 | async def main():
429 |     """Run Ollama integration examples."""
430 |     console.print(Panel(
431 |         "[bold]This demonstration shows how to use Ollama with the Ultimate MCP Server.[/bold]\n"
432 |         "Ollama allows you to run LLMs locally on your own machine without sending data to external services.\n\n"
433 |         "[yellow]Make sure you have Ollama installed and running:[/yellow]\n"
434 |         "- Download from [link]https://ollama.com/download[/link]\n"
435 |         "- Pull models with [green]ollama pull mix_77/gemma3-qat-tools:27b[/green] or similar commands",
436 |         title="[bold blue]🦙 Ollama Integration Demo[/bold blue]",
437 |         border_style="blue",
438 |         expand=False
439 |     ))
440 |     
441 |     tracker = CostTracker() # Instantiate tracker here
442 |     try:
443 |         # First show available models
444 |         await explore_ollama_models()
445 |         
446 |         console.print() # Add space between sections
447 |         
448 |         # Run model comparison 
449 |         await compare_ollama_models(tracker) # Pass tracker
450 |         
451 |         console.print() # Add space between sections
452 |         
453 |         # Run system prompt demonstration
454 |         await demonstrate_system_prompt(tracker) # Pass tracker
455 |         
456 |         console.print() # Add space between sections
457 |         
458 |         # Run streaming demonstration
459 |         await demonstrate_streaming(tracker) # Pass tracker
460 | 
461 |         # Display final summary
462 |         tracker.display_summary(console) # Display summary at the end
463 |         
464 |     except Exception as e:
465 |         logger.critical(f"Example failed: {str(e)}", emoji_key="critical", exc_info=True)
466 |         return 1
467 |     finally:
468 |         # Clean up any remaining aiohttp resources by forcing garbage collection
469 |         import gc
470 |         gc.collect()
471 |     
472 |     logger.success("Ollama Integration Demo Finished Successfully!", emoji_key="complete")
473 |     return 0
474 | 
475 | 
476 | if __name__ == "__main__":
477 |     exit_code = asyncio.run(main())
478 |     sys.exit(exit_code) 
```

--------------------------------------------------------------------------------
/ultimate_mcp_server/tools/pyodide_boot_template.html:
--------------------------------------------------------------------------------

```html
  1 | <!DOCTYPE html>
  2 | <html lang="en">
  3 | <head>
  4 |     <meta charset="utf-8" />
  5 |     <meta name="viewport" content="width=device-width,initial-scale=1" />
  6 |     <meta http-equiv="Referrer-Policy" content="strict-origin-when-cross-origin" />
  7 |     <title>Pyodide Sandbox v__PYODIDE_VERSION__ (Full)</title>
  8 |     <script>
  9 |         /* SessionStorage stub */
 10 |         try { if(typeof window!=='undefined' && typeof window.sessionStorage!=='undefined'){void window.sessionStorage;}else{throw new Error();} }
 11 |         catch(_){ if(typeof window!=='undefined'){Object.defineProperty(window, "sessionStorage", {value:{getItem:()=>null,setItem:()=>{},removeItem:()=>{},clear:()=>{},key:()=>null,length:0},configurable:true,writable:false});console.warn("sessionStorage stubbed.");} }
 12 |     </script>
 13 |     <style>
 14 |         /* Basic styles */
 15 |         body { font-family: system-ui, sans-serif; margin: 15px; background-color: #f8f9fa; color: #212529; }
 16 |         .status { padding: 10px 15px; border: 1px solid #dee2e6; margin-bottom: 15px; border-radius: 0.25rem; font-size: 0.9em; }
 17 |         .status-loading { background-color: #e9ecef; border-color: #ced4da; color: #495057; }
 18 |         .status-ready { background-color: #d1e7dd; border-color: #a3cfbb; color: #0a3622;}
 19 |         .status-error { background-color: #f8d7da; border-color: #f5c2c7; color: #842029; font-weight: bold; }
 20 |     </style>
 21 | </head>
 22 | <body>
 23 | 
 24 | <div id="status-indicator" class="status status-loading">Initializing Sandbox...</div>
 25 | 
 26 | <script type="module">
 27 |     // --- Constants ---
 28 |     const CDN_BASE = "__CDN_BASE__";
 29 |     const PYODIDE_VERSION = "__PYODIDE_VERSION__";
 30 |     // *** Use the constant for packages loaded AT STARTUP ***
 31 |     const CORE_PACKAGES_JSON = '__CORE_PACKAGES_JSON__';
 32 |     const MEM_LIMIT_MB = parseInt("__MEM_LIMIT_MB__", 10);
 33 |     const logPrefix = "[PyodideFull]"; // Updated prefix
 34 | 
 35 |     // --- DOM Element ---
 36 |     const statusIndicator = document.getElementById('status-indicator');
 37 | 
 38 |     // --- Performance & Heap Helpers ---
 39 |     const perf = (typeof performance !== 'undefined') ? performance : { now: () => Date.now() };
 40 |     const now = () => perf.now();
 41 |     const heapMB = () => (performance?.memory?.usedJSHeapSize ?? 0) / 1048576;
 42 | 
 43 |     // --- Safe Proxy Destruction Helper ---
 44 |     function safeDestroy(proxy, name, handlerLogPrefix) {
 45 |         try {
 46 |             if (proxy && typeof proxy === 'object' && typeof proxy.destroy === 'function') {
 47 |                 const proxyId = proxy.toString ? proxy.toString() : '(proxy)';
 48 |                 console.log(`${handlerLogPrefix} Destroying ${name} proxy: ${proxyId}`);
 49 |                 proxy.destroy();
 50 |                 console.log(`${handlerLogPrefix} ${name} proxy destroyed.`);
 51 |                 return true;
 52 |             }
 53 |         } catch (e) { console.warn(`${handlerLogPrefix} Error destroying ${name}:`, e?.message || e); }
 54 |         return false;
 55 |     }
 56 | 
 57 |     // --- Python Runner Code Template (Reads code from its own global scope) ---
 58 |     // This should be the same simple runner that worked before
 59 |     const pythonRunnerTemplate = `
 60 | import sys, io, contextlib, traceback, time, base64, json
 61 | print(f"[PyRunnerFull] Starting execution...")
 62 | _stdout = io.StringIO(); _stderr = io.StringIO()
 63 | result_value = None; error_info = None; execution_ok = False; elapsed_ms = 0.0; user_code = None
 64 | try:
 65 |     # Get code from the global scope *this script is run in*
 66 |     print("[PyRunnerFull] Getting code from _USER_CODE_TO_EXEC...")
 67 |     user_code = globals().get("_USER_CODE_TO_EXEC") # Read code set by JS onto the *passed* globals proxy
 68 |     if user_code is None: raise ValueError("_USER_CODE_TO_EXEC global not found in execution scope.")
 69 |     print(f"[PyRunnerFull] Code retrieved ({len(user_code)} chars). Ready to execute.")
 70 |     start_time = time.time()
 71 |     print(f"[PyRunnerFull] Executing user code...")
 72 |     try:
 73 |         with contextlib.redirect_stdout(_stdout), contextlib.redirect_stderr(_stderr):
 74 |             compiled_code = compile(source=user_code, filename='<user_code>', mode='exec')
 75 |             exec(compiled_code, globals()) # Execute in the provided globals
 76 |         if 'result' in globals(): result_value = globals()['result']
 77 |         execution_ok = True
 78 |         print("[PyRunnerFull] User code execution finished successfully.")
 79 |     except Exception as e:
 80 |         exc_type=type(e).__name__; exc_msg=str(e); tb_str=traceback.format_exc()
 81 |         error_info = {'type':exc_type, 'message':exc_msg, 'traceback':tb_str}
 82 |         print(f"[PyRunnerFull] User code execution failed: {exc_type}: {exc_msg}\\n{tb_str}")
 83 |     finally: elapsed_ms = (time.time() - start_time) * 1000 if 'start_time' in locals() else 0
 84 |     print(f"[PyRunnerFull] Exec phase took: {elapsed_ms:.1f}ms")
 85 | except Exception as outer_err:
 86 |      tb_str = traceback.format_exc(); error_info = {'type': type(outer_err).__name__, 'message': str(outer_err), 'traceback': tb_str}
 87 |      print(f"[PyRunnerFull] Setup/GetCode Error: {outer_err}\\n{tb_str}")
 88 |      execution_ok = False
 89 | payload_dict = {'ok':execution_ok,'stdout':_stdout.getvalue(),'stderr':_stderr.getvalue(),'elapsed':elapsed_ms,'result':result_value,'error':error_info}
 90 | print("[PyRunnerFull] Returning payload dictionary.")
 91 | payload_dict # Return value
 92 | `; // End of pythonRunnerTemplate
 93 | 
 94 |     // --- Main Async IIFE ---
 95 |     (async () => {
 96 |         let BOOT_MS = 0; let t0 = 0;
 97 |         let corePackagesToLoad = []; // Store parsed core packages list
 98 |         try {
 99 |             // === Step 1: Prepare for Load ===
100 |             t0 = now(); console.log(`${logPrefix} Boot script starting at t0=${t0}`);
101 |             statusIndicator.textContent = `Importing Pyodide v${PYODIDE_VERSION}...`;
102 | 
103 |             // Parse CORE packages list BEFORE calling loadPyodide
104 |             try {
105 |                 corePackagesToLoad = JSON.parse(CORE_PACKAGES_JSON);
106 |                 if (!Array.isArray(corePackagesToLoad)) { corePackagesToLoad = []; }
107 |                 console.log(`${logPrefix} Core packages requested for init:`, corePackagesToLoad.length > 0 ? corePackagesToLoad : '(none)');
108 |             } catch (parseErr) {
109 |                 console.error(`${logPrefix} Error parsing core packages JSON:`, parseErr);
110 |                 statusIndicator.textContent += ' (Error parsing core package list!)';
111 |                 corePackagesToLoad = [];
112 |             }
113 | 
114 |             // === Step 2: Load Pyodide (with packages option) ===
115 |             const { loadPyodide } = await import(`${CDN_BASE}/pyodide.mjs`);
116 |             console.log(`${logPrefix} Calling loadPyodide with core packages...`);
117 |             statusIndicator.textContent = `Loading Pyodide runtime & core packages...`;
118 | 
119 |             window.pyodide = await loadPyodide({
120 |                 indexURL: `${CDN_BASE}/`,
121 |                 packages: corePackagesToLoad // Load core packages during initialization
122 |             });
123 |             const pyodide = window.pyodide;
124 |             console.log(`${logPrefix} Pyodide core and initial packages loaded. Version: ${pyodide.version}`);
125 |             statusIndicator.textContent = 'Pyodide core & packages loaded.';
126 | 
127 |             // === Step 3: Verify Loaded Packages (Optional Debugging) ===
128 |             if (corePackagesToLoad.length > 0) {
129 |                 const loaded = pyodide.loadedPackages ? Object.keys(pyodide.loadedPackages) : [];
130 |                 console.log(`${logPrefix} Currently loaded packages:`, loaded);
131 |                 corePackagesToLoad.forEach(pkg => {
132 |                     if (!loaded.includes(pkg)) {
133 |                          console.warn(`${logPrefix} Core package '${pkg}' requested but not loaded! Check CDN/package name.`);
134 |                          statusIndicator.textContent += ` (Warn: ${pkg} failed load)`;
135 |                     }
136 |                 });
137 |             }
138 | 
139 |             BOOT_MS = now() - t0;
140 |             console.log(`${logPrefix} Pyodide setup complete in ${BOOT_MS.toFixed(0)}ms. Heap: ${heapMB().toFixed(1)} MB`);
141 |             statusIndicator.textContent = `Pyodide Ready (${BOOT_MS.toFixed(0)}ms). Awaiting commands...`;
142 |             statusIndicator.className = 'status status-ready';
143 | 
144 | 
145 |             Object.freeze(corePackagesToLoad);
146 |             Object.freeze(statusIndicator);   // prevents accidental re-assign
147 |             
148 |             // ================== Main Message Handler ==================
149 |             console.log(`${logPrefix} Setting up main message listener...`);
150 |             window.addEventListener("message", async (ev) => {
151 |                 const msg = ev.data;
152 |                 if (typeof msg !== 'object' || msg === null || !msg.id) { return; }
153 |                 const handlerLogPrefix = `${logPrefix}[Handler id:${msg.id}]`;
154 |                 console.log(`${handlerLogPrefix} Received: type=${msg.type}`);
155 |                 const wall0 = now();
156 | 
157 |                 const reply = { id: msg.id, ok: false, stdout:'', stderr:'', result:null, elapsed:0, wall_ms:0, error:null };
158 |                 let pyResultProxy = null;
159 |                 let namespaceProxy = null; // Holds the target execution scope
160 |                 let micropipProxy = null;
161 |                 let persistentReplProxyToDestroy = null;
162 | 
163 |                 try { // Outer try for message handling
164 |                     if (!window.pyodide) { throw new Error('Pyodide instance lost!'); }
165 |                     const pyodide = window.pyodide;
166 | 
167 |                     // === Handle Reset Message ===
168 |                     if (msg.type === "reset") {
169 |                          console.log(`${handlerLogPrefix} Reset request received.`);
170 |                          try {
171 |                              if (pyodide.globals.has("_MCP_REPL_NS")) {
172 |                                  console.log(`${handlerLogPrefix} Found _MCP_REPL_NS, attempting deletion.`);
173 |                                  persistentReplProxyToDestroy = pyodide.globals.get("_MCP_REPL_NS");
174 |                                  pyodide.globals.delete("_MCP_REPL_NS");
175 |                                  reply.cleared = true; console.log(`${handlerLogPrefix} _MCP_REPL_NS deleted.`);
176 |                              } else {
177 |                                  reply.cleared = false; console.log(`${handlerLogPrefix} No _MCP_REPL_NS found.`);
178 |                              }
179 |                              reply.ok = true;
180 |                          } catch (err) {
181 |                               console.error(`${handlerLogPrefix} Error during reset operation:`, err);
182 |                               reply.ok = false; reply.error = { type: err.name || 'ResetError', message: `Reset failed: ${err.message || err}`, traceback: err.stack };
183 |                          } finally {
184 |                               safeDestroy(persistentReplProxyToDestroy, "Persistent REPL (on reset)", handlerLogPrefix);
185 |                               console.log(`${handlerLogPrefix} Delivering reset response via callback (ok=${reply.ok})`);
186 |                               if(typeof window._deliverReplyToHost === 'function') { window._deliverReplyToHost(reply); }
187 |                               else { console.error(`${handlerLogPrefix} Host callback _deliverReplyToHost not found!`); }
188 |                          }
189 |                          return; // Exit handler
190 |                      } // End Reset
191 | 
192 |                     // === Ignore Non-Exec ===
193 |                     if (msg.type !== "exec") { console.log(`${handlerLogPrefix} Ignoring non-exec type: ${msg.type}`); return; }
194 | 
195 |                     // ================== Handle Exec Message ==================
196 |                     console.log(`${handlerLogPrefix} Processing exec request (repl=${msg.repl_mode})`);
197 | 
198 |                     /* === Step 1: Load *Additional* Packages/Wheels === */
199 |                     // Filter out packages already loaded during init
200 |                     const currentlyLoaded = pyodide.loadedPackages ? Object.keys(pyodide.loadedPackages) : [];
201 |                     const additionalPackagesToLoad = msg.packages?.filter(p => !currentlyLoaded.includes(p)) || [];
202 |                     if (additionalPackagesToLoad.length > 0) {
203 |                         const pkgs = additionalPackagesToLoad.join(", ");
204 |                         console.log(`${handlerLogPrefix} Loading additional packages: ${pkgs}`);
205 |                         await pyodide.loadPackage(additionalPackagesToLoad).catch(err => {
206 |                             throw new Error(`Additional package loading failed: ${pkgs} - ${err?.message || err}`);
207 |                         });
208 |                         console.log(`${handlerLogPrefix} Additional packages loaded: ${pkgs}`);
209 |                     }
210 | 
211 |                     // Load wheels (ensure micropip is available)
212 |                     if (msg.wheels?.length) {
213 |                         const whls = msg.wheels.join(", ");
214 |                         console.log(`${handlerLogPrefix} Loading wheels: ${whls}`);
215 |                         // Check if micropip needs loading (it might be a core package now)
216 |                         if (!pyodide.loadedPackages || !pyodide.loadedPackages['micropip']) {
217 |                             console.log(`${handlerLogPrefix} Loading micropip for wheels...`);
218 |                             await pyodide.loadPackage("micropip").catch(err => {
219 |                                throw new Error(`Failed to load micropip: ${err?.message || err}`);
220 |                             });
221 |                         }
222 |                         micropipProxy = pyodide.pyimport("micropip");
223 |                         console.log(`${handlerLogPrefix} Installing wheels via micropip...`);
224 |                         for (const whl of msg.wheels) {
225 |                              console.log(`${handlerLogPrefix} Installing wheel: ${whl}`);
226 |                              await micropipProxy.install(whl).catch(err => {
227 |                                  let pyError = ""; if (err instanceof pyodide.ffi.PythonError) pyError = `${err.type}: `;
228 |                                  throw new Error(`Wheel install failed for ${whl}: ${pyError}${err?.message || err}`);
229 |                              });
230 |                              console.log(`${handlerLogPrefix} Wheel installed: ${whl}`);
231 |                          }
232 |                          // Micropip proxy destroyed in finally
233 |                     }
234 | 
235 |                     /* === Step 2: Prepare Namespace Proxy (REPL aware) === */
236 |                     if (msg.repl_mode) {
237 |                         if (pyodide.globals.has("_MCP_REPL_NS")) {
238 |                             console.log(`${handlerLogPrefix} Reusing persistent REPL namespace.`);
239 |                             namespaceProxy = pyodide.globals.get("_MCP_REPL_NS");
240 |                             if (!namespaceProxy || typeof namespaceProxy.set !== 'function') {
241 |                                 console.warn(`${handlerLogPrefix} REPL namespace invalid. Resetting.`);
242 |                                 safeDestroy(namespaceProxy, "Invalid REPL", handlerLogPrefix);
243 |                                 namespaceProxy = pyodide.toPy({'__name__': '__main__'});
244 |                                 pyodide.globals.set("_MCP_REPL_NS", namespaceProxy);
245 |                             }
246 |                         } else {
247 |                             console.log(`${handlerLogPrefix} Initializing new persistent REPL namespace.`);
248 |                             namespaceProxy = pyodide.toPy({'__name__': '__main__'});
249 |                             pyodide.globals.set("_MCP_REPL_NS", namespaceProxy);
250 |                         }
251 |                     } else {
252 |                         console.log(`${handlerLogPrefix} Creating fresh temporary namespace.`);
253 |                         namespaceProxy = pyodide.toPy({'__name__': '__main__'});
254 |                     }
255 |                     if (!namespaceProxy || typeof namespaceProxy.set !== 'function') { // Final check
256 |                         throw new Error("Failed to obtain valid namespace proxy.");
257 |                     }
258 | 
259 |                     /* === Step 3: Prepare and Set User Code INTO Namespace Proxy === */
260 |                     let userCode = '';
261 |                     try {
262 |                         if (typeof msg.code_b64 !== 'string' || msg.code_b64 === '') throw new Error("Missing/empty code_b64");
263 |                         userCode = atob(msg.code_b64); // JS base64 decode
264 |                     } catch (decodeErr) { throw new Error(`Base64 decode failed: ${decodeErr.message}`); }
265 |                     console.log(`${handlerLogPrefix} Setting _USER_CODE_TO_EXEC on target namespace proxy...`);
266 |                     namespaceProxy.set("_USER_CODE_TO_EXEC", userCode); // Set ON THE TARGET PROXY
267 | 
268 |                     /* === Step 4: Execute Runner === */
269 |                     console.log(`${handlerLogPrefix} Executing Python runner...`);
270 |                     // Pass the namespaceProxy (which now contains the code) as globals
271 |                     pyResultProxy = await pyodide.runPythonAsync(pythonRunnerTemplate, { globals: namespaceProxy });
272 |                     // Cleanup the code variable from the namespaceProxy afterwards
273 |                     console.log(`${handlerLogPrefix} Deleting _USER_CODE_TO_EXEC from namespace proxy...`);
274 |                     if (namespaceProxy.has && namespaceProxy.has("_USER_CODE_TO_EXEC")) { // Check if method exists
275 |                          namespaceProxy.delete("_USER_CODE_TO_EXEC");
276 |                     } else { console.warn(`${handlerLogPrefix} Could not check/delete _USER_CODE_TO_EXEC from namespace.`); }
277 |                     reply.wall_ms = now() - wall0;
278 |                     console.log(`${handlerLogPrefix} Python runner finished. Wall: ${reply.wall_ms.toFixed(0)}ms`);
279 | 
280 |                     /* === Step 5: Process Result Proxy === */
281 |                     if (!pyResultProxy || typeof pyResultProxy.toJs !== 'function') { throw new Error(`Runner returned invalid result.`); }
282 |                     console.log(`${handlerLogPrefix} Converting Python result payload...`);
283 |                     let jsResultPayload = pyResultProxy.toJs({ dict_converter: Object.fromEntries });
284 |                     Object.assign(reply, jsResultPayload); // Merge python results
285 |                     reply.ok = jsResultPayload.ok;
286 | 
287 |                 } catch (err) { // Catch ANY error during the process
288 |                     reply.wall_ms = now() - wall0;
289 |                     console.error(`${handlerLogPrefix} *** ERROR DURING EXECUTION PROCESS ***:`, err);
290 |                     reply.ok = false;
291 |                     reply.error = { type: err.name || 'JavaScriptError', message: err.message || String(err), traceback: err.stack || null };
292 |                     if (err instanceof pyodide.ffi.PythonError) { reply.error.type = err.type || 'PythonError'; }
293 |                     reply.stdout = reply.stdout || ''; reply.stderr = reply.stderr || ''; reply.result = reply.result || null; reply.elapsed = reply.elapsed || 0;
294 |                 } finally {
295 |                      /* === Step 6: Cleanup Proxies === */
296 |                      console.log(`${handlerLogPrefix} Entering finally block for cleanup.`);
297 |                      safeDestroy(pyResultProxy, "Result", handlerLogPrefix);
298 |                      safeDestroy(micropipProxy, "Micropip", handlerLogPrefix);
299 |                      // Only destroy namespace if it was temporary (non-REPL)
300 |                      if (!msg?.repl_mode) { // Use optional chaining
301 |                          safeDestroy(namespaceProxy, "Temporary Namespace", handlerLogPrefix);
302 |                      } else {
303 |                           console.log(`${handlerLogPrefix} Skipping destruction of persistent REPL namespace proxy.`);
304 |                      }
305 |                      console.log(`${handlerLogPrefix} Cleanup finished.`);
306 | 
307 |                      /* === Step 7: Send Reply via Exposed Function === */
308 |                      console.log(`${handlerLogPrefix} *** Delivering final response via exposed function *** (ok=${reply.ok})`);
309 |                      console.log(`${handlerLogPrefix} Reply payload:`, JSON.stringify(reply, null, 2));
310 |                      try {
311 |                          if (typeof window._deliverReplyToHost === 'function') { window._deliverReplyToHost(reply); console.log(`${handlerLogPrefix} Reply delivered.`); }
312 |                          else { console.error(`${handlerLogPrefix} !!! ERROR: Host function _deliverReplyToHost not found!`); }
313 |                      } catch (deliveryErr) { console.error(`${handlerLogPrefix} !!! FAILED TO DELIVER REPLY !!!`, deliveryErr); }
314 |                 } // End finally block
315 | 
316 |                 /* Step 8: Heap Watchdog */
317 |                 const currentHeapMB = heapMB();
318 |                 if (Number.isFinite(currentHeapMB) && currentHeapMB > MEM_LIMIT_MB) {
319 |                      console.warn(`${handlerLogPrefix}[WATCHDOG] Heap ${currentHeapMB.toFixed(0)}MB > limit ${MEM_LIMIT_MB}MB. Closing!`);
320 |                      statusIndicator.textContent = `Heap limit exceeded (${currentHeapMB.toFixed(0)}MB). Closing...`;
321 |                      statusIndicator.className = 'status status-error';
322 |                      setTimeout(() => window.close(), 200);
323 |                  }
324 | 
325 |             }); // End message listener
326 | 
327 |             console.log(`${logPrefix} Main message listener active.`);
328 |             window.postMessage({ ready: true, boot_ms: BOOT_MS, id: "pyodide_ready" });
329 |             console.log(`${logPrefix} Full Sandbox Ready.`);
330 | 
331 |         } catch (err) { // Catch Initialization Errors
332 |             const initErrorMsg = `FATAL: Pyodide init failed: ${err.message || err}`;
333 |             console.error(`${logPrefix} ${initErrorMsg}`, err.stack || '(no stack)', err);
334 |             statusIndicator.textContent = initErrorMsg; statusIndicator.className = 'status status-error';
335 |             try { window.postMessage({ id: "pyodide_init_error", ok: false, error: { type: err.name || 'InitError', message: initErrorMsg, traceback: err.stack || null } });
336 |             } catch (postErr) { console.error(`${logPrefix} Failed to post init error:`, postErr); }
337 |         }
338 |     })(); // End main IIFE
339 | </script>
340 | 
341 | </body>
342 | </html>
```

--------------------------------------------------------------------------------
/ultimate_mcp_server/clients/rag_client.py:
--------------------------------------------------------------------------------

```python
  1 | """High-level client for RAG (Retrieval-Augmented Generation) operations."""
  2 | 
  3 | from typing import Any, Dict, List, Optional
  4 | 
  5 | from ultimate_mcp_server.services.knowledge_base import (
  6 |     get_knowledge_base_manager,
  7 |     get_knowledge_base_retriever,
  8 |     get_rag_service,
  9 | )
 10 | from ultimate_mcp_server.utils import get_logger
 11 | 
 12 | logger = get_logger("ultimate_mcp_server.clients.rag")
 13 | 
 14 | class RAGClient:
 15 |     """
 16 |     High-level client for Retrieval-Augmented Generation (RAG) operations.
 17 |     
 18 |     The RAGClient provides a simplified, unified interface for building and using
 19 |     RAG systems within the MCP ecosystem. It encapsulates all the key operations in
 20 |     the RAG workflow, from knowledge base creation and document ingestion to context
 21 |     retrieval and LLM-augmented generation.
 22 |     
 23 |     RAG is a technique that enhances LLM capabilities by retrieving relevant information
 24 |     from external knowledge bases before generation, allowing models to access information
 25 |     beyond their training data and produce more accurate, up-to-date responses.
 26 |     
 27 |     Key capabilities:
 28 |     - Knowledge base creation and management
 29 |     - Document ingestion with automatic chunking
 30 |     - Semantic retrieval of relevant context
 31 |     - LLM generation with retrieved context
 32 |     - Various retrieval methods (vector, hybrid, keyword)
 33 |     
 34 |     Architecture:
 35 |     The client follows a modular architecture with three main components:
 36 |     1. Knowledge Base Manager: Handles creation, deletion, and document ingestion
 37 |     2. Knowledge Base Retriever: Responsible for context retrieval using various methods
 38 |     3. RAG Service: Combines retrieval with LLM generation for complete RAG workflow
 39 |     
 40 |     Performance Considerations:
 41 |     - Document chunking size affects both retrieval quality and storage requirements
 42 |     - Retrieval method selection impacts accuracy vs. speed tradeoffs:
 43 |       * Vector search: Fast with good semantic understanding but may miss keyword matches
 44 |       * Keyword search: Good for exact matches but misses semantic similarities
 45 |       * Hybrid search: Most comprehensive but computationally more expensive
 46 |     - Top-k parameter balances between providing sufficient context and relevance dilution
 47 |     - Different LLM models may require different prompt templates for optimal performance
 48 |     
 49 |     This client abstracts away the complexity of the underlying vector stores,
 50 |     embeddings, and retrieval mechanisms, providing a simple API for RAG operations.
 51 |     
 52 |     Example usage:
 53 |     ```python
 54 |     # Create a RAG client
 55 |     client = RAGClient()
 56 |     
 57 |     # Create a knowledge base
 58 |     await client.create_knowledge_base(
 59 |         "company_docs", 
 60 |         "Company documentation and policies"
 61 |     )
 62 |     
 63 |     # Add documents to the knowledge base with metadata
 64 |     await client.add_documents(
 65 |         "company_docs",
 66 |         documents=[
 67 |             "Our return policy allows returns within 30 days of purchase with receipt.",
 68 |             "Product warranties cover manufacturing defects for a period of one year."
 69 |         ],
 70 |         metadatas=[
 71 |             {"source": "policies/returns.pdf", "page": 1, "department": "customer_service"},
 72 |             {"source": "policies/warranty.pdf", "page": 3, "department": "legal"}
 73 |         ],
 74 |         chunk_size=500,
 75 |         chunk_method="semantic"
 76 |     )
 77 |     
 78 |     # Retrieve context without generation (for inspection or custom handling)
 79 |     context = await client.retrieve(
 80 |         "company_docs",
 81 |         query="What is our return policy?",
 82 |         top_k=3,
 83 |         retrieval_method="hybrid"
 84 |     )
 85 |     
 86 |     # Print retrieved context and sources
 87 |     for i, (doc, meta) in enumerate(zip(context["documents"], context["metadatas"])):
 88 |         print(f"Source: {meta.get('source', 'unknown')} | Score: {context['distances'][i]:.3f}")
 89 |         print(f"Content: {doc[:100]}...\n")
 90 |     
 91 |     # Generate a response using RAG with specific provider and template
 92 |     result = await client.generate_with_rag(
 93 |         "company_docs",
 94 |         query="Explain our warranty coverage for electronics",
 95 |         provider="openai",
 96 |         model="gpt-4",
 97 |         template="customer_service_response",
 98 |         temperature=0.3,
 99 |         retrieval_method="hybrid"
100 |     )
101 |     
102 |     print(result["response"])
103 |     print("\nSources:")
104 |     for source in result.get("sources", []):
105 |         print(f"- {source['metadata'].get('source')}")
106 |     ```
107 |     """
108 |     
109 |     def __init__(self):
110 |         """Initialize the RAG client."""
111 |         self.kb_manager = get_knowledge_base_manager()
112 |         self.kb_retriever = get_knowledge_base_retriever()
113 |         self.rag_service = get_rag_service()
114 |     
115 |     async def create_knowledge_base(
116 |         self,
117 |         name: str,
118 |         description: Optional[str] = None,
119 |         overwrite: bool = False
120 |     ) -> Dict[str, Any]:
121 |         """Create a knowledge base.
122 |         
123 |         Args:
124 |             name: The name of the knowledge base
125 |             description: Optional description
126 |             overwrite: Whether to overwrite an existing KB with the same name
127 |             
128 |         Returns:
129 |             Result of the operation
130 |         """
131 |         logger.info(f"Creating knowledge base: {name}", emoji_key="processing")
132 |         
133 |         try:
134 |             result = await self.kb_manager.create_knowledge_base(
135 |                 name=name,
136 |                 description=description,
137 |                 overwrite=overwrite
138 |             )
139 |             
140 |             logger.success(f"Knowledge base created: {name}", emoji_key="success")
141 |             return result
142 |         except Exception as e:
143 |             logger.error(f"Failed to create knowledge base: {str(e)}", emoji_key="error")
144 |             raise
145 |     
146 |     async def add_documents(
147 |         self,
148 |         knowledge_base_name: str,
149 |         documents: List[str],
150 |         metadatas: Optional[List[Dict[str, Any]]] = None,
151 |         chunk_size: int = 1000,
152 |         chunk_method: str = "semantic"
153 |     ) -> Dict[str, Any]:
154 |         """Add documents to the knowledge base.
155 |         
156 |         Args:
157 |             knowledge_base_name: Name of the knowledge base to add to
158 |             documents: List of document texts
159 |             metadatas: Optional list of metadata dictionaries
160 |             chunk_size: Size of chunks to split documents into
161 |             chunk_method: Method to use for chunking ('simple', 'semantic', etc.)
162 |             
163 |         Returns:
164 |             Result of the operation
165 |         """
166 |         logger.info(f"Adding documents to knowledge base: {knowledge_base_name}", emoji_key="processing")
167 |         
168 |         try:
169 |             result = await self.kb_manager.add_documents(
170 |                 knowledge_base_name=knowledge_base_name,
171 |                 documents=documents,
172 |                 metadatas=metadatas,
173 |                 chunk_size=chunk_size,
174 |                 chunk_method=chunk_method
175 |             )
176 |             
177 |             added_count = result.get("added_count", 0)
178 |             logger.success(f"Added {added_count} documents to knowledge base", emoji_key="success")
179 |             return result
180 |         except Exception as e:
181 |             logger.error(f"Failed to add documents: {str(e)}", emoji_key="error")
182 |             raise
183 |     
184 |     async def list_knowledge_bases(self) -> List[Any]:
185 |         """List all knowledge bases.
186 |         
187 |         Returns:
188 |             List of knowledge base information
189 |         """
190 |         logger.info("Retrieving list of knowledge bases", emoji_key="processing")
191 |         
192 |         try:
193 |             knowledge_bases = await self.kb_manager.list_knowledge_bases()
194 |             return knowledge_bases
195 |         except Exception as e:
196 |             logger.error(f"Failed to list knowledge bases: {str(e)}", emoji_key="error")
197 |             raise
198 |     
199 |     async def retrieve(
200 |         self,
201 |         knowledge_base_name: str,
202 |         query: str,
203 |         top_k: int = 3,
204 |         retrieval_method: str = "vector"
205 |     ) -> Dict[str, Any]:
206 |         """
207 |         Retrieve relevant documents from a knowledge base for a given query.
208 |         
209 |         This method performs the retrieval stage of RAG, finding the most relevant
210 |         documents in the knowledge base based on the query. The method is useful
211 |         for standalone retrieval operations or when you want to examine retrieved
212 |         context before generating a response.
213 |         
214 |         The retrieval process works by:
215 |         1. Converting the query into a vector representation (embedding)
216 |         2. Finding documents with similar embeddings and/or matching keywords
217 |         3. Ranking and returning the most relevant documents
218 |         
219 |         Available retrieval methods:
220 |         - "vector": Embedding-based similarity search using cosine distance
221 |           Best for conceptual/semantic queries where exact wording may differ
222 |         - "keyword": Traditional text search using BM25 or similar algorithms
223 |           Best for queries with specific terms that must be matched
224 |         - "hybrid": Combines vector and keyword approaches with a weighted blend
225 |           Good general-purpose approach that balances semantic and keyword matching
226 |         - "rerank": Two-stage retrieval that first gets candidates, then reranks them
227 |           More computationally intensive but often more accurate
228 |         
229 |         Optimization strategies:
230 |         - For factual queries with specific terminology, use "keyword" or "hybrid"
231 |         - For conceptual or paraphrased queries, use "vector"
232 |         - For highest accuracy at cost of performance, use "rerank"
233 |         - Adjust top_k based on document length; shorter documents may need higher top_k
234 |         - Pre-filter by metadata before retrieval when targeting specific sections
235 |         
236 |         Understanding retrieval metrics:
237 |         - "distances" represent similarity scores where lower values indicate higher similarity
238 |           for vector search, and higher values indicate better matches for keyword search
239 |         - Scores are normalized differently between retrieval methods, so direct
240 |           comparison between methods is not meaningful
241 |         - Score thresholds for "good matches" vary based on embedding model and content domain
242 |         
243 |         Args:
244 |             knowledge_base_name: Name of the knowledge base to search
245 |             query: The search query (question or keywords)
246 |             top_k: Maximum number of documents to retrieve (default: 3)
247 |             retrieval_method: Method to use for retrieval ("vector", "keyword", "hybrid", "rerank")
248 |             
249 |         Returns:
250 |             Dictionary containing:
251 |             - "documents": List of retrieved documents (text chunks)
252 |             - "metadatas": List of metadata for each document (source info, etc.)
253 |             - "distances": List of similarity scores or relevance metrics
254 |               (interpretation depends on retrieval_method)
255 |             - "query": The original query
256 |             - "retrieval_method": The method used for retrieval
257 |             - "processing_time_ms": Time taken for retrieval in milliseconds
258 |             
259 |         Raises:
260 |             ValueError: If knowledge_base_name doesn't exist or query is invalid
261 |             Exception: If the retrieval process fails
262 |             
263 |         Example:
264 |             ```python
265 |             # Retrieve context about product returns
266 |             results = await rag_client.retrieve(
267 |                 knowledge_base_name="support_docs",
268 |                 query="How do I return a product?",
269 |                 top_k=5,
270 |                 retrieval_method="hybrid"
271 |             )
272 |             
273 |             # Check if we got high-quality matches
274 |             if results["documents"] and min(results["distances"]) < 0.3:  # Good match threshold
275 |                 print("Found relevant information!")
276 |             else:
277 |                 print("No strong matches found, consider reformulating the query")
278 |             
279 |             # Display retrieved documents and their sources with scores
280 |             for i, (doc, meta) in enumerate(zip(results["documents"], results["metadatas"])):
281 |                 score = results["distances"][i]
282 |                 score_indicator = "🟢" if score < 0.3 else "🟡" if score < 0.6 else "🔴"
283 |                 print(f"{score_indicator} Result {i+1} (score: {score:.3f}):")
284 |                 print(f"Source: {meta.get('source', 'unknown')}")
285 |                 print(doc[:100] + "...\n")
286 |             ```
287 |         """
288 |         logger.info(f"Retrieving context for query: '{query}'", emoji_key="processing")
289 |         
290 |         try:
291 |             results = await self.kb_retriever.retrieve(
292 |                 knowledge_base_name=knowledge_base_name,
293 |                 query=query,
294 |                 top_k=top_k,
295 |                 retrieval_method=retrieval_method
296 |             )
297 |             
298 |             return results
299 |         except Exception as e:
300 |             logger.error(f"Failed to retrieve context: {str(e)}", emoji_key="error")
301 |             raise
302 |     
303 |     async def generate_with_rag(
304 |         self,
305 |         knowledge_base_name: str,
306 |         query: str,
307 |         provider: str = "gemini",
308 |         model: Optional[str] = None,
309 |         template: str = "rag_default",
310 |         temperature: float = 0.3,
311 |         top_k: int = 3,
312 |         retrieval_method: str = "hybrid",
313 |         include_sources: bool = True
314 |     ) -> Dict[str, Any]:
315 |         """
316 |         Generate a response using Retrieval-Augmented Generation (RAG).
317 |         
318 |         This method performs the complete RAG process:
319 |         1. Retrieves relevant documents from the specified knowledge base based on the query
320 |         2. Constructs a prompt that includes both the query and retrieved context
321 |         3. Sends the augmented prompt to the LLM to generate a response
322 |         4. Optionally includes source information for transparency and citation
323 |         
324 |         The retrieval process can use different methods:
325 |         - "vector": Pure semantic/embedding-based similarity search (good for conceptual queries)
326 |         - "keyword": Traditional keyword-based search (good for specific terms or phrases)
327 |         - "hybrid": Combines vector and keyword approaches (good general-purpose approach)
328 |         - "rerank": Uses a two-stage approach with retrieval and reranking
329 |         
330 |         Args:
331 |             knowledge_base_name: Name of the knowledge base to query for relevant context
332 |             query: The user's question or request
333 |             provider: LLM provider to use for generation (e.g., "openai", "anthropic", "gemini")
334 |             model: Specific model to use (if None, uses provider's default)
335 |             template: Prompt template name that defines how to format the RAG prompt
336 |                       Different templates can be optimized for different use cases
337 |             temperature: Sampling temperature for controlling randomness (0.0-1.0)
338 |                          Lower values recommended for factual RAG responses
339 |             top_k: Number of relevant documents to retrieve and include in the context
340 |                    Higher values provide more context but may dilute relevance
341 |             retrieval_method: The method to use for retrieving documents ("vector", "keyword", "hybrid")
342 |             include_sources: Whether to include source information in the output for citations
343 |             
344 |         Returns:
345 |             Dictionary containing:
346 |             - "response": The generated text response from the LLM
347 |             - "sources": List of source documents and their metadata (if include_sources=True)
348 |             - "context": The retrieved context that was used for generation
349 |             - "prompt": The full prompt that was sent to the LLM (useful for debugging)
350 |             - "tokens": Token usage information (input, output, total)
351 |             - "processing_time_ms": Total processing time in milliseconds
352 |             
353 |         Raises:
354 |             ValueError: If knowledge_base_name or query is invalid
355 |             Exception: If retrieval or generation fails
356 |             
357 |         Example:
358 |             ```python
359 |             # Generate a response using RAG with custom parameters
360 |             result = await rag_client.generate_with_rag(
361 |                 knowledge_base_name="financial_docs",
362 |                 query="What were our Q3 earnings?",
363 |                 provider="openai",
364 |                 model="gpt-4",
365 |                 temperature=0.1,
366 |                 top_k=5,
367 |                 retrieval_method="hybrid"
368 |             )
369 |             
370 |             # Access the response and sources
371 |             print(result["response"])
372 |             for src in result["sources"]:
373 |                 print(f"- {src['metadata']['source']} (relevance: {src['score']})")
374 |             ```
375 |         """
376 |         logger.info(f"Generating RAG response for: '{query}'", emoji_key="processing")
377 |         
378 |         try:
379 |             result = await self.rag_service.generate_with_rag(
380 |                 knowledge_base_name=knowledge_base_name,
381 |                 query=query,
382 |                 provider=provider,
383 |                 model=model,
384 |                 template=template,
385 |                 temperature=temperature,
386 |                 top_k=top_k,
387 |                 retrieval_method=retrieval_method,
388 |                 include_sources=include_sources
389 |             )
390 |             
391 |             return result
392 |         except Exception as e:
393 |             logger.error(f"Failed to call RAG service: {str(e)}", emoji_key="error")
394 |             raise
395 |     
396 |     async def delete_knowledge_base(self, name: str) -> Dict[str, Any]:
397 |         """Delete a knowledge base.
398 |         
399 |         Args:
400 |             name: Name of the knowledge base to delete
401 |             
402 |         Returns:
403 |             Result of the operation
404 |         """
405 |         logger.info(f"Deleting knowledge base: {name}", emoji_key="processing")
406 |         
407 |         try:
408 |             result = await self.kb_manager.delete_knowledge_base(name=name)
409 |             logger.success(f"Knowledge base {name} deleted successfully", emoji_key="success")
410 |             return result
411 |         except Exception as e:
412 |             logger.error(f"Failed to delete knowledge base: {str(e)}", emoji_key="error")
413 |             raise
414 |     
415 |     async def reset_knowledge_base(self, knowledge_base_name: str) -> None:
416 |         """
417 |         Reset (delete and recreate) a knowledge base.
418 |         
419 |         This method completely removes an existing knowledge base and creates a new
420 |         empty one with the same name. This is useful when you need to:
421 |         - Remove all documents from a knowledge base efficiently
422 |         - Fix a corrupted knowledge base
423 |         - Change the underlying embedding model without renaming the knowledge base
424 |         - Update document chunking strategy for an entire collection
425 |         - Clear outdated information before a complete refresh
426 |         
427 |         Performance Considerations:
428 |         - Resetting is significantly faster than removing documents individually
429 |         - For large knowledge bases, resetting and bulk re-adding documents can be
430 |           orders of magnitude more efficient than incremental updates
431 |         - New documents added after reset will use any updated embedding models or
432 |           chunking strategies configured in the system
433 |         
434 |         Data Integrity:
435 |         - This operation preserves knowledge base configuration but removes all content
436 |         - The knowledge base name and any associated permissions remain intact
437 |         - Custom configuration settings on the knowledge base will be preserved
438 |           if the knowledge base service supports configuration persistence
439 |         
440 |         WARNING: This operation is irreversible. All documents and their embeddings
441 |         will be permanently deleted. Consider backing up important data before resetting.
442 |         
443 |         Disaster Recovery:
444 |         Before resetting a production knowledge base, consider these strategies:
445 |         1. Create a backup by exporting documents and metadata if the feature is available
446 |         2. Maintain source documents in original form outside the knowledge base
447 |         3. Document the ingestion pipeline to reproduce the knowledge base if needed
448 |         4. Consider creating a temporary duplicate before resetting critical knowledge bases
449 |         
450 |         The reset process:
451 |         1. Deletes the entire knowledge base collection/index
452 |         2. Creates a new empty knowledge base with the same name
453 |         3. Re-initializes any associated metadata or settings
454 |         
455 |         Args:
456 |             knowledge_base_name: Name of the knowledge base to reset
457 |             
458 |         Returns:
459 |             None
460 |             
461 |         Raises:
462 |             ValueError: If the knowledge base doesn't exist
463 |             Exception: If deletion or recreation fails
464 |             
465 |         Example:
466 |             ```python
467 |             # Backup critical metadata before reset (if needed)
468 |             kb_info = await rag_client.list_knowledge_bases()
469 |             kb_config = next((kb for kb in kb_info if kb['name'] == 'product_documentation'), None)
470 |             
471 |             # Reset a knowledge base that may have outdated or corrupted data
472 |             await rag_client.reset_knowledge_base("product_documentation")
473 |             
474 |             # After resetting, re-add documents with potentially improved chunking strategy
475 |             await rag_client.add_documents(
476 |                 knowledge_base_name="product_documentation",
477 |                 documents=updated_docs,
478 |                 metadatas=updated_metadatas,
479 |                 chunk_size=800,  # Updated chunk size better suited for the content
480 |                 chunk_method="semantic"  # Using semantic chunking for better results
481 |             )
482 |             ```
483 |         """
484 |         logger.info(f"Deleting knowledge base: {knowledge_base_name}", emoji_key="processing")
485 |         
486 |         try:
487 |             result = await self.kb_manager.delete_knowledge_base(name=knowledge_base_name)
488 |             logger.success(f"Knowledge base {knowledge_base_name} deleted successfully", emoji_key="success")
489 |             return result
490 |         except Exception as e:
491 |             logger.error(f"Failed to delete knowledge base: {str(e)}", emoji_key="error")
492 |             raise 
```