This is page 7 of 45. Use http://codebase.md/dicklesworthstone/llm_gateway_mcp_server?lines=true&page={x} to view the full context.
# Directory Structure
```
├── .cursorignore
├── .env.example
├── .envrc
├── .gitignore
├── additional_features.md
├── check_api_keys.py
├── completion_support.py
├── comprehensive_test.py
├── docker-compose.yml
├── Dockerfile
├── empirically_measured_model_speeds.json
├── error_handling.py
├── example_structured_tool.py
├── examples
│ ├── __init__.py
│ ├── advanced_agent_flows_using_unified_memory_system_demo.py
│ ├── advanced_extraction_demo.py
│ ├── advanced_unified_memory_system_demo.py
│ ├── advanced_vector_search_demo.py
│ ├── analytics_reporting_demo.py
│ ├── audio_transcription_demo.py
│ ├── basic_completion_demo.py
│ ├── cache_demo.py
│ ├── claude_integration_demo.py
│ ├── compare_synthesize_demo.py
│ ├── cost_optimization.py
│ ├── data
│ │ ├── sample_event.txt
│ │ ├── Steve_Jobs_Introducing_The_iPhone_compressed.md
│ │ └── Steve_Jobs_Introducing_The_iPhone_compressed.mp3
│ ├── docstring_refiner_demo.py
│ ├── document_conversion_and_processing_demo.py
│ ├── entity_relation_graph_demo.py
│ ├── filesystem_operations_demo.py
│ ├── grok_integration_demo.py
│ ├── local_text_tools_demo.py
│ ├── marqo_fused_search_demo.py
│ ├── measure_model_speeds.py
│ ├── meta_api_demo.py
│ ├── multi_provider_demo.py
│ ├── ollama_integration_demo.py
│ ├── prompt_templates_demo.py
│ ├── python_sandbox_demo.py
│ ├── rag_example.py
│ ├── research_workflow_demo.py
│ ├── sample
│ │ ├── article.txt
│ │ ├── backprop_paper.pdf
│ │ ├── buffett.pdf
│ │ ├── contract_link.txt
│ │ ├── legal_contract.txt
│ │ ├── medical_case.txt
│ │ ├── northwind.db
│ │ ├── research_paper.txt
│ │ ├── sample_data.json
│ │ └── text_classification_samples
│ │ ├── email_classification.txt
│ │ ├── news_samples.txt
│ │ ├── product_reviews.txt
│ │ └── support_tickets.txt
│ ├── sample_docs
│ │ └── downloaded
│ │ └── attention_is_all_you_need.pdf
│ ├── sentiment_analysis_demo.py
│ ├── simple_completion_demo.py
│ ├── single_shot_synthesis_demo.py
│ ├── smart_browser_demo.py
│ ├── sql_database_demo.py
│ ├── sse_client_demo.py
│ ├── test_code_extraction.py
│ ├── test_content_detection.py
│ ├── test_ollama.py
│ ├── text_classification_demo.py
│ ├── text_redline_demo.py
│ ├── tool_composition_examples.py
│ ├── tournament_code_demo.py
│ ├── tournament_text_demo.py
│ ├── unified_memory_system_demo.py
│ ├── vector_search_demo.py
│ ├── web_automation_instruction_packs.py
│ └── workflow_delegation_demo.py
├── LICENSE
├── list_models.py
├── marqo_index_config.json.example
├── mcp_protocol_schema_2025-03-25_version.json
├── mcp_python_lib_docs.md
├── mcp_tool_context_estimator.py
├── model_preferences.py
├── pyproject.toml
├── quick_test.py
├── README.md
├── resource_annotations.py
├── run_all_demo_scripts_and_check_for_errors.py
├── storage
│ └── smart_browser_internal
│ ├── locator_cache.db
│ ├── readability.js
│ └── storage_state.enc
├── test_client.py
├── test_connection.py
├── TEST_README.md
├── test_sse_client.py
├── test_stdio_client.py
├── tests
│ ├── __init__.py
│ ├── conftest.py
│ ├── integration
│ │ ├── __init__.py
│ │ └── test_server.py
│ ├── manual
│ │ ├── test_extraction_advanced.py
│ │ └── test_extraction.py
│ └── unit
│ ├── __init__.py
│ ├── test_cache.py
│ ├── test_providers.py
│ └── test_tools.py
├── TODO.md
├── tool_annotations.py
├── tools_list.json
├── ultimate_mcp_banner.webp
├── ultimate_mcp_logo.webp
├── ultimate_mcp_server
│ ├── __init__.py
│ ├── __main__.py
│ ├── cli
│ │ ├── __init__.py
│ │ ├── __main__.py
│ │ ├── commands.py
│ │ ├── helpers.py
│ │ └── typer_cli.py
│ ├── clients
│ │ ├── __init__.py
│ │ ├── completion_client.py
│ │ └── rag_client.py
│ ├── config
│ │ └── examples
│ │ └── filesystem_config.yaml
│ ├── config.py
│ ├── constants.py
│ ├── core
│ │ ├── __init__.py
│ │ ├── evaluation
│ │ │ ├── base.py
│ │ │ └── evaluators.py
│ │ ├── providers
│ │ │ ├── __init__.py
│ │ │ ├── anthropic.py
│ │ │ ├── base.py
│ │ │ ├── deepseek.py
│ │ │ ├── gemini.py
│ │ │ ├── grok.py
│ │ │ ├── ollama.py
│ │ │ ├── openai.py
│ │ │ └── openrouter.py
│ │ ├── server.py
│ │ ├── state_store.py
│ │ ├── tournaments
│ │ │ ├── manager.py
│ │ │ ├── tasks.py
│ │ │ └── utils.py
│ │ └── ums_api
│ │ ├── __init__.py
│ │ ├── ums_database.py
│ │ ├── ums_endpoints.py
│ │ ├── ums_models.py
│ │ └── ums_services.py
│ ├── exceptions.py
│ ├── graceful_shutdown.py
│ ├── services
│ │ ├── __init__.py
│ │ ├── analytics
│ │ │ ├── __init__.py
│ │ │ ├── metrics.py
│ │ │ └── reporting.py
│ │ ├── cache
│ │ │ ├── __init__.py
│ │ │ ├── cache_service.py
│ │ │ ├── persistence.py
│ │ │ ├── strategies.py
│ │ │ └── utils.py
│ │ ├── cache.py
│ │ ├── document.py
│ │ ├── knowledge_base
│ │ │ ├── __init__.py
│ │ │ ├── feedback.py
│ │ │ ├── manager.py
│ │ │ ├── rag_engine.py
│ │ │ ├── retriever.py
│ │ │ └── utils.py
│ │ ├── prompts
│ │ │ ├── __init__.py
│ │ │ ├── repository.py
│ │ │ └── templates.py
│ │ ├── prompts.py
│ │ └── vector
│ │ ├── __init__.py
│ │ ├── embeddings.py
│ │ └── vector_service.py
│ ├── tool_token_counter.py
│ ├── tools
│ │ ├── __init__.py
│ │ ├── audio_transcription.py
│ │ ├── base.py
│ │ ├── completion.py
│ │ ├── docstring_refiner.py
│ │ ├── document_conversion_and_processing.py
│ │ ├── enhanced-ums-lookbook.html
│ │ ├── entity_relation_graph.py
│ │ ├── excel_spreadsheet_automation.py
│ │ ├── extraction.py
│ │ ├── filesystem.py
│ │ ├── html_to_markdown.py
│ │ ├── local_text_tools.py
│ │ ├── marqo_fused_search.py
│ │ ├── meta_api_tool.py
│ │ ├── ocr_tools.py
│ │ ├── optimization.py
│ │ ├── provider.py
│ │ ├── pyodide_boot_template.html
│ │ ├── python_sandbox.py
│ │ ├── rag.py
│ │ ├── redline-compiled.css
│ │ ├── sentiment_analysis.py
│ │ ├── single_shot_synthesis.py
│ │ ├── smart_browser.py
│ │ ├── sql_databases.py
│ │ ├── text_classification.py
│ │ ├── text_redline_tools.py
│ │ ├── tournament.py
│ │ ├── ums_explorer.html
│ │ └── unified_memory_system.py
│ ├── utils
│ │ ├── __init__.py
│ │ ├── async_utils.py
│ │ ├── display.py
│ │ ├── logging
│ │ │ ├── __init__.py
│ │ │ ├── console.py
│ │ │ ├── emojis.py
│ │ │ ├── formatter.py
│ │ │ ├── logger.py
│ │ │ ├── panels.py
│ │ │ ├── progress.py
│ │ │ └── themes.py
│ │ ├── parse_yaml.py
│ │ ├── parsing.py
│ │ ├── security.py
│ │ └── text.py
│ └── working_memory_api.py
├── unified_memory_system_technical_analysis.md
└── uv.lock
```
# Files
--------------------------------------------------------------------------------
/examples/compare_synthesize_demo.py:
--------------------------------------------------------------------------------
```python
1 | #!/usr/bin/env python
2 | """Enhanced demo of the Advanced Response Comparator & Synthesizer Tool."""
3 | import asyncio
4 | import json
5 | import sys
6 | from collections import namedtuple # Import namedtuple
7 | from pathlib import Path
8 |
9 | # Add project root to path for imports when running as script
10 | sys.path.insert(0, str(Path(__file__).parent.parent))
11 |
12 | from rich import box
13 | from rich.markup import escape
14 | from rich.panel import Panel
15 | from rich.rule import Rule
16 | from rich.syntax import Syntax
17 | from rich.table import Table
18 |
19 | from ultimate_mcp_server.constants import Provider
20 | from ultimate_mcp_server.core.server import Gateway # Use Gateway to get MCP
21 |
22 | # from ultimate_mcp_server.tools.meta import compare_and_synthesize # Add correct import
23 | from ultimate_mcp_server.utils import get_logger
24 | from ultimate_mcp_server.utils.display import CostTracker # Import CostTracker
25 | from ultimate_mcp_server.utils.logging.console import console
26 |
27 | # Initialize logger
28 | logger = get_logger("example.compare_synthesize_v2")
29 |
30 | # Create a simple structure for cost tracking from dict (tokens might be missing)
31 | TrackableResult = namedtuple("TrackableResult", ["cost", "input_tokens", "output_tokens", "provider", "model", "processing_time"])
32 |
33 | # Global MCP instance (will be populated from Gateway)
34 | mcp = None
35 |
36 | async def setup_gateway_and_tools():
37 | """Set up the gateway and register tools."""
38 | global mcp
39 | logger.info("Initializing Gateway and MetaTools for enhanced demo...", emoji_key="start")
40 | gateway = Gateway("compare-synthesize-demo-v2", register_tools=False)
41 |
42 | # Initialize providers (needed for the tool to function)
43 | try:
44 | await gateway._initialize_providers()
45 | except Exception as e:
46 | logger.critical(f"Failed to initialize providers: {e}. Check API keys.", emoji_key="critical", exc_info=True)
47 | sys.exit(1) # Exit if providers can't be initialized
48 |
49 | # REMOVE MetaTools instance
50 | # meta_tools = MetaTools(gateway) # Pass the gateway instance # noqa: F841
51 | mcp = gateway.mcp # Store the MCP server instance
52 |
53 | # Manually register the required tool
54 | # mcp.tool()(compare_and_synthesize)
55 | # logger.info("Manually registered compare_and_synthesize tool.")
56 |
57 | # Verify tool registration
58 | tool_list = await mcp.list_tools()
59 | tool_names = [t.name for t in tool_list] # Access name attribute directly
60 | # Use console.print for tool list
61 | console.print(f"Registered tools: [cyan]{escape(str(tool_names))}[/cyan]")
62 | if "compare_and_synthesize" in tool_names:
63 | logger.success("compare_and_synthesize tool registered successfully.", emoji_key="success")
64 | else:
65 | logger.error("compare_and_synthesize tool FAILED to register.", emoji_key="error")
66 | sys.exit(1) # Exit if the required tool isn't available
67 |
68 | logger.success("Setup complete.", emoji_key="success")
69 |
70 | # Refactored print_result function using Rich
71 | def print_result(title: str, result: dict):
72 | """Helper function to print results clearly using Rich components."""
73 | console.print(Rule(f"[bold blue]{escape(title)}[/bold blue]"))
74 |
75 | # Handle potential list result format (from older tool versions?)
76 | if isinstance(result, list) and len(result) > 0:
77 | if hasattr(result[0], 'text'):
78 | try:
79 | result = json.loads(result[0].text)
80 | except Exception:
81 | result = {"error": "Failed to parse result from list format"}
82 | else:
83 | result = result[0] # Assume first item is the dict
84 | elif not isinstance(result, dict):
85 | result = {"error": f"Unexpected result format: {type(result)}"}
86 |
87 | if result.get("error"):
88 | error_content = f"[red]Error:[/red] {escape(result['error'])}"
89 | if "partial_results" in result and result["partial_results"]:
90 | try:
91 | partial_json = json.dumps(result["partial_results"], indent=2)
92 | error_content += "\n\n[yellow]Partial Results:[/yellow]"
93 | error_panel_content = Syntax(partial_json, "json", theme="default", line_numbers=False, word_wrap=True)
94 | except Exception as json_err:
95 | error_panel_content = f"[red]Could not display partial results: {escape(str(json_err))}[/red]"
96 | else:
97 | error_panel_content = error_content
98 |
99 | console.print(Panel(
100 | error_panel_content,
101 | title="[bold red]Tool Error[/bold red]",
102 | border_style="red",
103 | expand=False
104 | ))
105 |
106 | else:
107 | # Display synthesis/analysis sections
108 | if "synthesis" in result:
109 | synthesis_data = result["synthesis"]
110 | if isinstance(synthesis_data, dict):
111 |
112 | if "best_response_text" in synthesis_data:
113 | console.print(Panel(
114 | escape(synthesis_data["best_response_text"].strip()),
115 | title="[bold green]Best Response Text[/bold green]",
116 | border_style="green",
117 | expand=False
118 | ))
119 |
120 | if "synthesized_response" in synthesis_data:
121 | console.print(Panel(
122 | escape(synthesis_data["synthesized_response"].strip()),
123 | title="[bold magenta]Synthesized Response[/bold magenta]",
124 | border_style="magenta",
125 | expand=False
126 | ))
127 |
128 | if synthesis_data.get("best_response", {}).get("reasoning"):
129 | console.print(Panel(
130 | escape(synthesis_data["best_response"]["reasoning"].strip()),
131 | title="[bold cyan]Best Response Reasoning[/bold cyan]",
132 | border_style="dim cyan",
133 | expand=False
134 | ))
135 |
136 | if synthesis_data.get("synthesis_strategy"):
137 | console.print(Panel(
138 | escape(synthesis_data["synthesis_strategy"].strip()),
139 | title="[bold yellow]Synthesis Strategy Explanation[/bold yellow]",
140 | border_style="dim yellow",
141 | expand=False
142 | ))
143 |
144 | if "ranking" in synthesis_data:
145 | try:
146 | ranking_json = json.dumps(synthesis_data["ranking"], indent=2)
147 | console.print(Panel(
148 | Syntax(ranking_json, "json", theme="default", line_numbers=False, word_wrap=True),
149 | title="[bold]Ranking[/bold]",
150 | border_style="dim blue",
151 | expand=False
152 | ))
153 | except Exception as json_err:
154 | console.print(f"[red]Could not display ranking: {escape(str(json_err))}[/red]")
155 |
156 | if "comparative_analysis" in synthesis_data:
157 | try:
158 | analysis_json = json.dumps(synthesis_data["comparative_analysis"], indent=2)
159 | console.print(Panel(
160 | Syntax(analysis_json, "json", theme="default", line_numbers=False, word_wrap=True),
161 | title="[bold]Comparative Analysis[/bold]",
162 | border_style="dim blue",
163 | expand=False
164 | ))
165 | except Exception as json_err:
166 | console.print(f"[red]Could not display comparative analysis: {escape(str(json_err))}[/red]")
167 |
168 | else: # Handle case where synthesis data isn't a dict (e.g., raw text error)
169 | console.print(Panel(
170 | f"[yellow]Synthesis Output (raw/unexpected format):[/yellow]\n{escape(str(synthesis_data))}",
171 | title="[bold yellow]Synthesis Data[/bold yellow]",
172 | border_style="yellow",
173 | expand=False
174 | ))
175 |
176 | # Display Stats Table
177 | stats_table = Table(title="[bold]Execution Stats[/bold]", box=box.ROUNDED, show_header=False, expand=False)
178 | stats_table.add_column("Metric", style="cyan", no_wrap=True)
179 | stats_table.add_column("Value", style="white")
180 | stats_table.add_row("Eval/Synth Model", f"{escape(result.get('synthesis_provider','N/A'))}/{escape(result.get('synthesis_model','N/A'))}")
181 | stats_table.add_row("Total Cost", f"${result.get('cost', {}).get('total_cost', 0.0):.6f}")
182 | stats_table.add_row("Processing Time", f"{result.get('processing_time', 0.0):.2f}s")
183 | console.print(stats_table)
184 |
185 | console.print() # Add spacing after each result block
186 |
187 |
188 | async def run_comparison_demo(tracker: CostTracker):
189 | """Demonstrate different modes of compare_and_synthesize."""
190 | if not mcp:
191 | logger.error("MCP server not initialized. Run setup first.", emoji_key="error")
192 | return
193 |
194 | prompt = "Explain the main benefits of using asynchronous programming in Python for a moderately technical audience. Provide 2-3 key advantages."
195 |
196 | # --- Configuration for initial responses ---
197 | console.print(Rule("[bold green]Configurations[/bold green]"))
198 | console.print(f"[cyan]Prompt:[/cyan] {escape(prompt)}")
199 | initial_configs = [
200 | {"provider": Provider.OPENAI.value, "model": "gpt-4.1-mini", "parameters": {"temperature": 0.6, "max_tokens": 150}},
201 | {"provider": Provider.ANTHROPIC.value, "model": "claude-3-5-haiku-20241022", "parameters": {"temperature": 0.5, "max_tokens": 150}},
202 | {"provider": Provider.GEMINI.value, "model": "gemini-2.0-flash-lite", "parameters": {"temperature": 0.7, "max_tokens": 150}},
203 | {"provider": Provider.DEEPSEEK.value, "model": "deepseek-chat", "parameters": {"temperature": 0.6, "max_tokens": 150}},
204 | ]
205 | console.print(f"[cyan]Initial Models:[/cyan] {[f'{c['provider']}:{c['model']}' for c in initial_configs]}")
206 |
207 | # --- Evaluation Criteria ---
208 | criteria = [
209 | "Clarity: Is the explanation clear and easy to understand for the target audience?",
210 | "Accuracy: Are the stated benefits of async programming technically correct?",
211 | "Relevance: Does the response directly address the prompt and focus on key advantages?",
212 | "Conciseness: Is the explanation brief and to the point?",
213 | "Completeness: Does it mention 2-3 distinct and significant benefits?",
214 | ]
215 | console.print("[cyan]Evaluation Criteria:[/cyan]")
216 | for i, criterion in enumerate(criteria):
217 | console.print(f" {i+1}. {escape(criterion)}")
218 |
219 | # --- Criteria Weights (Optional) ---
220 | criteria_weights = {
221 | "Clarity: Is the explanation clear and easy to understand for the target audience?": 0.3,
222 | "Accuracy: Are the stated benefits of async programming technically correct?": 0.3,
223 | "Relevance: Does the response directly address the prompt and focus on key advantages?": 0.15,
224 | "Conciseness: Is the explanation brief and to the point?": 0.1,
225 | "Completeness: Does it mention 2-3 distinct and significant benefits?": 0.15,
226 | }
227 | console.print("[cyan]Criteria Weights (Optional):[/cyan]")
228 | # Create a small table for weights
229 | weights_table = Table(box=box.MINIMAL, show_header=False)
230 | weights_table.add_column("Criterion Snippet", style="dim")
231 | weights_table.add_column("Weight", style="green")
232 | for crit, weight in criteria_weights.items():
233 | weights_table.add_row(escape(crit.split(':')[0]), f"{weight:.2f}")
234 | console.print(weights_table)
235 |
236 | # --- Synthesis/Evaluation Model ---
237 | synthesis_model_config = {"provider": Provider.OPENAI.value, "model": "gpt-4.1"}
238 | console.print(f"[cyan]Synthesis/Evaluation Model:[/cyan] {escape(synthesis_model_config['provider'])}:{escape(synthesis_model_config['model'])}")
239 | console.print() # Spacing before demos start
240 |
241 | common_args = {
242 | "prompt": prompt,
243 | "configs": initial_configs,
244 | "criteria": criteria,
245 | "criteria_weights": criteria_weights,
246 | }
247 |
248 | # --- Demo 1: Select Best Response ---
249 | logger.info("Running format 'best'...", emoji_key="processing")
250 | try:
251 | result = await mcp.call_tool("compare_and_synthesize", {
252 | **common_args,
253 | "response_format": "best",
254 | "include_reasoning": True, # Show why it was selected
255 | "synthesis_model": synthesis_model_config # Explicitly specify model to avoid OpenRouter
256 | })
257 | print_result("Response Format: 'best' (with reasoning)", result)
258 | # Track cost
259 | if isinstance(result, dict) and "cost" in result and "synthesis_provider" in result and "synthesis_model" in result:
260 | try:
261 | trackable = TrackableResult(
262 | cost=result.get("cost", {}).get("total_cost", 0.0),
263 | input_tokens=0, # Tokens not typically aggregated in this tool's output
264 | output_tokens=0,
265 | provider=result.get("synthesis_provider", "unknown"),
266 | model=result.get("synthesis_model", "compare_synthesize"),
267 | processing_time=result.get("processing_time", 0.0)
268 | )
269 | tracker.add_call(trackable)
270 | except Exception as track_err:
271 | logger.warning(f"Could not track cost for 'best' format: {track_err}", exc_info=False)
272 | except Exception as e:
273 | logger.error(f"Error during 'best' format demo: {e}", emoji_key="error", exc_info=True)
274 |
275 | # --- Demo 2: Synthesize Responses (Comprehensive Strategy) ---
276 | logger.info("Running format 'synthesis' (comprehensive)...", emoji_key="processing")
277 | try:
278 | result = await mcp.call_tool("compare_and_synthesize", {
279 | **common_args,
280 | "response_format": "synthesis",
281 | "synthesis_strategy": "comprehensive",
282 | "synthesis_model": synthesis_model_config, # Specify model for consistency
283 | "include_reasoning": True,
284 | })
285 | print_result("Response Format: 'synthesis' (Comprehensive Strategy)", result)
286 | # Track cost
287 | if isinstance(result, dict) and "cost" in result and "synthesis_provider" in result and "synthesis_model" in result:
288 | try:
289 | trackable = TrackableResult(
290 | cost=result.get("cost", {}).get("total_cost", 0.0),
291 | input_tokens=0, # Tokens not typically aggregated
292 | output_tokens=0,
293 | provider=result.get("synthesis_provider", "unknown"),
294 | model=result.get("synthesis_model", "compare_synthesize"),
295 | processing_time=result.get("processing_time", 0.0)
296 | )
297 | tracker.add_call(trackable)
298 | except Exception as track_err:
299 | logger.warning(f"Could not track cost for 'synthesis comprehensive': {track_err}", exc_info=False)
300 | except Exception as e:
301 | logger.error(f"Error during 'synthesis comprehensive' demo: {e}", emoji_key="error", exc_info=True)
302 |
303 | # --- Demo 3: Synthesize Responses (Conservative Strategy, No Reasoning) ---
304 | logger.info("Running format 'synthesis' (conservative, no reasoning)...", emoji_key="processing")
305 | try:
306 | result = await mcp.call_tool("compare_and_synthesize", {
307 | **common_args,
308 | "response_format": "synthesis",
309 | "synthesis_strategy": "conservative",
310 | "synthesis_model": synthesis_model_config, # Explicitly specify
311 | "include_reasoning": False, # Hide the synthesis strategy explanation
312 | })
313 | print_result("Response Format: 'synthesis' (Conservative, No Reasoning)", result)
314 | # Track cost
315 | if isinstance(result, dict) and "cost" in result and "synthesis_provider" in result and "synthesis_model" in result:
316 | try:
317 | trackable = TrackableResult(
318 | cost=result.get("cost", {}).get("total_cost", 0.0),
319 | input_tokens=0, # Tokens not typically aggregated
320 | output_tokens=0,
321 | provider=result.get("synthesis_provider", "unknown"),
322 | model=result.get("synthesis_model", "compare_synthesize"),
323 | processing_time=result.get("processing_time", 0.0)
324 | )
325 | tracker.add_call(trackable)
326 | except Exception as track_err:
327 | logger.warning(f"Could not track cost for 'synthesis conservative': {track_err}", exc_info=False)
328 | except Exception as e:
329 | logger.error(f"Error during 'synthesis conservative' demo: {e}", emoji_key="error", exc_info=True)
330 |
331 | # --- Demo 4: Rank Responses ---
332 | logger.info("Running format 'ranked'...", emoji_key="processing")
333 | try:
334 | result = await mcp.call_tool("compare_and_synthesize", {
335 | **common_args,
336 | "response_format": "ranked",
337 | "include_reasoning": True, # Show reasoning for ranks
338 | "synthesis_model": synthesis_model_config, # Explicitly specify
339 | })
340 | print_result("Response Format: 'ranked' (with reasoning)", result)
341 | # Track cost
342 | if isinstance(result, dict) and "cost" in result and "synthesis_provider" in result and "synthesis_model" in result:
343 | try:
344 | trackable = TrackableResult(
345 | cost=result.get("cost", {}).get("total_cost", 0.0),
346 | input_tokens=0, # Tokens not typically aggregated
347 | output_tokens=0,
348 | provider=result.get("synthesis_provider", "unknown"),
349 | model=result.get("synthesis_model", "compare_synthesize"),
350 | processing_time=result.get("processing_time", 0.0)
351 | )
352 | tracker.add_call(trackable)
353 | except Exception as track_err:
354 | logger.warning(f"Could not track cost for 'ranked' format: {track_err}", exc_info=False)
355 | except Exception as e:
356 | logger.error(f"Error during 'ranked' format demo: {e}", emoji_key="error", exc_info=True)
357 |
358 | # --- Demo 5: Analyze Responses ---
359 | logger.info("Running format 'analysis'...", emoji_key="processing")
360 | try:
361 | result = await mcp.call_tool("compare_and_synthesize", {
362 | **common_args,
363 | "response_format": "analysis",
364 | # No reasoning needed for analysis format, it's inherent
365 | "synthesis_model": synthesis_model_config, # Explicitly specify
366 | })
367 | print_result("Response Format: 'analysis'", result)
368 | # Track cost
369 | if isinstance(result, dict) and "cost" in result and "synthesis_provider" in result and "synthesis_model" in result:
370 | try:
371 | trackable = TrackableResult(
372 | cost=result.get("cost", {}).get("total_cost", 0.0),
373 | input_tokens=0, # Tokens not typically aggregated
374 | output_tokens=0,
375 | provider=result.get("synthesis_provider", "unknown"),
376 | model=result.get("synthesis_model", "compare_synthesize"),
377 | processing_time=result.get("processing_time", 0.0)
378 | )
379 | tracker.add_call(trackable)
380 | except Exception as track_err:
381 | logger.warning(f"Could not track cost for 'analysis' format: {track_err}", exc_info=False)
382 | except Exception as e:
383 | logger.error(f"Error during 'analysis' format demo: {e}", emoji_key="error", exc_info=True)
384 |
385 | # Display cost summary at the end
386 | tracker.display_summary(console)
387 |
388 |
389 | async def main():
390 | """Run the enhanced compare_and_synthesize demo."""
391 | tracker = CostTracker() # Instantiate tracker
392 | await setup_gateway_and_tools()
393 | await run_comparison_demo(tracker) # Pass tracker
394 | # logger.info("Skipping run_comparison_demo() as the 'compare_and_synthesize' tool function is missing.") # Remove skip message
395 |
396 | if __name__ == "__main__":
397 | try:
398 | asyncio.run(main())
399 | except KeyboardInterrupt:
400 | logger.info("Demo stopped by user.")
401 | except Exception as main_err:
402 | logger.critical(f"Demo failed with unexpected error: {main_err}", emoji_key="critical", exc_info=True)
```
--------------------------------------------------------------------------------
/ultimate_mcp_server/tools/tournament.py:
--------------------------------------------------------------------------------
```python
1 | """Tournament tools for Ultimate MCP Server."""
2 | from typing import Any, Dict, List, Optional
3 |
4 | from ultimate_mcp_server.exceptions import ToolError
5 |
6 | from ultimate_mcp_server.core.models.tournament import (
7 | CancelTournamentInput,
8 | CancelTournamentOutput,
9 | CreateTournamentInput,
10 | CreateTournamentOutput,
11 | GetTournamentResultsInput,
12 | GetTournamentStatusInput,
13 | GetTournamentStatusOutput,
14 | TournamentBasicInfo,
15 | TournamentData,
16 | TournamentStatus,
17 | )
18 | from ultimate_mcp_server.core.models.tournament import (
19 | EvaluatorConfig as InputEvaluatorConfig,
20 | )
21 | from ultimate_mcp_server.core.models.tournament import (
22 | ModelConfig as InputModelConfig,
23 | )
24 | from ultimate_mcp_server.core.tournaments.manager import tournament_manager
25 | from ultimate_mcp_server.tools.base import with_error_handling, with_tool_metrics
26 | from ultimate_mcp_server.utils import get_logger
27 |
28 | logger = get_logger("ultimate_mcp_server.tools.tournament")
29 |
30 | # --- Standalone Tool Functions ---
31 |
32 | @with_tool_metrics
33 | @with_error_handling
34 | async def create_tournament(
35 | name: str,
36 | prompt: str,
37 | models: List[Dict[str, Any]],
38 | rounds: int = 3,
39 | tournament_type: str = "code",
40 | extraction_model_id: Optional[str] = "anthropic/claude-3-5-haiku-20241022",
41 | evaluators: Optional[List[Dict[str, Any]]] = None,
42 | max_retries_per_model_call: int = 3,
43 | retry_backoff_base_seconds: float = 1.0,
44 | max_concurrent_model_calls: int = 5
45 | ) -> Dict[str, Any]:
46 | """
47 | Creates and starts a new LLM competition (tournament) based on a prompt and model configurations.
48 |
49 | Args:
50 | name: Human-readable name for the tournament (e.g., "Essay Refinement Contest", "Python Sorting Challenge").
51 | prompt: The task prompt provided to all participating LLM models.
52 | models: List of model configurations (external key is "models"). Each config is a dictionary specifying:
53 | - model_id (str, required): e.g., 'openai/gpt-4o'.
54 | - diversity_count (int, optional, default 1): Number of variants per model.
55 | # ... (rest of ModelConfig fields) ...
56 | rounds: Number of tournament rounds. Each round allows models to refine their previous output (if applicable to the tournament type). Default is 3.
57 | tournament_type: The type of tournament defining the task and evaluation method. Supported types include:
58 | - "code": For evaluating code generation based on correctness and potentially style/efficiency.
59 | - "text": For general text generation, improvement, or refinement tasks.
60 | Default is "code".
61 | extraction_model_id: (Optional, primarily for 'code' type) Specific LLM model to use for extracting and evaluating results like code blocks. If None, a default is used.
62 | evaluators: (Optional) List of evaluator configurations as dicts.
63 | max_retries_per_model_call: Maximum retries per model call.
64 | retry_backoff_base_seconds: Base seconds for retry backoff.
65 | max_concurrent_model_calls: Maximum concurrent model calls.
66 |
67 | Returns:
68 | Dictionary with tournament creation status containing:
69 | - tournament_id: Unique identifier for the created tournament.
70 | - status: Initial tournament status (usually 'PENDING' or 'RUNNING').
71 | - storage_path: Filesystem path where tournament data will be stored.
72 |
73 | Example:
74 | {
75 | "tournament_id": "tour_abc123xyz789",
76 | "status": "PENDING",
77 | "storage_path": "/path/to/storage/tour_abc123xyz789"
78 | }
79 |
80 | Raises:
81 | ToolError: If input is invalid, tournament creation fails, or scheduling fails.
82 | """
83 | logger.info(f"Tool 'create_tournament' invoked for: {name}")
84 | try:
85 | parsed_model_configs = [InputModelConfig(**mc) for mc in models]
86 | parsed_evaluators = [InputEvaluatorConfig(**ev) for ev in (evaluators or [])]
87 | input_data = CreateTournamentInput(
88 | name=name,
89 | prompt=prompt,
90 | models=parsed_model_configs,
91 | rounds=rounds,
92 | tournament_type=tournament_type,
93 | extraction_model_id=extraction_model_id,
94 | evaluators=parsed_evaluators,
95 | max_retries_per_model_call=max_retries_per_model_call,
96 | retry_backoff_base_seconds=retry_backoff_base_seconds,
97 | max_concurrent_model_calls=max_concurrent_model_calls
98 | )
99 |
100 | tournament = tournament_manager.create_tournament(input_data)
101 | if not tournament:
102 | raise ToolError("Failed to create tournament entry.")
103 |
104 | logger.info("Calling start_tournament_execution (using asyncio)")
105 | success = tournament_manager.start_tournament_execution(
106 | tournament_id=tournament.tournament_id
107 | )
108 |
109 | if not success:
110 | logger.error(f"Failed to schedule background execution for tournament {tournament.tournament_id}")
111 | updated_tournament = tournament_manager.get_tournament(tournament.tournament_id)
112 | error_msg = updated_tournament.error_message if updated_tournament else "Failed to schedule execution."
113 | raise ToolError(f"Failed to start tournament execution: {error_msg}")
114 |
115 | logger.info(f"Tournament {tournament.tournament_id} ({tournament.name}) created and background execution started.")
116 | # Include storage_path in the return value
117 | output = CreateTournamentOutput(
118 | tournament_id=tournament.tournament_id,
119 | status=tournament.status,
120 | storage_path=tournament.storage_path,
121 | message=f"Tournament '{tournament.name}' created successfully and execution started."
122 | )
123 | return output.dict()
124 |
125 | except ValueError as ve:
126 | logger.warning(f"Validation error creating tournament: {ve}")
127 | raise ToolError(f"Invalid input: {ve}") from ve
128 | except Exception as e:
129 | logger.error(f"Error creating tournament: {e}", exc_info=True)
130 | raise ToolError(f"An unexpected error occurred: {e}") from e
131 |
132 | @with_tool_metrics
133 | @with_error_handling
134 | async def get_tournament_status(
135 | tournament_id: str
136 | ) -> Dict[str, Any]:
137 | """Retrieves the current status and progress of a specific tournament.
138 |
139 | Use this tool to monitor an ongoing tournament (PENDING, RUNNING) or check the final
140 | state (COMPLETED, FAILED, CANCELLED) of a past tournament.
141 |
142 | Args:
143 | tournament_id: Unique identifier of the tournament to check.
144 |
145 | Returns:
146 | Dictionary containing tournament status information:
147 | - tournament_id: Unique identifier for the tournament.
148 | - name: Human-readable name of the tournament.
149 | - tournament_type: Type of tournament (e.g., "code", "text").
150 | - status: Current status (e.g., "PENDING", "RUNNING", "COMPLETED", "FAILED", "CANCELLED").
151 | - current_round: Current round number (1-based) if RUNNING, else the last active round.
152 | - total_rounds: Total number of rounds configured for this tournament.
153 | - created_at: ISO timestamp when the tournament was created.
154 | - updated_at: ISO timestamp when the tournament status was last updated.
155 | - error_message: Error message if the tournament FAILED (null otherwise).
156 |
157 | Error Handling:
158 | - Raises ToolError (400) if tournament_id format is invalid.
159 | - Raises ToolError (404) if the tournament ID is not found.
160 | - Raises ToolError (500) for internal server errors.
161 |
162 | Example:
163 | {
164 | "tournament_id": "tour_abc123xyz789",
165 | "name": "Essay Refinement Contest",
166 | "tournament_type": "text",
167 | "status": "RUNNING",
168 | "current_round": 2,
169 | "total_rounds": 3,
170 | "created_at": "2023-04-15T14:32:17.123456",
171 | "updated_at": "2023-04-15T14:45:22.123456",
172 | "error_message": null
173 | }
174 | """
175 | logger.debug(f"Getting status for tournament: {tournament_id}")
176 | try:
177 | if not tournament_id or not isinstance(tournament_id, str):
178 | raise ToolError(
179 | status_code=400,
180 | detail="Invalid tournament ID format. Tournament ID must be a non-empty string."
181 | )
182 |
183 | try:
184 | input_data = GetTournamentStatusInput(tournament_id=tournament_id)
185 | except ValueError as ve:
186 | raise ToolError(
187 | status_code=400,
188 | detail=f"Invalid tournament ID: {str(ve)}"
189 | ) from ve
190 |
191 | tournament = tournament_manager.get_tournament(input_data.tournament_id, force_reload=True)
192 | if not tournament:
193 | raise ToolError(
194 | status_code=404,
195 | detail=f"Tournament not found: {tournament_id}. Check if the tournament ID is correct or use list_tournaments to see all available tournaments."
196 | )
197 |
198 | try:
199 | output = GetTournamentStatusOutput(
200 | tournament_id=tournament.tournament_id,
201 | name=tournament.name,
202 | tournament_type=tournament.config.tournament_type,
203 | status=tournament.status,
204 | current_round=tournament.current_round,
205 | total_rounds=tournament.config.rounds,
206 | created_at=tournament.created_at,
207 | updated_at=tournament.updated_at,
208 | error_message=tournament.error_message
209 | )
210 | return output.dict()
211 | except Exception as e:
212 | logger.error(f"Error converting tournament data to output format: {e}", exc_info=True)
213 | raise ToolError(
214 | status_code=500,
215 | detail=f"Error processing tournament data: {str(e)}. The tournament data may be corrupted."
216 | ) from e
217 | except ToolError:
218 | raise
219 | except Exception as e:
220 | logger.error(f"Error getting tournament status for {tournament_id}: {e}", exc_info=True)
221 | raise ToolError(
222 | status_code=500,
223 | detail=f"Internal server error retrieving tournament status: {str(e)}. Please try again or check the server logs."
224 | ) from e
225 |
226 | @with_tool_metrics
227 | @with_error_handling
228 | async def list_tournaments(
229 | ) -> List[Dict[str, Any]]:
230 | """Lists all created tournaments with basic identifying information and status.
231 |
232 | Useful for discovering existing tournaments and their current states without fetching full results.
233 |
234 | Returns:
235 | List of dictionaries, each containing basic tournament info:
236 | - tournament_id: Unique identifier for the tournament.
237 | - name: Human-readable name of the tournament.
238 | - tournament_type: Type of tournament (e.g., "code", "text").
239 | - status: Current status (e.g., "PENDING", "RUNNING", "COMPLETED", "FAILED", "CANCELLED").
240 | - created_at: ISO timestamp when the tournament was created.
241 | - updated_at: ISO timestamp when the tournament was last updated.
242 |
243 | Example:
244 | [
245 | {
246 | "tournament_id": "tour_abc123",
247 | "name": "Tournament A",
248 | "tournament_type": "code",
249 | "status": "COMPLETED",
250 | "created_at": "2023-04-10T10:00:00",
251 | "updated_at": "2023-04-10T12:30:00"
252 | },
253 | ...
254 | ]
255 | """
256 | logger.debug("Listing all tournaments")
257 | try:
258 | tournaments = tournament_manager.list_tournaments()
259 | output_list = []
260 | for tournament in tournaments:
261 | try:
262 | # Ensure tournament object has necessary attributes before accessing
263 | if not hasattr(tournament, 'tournament_id') or \
264 | not hasattr(tournament, 'name') or \
265 | not hasattr(tournament, 'config') or \
266 | not hasattr(tournament.config, 'tournament_type') or \
267 | not hasattr(tournament, 'status') or \
268 | not hasattr(tournament, 'created_at') or \
269 | not hasattr(tournament, 'updated_at'):
270 | logger.warning(f"Skipping tournament due to missing attributes: {getattr(tournament, 'tournament_id', 'UNKNOWN ID')}")
271 | continue
272 |
273 | basic_info = TournamentBasicInfo(
274 | tournament_id=tournament.tournament_id,
275 | name=tournament.name,
276 | tournament_type=tournament.config.tournament_type,
277 | status=tournament.status,
278 | created_at=tournament.created_at,
279 | updated_at=tournament.updated_at,
280 | )
281 | output_list.append(basic_info.dict())
282 | except Exception as e:
283 | logger.warning(f"Skipping tournament {getattr(tournament, 'tournament_id', 'UNKNOWN')} due to data error during processing: {e}")
284 | return output_list
285 | except Exception as e:
286 | logger.error(f"Error listing tournaments: {e}", exc_info=True)
287 | raise ToolError(
288 | status_code=500,
289 | detail=f"Internal server error listing tournaments: {str(e)}"
290 | ) from e
291 |
292 | @with_tool_metrics
293 | @with_error_handling
294 | async def get_tournament_results(
295 | tournament_id: str
296 | ) -> List[Dict[str, str]]:
297 | """Retrieves the complete results and configuration for a specific tournament.
298 |
299 | Provides comprehensive details including configuration, final scores (if applicable),
300 | detailed round-by-round results, model outputs, and any errors encountered.
301 | Use this *after* a tournament has finished (COMPLETED or FAILED) for full analysis.
302 |
303 | Args:
304 | tournament_id: Unique identifier for the tournament.
305 |
306 | Returns:
307 | Dictionary containing the full tournament data (structure depends on the tournament manager's implementation, but generally includes config, status, results, timestamps, etc.).
308 |
309 | Example (Conceptual - actual structure may vary):
310 | {
311 | "tournament_id": "tour_abc123",
312 | "name": "Sorting Algo Test",
313 | "status": "COMPLETED",
314 | "config": { ... },
315 | "results": { "scores": { ... }, "round_results": [ { ... }, ... ] },
316 | "created_at": "...",
317 | "updated_at": "...",
318 | "error_message": null
319 | }
320 |
321 | Raises:
322 | ToolError: If the tournament ID is invalid, not found, results are not ready (still PENDING/RUNNING), or an internal error occurs.
323 | """
324 | logger.debug(f"Getting results for tournament: {tournament_id}")
325 | try:
326 | if not tournament_id or not isinstance(tournament_id, str):
327 | raise ToolError(
328 | status_code=400,
329 | detail="Invalid tournament ID format. Tournament ID must be a non-empty string."
330 | )
331 |
332 | try:
333 | input_data = GetTournamentResultsInput(tournament_id=tournament_id)
334 | except ValueError as ve:
335 | raise ToolError(
336 | status_code=400,
337 | detail=f"Invalid tournament ID: {str(ve)}"
338 | ) from ve
339 |
340 | # Make sure to request TournamentData which should contain results
341 | tournament_data: Optional[TournamentData] = tournament_manager.get_tournament(input_data.tournament_id, force_reload=True)
342 |
343 | if not tournament_data:
344 | # Check if the tournament exists but just has no results yet (e.g., PENDING)
345 | tournament_status_info = tournament_manager.get_tournament(tournament_id) # Gets basic info
346 | if tournament_status_info:
347 | current_status = tournament_status_info.status
348 | if current_status in [TournamentStatus.PENDING, TournamentStatus.RUNNING]:
349 | raise ToolError(
350 | status_code=404, # Use 404 to indicate results not ready
351 | detail=f"Tournament '{tournament_id}' is currently {current_status}. Results are not yet available."
352 | )
353 | else: # Should have results if COMPLETED or ERROR, maybe data issue?
354 | logger.error(f"Tournament {tournament_id} status is {current_status} but get_tournament_results returned None.")
355 | raise ToolError(
356 | status_code=500,
357 | detail=f"Could not retrieve results for tournament '{tournament_id}' despite status being {current_status}. There might be an internal data issue."
358 | )
359 | else:
360 | raise ToolError(
361 | status_code=404,
362 | detail=f"Tournament not found: {tournament_id}. Cannot retrieve results."
363 | )
364 |
365 | # NEW: Return a structure that FastMCP might recognize as a pre-formatted content list
366 | json_string = tournament_data.json()
367 | logger.info(f"[DEBUG_GET_RESULTS] Returning pre-formatted TextContent list. JSON Snippet: {json_string[:150]}")
368 | return [{ "type": "text", "text": json_string }]
369 |
370 | except ToolError:
371 | raise
372 | except Exception as e:
373 | logger.error(f"Error getting tournament results for {tournament_id}: {e}", exc_info=True)
374 | raise ToolError(
375 | f"Internal server error retrieving tournament results: {str(e)}",
376 | 500 # status_code
377 | ) from e
378 |
379 | @with_tool_metrics
380 | @with_error_handling
381 | async def cancel_tournament(
382 | tournament_id: str
383 | ) -> Dict[str, Any]:
384 | """Attempts to cancel a running (RUNNING) or pending (PENDING) tournament.
385 |
386 | Signals the tournament manager to stop processing. Cancellation is not guaranteed
387 | to be immediate. Check status afterwards using `get_tournament_status`.
388 | Cannot cancel tournaments that are already COMPLETED, FAILED, or CANCELLED.
389 |
390 | Args:
391 | tournament_id: Unique identifier for the tournament to cancel.
392 |
393 | Returns:
394 | Dictionary confirming the cancellation attempt:
395 | - tournament_id: The ID of the tournament targeted for cancellation.
396 | - status: The status *after* the cancellation attempt (e.g., "CANCELLED", or the previous state like "COMPLETED" if cancellation was not possible).
397 | - message: A message indicating the outcome (e.g., "Tournament cancellation requested successfully.", "Cancellation failed: Tournament is already COMPLETED.").
398 |
399 | Raises:
400 | ToolError: If the tournament ID is invalid, not found, or an internal error occurs.
401 | """
402 | logger.info(f"Received request to cancel tournament: {tournament_id}")
403 | try:
404 | if not tournament_id or not isinstance(tournament_id, str):
405 | raise ToolError(status_code=400, detail="Invalid tournament ID format.")
406 |
407 | try:
408 | input_data = CancelTournamentInput(tournament_id=tournament_id)
409 | except ValueError as ve:
410 | raise ToolError(status_code=400, detail=f"Invalid tournament ID: {str(ve)}") from ve
411 |
412 | # Call the manager's cancel function
413 | success, message, final_status = await tournament_manager.cancel_tournament(input_data.tournament_id)
414 |
415 | # Prepare output using the Pydantic model
416 | output = CancelTournamentOutput(
417 | tournament_id=tournament_id,
418 | status=final_status, # Return the actual status after attempt
419 | message=message
420 | )
421 |
422 | if not success:
423 | # Log the failure but return the status/message from the manager
424 | logger.warning(f"Cancellation attempt for tournament {tournament_id} reported failure: {message}")
425 | # Raise ToolError if the status implies a client error (e.g., not found)
426 | if "not found" in message.lower():
427 | raise ToolError(status_code=404, detail=message)
428 | elif final_status in [TournamentStatus.COMPLETED, TournamentStatus.FAILED, TournamentStatus.CANCELLED] and "already" in message.lower():
429 | raise ToolError(status_code=409, detail=message)
430 | # Optionally handle other errors as 500
431 | # else:
432 | # raise ToolError(status_code=500, detail=f"Cancellation failed: {message}")
433 | else:
434 | logger.info(f"Cancellation attempt for tournament {tournament_id} successful. Final status: {final_status}")
435 |
436 | return output.dict()
437 |
438 | except ToolError:
439 | raise
440 | except Exception as e:
441 | logger.error(f"Error cancelling tournament {tournament_id}: {e}", exc_info=True)
442 | raise ToolError(status_code=500, detail=f"Internal server error during cancellation: {str(e)}") from e
```
--------------------------------------------------------------------------------
/ultimate_mcp_server/core/providers/openai.py:
--------------------------------------------------------------------------------
```python
1 | """OpenAI provider implementation."""
2 | import time
3 | from typing import Any, AsyncGenerator, Dict, List, Optional, Tuple
4 |
5 | from openai import AsyncOpenAI
6 |
7 | from ultimate_mcp_server.constants import Provider
8 | from ultimate_mcp_server.core.providers.base import BaseProvider, ModelResponse
9 | from ultimate_mcp_server.utils import get_logger
10 |
11 | # Use the same naming scheme everywhere: logger at module level
12 | logger = get_logger("ultimate_mcp_server.providers.openai")
13 |
14 |
15 | class OpenAIProvider(BaseProvider):
16 | """Provider implementation for OpenAI API."""
17 |
18 | provider_name = Provider.OPENAI.value
19 |
20 | def __init__(self, api_key: Optional[str] = None, **kwargs):
21 | """Initialize the OpenAI provider.
22 |
23 | Args:
24 | api_key: OpenAI API key
25 | **kwargs: Additional options
26 | """
27 | super().__init__(api_key=api_key, **kwargs)
28 | self.base_url = kwargs.get("base_url")
29 | self.organization = kwargs.get("organization")
30 | self.models_cache = None
31 |
32 | async def initialize(self) -> bool:
33 | """Initialize the OpenAI client.
34 |
35 | Returns:
36 | bool: True if initialization was successful
37 | """
38 | try:
39 | self.client = AsyncOpenAI(
40 | api_key=self.api_key,
41 | base_url=self.base_url,
42 | organization=self.organization,
43 | )
44 |
45 | # Skip API call if using a mock key (for tests)
46 | if self.api_key and "mock-" in self.api_key:
47 | self.logger.info(
48 | "Using mock OpenAI key - skipping API validation",
49 | emoji_key="mock"
50 | )
51 | return True
52 |
53 | # Test connection by listing models
54 | await self.list_models()
55 |
56 | self.logger.success(
57 | "OpenAI provider initialized successfully",
58 | emoji_key="provider"
59 | )
60 | return True
61 |
62 | except Exception as e:
63 | self.logger.error(
64 | f"Failed to initialize OpenAI provider: {str(e)}",
65 | emoji_key="error"
66 | )
67 | return False
68 |
69 | async def generate_completion(
70 | self,
71 | prompt: Optional[str] = None,
72 | model: Optional[str] = None,
73 | max_tokens: Optional[int] = None,
74 | temperature: float = 0.7,
75 | **kwargs
76 | ) -> ModelResponse:
77 | """Generate a completion using OpenAI.
78 |
79 | Args:
80 | prompt: Text prompt to send to the model
81 | model: Model name to use (e.g., "gpt-4o")
82 | max_tokens: Maximum tokens to generate
83 | temperature: Temperature parameter (0.0-1.0)
84 | **kwargs: Additional model-specific parameters
85 |
86 | Returns:
87 | ModelResponse with completion result
88 |
89 | Raises:
90 | Exception: If API call fails
91 | """
92 | if not self.client:
93 | await self.initialize()
94 |
95 | # Use default model if not specified
96 | model = model or self.get_default_model()
97 |
98 | # Strip provider prefix if present (e.g., "openai/gpt-4o" -> "gpt-4o")
99 | if model.startswith(f"{self.provider_name}/"):
100 | original_model = model
101 | model = model.split("/", 1)[1]
102 | self.logger.debug(f"Stripped provider prefix from model name: {original_model} -> {model}")
103 |
104 | # Handle case when messages are provided instead of prompt (for chat_completion)
105 | messages = kwargs.pop("messages", None)
106 |
107 | # If neither prompt nor messages are provided, raise an error
108 | if prompt is None and not messages:
109 | raise ValueError("Either 'prompt' or 'messages' must be provided")
110 |
111 | # Create messages if not already provided
112 | if not messages:
113 | messages = [{"role": "user", "content": prompt}]
114 |
115 | # Prepare API call parameters
116 | params = {
117 | "model": model,
118 | "messages": messages,
119 | "temperature": temperature,
120 | }
121 |
122 | # Add max_tokens if specified
123 | if max_tokens is not None:
124 | params["max_tokens"] = max_tokens
125 |
126 | # Check for json_mode flag and remove it from kwargs
127 | json_mode = kwargs.pop("json_mode", False)
128 | if json_mode:
129 | # Use the correct response_format for JSON mode
130 | params["response_format"] = {"type": "json_object"}
131 | self.logger.debug("Setting response_format to JSON mode for OpenAI")
132 |
133 | # Handle any legacy response_format passed directly, but prefer json_mode
134 | if "response_format" in kwargs and not json_mode:
135 | # Support both direct format object and type-only specification
136 | response_format = kwargs.pop("response_format")
137 | if isinstance(response_format, dict):
138 | params["response_format"] = response_format
139 | elif isinstance(response_format, str) and response_format in ["json_object", "text"]:
140 | params["response_format"] = {"type": response_format}
141 | self.logger.debug(f"Setting response_format from direct param: {params.get('response_format')}")
142 |
143 | # Add any remaining additional parameters
144 | params.update(kwargs)
145 |
146 | # --- Special handling for specific model parameter constraints ---
147 | if model == 'o3-mini':
148 | if 'temperature' in params:
149 | self.logger.debug(f"Removing unsupported 'temperature' parameter for model {model}")
150 | del params['temperature']
151 | elif model == 'o1-preview':
152 | current_temp = params.get('temperature')
153 | # Only allow temperature if it's explicitly set to 1.0, otherwise remove it to use API default.
154 | if current_temp is not None and current_temp != 1.0:
155 | self.logger.debug(f"Removing non-default 'temperature' ({current_temp}) for model {model}")
156 | del params['temperature']
157 | # --- End special handling ---
158 |
159 | # Log request
160 | prompt_length = len(prompt) if prompt else sum(len(m.get("content", "")) for m in messages)
161 | self.logger.info(
162 | f"Generating completion with OpenAI model {model}",
163 | emoji_key=self.provider_name,
164 | prompt_length=prompt_length,
165 | json_mode=json_mode # Log if json_mode was requested
166 | )
167 |
168 | try:
169 | # API call with timing
170 | response, processing_time = await self.process_with_timer(
171 | self.client.chat.completions.create, **params
172 | )
173 |
174 | # Extract response text
175 | completion_text = response.choices[0].message.content
176 |
177 | # Create message object for chat_completion
178 | message = {
179 | "role": "assistant",
180 | "content": completion_text
181 | }
182 |
183 | # Create standardized response
184 | result = ModelResponse(
185 | text=completion_text,
186 | model=model,
187 | provider=self.provider_name,
188 | input_tokens=response.usage.prompt_tokens,
189 | output_tokens=response.usage.completion_tokens,
190 | total_tokens=response.usage.total_tokens,
191 | processing_time=processing_time,
192 | raw_response=response,
193 | )
194 |
195 | # Add message to result for chat_completion
196 | result.message = message
197 |
198 | # Log success
199 | self.logger.success(
200 | "OpenAI completion successful",
201 | emoji_key="success",
202 | model=model,
203 | tokens={
204 | "input": result.input_tokens,
205 | "output": result.output_tokens
206 | },
207 | cost=result.cost,
208 | time=result.processing_time
209 | )
210 |
211 | return result
212 |
213 | except Exception as e:
214 | self.logger.error(
215 | f"OpenAI completion failed: {str(e)}",
216 | emoji_key="error",
217 | model=model
218 | )
219 | raise
220 |
221 | async def generate_completion_stream(
222 | self,
223 | prompt: Optional[str] = None,
224 | model: Optional[str] = None,
225 | max_tokens: Optional[int] = None,
226 | temperature: float = 0.7,
227 | **kwargs
228 | ) -> AsyncGenerator[Tuple[str, Dict[str, Any]], None]:
229 | """Generate a streaming completion using OpenAI.
230 |
231 | Args:
232 | prompt: Text prompt to send to the model
233 | model: Model name to use (e.g., "gpt-4o")
234 | max_tokens: Maximum tokens to generate
235 | temperature: Temperature parameter (0.0-1.0)
236 | **kwargs: Additional model-specific parameters
237 |
238 | Yields:
239 | Tuple of (text_chunk, metadata)
240 |
241 | Raises:
242 | Exception: If API call fails
243 | """
244 | if not self.client:
245 | await self.initialize()
246 |
247 | # Use default model if not specified
248 | model = model or self.get_default_model()
249 |
250 | # Strip provider prefix if present (e.g., "openai/gpt-4o" -> "gpt-4o")
251 | if model.startswith(f"{self.provider_name}/"):
252 | original_model = model
253 | model = model.split("/", 1)[1]
254 | self.logger.debug(f"Stripped provider prefix from model name (stream): {original_model} -> {model}")
255 |
256 | # Handle case when messages are provided instead of prompt (for chat_completion)
257 | messages = kwargs.pop("messages", None)
258 |
259 | # If neither prompt nor messages are provided, raise an error
260 | if prompt is None and not messages:
261 | raise ValueError("Either 'prompt' or 'messages' must be provided")
262 |
263 | # Create messages if not already provided
264 | if not messages:
265 | messages = [{"role": "user", "content": prompt}]
266 |
267 | # Prepare API call parameters
268 | params = {
269 | "model": model,
270 | "messages": messages,
271 | "temperature": temperature,
272 | "stream": True,
273 | }
274 |
275 | # Add max_tokens if specified
276 | if max_tokens is not None:
277 | params["max_tokens"] = max_tokens
278 |
279 | # Check for json_mode flag and remove it from kwargs
280 | json_mode = kwargs.pop("json_mode", False)
281 | if json_mode:
282 | # Use the correct response_format for JSON mode
283 | params["response_format"] = {"type": "json_object"}
284 | self.logger.debug("Setting response_format to JSON mode for OpenAI streaming")
285 |
286 | # Add any remaining additional parameters
287 | params.update(kwargs)
288 |
289 | # Log request
290 | prompt_length = len(prompt) if prompt else sum(len(m.get("content", "")) for m in messages)
291 | self.logger.info(
292 | f"Generating streaming completion with OpenAI model {model}",
293 | emoji_key=self.provider_name,
294 | prompt_length=prompt_length,
295 | json_mode=json_mode # Log if json_mode was requested
296 | )
297 |
298 | start_time = time.time()
299 | total_chunks = 0
300 |
301 | try:
302 | # Make streaming API call
303 | stream = await self.client.chat.completions.create(**params)
304 |
305 | # Process the stream
306 | async for chunk in stream:
307 | total_chunks += 1
308 |
309 | # Extract content from the chunk
310 | delta = chunk.choices[0].delta
311 | content = delta.content or ""
312 |
313 | # Metadata for this chunk
314 | metadata = {
315 | "model": model,
316 | "provider": self.provider_name,
317 | "chunk_index": total_chunks,
318 | "finish_reason": chunk.choices[0].finish_reason,
319 | }
320 |
321 | yield content, metadata
322 |
323 | # Log success
324 | processing_time = time.time() - start_time
325 | self.logger.success(
326 | "OpenAI streaming completion successful",
327 | emoji_key="success",
328 | model=model,
329 | chunks=total_chunks,
330 | time=processing_time
331 | )
332 |
333 | except Exception as e:
334 | self.logger.error(
335 | f"OpenAI streaming completion failed: {str(e)}",
336 | emoji_key="error",
337 | model=model
338 | )
339 | raise
340 |
341 | async def list_models(self) -> List[Dict[str, Any]]:
342 | """
343 | List available OpenAI models with their capabilities and metadata.
344 |
345 | This method queries the OpenAI API to retrieve a comprehensive list of available
346 | models accessible to the current API key. It filters the results to focus on
347 | GPT models that are relevant to text generation tasks, excluding embeddings,
348 | moderation, and other specialized models.
349 |
350 | For efficiency, the method uses a caching mechanism that stores the model list
351 | after the first successful API call. Subsequent calls return the cached results
352 | without making additional API requests. This reduces latency and API usage while
353 | ensuring the available models information is readily accessible.
354 |
355 | If the API call fails (due to network issues, invalid credentials, etc.), the
356 | method falls back to returning a hardcoded list of common OpenAI models to ensure
357 | the application can continue functioning with reasonable defaults.
358 |
359 | Returns:
360 | A list of dictionaries containing model information with these fields:
361 | - id: The model identifier used when making API calls (e.g., "gpt-4o")
362 | - provider: Always "openai" for this provider
363 | - created: Timestamp of when the model was created (if available from API)
364 | - owned_by: Organization that owns the model (e.g., "openai", "system")
365 |
366 | The fallback model list (used on API errors) includes basic information
367 | for gpt-4o, gpt-4.1-mini, and other commonly used models.
368 |
369 | Example response:
370 | ```python
371 | [
372 | {
373 | "id": "gpt-4o",
374 | "provider": "openai",
375 | "created": 1693399330,
376 | "owned_by": "openai"
377 | },
378 | {
379 | "id": "gpt-4.1-mini",
380 | "provider": "openai",
381 | "created": 1705006269,
382 | "owned_by": "openai"
383 | }
384 | ]
385 | ```
386 |
387 | Note:
388 | The specific models returned depend on the API key's permissions and
389 | the models currently offered by OpenAI. As new models are released
390 | or existing ones deprecated, the list will change accordingly.
391 | """
392 | if self.models_cache:
393 | return self.models_cache
394 |
395 | try:
396 | if not self.client:
397 | await self.initialize()
398 |
399 | # Fetch models from API
400 | response = await self.client.models.list()
401 |
402 | # Process response
403 | models = []
404 | for model in response.data:
405 | # Filter to relevant models (chat-capable GPT models)
406 | if model.id.startswith("gpt-"):
407 | models.append({
408 | "id": model.id,
409 | "provider": self.provider_name,
410 | "created": model.created,
411 | "owned_by": model.owned_by,
412 | })
413 |
414 | # Cache results
415 | self.models_cache = models
416 |
417 | return models
418 |
419 | except Exception as e:
420 | self.logger.error(
421 | f"Failed to list OpenAI models: {str(e)}",
422 | emoji_key="error"
423 | )
424 |
425 | # Return basic models on error
426 | return [
427 | {
428 | "id": "gpt-4o",
429 | "provider": self.provider_name,
430 | "description": "Most capable GPT-4 model",
431 | },
432 | {
433 | "id": "gpt-4.1-mini",
434 | "provider": self.provider_name,
435 | "description": "Smaller, efficient GPT-4 model",
436 | },
437 | {
438 | "id": "gpt-4.1-mini",
439 | "provider": self.provider_name,
440 | "description": "Fast and cost-effective GPT model",
441 | },
442 | ]
443 |
444 | def get_default_model(self) -> str:
445 | """
446 | Get the default OpenAI model identifier to use when none is specified.
447 |
448 | This method determines the appropriate default model for OpenAI completions
449 | through a prioritized selection process:
450 |
451 | 1. First, it attempts to load the default_model setting from the Ultimate MCP Server
452 | configuration system (from providers.openai.default_model in the config)
453 | 2. If that's not available or valid, it falls back to a hardcoded default model
454 | that represents a reasonable balance of capability, cost, and availability
455 |
456 | Using the configuration system allows for flexible deployment-specific defaults
457 | without code changes, while the hardcoded fallback ensures the system remains
458 | functional even with minimal configuration.
459 |
460 | Returns:
461 | String identifier of the default OpenAI model to use (e.g., "gpt-4.1-mini").
462 | This identifier can be directly used in API calls to the OpenAI API.
463 |
464 | Note:
465 | The current hardcoded default is "gpt-4.1-mini", chosen for its balance of
466 | capability and cost. This may change in future versions as new models are
467 | released or existing ones are deprecated.
468 | """
469 | from ultimate_mcp_server.config import get_config
470 |
471 | # Safely get from config if available
472 | try:
473 | config = get_config()
474 | provider_config = getattr(config, 'providers', {}).get(self.provider_name, None)
475 | if provider_config and provider_config.default_model:
476 | return provider_config.default_model
477 | except (AttributeError, TypeError):
478 | # Handle case when providers attribute doesn't exist or isn't a dict
479 | pass
480 |
481 | # Otherwise return hard-coded default
482 | return "gpt-4.1-mini"
483 |
484 | async def check_api_key(self) -> bool:
485 | """Check if the OpenAI API key is valid.
486 |
487 | This method performs a lightweight validation of the configured OpenAI API key
488 | by attempting to list available models. A successful API call confirms that:
489 |
490 | 1. The API key is properly formatted and not empty
491 | 2. The key has at least read permissions on the OpenAI API
492 | 3. The API endpoint is accessible and responding
493 | 4. The account associated with the key is active and not suspended
494 |
495 | This validation is useful when initializing the provider to ensure the API key
496 | works before attempting to make model completion requests that might fail later.
497 |
498 | Returns:
499 | bool: True if the API key is valid and the API is accessible, False otherwise.
500 | A False result may indicate an invalid key, network issues, or API service disruption.
501 |
502 | Notes:
503 | - This method simply calls list_models() which caches results for efficiency
504 | - No detailed error information is returned, only a boolean success indicator
505 | - The method silently catches all exceptions and returns False rather than raising
506 | - For debugging key issues, check server logs for the full exception details
507 | """
508 | try:
509 | # Just list models as a simple validation
510 | await self.list_models()
511 | return True
512 | except Exception:
513 | return False
```
--------------------------------------------------------------------------------
/ultimate_mcp_server/services/knowledge_base/rag_engine.py:
--------------------------------------------------------------------------------
```python
1 | """RAG engine for retrieval-augmented generation."""
2 | import time
3 | from typing import Any, Dict, List, Optional, Set
4 |
5 | from ultimate_mcp_server.core.models.requests import CompletionRequest
6 | from ultimate_mcp_server.services.cache import get_cache_service
7 | from ultimate_mcp_server.services.knowledge_base.feedback import get_rag_feedback_service
8 | from ultimate_mcp_server.services.knowledge_base.retriever import KnowledgeBaseRetriever
9 | from ultimate_mcp_server.services.knowledge_base.utils import (
10 | extract_keywords,
11 | generate_token_estimate,
12 | )
13 | from ultimate_mcp_server.services.prompts import get_prompt_service
14 | from ultimate_mcp_server.utils import get_logger
15 |
16 | logger = get_logger(__name__)
17 |
18 |
19 | # Default RAG prompt templates
20 | DEFAULT_RAG_TEMPLATES = {
21 | "rag_default": """Answer the question based only on the following context:
22 |
23 | {context}
24 |
25 | Question: {query}
26 |
27 | Answer:""",
28 |
29 | "rag_with_sources": """Answer the question based only on the following context:
30 |
31 | {context}
32 |
33 | Question: {query}
34 |
35 | Provide your answer along with the source document IDs in [brackets] for each piece of information:""",
36 |
37 | "rag_summarize": """Summarize the following context information:
38 |
39 | {context}
40 |
41 | Summary:""",
42 |
43 | "rag_analysis": """Analyze the following information and provide key insights:
44 |
45 | {context}
46 |
47 | Query: {query}
48 |
49 | Analysis:"""
50 | }
51 |
52 |
53 | class RAGEngine:
54 | """Engine for retrieval-augmented generation."""
55 |
56 | def __init__(
57 | self,
58 | retriever: KnowledgeBaseRetriever,
59 | provider_manager,
60 | optimization_service=None,
61 | analytics_service=None
62 | ):
63 | """Initialize the RAG engine.
64 |
65 | Args:
66 | retriever: Knowledge base retriever
67 | provider_manager: Provider manager for LLM access
68 | optimization_service: Optional optimization service for model selection
69 | analytics_service: Optional analytics service for tracking
70 | """
71 | self.retriever = retriever
72 | self.provider_manager = provider_manager
73 | self.optimization_service = optimization_service
74 | self.analytics_service = analytics_service
75 |
76 | # Initialize prompt service
77 | self.prompt_service = get_prompt_service()
78 |
79 | # Initialize feedback service
80 | self.feedback_service = get_rag_feedback_service()
81 |
82 | # Initialize cache service
83 | self.cache_service = get_cache_service()
84 |
85 | # Register RAG templates
86 | for template_name, template_text in DEFAULT_RAG_TEMPLATES.items():
87 | self.prompt_service.register_template(template_name, template_text)
88 |
89 | logger.info("RAG engine initialized", extra={"emoji_key": "success"})
90 |
91 | async def _select_optimal_model(self, task_info: Dict[str, Any]) -> Dict[str, Any]:
92 | """Select optimal model for a RAG task.
93 |
94 | Args:
95 | task_info: Task information
96 |
97 | Returns:
98 | Model selection
99 | """
100 | if self.optimization_service:
101 | try:
102 | return await self.optimization_service.get_optimal_model(task_info)
103 | except Exception as e:
104 | logger.error(
105 | f"Error selecting optimal model: {str(e)}",
106 | extra={"emoji_key": "error"}
107 | )
108 |
109 | # Fallback to default models for RAG
110 | return {
111 | "provider": "openai",
112 | "model": "gpt-4.1-mini"
113 | }
114 |
115 | async def _track_rag_metrics(
116 | self,
117 | knowledge_base: str,
118 | query: str,
119 | provider: str,
120 | model: str,
121 | metrics: Dict[str, Any]
122 | ) -> None:
123 | """Track RAG operation metrics.
124 |
125 | Args:
126 | knowledge_base: Knowledge base name
127 | query: Query text
128 | provider: Provider name
129 | model: Model name
130 | metrics: Operation metrics
131 | """
132 | if not self.analytics_service:
133 | return
134 |
135 | try:
136 | await self.analytics_service.track_operation(
137 | operation_type="rag",
138 | provider=provider,
139 | model=model,
140 | input_tokens=metrics.get("input_tokens", 0),
141 | output_tokens=metrics.get("output_tokens", 0),
142 | total_tokens=metrics.get("total_tokens", 0),
143 | cost=metrics.get("cost", 0.0),
144 | duration=metrics.get("total_time", 0.0),
145 | metadata={
146 | "knowledge_base": knowledge_base,
147 | "query": query,
148 | "retrieval_count": metrics.get("retrieval_count", 0),
149 | "retrieval_time": metrics.get("retrieval_time", 0.0),
150 | "generation_time": metrics.get("generation_time", 0.0)
151 | }
152 | )
153 | except Exception as e:
154 | logger.error(
155 | f"Error tracking RAG metrics: {str(e)}",
156 | extra={"emoji_key": "error"}
157 | )
158 |
159 | def _format_context(
160 | self,
161 | results: List[Dict[str, Any]],
162 | include_metadata: bool = True
163 | ) -> str:
164 | """Format retrieval results into context.
165 |
166 | Args:
167 | results: List of retrieval results
168 | include_metadata: Whether to include metadata
169 |
170 | Returns:
171 | Formatted context
172 | """
173 | context_parts = []
174 |
175 | for i, result in enumerate(results):
176 | # Format metadata if included
177 | metadata_str = ""
178 | if include_metadata and result.get("metadata"):
179 | # Extract relevant metadata fields
180 | metadata_fields = []
181 | for key in ["title", "source", "author", "date", "source_id", "potential_title"]:
182 | if key in result["metadata"]:
183 | metadata_fields.append(f"{key}: {result['metadata'][key]}")
184 |
185 | if metadata_fields:
186 | metadata_str = " | ".join(metadata_fields)
187 | metadata_str = f"[{metadata_str}]\n"
188 |
189 | # Add document with index
190 | context_parts.append(f"Document {i+1} [ID: {result['id']}]:\n{metadata_str}{result['document']}")
191 |
192 | return "\n\n".join(context_parts)
193 |
194 | async def _adjust_retrieval_params(self, query: str, knowledge_base_name: str) -> Dict[str, Any]:
195 | """Dynamically adjust retrieval parameters based on query complexity.
196 |
197 | Args:
198 | query: Query text
199 | knowledge_base_name: Knowledge base name
200 |
201 | Returns:
202 | Adjusted parameters
203 | """
204 | # Analyze query complexity
205 | query_length = len(query.split())
206 | query_keywords = extract_keywords(query)
207 |
208 | # Base parameters
209 | params = {
210 | "top_k": 5,
211 | "retrieval_method": "vector",
212 | "min_score": 0.6,
213 | "search_params": {"search_ef": 100}
214 | }
215 |
216 | # Adjust based on query length
217 | if query_length > 30: # Complex query
218 | params["top_k"] = 8
219 | params["search_params"]["search_ef"] = 200
220 | params["retrieval_method"] = "hybrid"
221 | elif query_length < 5: # Very short query
222 | params["top_k"] = 10 # Get more results for short queries
223 | params["min_score"] = 0.5 # Lower threshold
224 |
225 | # Check if similar queries exist
226 | similar_queries = await self.feedback_service.get_similar_queries(
227 | knowledge_base_name=knowledge_base_name,
228 | query=query,
229 | top_k=1,
230 | threshold=0.85
231 | )
232 |
233 | # If we have similar past queries, use their parameters
234 | if similar_queries:
235 | params["retrieval_method"] = "hybrid" # Hybrid works well for repeat queries
236 |
237 | # Add keywords
238 | params["additional_keywords"] = query_keywords
239 |
240 | return params
241 |
242 | async def _analyze_used_documents(
243 | self,
244 | answer: str,
245 | results: List[Dict[str, Any]]
246 | ) -> Set[str]:
247 | """Analyze which documents were used in the answer.
248 |
249 | Args:
250 | answer: Generated answer
251 | results: List of retrieval results
252 |
253 | Returns:
254 | Set of document IDs used in the answer
255 | """
256 | used_ids = set()
257 |
258 | # Check for explicit mentions of document IDs
259 | for result in results:
260 | doc_id = result["id"]
261 | if f"[ID: {doc_id}]" in answer or f"[{doc_id}]" in answer:
262 | used_ids.add(doc_id)
263 |
264 | # Check content overlap (crude approximation)
265 | for result in results:
266 | if result["id"] in used_ids:
267 | continue
268 |
269 | # Check for significant phrases from document in answer
270 | doc_keywords = extract_keywords(result["document"], max_keywords=5)
271 | matched_keywords = sum(1 for kw in doc_keywords if kw in answer.lower())
272 |
273 | # If multiple keywords match, consider document used
274 | if matched_keywords >= 2:
275 | used_ids.add(result["id"])
276 |
277 | return used_ids
278 |
279 | async def _check_cached_response(
280 | self,
281 | knowledge_base_name: str,
282 | query: str
283 | ) -> Optional[Dict[str, Any]]:
284 | """Check for cached RAG response.
285 |
286 | Args:
287 | knowledge_base_name: Knowledge base name
288 | query: Query text
289 |
290 | Returns:
291 | Cached response or None
292 | """
293 | if not self.cache_service:
294 | return None
295 |
296 | cache_key = f"rag_{knowledge_base_name}_{query}"
297 |
298 | try:
299 | cached = await self.cache_service.get(cache_key)
300 | if cached:
301 | logger.info(
302 | f"Using cached RAG response for query in '{knowledge_base_name}'",
303 | extra={"emoji_key": "cache"}
304 | )
305 | return cached
306 | except Exception as e:
307 | logger.error(
308 | f"Error checking cache: {str(e)}",
309 | extra={"emoji_key": "error"}
310 | )
311 |
312 | return None
313 |
314 | async def _cache_response(
315 | self,
316 | knowledge_base_name: str,
317 | query: str,
318 | response: Dict[str, Any]
319 | ) -> None:
320 | """Cache RAG response.
321 |
322 | Args:
323 | knowledge_base_name: Knowledge base name
324 | query: Query text
325 | response: Response to cache
326 | """
327 | if not self.cache_service:
328 | return
329 |
330 | cache_key = f"rag_{knowledge_base_name}_{query}"
331 |
332 | try:
333 | # Cache for 1 day
334 | await self.cache_service.set(cache_key, response, ttl=86400)
335 | except Exception as e:
336 | logger.error(
337 | f"Error caching response: {str(e)}",
338 | extra={"emoji_key": "error"}
339 | )
340 |
341 | async def generate_with_rag(
342 | self,
343 | knowledge_base_name: str,
344 | query: str,
345 | provider: Optional[str] = None,
346 | model: Optional[str] = None,
347 | template: str = "rag_default",
348 | max_tokens: int = 1000,
349 | temperature: float = 0.3,
350 | top_k: Optional[int] = None,
351 | retrieval_method: Optional[str] = None,
352 | min_score: Optional[float] = None,
353 | metadata_filter: Optional[Dict[str, Any]] = None,
354 | include_metadata: bool = True,
355 | include_sources: bool = True,
356 | use_cache: bool = True,
357 | apply_feedback: bool = True,
358 | search_params: Optional[Dict[str, Any]] = None
359 | ) -> Dict[str, Any]:
360 | """Generate a response using RAG.
361 |
362 | Args:
363 | knowledge_base_name: Knowledge base name
364 | query: Query text
365 | provider: Provider name (auto-selected if None)
366 | model: Model name (auto-selected if None)
367 | template: RAG prompt template name
368 | max_tokens: Maximum tokens for generation
369 | temperature: Temperature for generation
370 | top_k: Number of documents to retrieve (auto-adjusted if None)
371 | retrieval_method: Retrieval method (vector, hybrid)
372 | min_score: Minimum similarity score
373 | metadata_filter: Optional metadata filter
374 | include_metadata: Whether to include metadata in context
375 | include_sources: Whether to include sources in response
376 | use_cache: Whether to use cached responses
377 | apply_feedback: Whether to apply feedback adjustments
378 | search_params: Optional ChromaDB search parameters
379 |
380 | Returns:
381 | Generated response with sources and metrics
382 | """
383 | start_time = time.time()
384 | operation_metrics = {}
385 |
386 | # Check cache first if enabled
387 | if use_cache:
388 | cached_response = await self._check_cached_response(knowledge_base_name, query)
389 | if cached_response:
390 | return cached_response
391 |
392 | # Auto-select model if not specified
393 | if not provider or not model:
394 | # Determine task complexity based on query
395 | task_complexity = "medium"
396 | if len(query) > 100:
397 | task_complexity = "high"
398 | elif len(query) < 30:
399 | task_complexity = "low"
400 |
401 | # Get optimal model
402 | model_selection = await self._select_optimal_model({
403 | "task_type": "rag_completion",
404 | "complexity": task_complexity,
405 | "query_length": len(query)
406 | })
407 |
408 | provider = provider or model_selection["provider"]
409 | model = model or model_selection["model"]
410 |
411 | # Dynamically adjust retrieval parameters if not specified
412 | if top_k is None or retrieval_method is None or min_score is None:
413 | adjusted_params = await self._adjust_retrieval_params(query, knowledge_base_name)
414 |
415 | # Use specified parameters or adjusted ones
416 | top_k = top_k or adjusted_params["top_k"]
417 | retrieval_method = retrieval_method or adjusted_params["retrieval_method"]
418 | min_score = min_score or adjusted_params["min_score"]
419 | search_params = search_params or adjusted_params.get("search_params")
420 | additional_keywords = adjusted_params.get("additional_keywords")
421 | else:
422 | additional_keywords = None
423 |
424 | # Retrieve context
425 | retrieval_start = time.time()
426 |
427 | if retrieval_method == "hybrid":
428 | # Use hybrid search
429 | retrieval_result = await self.retriever.retrieve_hybrid(
430 | knowledge_base_name=knowledge_base_name,
431 | query=query,
432 | top_k=top_k,
433 | min_score=min_score,
434 | metadata_filter=metadata_filter,
435 | additional_keywords=additional_keywords,
436 | apply_feedback=apply_feedback,
437 | search_params=search_params
438 | )
439 | else:
440 | # Use standard vector search
441 | retrieval_result = await self.retriever.retrieve(
442 | knowledge_base_name=knowledge_base_name,
443 | query=query,
444 | top_k=top_k,
445 | min_score=min_score,
446 | metadata_filter=metadata_filter,
447 | content_filter=None, # No content filter for vector-only search
448 | apply_feedback=apply_feedback,
449 | search_params=search_params
450 | )
451 |
452 | retrieval_time = time.time() - retrieval_start
453 | operation_metrics["retrieval_time"] = retrieval_time
454 |
455 | # Check if retrieval was successful
456 | if retrieval_result.get("status") != "success" or not retrieval_result.get("results"):
457 | logger.warning(
458 | f"No relevant documents found for query in knowledge base '{knowledge_base_name}'",
459 | extra={"emoji_key": "warning"}
460 | )
461 |
462 | # Return error response
463 | error_response = {
464 | "status": "no_results",
465 | "message": "No relevant documents found for query",
466 | "query": query,
467 | "retrieval_time": retrieval_time,
468 | "total_time": time.time() - start_time
469 | }
470 |
471 | # Cache error response if enabled
472 | if use_cache:
473 | await self._cache_response(knowledge_base_name, query, error_response)
474 |
475 | return error_response
476 |
477 | # Format context from retrieval results
478 | context = self._format_context(
479 | retrieval_result["results"],
480 | include_metadata=include_metadata
481 | )
482 |
483 | # Get prompt template
484 | template_text = self.prompt_service.get_template(template)
485 | if not template_text:
486 | # Fallback to default template
487 | template_text = DEFAULT_RAG_TEMPLATES["rag_default"]
488 |
489 | # Format prompt with template
490 | rag_prompt = template_text.format(
491 | context=context,
492 | query=query
493 | )
494 |
495 | # Calculate token estimates
496 | input_tokens = generate_token_estimate(rag_prompt)
497 | operation_metrics["context_tokens"] = generate_token_estimate(context)
498 | operation_metrics["input_tokens"] = input_tokens
499 | operation_metrics["retrieval_count"] = len(retrieval_result["results"])
500 |
501 | # Generate completion
502 | generation_start = time.time()
503 |
504 | provider_service = self.provider_manager.get_provider(provider)
505 | completion_request = CompletionRequest(
506 | prompt=rag_prompt,
507 | model=model,
508 | max_tokens=max_tokens,
509 | temperature=temperature
510 | )
511 |
512 | completion_result = await provider_service.generate_completion(
513 | request=completion_request
514 | )
515 |
516 | generation_time = time.time() - generation_start
517 | operation_metrics["generation_time"] = generation_time
518 |
519 | # Extract completion and metrics
520 | completion = completion_result.get("completion", "")
521 | operation_metrics["output_tokens"] = completion_result.get("output_tokens", 0)
522 | operation_metrics["total_tokens"] = completion_result.get("total_tokens", 0)
523 | operation_metrics["cost"] = completion_result.get("cost", 0.0)
524 | operation_metrics["total_time"] = time.time() - start_time
525 |
526 | # Prepare sources if requested
527 | sources = []
528 | if include_sources:
529 | for result in retrieval_result["results"]:
530 | # Include limited context for each source
531 | doc_preview = result["document"]
532 | if len(doc_preview) > 100:
533 | doc_preview = doc_preview[:100] + "..."
534 |
535 | sources.append({
536 | "id": result["id"],
537 | "document": doc_preview,
538 | "score": result["score"],
539 | "metadata": result.get("metadata", {})
540 | })
541 |
542 | # Analyze which documents were used in the answer
543 | used_doc_ids = await self._analyze_used_documents(completion, retrieval_result["results"])
544 |
545 | # Record feedback
546 | if apply_feedback:
547 | await self.retriever.record_feedback(
548 | knowledge_base_name=knowledge_base_name,
549 | query=query,
550 | retrieved_documents=retrieval_result["results"],
551 | used_document_ids=list(used_doc_ids)
552 | )
553 |
554 | # Track metrics
555 | await self._track_rag_metrics(
556 | knowledge_base=knowledge_base_name,
557 | query=query,
558 | provider=provider,
559 | model=model,
560 | metrics=operation_metrics
561 | )
562 |
563 | logger.info(
564 | f"Generated RAG response using {provider}/{model} in {operation_metrics['total_time']:.2f}s",
565 | extra={"emoji_key": "success"}
566 | )
567 |
568 | # Create response
569 | response = {
570 | "status": "success",
571 | "query": query,
572 | "answer": completion,
573 | "sources": sources,
574 | "knowledge_base": knowledge_base_name,
575 | "provider": provider,
576 | "model": model,
577 | "used_document_ids": list(used_doc_ids),
578 | "metrics": operation_metrics
579 | }
580 |
581 | # Cache response if enabled
582 | if use_cache:
583 | await self._cache_response(knowledge_base_name, query, response)
584 |
585 | return response
```
--------------------------------------------------------------------------------
/ultimate_mcp_server/tools/redline-compiled.css:
--------------------------------------------------------------------------------
```css
1 | *,:after,:before{--tw-border-spacing-x:0;--tw-border-spacing-y:0;--tw-translate-x:0;--tw-translate-y:0;--tw-rotate:0;--tw-skew-x:0;--tw-skew-y:0;--tw-scale-x:1;--tw-scale-y:1;--tw-pan-x: ;--tw-pan-y: ;--tw-pinch-zoom: ;--tw-scroll-snap-strictness:proximity;--tw-gradient-from-position: ;--tw-gradient-via-position: ;--tw-gradient-to-position: ;--tw-ordinal: ;--tw-slashed-zero: ;--tw-numeric-figure: ;--tw-numeric-spacing: ;--tw-numeric-fraction: ;--tw-ring-inset: ;--tw-ring-offset-width:0px;--tw-ring-offset-color:#fff;--tw-ring-color:rgba(59,130,246,.5);--tw-ring-offset-shadow:0 0 #0000;--tw-ring-shadow:0 0 #0000;--tw-shadow:0 0 #0000;--tw-shadow-colored:0 0 #0000;--tw-blur: ;--tw-brightness: ;--tw-contrast: ;--tw-grayscale: ;--tw-hue-rotate: ;--tw-invert: ;--tw-saturate: ;--tw-sepia: ;--tw-drop-shadow: ;--tw-backdrop-blur: ;--tw-backdrop-brightness: ;--tw-backdrop-contrast: ;--tw-backdrop-grayscale: ;--tw-backdrop-hue-rotate: ;--tw-backdrop-invert: ;--tw-backdrop-opacity: ;--tw-backdrop-saturate: ;--tw-backdrop-sepia: ;--tw-contain-size: ;--tw-contain-layout: ;--tw-contain-paint: ;--tw-contain-style: }::backdrop{--tw-border-spacing-x:0;--tw-border-spacing-y:0;--tw-translate-x:0;--tw-translate-y:0;--tw-rotate:0;--tw-skew-x:0;--tw-skew-y:0;--tw-scale-x:1;--tw-scale-y:1;--tw-pan-x: ;--tw-pan-y: ;--tw-pinch-zoom: ;--tw-scroll-snap-strictness:proximity;--tw-gradient-from-position: ;--tw-gradient-via-position: ;--tw-gradient-to-position: ;--tw-ordinal: ;--tw-slashed-zero: ;--tw-numeric-figure: ;--tw-numeric-spacing: ;--tw-numeric-fraction: ;--tw-ring-inset: ;--tw-ring-offset-width:0px;--tw-ring-offset-color:#fff;--tw-ring-color:rgba(59,130,246,.5);--tw-ring-offset-shadow:0 0 #0000;--tw-ring-shadow:0 0 #0000;--tw-shadow:0 0 #0000;--tw-shadow-colored:0 0 #0000;--tw-blur: ;--tw-brightness: ;--tw-contrast: ;--tw-grayscale: ;--tw-hue-rotate: ;--tw-invert: ;--tw-saturate: ;--tw-sepia: ;--tw-drop-shadow: ;--tw-backdrop-blur: ;--tw-backdrop-brightness: ;--tw-backdrop-contrast: ;--tw-backdrop-grayscale: ;--tw-backdrop-hue-rotate: ;--tw-backdrop-invert: ;--tw-backdrop-opacity: ;--tw-backdrop-saturate: ;--tw-backdrop-sepia: ;--tw-contain-size: ;--tw-contain-layout: ;--tw-contain-paint: ;--tw-contain-style: }
2 | /*! tailwindcss v3.4.17 | MIT License | https://tailwindcss.com*/*,:after,:before{box-sizing:border-box;border:0 solid #e5e7eb}:after,:before{--tw-content:""}:host,html{line-height:1.5;-webkit-text-size-adjust:100%;-moz-tab-size:4;-o-tab-size:4;tab-size:4;font-family:ui-sans-serif,system-ui,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol,Noto Color Emoji;font-feature-settings:normal;font-variation-settings:normal;-webkit-tap-highlight-color:transparent}body{margin:0;line-height:inherit}hr{height:0;color:inherit;border-top-width:1px}abbr:where([title]){-webkit-text-decoration:underline dotted;text-decoration:underline dotted}h1,h2,h3,h4,h5,h6{font-size:inherit;font-weight:inherit}a{color:inherit;text-decoration:inherit}b,strong{font-weight:bolder}code,kbd,pre,samp{font-family:ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,monospace;font-feature-settings:normal;font-variation-settings:normal;font-size:1em}small{font-size:80%}sub,sup{font-size:75%;line-height:0;position:relative;vertical-align:baseline}sub{bottom:-.25em}sup{top:-.5em}table{text-indent:0;border-color:inherit;border-collapse:collapse}button,input,optgroup,select,textarea{font-family:inherit;font-feature-settings:inherit;font-variation-settings:inherit;font-size:100%;font-weight:inherit;line-height:inherit;letter-spacing:inherit;color:inherit;margin:0;padding:0}button,select{text-transform:none}button,input:where([type=button]),input:where([type=reset]),input:where([type=submit]){-webkit-appearance:button;background-color:transparent;background-image:none}:-moz-focusring{outline:auto}:-moz-ui-invalid{box-shadow:none}progress{vertical-align:baseline}::-webkit-inner-spin-button,::-webkit-outer-spin-button{height:auto}[type=search]{-webkit-appearance:textfield;outline-offset:-2px}::-webkit-search-decoration{-webkit-appearance:none}::-webkit-file-upload-button{-webkit-appearance:button;font:inherit}summary{display:list-item}blockquote,dd,dl,figure,h1,h2,h3,h4,h5,h6,hr,p,pre{margin:0}fieldset{margin:0}fieldset,legend{padding:0}menu,ol,ul{list-style:none;margin:0;padding:0}dialog{padding:0}textarea{resize:vertical}input::-moz-placeholder,textarea::-moz-placeholder{opacity:1;color:#9ca3af}input::placeholder,textarea::placeholder{opacity:1;color:#9ca3af}[role=button],button{cursor:pointer}:disabled{cursor:default}audio,canvas,embed,iframe,img,object,svg,video{display:block;vertical-align:middle}img,video{max-width:100%;height:auto}[hidden]:where(:not([hidden=until-found])){display:none}.\!container{width:100%!important}.container{width:100%}@media (min-width:640px){.\!container{max-width:640px!important}.container{max-width:640px}}@media (min-width:768px){.\!container{max-width:768px!important}.container{max-width:768px}}@media (min-width:1024px){.\!container{max-width:1024px!important}.container{max-width:1024px}}@media (min-width:1280px){.\!container{max-width:1280px!important}.container{max-width:1280px}}@media (min-width:1536px){.\!container{max-width:1536px!important}.container{max-width:1536px}}.diff-insert,ins.diff-insert,ins.diff-insert-text{--tw-bg-opacity:1;background-color:rgb(239 246 255/var(--tw-bg-opacity,1));--tw-text-opacity:1;color:rgb(30 64 175/var(--tw-text-opacity,1));text-decoration-line:none;--tw-ring-offset-shadow:var(--tw-ring-inset) 0 0 0 var(--tw-ring-offset-width) var(--tw-ring-offset-color);--tw-ring-shadow:var(--tw-ring-inset) 0 0 0 calc(1px + var(--tw-ring-offset-width)) var(--tw-ring-color);--tw-ring-inset:inset;--tw-ring-color:rgba(147,197,253,.6)}.diff-delete,.diff-insert,del.diff-delete,del.diff-delete-text,ins.diff-insert,ins.diff-insert-text{border-radius:.125rem;padding-left:.125rem;padding-right:.125rem;box-shadow:var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow,0 0 #0000)}.diff-delete,del.diff-delete,del.diff-delete-text{--tw-bg-opacity:1;background-color:rgb(255 241 242/var(--tw-bg-opacity,1));--tw-text-opacity:1;color:rgb(159 18 57/var(--tw-text-opacity,1));text-decoration-line:line-through;--tw-ring-offset-shadow:var(--tw-ring-inset) 0 0 0 var(--tw-ring-offset-width) var(--tw-ring-offset-color);--tw-ring-shadow:var(--tw-ring-inset) 0 0 0 calc(1px + var(--tw-ring-offset-width)) var(--tw-ring-color);--tw-ring-inset:inset;--tw-ring-color:rgba(253,164,175,.6)}.diff-move-target,ins.diff-move-target{border-radius:.125rem;border-width:1px;--tw-border-opacity:1;border-color:rgb(110 231 183/var(--tw-border-opacity,1));--tw-bg-opacity:1;background-color:rgb(236 253 245/var(--tw-bg-opacity,1));padding-left:.125rem;padding-right:.125rem;--tw-text-opacity:1;color:rgb(6 78 59/var(--tw-text-opacity,1));--tw-ring-offset-shadow:var(--tw-ring-inset) 0 0 0 var(--tw-ring-offset-width) var(--tw-ring-offset-color);--tw-ring-shadow:var(--tw-ring-inset) 0 0 0 calc(1px + var(--tw-ring-offset-width)) var(--tw-ring-color);box-shadow:var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow,0 0 #0000);--tw-ring-color:rgba(52,211,153,.6)}.diff-move-source,del.diff-move-source{border-radius:.125rem;border:1px dashed rgba(52,211,153,.4);background-color:rgba(236,253,245,.5);padding-left:.125rem;padding-right:.125rem;color:rgba(6,95,70,.6);text-decoration-line:line-through}span.diff-attrib-change{border-bottom-width:1px;border-style:dotted;border-color:rgb(251 146 60/var(--tw-border-opacity,1));background-color:rgba(255,247,237,.4)}span.diff-attrib-change,span.diff-rename-node{--tw-border-opacity:1;padding-left:.125rem;padding-right:.125rem}span.diff-rename-node{border-bottom-width:2px;border-style:dotted;border-color:rgb(192 132 252/var(--tw-border-opacity,1));background-color:rgba(245,243,255,.4)}.visible{visibility:visible}.collapse{visibility:collapse}.static{position:static}.fixed{position:fixed}.absolute{position:absolute}.relative{position:relative}.bottom-10{bottom:2.5rem}.bottom-2{bottom:.5rem}.left-2{left:.5rem}.right-1{right:.25rem}.right-2{right:.5rem}.top-10{top:2.5rem}.top-2{top:.5rem}.isolate{isolation:isolate}.z-40{z-index:40}.z-50{z-index:50}.mx-auto{margin-left:auto;margin-right:auto}.ml-1{margin-left:.25rem}.ml-2{margin-left:.5rem}.mr-1{margin-right:.25rem}.block{display:block}.inline-block{display:inline-block}.inline{display:inline}.flex{display:flex}.\!table{display:table!important}.table{display:table}.grid{display:grid}.contents{display:contents}.hidden{display:none}.h-0\.5{height:.125rem}.h-3{height:.75rem}.h-4{height:1rem}.w-1{width:.25rem}.w-3{width:.75rem}.w-4{width:1rem}.max-w-3xl{max-width:48rem}.shrink{flex-shrink:1}.grow{flex-grow:1}.border-collapse{border-collapse:collapse}.transform{transform:translate(var(--tw-translate-x),var(--tw-translate-y)) rotate(var(--tw-rotate)) skewX(var(--tw-skew-x)) skewY(var(--tw-skew-y)) scaleX(var(--tw-scale-x)) scaleY(var(--tw-scale-y))}.cursor-pointer{cursor:pointer}.resize{resize:both}.flex-col{flex-direction:column}.flex-wrap{flex-wrap:wrap}.items-center{align-items:center}.gap-2{gap:.5rem}.truncate{overflow:hidden;text-overflow:ellipsis;white-space:nowrap}.rounded{border-radius:.25rem}.rounded-lg{border-radius:.5rem}.rounded-sm{border-radius:.125rem}.border{border-width:1px}.border-b{border-bottom-width:1px}.border-b-2{border-bottom-width:2px}.border-dashed{border-style:dashed}.border-dotted{border-style:dotted}.border-emerald-300{--tw-border-opacity:1;border-color:rgb(110 231 183/var(--tw-border-opacity,1))}.border-emerald-400{--tw-border-opacity:1;border-color:rgb(52 211 153/var(--tw-border-opacity,1))}.border-emerald-400\/40{border-color:rgba(52,211,153,.4)}.border-gray-300{--tw-border-opacity:1;border-color:rgb(209 213 219/var(--tw-border-opacity,1))}.border-orange-400{--tw-border-opacity:1;border-color:rgb(251 146 60/var(--tw-border-opacity,1))}.border-orange-500{--tw-border-opacity:1;border-color:rgb(249 115 22/var(--tw-border-opacity,1))}.border-purple-400{--tw-border-opacity:1;border-color:rgb(192 132 252/var(--tw-border-opacity,1))}.border-purple-500{--tw-border-opacity:1;border-color:rgb(168 85 247/var(--tw-border-opacity,1))}.bg-blue-100{--tw-bg-opacity:1;background-color:rgb(219 234 254/var(--tw-bg-opacity,1))}.bg-blue-50{--tw-bg-opacity:1;background-color:rgb(239 246 255/var(--tw-bg-opacity,1))}.bg-blue-500{--tw-bg-opacity:1;background-color:rgb(59 130 246/var(--tw-bg-opacity,1))}.bg-blue-900\/40{background-color:rgba(30,58,138,.4)}.bg-emerald-100{--tw-bg-opacity:1;background-color:rgb(209 250 229/var(--tw-bg-opacity,1))}.bg-emerald-100\/70{background-color:rgba(209,250,229,.7)}.bg-emerald-50{--tw-bg-opacity:1;background-color:rgb(236 253 245/var(--tw-bg-opacity,1))}.bg-emerald-50\/50{background-color:rgba(236,253,245,.5)}.bg-emerald-500{--tw-bg-opacity:1;background-color:rgb(16 185 129/var(--tw-bg-opacity,1))}.bg-emerald-900\/30{background-color:rgba(6,78,59,.3)}.bg-emerald-900\/40{background-color:rgba(6,78,59,.4)}.bg-gray-100{--tw-bg-opacity:1;background-color:rgb(243 244 246/var(--tw-bg-opacity,1))}.bg-gray-200{--tw-bg-opacity:1;background-color:rgb(229 231 235/var(--tw-bg-opacity,1))}.bg-orange-50{--tw-bg-opacity:1;background-color:rgb(255 247 237/var(--tw-bg-opacity,1))}.bg-orange-50\/40{background-color:rgba(255,247,237,.4)}.bg-rose-100{--tw-bg-opacity:1;background-color:rgb(255 228 230/var(--tw-bg-opacity,1))}.bg-rose-50{--tw-bg-opacity:1;background-color:rgb(255 241 242/var(--tw-bg-opacity,1))}.bg-rose-500{--tw-bg-opacity:1;background-color:rgb(244 63 94/var(--tw-bg-opacity,1))}.bg-rose-900\/40{background-color:rgba(136,19,55,.4)}.bg-transparent{background-color:transparent}.bg-violet-50\/40{background-color:rgba(245,243,255,.4)}.bg-white{--tw-bg-opacity:1;background-color:rgb(255 255 255/var(--tw-bg-opacity,1))}.bg-white\/90{background-color:hsla(0,0%,100%,.9)}.p-2{padding:.5rem}.px-0\.5{padding-left:.125rem;padding-right:.125rem}.px-1{padding-left:.25rem;padding-right:.25rem}.px-2{padding-left:.5rem;padding-right:.5rem}.px-4{padding-left:1rem;padding-right:1rem}.py-1{padding-top:.25rem;padding-bottom:.25rem}.py-8{padding-top:2rem;padding-bottom:2rem}.font-\[\'Newsreader\'\]{font-family:Newsreader}.text-\[0\.95em\]{font-size:.95em}.text-xs{font-size:.75rem;line-height:1rem}.uppercase{text-transform:uppercase}.lowercase{text-transform:lowercase}.capitalize{text-transform:capitalize}.italic{font-style:italic}.ordinal{--tw-ordinal:ordinal;font-variant-numeric:var(--tw-ordinal) var(--tw-slashed-zero) var(--tw-numeric-figure) var(--tw-numeric-spacing) var(--tw-numeric-fraction)}.text-black{--tw-text-opacity:1;color:rgb(0 0 0/var(--tw-text-opacity,1))}.text-blue-200{--tw-text-opacity:1;color:rgb(191 219 254/var(--tw-text-opacity,1))}.text-blue-800{--tw-text-opacity:1;color:rgb(30 64 175/var(--tw-text-opacity,1))}.text-emerald-200{--tw-text-opacity:1;color:rgb(167 243 208/var(--tw-text-opacity,1))}.text-emerald-300\/60{color:rgba(110,231,183,.6)}.text-emerald-800\/60{color:rgba(6,95,70,.6)}.text-emerald-900{--tw-text-opacity:1;color:rgb(6 78 59/var(--tw-text-opacity,1))}.text-gray-500{--tw-text-opacity:1;color:rgb(107 114 128/var(--tw-text-opacity,1))}.text-rose-200{--tw-text-opacity:1;color:rgb(254 205 211/var(--tw-text-opacity,1))}.text-rose-800{--tw-text-opacity:1;color:rgb(159 18 57/var(--tw-text-opacity,1))}.underline{text-decoration-line:underline}.overline{text-decoration-line:overline}.line-through{text-decoration-line:line-through}.no-underline{text-decoration-line:none}.decoration-2{text-decoration-thickness:2px}.underline-offset-4{text-underline-offset:4px}.shadow{--tw-shadow:0 1px 3px 0 rgba(0,0,0,.1),0 1px 2px -1px rgba(0,0,0,.1);--tw-shadow-colored:0 1px 3px 0 var(--tw-shadow-color),0 1px 2px -1px var(--tw-shadow-color)}.shadow,.shadow-lg{box-shadow:var(--tw-ring-offset-shadow,0 0 #0000),var(--tw-ring-shadow,0 0 #0000),var(--tw-shadow)}.shadow-lg{--tw-shadow:0 10px 15px -3px rgba(0,0,0,.1),0 4px 6px -4px rgba(0,0,0,.1);--tw-shadow-colored:0 10px 15px -3px var(--tw-shadow-color),0 4px 6px -4px var(--tw-shadow-color)}.outline{outline-style:solid}.ring{--tw-ring-offset-shadow:var(--tw-ring-inset) 0 0 0 var(--tw-ring-offset-width) var(--tw-ring-offset-color);--tw-ring-shadow:var(--tw-ring-inset) 0 0 0 calc(3px + var(--tw-ring-offset-width)) var(--tw-ring-color)}.ring,.ring-0{box-shadow:var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow,0 0 #0000)}.ring-0{--tw-ring-offset-shadow:var(--tw-ring-inset) 0 0 0 var(--tw-ring-offset-width) var(--tw-ring-offset-color);--tw-ring-shadow:var(--tw-ring-inset) 0 0 0 calc(var(--tw-ring-offset-width)) var(--tw-ring-color)}.ring-1{--tw-ring-offset-shadow:var(--tw-ring-inset) 0 0 0 var(--tw-ring-offset-width) var(--tw-ring-offset-color);--tw-ring-shadow:var(--tw-ring-inset) 0 0 0 calc(1px + var(--tw-ring-offset-width)) var(--tw-ring-color)}.ring-1,.ring-2{box-shadow:var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow,0 0 #0000)}.ring-2{--tw-ring-offset-shadow:var(--tw-ring-inset) 0 0 0 var(--tw-ring-offset-width) var(--tw-ring-offset-color);--tw-ring-shadow:var(--tw-ring-inset) 0 0 0 calc(2px + var(--tw-ring-offset-width)) var(--tw-ring-color)}.ring-inset{--tw-ring-inset:inset}.ring-black\/10{--tw-ring-color:rgba(0,0,0,.1)}.ring-blue-300{--tw-ring-opacity:1;--tw-ring-color:rgb(147 197 253/var(--tw-ring-opacity,1))}.ring-blue-300\/60{--tw-ring-color:rgba(147,197,253,.6)}.ring-emerald-300{--tw-ring-opacity:1;--tw-ring-color:rgb(110 231 183/var(--tw-ring-opacity,1))}.ring-emerald-400\/60{--tw-ring-color:rgba(52,211,153,.6)}.ring-emerald-500\/30{--tw-ring-color:rgba(16,185,129,.3)}.ring-orange-300{--tw-ring-opacity:1;--tw-ring-color:rgb(253 186 116/var(--tw-ring-opacity,1))}.ring-rose-300{--tw-ring-opacity:1;--tw-ring-color:rgb(253 164 175/var(--tw-ring-opacity,1))}.ring-rose-300\/60{--tw-ring-color:rgba(253,164,175,.6)}.ring-offset-1{--tw-ring-offset-width:1px}.blur{--tw-blur:blur(8px)}.blur,.grayscale{filter:var(--tw-blur) var(--tw-brightness) var(--tw-contrast) var(--tw-grayscale) var(--tw-hue-rotate) var(--tw-invert) var(--tw-saturate) var(--tw-sepia) var(--tw-drop-shadow)}.grayscale{--tw-grayscale:grayscale(100%)}.invert{--tw-invert:invert(100%)}.filter,.invert{filter:var(--tw-blur) var(--tw-brightness) var(--tw-contrast) var(--tw-grayscale) var(--tw-hue-rotate) var(--tw-invert) var(--tw-saturate) var(--tw-sepia) var(--tw-drop-shadow)}.backdrop-blur-sm{--tw-backdrop-blur:blur(4px);-webkit-backdrop-filter:var(--tw-backdrop-blur) var(--tw-backdrop-brightness) var(--tw-backdrop-contrast) var(--tw-backdrop-grayscale) var(--tw-backdrop-hue-rotate) var(--tw-backdrop-invert) var(--tw-backdrop-opacity) var(--tw-backdrop-saturate) var(--tw-backdrop-sepia);backdrop-filter:var(--tw-backdrop-blur) var(--tw-backdrop-brightness) var(--tw-backdrop-contrast) var(--tw-backdrop-grayscale) var(--tw-backdrop-hue-rotate) var(--tw-backdrop-invert) var(--tw-backdrop-opacity) var(--tw-backdrop-saturate) var(--tw-backdrop-sepia)}.transition{transition-property:color,background-color,border-color,text-decoration-color,fill,stroke,opacity,box-shadow,transform,filter,-webkit-backdrop-filter;transition-property:color,background-color,border-color,text-decoration-color,fill,stroke,opacity,box-shadow,transform,filter,backdrop-filter;transition-property:color,background-color,border-color,text-decoration-color,fill,stroke,opacity,box-shadow,transform,filter,backdrop-filter,-webkit-backdrop-filter;transition-timing-function:cubic-bezier(.4,0,.2,1);transition-duration:.15s}.transition-all{transition-property:all;transition-timing-function:cubic-bezier(.4,0,.2,1);transition-duration:.15s}.transition-colors{transition-property:color,background-color,border-color,text-decoration-color,fill,stroke;transition-timing-function:cubic-bezier(.4,0,.2,1);transition-duration:.15s}.duration-200{transition-duration:.2s}.ease-out{transition-timing-function:cubic-bezier(0,0,.2,1)}.\[_hdr_start\:_hdr_end\]{_hdr_start:hdr end}.\[_nonce_start\:_nonce_end\]{_nonce_start:nonce end}.\[a\:a\]{a:a}.\[ctx_start\:ctx_end\]{ctx_start:ctx end}.\[current_pos\:best_split_pos\]{current_pos:best split pos}.\[end-1\:end\]{end-1:end}.\[hl_start\:hl_end\]{hl_start:hl end}.\[i1\:i2\]{i1:i2}.\[i\:i\+3\]{i:i+3}.\[i\:i\+chunk_size\]{i:i+chunk size}.\[i\:i\+len\(pattern\)\]{i:i+len(pattern)}.\[inherit_ndx\:inherit_ndx\+1\]{inherit_ndx:inherit ndx+1}.\[j1\:j2\]{j1:j2}.\[json_start\:json_end\]{json_start:json end}.\[last\:start\]{last:start}.\[last_end\:start\]{last_end:start}.\[last_section_end\:first_match_start\]{last_section_end:first match start}.\[left\:right\]{left:right}.\[line_offset\:end_line\]{line_offset:end line}.\[offset\:next_offset\]{offset:next offset}.\[offset\:offset\+limit\]{offset:offset+limit}.\[pos\:end_pos\]{pos:end pos}.\[pos\:new_pos\]{pos:new pos}.\[position\:offset\]{position:offset}.\[position\:start\]{position:start}.\[search_region_start\:end_index\]{search_region_start:end index}.\[section_content_start\:section_content_end\]{section_content_start:section content end}.\[section_start\:section_end\]{section_start:section end}.\[start\:end\]{start:end}.\[start\:start\+1\]{start:start+1}.\[start_index\:best_split_index\]{start_index:best split index}.\[start_index\:end_index\]{start_index:end index}.\[start_pos\:pos\]{start_pos:pos}.\[user\:passwd\@\]{user:passwd@}.hover\:bg-gray-200:hover{--tw-bg-opacity:1;background-color:rgb(229 231 235/var(--tw-bg-opacity,1))}.hover\:bg-gray-300:hover{--tw-bg-opacity:1;background-color:rgb(209 213 219/var(--tw-bg-opacity,1))}.hover\:text-gray-700:hover{--tw-text-opacity:1;color:rgb(55 65 81/var(--tw-text-opacity,1))}@media (min-width:640px){.sm\:inline{display:inline}.sm\:hidden{display:none}}@media (min-width:768px){.md\:flex{display:flex}}@media (prefers-color-scheme:dark){.dark\:inline{display:inline}.dark\:hidden{display:none}.dark\:bg-blue-900\/60{background-color:rgba(30,58,138,.6)}.dark\:bg-emerald-900\/60{background-color:rgba(6,78,59,.6)}.dark\:bg-gray-700{--tw-bg-opacity:1;background-color:rgb(55 65 81/var(--tw-bg-opacity,1))}.dark\:bg-gray-800{--tw-bg-opacity:1;background-color:rgb(31 41 55/var(--tw-bg-opacity,1))}.dark\:bg-gray-800\/90{background-color:rgba(31,41,55,.9)}.dark\:bg-gray-900{--tw-bg-opacity:1;background-color:rgb(17 24 39/var(--tw-bg-opacity,1))}.dark\:bg-orange-900\/60{background-color:rgba(124,45,18,.6)}.dark\:bg-rose-900\/60{background-color:rgba(136,19,55,.6)}.dark\:text-gray-200{--tw-text-opacity:1;color:rgb(229 231 235/var(--tw-text-opacity,1))}.dark\:ring-blue-700{--tw-ring-opacity:1;--tw-ring-color:rgb(29 78 216/var(--tw-ring-opacity,1))}.dark\:ring-emerald-700{--tw-ring-opacity:1;--tw-ring-color:rgb(4 120 87/var(--tw-ring-opacity,1))}.dark\:ring-orange-700{--tw-ring-opacity:1;--tw-ring-color:rgb(194 65 12/var(--tw-ring-opacity,1))}.dark\:ring-rose-700{--tw-ring-opacity:1;--tw-ring-color:rgb(190 18 60/var(--tw-ring-opacity,1))}.dark\:hover\:bg-gray-600:hover{--tw-bg-opacity:1;background-color:rgb(75 85 99/var(--tw-bg-opacity,1))}}
```
--------------------------------------------------------------------------------
/examples/ollama_integration_demo.py:
--------------------------------------------------------------------------------
```python
1 | #!/usr/bin/env python
2 | """Ollama integration demonstration using Ultimate MCP Server."""
3 | import asyncio
4 | import sys
5 | import time
6 | from pathlib import Path
7 |
8 | # Add project root to path for imports when running as script
9 | sys.path.insert(0, str(Path(__file__).parent.parent))
10 |
11 | # Import configuration system instead of using update_ollama_env
12 | from ultimate_mcp_server.config import get_config
13 |
14 | # Load the config to ensure environment variables from .env are read
15 | config = get_config()
16 |
17 | # Third-party imports
18 | # These imports need to be below sys.path modification, which is why they have noqa comments
19 | from rich import box # noqa: E402
20 | from rich.markup import escape # noqa: E402
21 | from rich.panel import Panel # noqa: E402
22 | from rich.rule import Rule # noqa: E402
23 | from rich.table import Table # noqa: E402
24 |
25 | # Project imports
26 | from ultimate_mcp_server.constants import Provider # noqa: E402
27 | from ultimate_mcp_server.core.server import Gateway # noqa: E402
28 | from ultimate_mcp_server.utils import get_logger # noqa: E402
29 | from ultimate_mcp_server.utils.display import CostTracker # Import CostTracker # noqa: E402
30 | from ultimate_mcp_server.utils.logging.console import console # noqa: E402
31 |
32 | # Initialize logger
33 | logger = get_logger("example.ollama_integration_demo")
34 |
35 |
36 | async def compare_ollama_models(tracker: CostTracker):
37 | """Compare different Ollama models."""
38 | console.print(Rule("[bold blue]🦙 Ollama Model Comparison[/bold blue]"))
39 | logger.info("Starting Ollama models comparison", emoji_key="start")
40 |
41 | # Create Gateway instance - this handles provider initialization
42 | gateway = Gateway("ollama-demo", register_tools=False)
43 |
44 | try:
45 | # Initialize providers
46 | logger.info("Initializing providers...", emoji_key="provider")
47 | await gateway._initialize_providers()
48 |
49 | provider_name = Provider.OLLAMA.value
50 | try:
51 | # Get the provider from the gateway
52 | provider = gateway.providers.get(provider_name)
53 | if not provider:
54 | logger.error(f"Provider {provider_name} not available or initialized", emoji_key="error")
55 | return
56 |
57 | logger.info(f"Using provider: {provider_name}", emoji_key="provider")
58 |
59 | models = await provider.list_models()
60 | model_names = [m["id"] for m in models] # Extract names from model dictionaries
61 | console.print(f"Found {len(model_names)} Ollama models: [cyan]{escape(str(model_names))}[/cyan]")
62 |
63 | # Select specific models to compare (adjust these based on what you have installed locally)
64 | ollama_models = [
65 | "mix_77/gemma3-qat-tools:27b",
66 | "JollyLlama/GLM-Z1-32B-0414-Q4_K_M:latest",
67 | "llama3.2-vision:latest"
68 | ]
69 | # Filter based on available models
70 | models_to_compare = [m for m in ollama_models if m in model_names]
71 | if not models_to_compare:
72 | logger.error("None of the selected models for comparison are available. Please use 'ollama pull MODEL' to download models first.", emoji_key="error")
73 | console.print("[red]Selected models not found. Use 'ollama pull mix_77/gemma3-qat-tools:27b' to download models first.[/red]")
74 | return
75 | console.print(f"Comparing models: [yellow]{escape(str(models_to_compare))}[/yellow]")
76 |
77 | prompt = """
78 | Explain the concept of quantum entanglement in a way that a high school student would understand.
79 | Keep your response brief and accessible.
80 | """
81 | console.print(f"[cyan]Using Prompt:[/cyan] {escape(prompt.strip())[:100]}...")
82 |
83 | results_data = []
84 |
85 | for model_name in models_to_compare:
86 | try:
87 | logger.info(f"Testing model: {model_name}", emoji_key="model")
88 | start_time = time.time()
89 | result = await provider.generate_completion(
90 | prompt=prompt,
91 | model=model_name,
92 | temperature=0.3,
93 | max_tokens=300
94 | )
95 | processing_time = time.time() - start_time
96 |
97 | # Track the cost
98 | tracker.add_call(result)
99 |
100 | results_data.append({
101 | "model": model_name,
102 | "text": result.text,
103 | "tokens": {
104 | "input": result.input_tokens,
105 | "output": result.output_tokens,
106 | "total": result.total_tokens
107 | },
108 | "cost": result.cost,
109 | "time": processing_time
110 | })
111 |
112 | logger.success(
113 | f"Completion for {model_name} successful",
114 | emoji_key="success",
115 | # Tokens/cost/time logged implicitly by storing in results_data
116 | )
117 |
118 | except Exception as e:
119 | logger.error(f"Error testing model {model_name}: {str(e)}", emoji_key="error", exc_info=True)
120 | # Optionally add an error entry to results_data if needed
121 |
122 | # Display comparison results using Rich
123 | if results_data:
124 | console.print(Rule("[bold green]Comparison Results[/bold green]"))
125 |
126 | for result_item in results_data:
127 | model = result_item["model"]
128 | time_s = result_item["time"]
129 | tokens = result_item.get("tokens", {}).get("total", 0)
130 |
131 | # If tokens is 0 but we have text, estimate based on text length
132 | text = result_item.get("text", "").strip()
133 | if tokens == 0 and text:
134 | # Rough estimate: ~1.3 tokens per word plus some for punctuation
135 | tokens = len(text.split()) + len(text) // 10
136 | # Update the result item for cost tracking
137 | result_item["tokens"]["output"] = tokens
138 | result_item["tokens"]["total"] = tokens + result_item["tokens"]["input"]
139 |
140 | tokens_per_second = tokens / time_s if time_s > 0 and tokens > 0 else 0
141 | cost = result_item.get("cost", 0.0)
142 |
143 | stats_line = (
144 | f"Time: [yellow]{time_s:.2f}s[/yellow] | "
145 | f"Tokens: [cyan]{tokens}[/cyan] | "
146 | f"Speed: [blue]{tokens_per_second:.1f} tok/s[/blue] | "
147 | f"Cost: [green]${cost:.6f}[/green]"
148 | )
149 |
150 | console.print(Panel(
151 | escape(text),
152 | title=f"[bold magenta]{escape(model)}[/bold magenta]",
153 | subtitle=stats_line,
154 | border_style="blue",
155 | expand=False
156 | ))
157 | console.print()
158 |
159 | except Exception as e:
160 | logger.error(f"Error in model comparison: {str(e)}", emoji_key="error", exc_info=True)
161 | finally:
162 | # Ensure resources are cleaned up
163 | if hasattr(gateway, 'shutdown'):
164 | await gateway.shutdown()
165 |
166 |
167 | async def demonstrate_system_prompt(tracker: CostTracker):
168 | """Demonstrate Ollama with system prompts."""
169 | console.print(Rule("[bold blue]🦙 Ollama System Prompt Demonstration[/bold blue]"))
170 | logger.info("Demonstrating Ollama with system prompts", emoji_key="start")
171 |
172 | # Create Gateway instance - this handles provider initialization
173 | gateway = Gateway("ollama-demo", register_tools=False)
174 |
175 | try:
176 | # Initialize providers
177 | logger.info("Initializing providers...", emoji_key="provider")
178 | await gateway._initialize_providers()
179 |
180 | provider_name = Provider.OLLAMA.value
181 | try:
182 | # Get the provider from the gateway
183 | provider = gateway.providers.get(provider_name)
184 | if not provider:
185 | logger.error(f"Provider {provider_name} not available or initialized", emoji_key="error")
186 | return
187 |
188 | # Use mix_77/gemma3-qat-tools:27b (ensure it's available)
189 | model = "mix_77/gemma3-qat-tools:27b"
190 | available_models = await provider.list_models()
191 | model_names = [m["id"] for m in available_models]
192 |
193 | if model not in model_names:
194 | logger.warning(f"Model {model} not available, please run 'ollama pull mix_77/gemma3-qat-tools:27b'", emoji_key="warning")
195 | console.print("[yellow]Model not found. Please run 'ollama pull mix_77/gemma3-qat-tools:27b' to download it.[/yellow]")
196 | return
197 |
198 | logger.info(f"Using model: {model}", emoji_key="model")
199 |
200 | system_prompt = """
201 | You are a helpful assistant with expertise in physics.
202 | Keep all explanations accurate but very concise.
203 | Always provide real-world examples to illustrate concepts.
204 | """
205 | user_prompt = "Explain the concept of gravity."
206 |
207 | logger.info("Generating completion with system prompt", emoji_key="processing")
208 |
209 | result = await provider.generate_completion(
210 | prompt=user_prompt,
211 | model=model,
212 | temperature=0.7,
213 | system=system_prompt,
214 | max_tokens=1000 # Increased max_tokens
215 | )
216 |
217 | # Track the cost
218 | tracker.add_call(result)
219 |
220 | logger.success("Completion with system prompt successful", emoji_key="success")
221 |
222 | # Display result using Rich Panels
223 | console.print(Panel(
224 | escape(system_prompt.strip()),
225 | title="[bold cyan]System Prompt[/bold cyan]",
226 | border_style="dim cyan",
227 | expand=False
228 | ))
229 | console.print(Panel(
230 | escape(user_prompt.strip()),
231 | title="[bold yellow]User Prompt[/bold yellow]",
232 | border_style="dim yellow",
233 | expand=False
234 | ))
235 | console.print(Panel(
236 | escape(result.text.strip()),
237 | title="[bold green]Ollama Response[/bold green]",
238 | border_style="green",
239 | expand=False
240 | ))
241 |
242 | # Display stats in a small table
243 | stats_table = Table(title="Execution Stats", show_header=False, box=box.MINIMAL, expand=False)
244 | stats_table.add_column("Metric", style="cyan")
245 | stats_table.add_column("Value", style="white")
246 | stats_table.add_row("Input Tokens", str(result.input_tokens))
247 | stats_table.add_row("Output Tokens", str(result.output_tokens))
248 | stats_table.add_row("Cost", f"${result.cost:.6f}")
249 | stats_table.add_row("Processing Time", f"{result.processing_time:.3f}s")
250 | console.print(stats_table)
251 | console.print()
252 |
253 | except Exception as e:
254 | logger.error(f"Error in system prompt demonstration: {str(e)}", emoji_key="error", exc_info=True)
255 | # Optionally re-raise or handle differently
256 | finally:
257 | # Ensure resources are cleaned up
258 | if hasattr(gateway, 'shutdown'):
259 | await gateway.shutdown()
260 |
261 |
262 | async def demonstrate_streaming(tracker: CostTracker):
263 | """Demonstrate Ollama streaming capabilities."""
264 | console.print(Rule("[bold blue]🦙 Ollama Streaming Demonstration[/bold blue]"))
265 | logger.info("Demonstrating Ollama streaming capabilities", emoji_key="start")
266 |
267 | # Create Gateway instance - this handles provider initialization
268 | gateway = Gateway("ollama-demo", register_tools=False)
269 |
270 | try:
271 | # Initialize providers
272 | logger.info("Initializing providers...", emoji_key="provider")
273 | await gateway._initialize_providers()
274 |
275 | provider_name = Provider.OLLAMA.value
276 | try:
277 | # Get the provider from the gateway
278 | provider = gateway.providers.get(provider_name)
279 | if not provider:
280 | logger.error(f"Provider {provider_name} not available or initialized", emoji_key="error")
281 | return
282 |
283 | # Use any available Ollama model
284 | model = provider.get_default_model()
285 | logger.info(f"Using model: {model}", emoji_key="model")
286 |
287 | prompt = "Write a short poem about programming with AI"
288 | console.print(Panel(
289 | escape(prompt.strip()),
290 | title="[bold yellow]Prompt[/bold yellow]",
291 | border_style="dim yellow",
292 | expand=False
293 | ))
294 |
295 | logger.info("Generating streaming completion", emoji_key="processing")
296 |
297 | console.print("[bold green]Streaming response:[/bold green]")
298 |
299 | # Stream completion and display tokens as they arrive
300 | output_text = ""
301 | token_count = 0
302 | start_time = time.time()
303 |
304 | stream = provider.generate_completion_stream(
305 | prompt=prompt,
306 | model=model,
307 | temperature=0.7,
308 | max_tokens=200
309 | )
310 |
311 | final_metadata = None
312 | async for chunk, metadata in stream:
313 | # Display the streaming chunk
314 | console.print(chunk, end="", highlight=False)
315 | output_text += chunk
316 | token_count += 1
317 | final_metadata = metadata
318 |
319 | # Newline after streaming is complete
320 | console.print()
321 | console.print()
322 |
323 | processing_time = time.time() - start_time
324 |
325 | if final_metadata:
326 | # Track the cost at the end
327 | if tracker:
328 | # Create a simple object with the necessary attributes for the tracker
329 | class StreamingCall:
330 | def __init__(self, metadata):
331 | self.model = metadata.get("model", "")
332 | self.provider = metadata.get("provider", "")
333 | self.input_tokens = metadata.get("input_tokens", 0)
334 | self.output_tokens = metadata.get("output_tokens", 0)
335 | self.total_tokens = metadata.get("total_tokens", 0)
336 | self.cost = metadata.get("cost", 0.0)
337 | self.processing_time = metadata.get("processing_time", 0.0)
338 |
339 | tracker.add_call(StreamingCall(final_metadata))
340 |
341 | # Display stats
342 | stats_table = Table(title="Streaming Stats", show_header=False, box=box.MINIMAL, expand=False)
343 | stats_table.add_column("Metric", style="cyan")
344 | stats_table.add_column("Value", style="white")
345 | stats_table.add_row("Input Tokens", str(final_metadata.get("input_tokens", 0)))
346 | stats_table.add_row("Output Tokens", str(final_metadata.get("output_tokens", 0)))
347 | stats_table.add_row("Cost", f"${final_metadata.get('cost', 0.0):.6f}")
348 | stats_table.add_row("Processing Time", f"{processing_time:.3f}s")
349 | stats_table.add_row("Tokens per Second", f"{final_metadata.get('output_tokens', 0) / processing_time if processing_time > 0 else 0:.1f}")
350 | console.print(stats_table)
351 |
352 | logger.success("Streaming demonstration completed", emoji_key="success")
353 |
354 | except Exception as e:
355 | logger.error(f"Error in streaming demonstration: {str(e)}", emoji_key="error", exc_info=True)
356 | finally:
357 | # Ensure resources are cleaned up
358 | if hasattr(gateway, 'shutdown'):
359 | await gateway.shutdown()
360 |
361 |
362 | async def explore_ollama_models():
363 | """Display available Ollama models."""
364 | console.print(Rule("[bold cyan]🦙 Available Ollama Models[/bold cyan]"))
365 |
366 | # Create Gateway instance - this handles provider initialization
367 | gateway = Gateway("ollama-demo", register_tools=False)
368 |
369 | try:
370 | # Initialize providers
371 | logger.info("Initializing providers...", emoji_key="provider")
372 | await gateway._initialize_providers()
373 |
374 | # Get provider from the gateway
375 | provider = gateway.providers.get(Provider.OLLAMA.value)
376 | if not provider:
377 | logger.error(f"Provider {Provider.OLLAMA.value} not available or initialized", emoji_key="error")
378 | console.print("[red]Ollama provider not available. Make sure Ollama is installed and running on your machine.[/red]")
379 | console.print("[yellow]Visit https://ollama.com/download for installation instructions.[/yellow]")
380 | return
381 |
382 | # Get list of available models
383 | try:
384 | models = await provider.list_models()
385 |
386 | if not models:
387 | console.print("[yellow]No Ollama models found. Use 'ollama pull MODEL' to download models.[/yellow]")
388 | console.print("Example: [green]ollama pull mix_77/gemma3-qat-tools:27b[/green]")
389 | return
390 |
391 | # Create a table to display model information
392 | table = Table(title="Local Ollama Models")
393 | table.add_column("Model ID", style="cyan")
394 | table.add_column("Description", style="green")
395 | table.add_column("Size", style="yellow")
396 |
397 | for model in models:
398 | # Extract size from description if available
399 | size_str = "Unknown"
400 | description = model.get("description", "")
401 |
402 | # Check if size information is in the description (format: "... (X.XX GB)")
403 | import re
404 | size_match = re.search(r'\((\d+\.\d+) GB\)', description)
405 | if size_match:
406 | size_gb = float(size_match.group(1))
407 | size_str = f"{size_gb:.2f} GB"
408 |
409 | table.add_row(
410 | model["id"],
411 | description,
412 | size_str
413 | )
414 |
415 | console.print(table)
416 | console.print("\n[dim]Note: To add more models, use 'ollama pull MODEL_NAME'[/dim]")
417 | except Exception as e:
418 | logger.error(f"Error listing Ollama models: {str(e)}", emoji_key="error")
419 | console.print(f"[red]Failed to list Ollama models: {str(e)}[/red]")
420 | console.print("[yellow]Make sure Ollama is installed and running on your machine.[/yellow]")
421 | console.print("[yellow]Visit https://ollama.com/download for installation instructions.[/yellow]")
422 | finally:
423 | # Ensure resources are cleaned up
424 | if hasattr(gateway, 'shutdown'):
425 | await gateway.shutdown()
426 |
427 |
428 | async def main():
429 | """Run Ollama integration examples."""
430 | console.print(Panel(
431 | "[bold]This demonstration shows how to use Ollama with the Ultimate MCP Server.[/bold]\n"
432 | "Ollama allows you to run LLMs locally on your own machine without sending data to external services.\n\n"
433 | "[yellow]Make sure you have Ollama installed and running:[/yellow]\n"
434 | "- Download from [link]https://ollama.com/download[/link]\n"
435 | "- Pull models with [green]ollama pull mix_77/gemma3-qat-tools:27b[/green] or similar commands",
436 | title="[bold blue]🦙 Ollama Integration Demo[/bold blue]",
437 | border_style="blue",
438 | expand=False
439 | ))
440 |
441 | tracker = CostTracker() # Instantiate tracker here
442 | try:
443 | # First show available models
444 | await explore_ollama_models()
445 |
446 | console.print() # Add space between sections
447 |
448 | # Run model comparison
449 | await compare_ollama_models(tracker) # Pass tracker
450 |
451 | console.print() # Add space between sections
452 |
453 | # Run system prompt demonstration
454 | await demonstrate_system_prompt(tracker) # Pass tracker
455 |
456 | console.print() # Add space between sections
457 |
458 | # Run streaming demonstration
459 | await demonstrate_streaming(tracker) # Pass tracker
460 |
461 | # Display final summary
462 | tracker.display_summary(console) # Display summary at the end
463 |
464 | except Exception as e:
465 | logger.critical(f"Example failed: {str(e)}", emoji_key="critical", exc_info=True)
466 | return 1
467 | finally:
468 | # Clean up any remaining aiohttp resources by forcing garbage collection
469 | import gc
470 | gc.collect()
471 |
472 | logger.success("Ollama Integration Demo Finished Successfully!", emoji_key="complete")
473 | return 0
474 |
475 |
476 | if __name__ == "__main__":
477 | exit_code = asyncio.run(main())
478 | sys.exit(exit_code)
```
--------------------------------------------------------------------------------
/ultimate_mcp_server/tools/pyodide_boot_template.html:
--------------------------------------------------------------------------------
```html
1 | <!DOCTYPE html>
2 | <html lang="en">
3 | <head>
4 | <meta charset="utf-8" />
5 | <meta name="viewport" content="width=device-width,initial-scale=1" />
6 | <meta http-equiv="Referrer-Policy" content="strict-origin-when-cross-origin" />
7 | <title>Pyodide Sandbox v__PYODIDE_VERSION__ (Full)</title>
8 | <script>
9 | /* SessionStorage stub */
10 | try { if(typeof window!=='undefined' && typeof window.sessionStorage!=='undefined'){void window.sessionStorage;}else{throw new Error();} }
11 | catch(_){ if(typeof window!=='undefined'){Object.defineProperty(window, "sessionStorage", {value:{getItem:()=>null,setItem:()=>{},removeItem:()=>{},clear:()=>{},key:()=>null,length:0},configurable:true,writable:false});console.warn("sessionStorage stubbed.");} }
12 | </script>
13 | <style>
14 | /* Basic styles */
15 | body { font-family: system-ui, sans-serif; margin: 15px; background-color: #f8f9fa; color: #212529; }
16 | .status { padding: 10px 15px; border: 1px solid #dee2e6; margin-bottom: 15px; border-radius: 0.25rem; font-size: 0.9em; }
17 | .status-loading { background-color: #e9ecef; border-color: #ced4da; color: #495057; }
18 | .status-ready { background-color: #d1e7dd; border-color: #a3cfbb; color: #0a3622;}
19 | .status-error { background-color: #f8d7da; border-color: #f5c2c7; color: #842029; font-weight: bold; }
20 | </style>
21 | </head>
22 | <body>
23 |
24 | <div id="status-indicator" class="status status-loading">Initializing Sandbox...</div>
25 |
26 | <script type="module">
27 | // --- Constants ---
28 | const CDN_BASE = "__CDN_BASE__";
29 | const PYODIDE_VERSION = "__PYODIDE_VERSION__";
30 | // *** Use the constant for packages loaded AT STARTUP ***
31 | const CORE_PACKAGES_JSON = '__CORE_PACKAGES_JSON__';
32 | const MEM_LIMIT_MB = parseInt("__MEM_LIMIT_MB__", 10);
33 | const logPrefix = "[PyodideFull]"; // Updated prefix
34 |
35 | // --- DOM Element ---
36 | const statusIndicator = document.getElementById('status-indicator');
37 |
38 | // --- Performance & Heap Helpers ---
39 | const perf = (typeof performance !== 'undefined') ? performance : { now: () => Date.now() };
40 | const now = () => perf.now();
41 | const heapMB = () => (performance?.memory?.usedJSHeapSize ?? 0) / 1048576;
42 |
43 | // --- Safe Proxy Destruction Helper ---
44 | function safeDestroy(proxy, name, handlerLogPrefix) {
45 | try {
46 | if (proxy && typeof proxy === 'object' && typeof proxy.destroy === 'function') {
47 | const proxyId = proxy.toString ? proxy.toString() : '(proxy)';
48 | console.log(`${handlerLogPrefix} Destroying ${name} proxy: ${proxyId}`);
49 | proxy.destroy();
50 | console.log(`${handlerLogPrefix} ${name} proxy destroyed.`);
51 | return true;
52 | }
53 | } catch (e) { console.warn(`${handlerLogPrefix} Error destroying ${name}:`, e?.message || e); }
54 | return false;
55 | }
56 |
57 | // --- Python Runner Code Template (Reads code from its own global scope) ---
58 | // This should be the same simple runner that worked before
59 | const pythonRunnerTemplate = `
60 | import sys, io, contextlib, traceback, time, base64, json
61 | print(f"[PyRunnerFull] Starting execution...")
62 | _stdout = io.StringIO(); _stderr = io.StringIO()
63 | result_value = None; error_info = None; execution_ok = False; elapsed_ms = 0.0; user_code = None
64 | try:
65 | # Get code from the global scope *this script is run in*
66 | print("[PyRunnerFull] Getting code from _USER_CODE_TO_EXEC...")
67 | user_code = globals().get("_USER_CODE_TO_EXEC") # Read code set by JS onto the *passed* globals proxy
68 | if user_code is None: raise ValueError("_USER_CODE_TO_EXEC global not found in execution scope.")
69 | print(f"[PyRunnerFull] Code retrieved ({len(user_code)} chars). Ready to execute.")
70 | start_time = time.time()
71 | print(f"[PyRunnerFull] Executing user code...")
72 | try:
73 | with contextlib.redirect_stdout(_stdout), contextlib.redirect_stderr(_stderr):
74 | compiled_code = compile(source=user_code, filename='<user_code>', mode='exec')
75 | exec(compiled_code, globals()) # Execute in the provided globals
76 | if 'result' in globals(): result_value = globals()['result']
77 | execution_ok = True
78 | print("[PyRunnerFull] User code execution finished successfully.")
79 | except Exception as e:
80 | exc_type=type(e).__name__; exc_msg=str(e); tb_str=traceback.format_exc()
81 | error_info = {'type':exc_type, 'message':exc_msg, 'traceback':tb_str}
82 | print(f"[PyRunnerFull] User code execution failed: {exc_type}: {exc_msg}\\n{tb_str}")
83 | finally: elapsed_ms = (time.time() - start_time) * 1000 if 'start_time' in locals() else 0
84 | print(f"[PyRunnerFull] Exec phase took: {elapsed_ms:.1f}ms")
85 | except Exception as outer_err:
86 | tb_str = traceback.format_exc(); error_info = {'type': type(outer_err).__name__, 'message': str(outer_err), 'traceback': tb_str}
87 | print(f"[PyRunnerFull] Setup/GetCode Error: {outer_err}\\n{tb_str}")
88 | execution_ok = False
89 | payload_dict = {'ok':execution_ok,'stdout':_stdout.getvalue(),'stderr':_stderr.getvalue(),'elapsed':elapsed_ms,'result':result_value,'error':error_info}
90 | print("[PyRunnerFull] Returning payload dictionary.")
91 | payload_dict # Return value
92 | `; // End of pythonRunnerTemplate
93 |
94 | // --- Main Async IIFE ---
95 | (async () => {
96 | let BOOT_MS = 0; let t0 = 0;
97 | let corePackagesToLoad = []; // Store parsed core packages list
98 | try {
99 | // === Step 1: Prepare for Load ===
100 | t0 = now(); console.log(`${logPrefix} Boot script starting at t0=${t0}`);
101 | statusIndicator.textContent = `Importing Pyodide v${PYODIDE_VERSION}...`;
102 |
103 | // Parse CORE packages list BEFORE calling loadPyodide
104 | try {
105 | corePackagesToLoad = JSON.parse(CORE_PACKAGES_JSON);
106 | if (!Array.isArray(corePackagesToLoad)) { corePackagesToLoad = []; }
107 | console.log(`${logPrefix} Core packages requested for init:`, corePackagesToLoad.length > 0 ? corePackagesToLoad : '(none)');
108 | } catch (parseErr) {
109 | console.error(`${logPrefix} Error parsing core packages JSON:`, parseErr);
110 | statusIndicator.textContent += ' (Error parsing core package list!)';
111 | corePackagesToLoad = [];
112 | }
113 |
114 | // === Step 2: Load Pyodide (with packages option) ===
115 | const { loadPyodide } = await import(`${CDN_BASE}/pyodide.mjs`);
116 | console.log(`${logPrefix} Calling loadPyodide with core packages...`);
117 | statusIndicator.textContent = `Loading Pyodide runtime & core packages...`;
118 |
119 | window.pyodide = await loadPyodide({
120 | indexURL: `${CDN_BASE}/`,
121 | packages: corePackagesToLoad // Load core packages during initialization
122 | });
123 | const pyodide = window.pyodide;
124 | console.log(`${logPrefix} Pyodide core and initial packages loaded. Version: ${pyodide.version}`);
125 | statusIndicator.textContent = 'Pyodide core & packages loaded.';
126 |
127 | // === Step 3: Verify Loaded Packages (Optional Debugging) ===
128 | if (corePackagesToLoad.length > 0) {
129 | const loaded = pyodide.loadedPackages ? Object.keys(pyodide.loadedPackages) : [];
130 | console.log(`${logPrefix} Currently loaded packages:`, loaded);
131 | corePackagesToLoad.forEach(pkg => {
132 | if (!loaded.includes(pkg)) {
133 | console.warn(`${logPrefix} Core package '${pkg}' requested but not loaded! Check CDN/package name.`);
134 | statusIndicator.textContent += ` (Warn: ${pkg} failed load)`;
135 | }
136 | });
137 | }
138 |
139 | BOOT_MS = now() - t0;
140 | console.log(`${logPrefix} Pyodide setup complete in ${BOOT_MS.toFixed(0)}ms. Heap: ${heapMB().toFixed(1)} MB`);
141 | statusIndicator.textContent = `Pyodide Ready (${BOOT_MS.toFixed(0)}ms). Awaiting commands...`;
142 | statusIndicator.className = 'status status-ready';
143 |
144 |
145 | Object.freeze(corePackagesToLoad);
146 | Object.freeze(statusIndicator); // prevents accidental re-assign
147 |
148 | // ================== Main Message Handler ==================
149 | console.log(`${logPrefix} Setting up main message listener...`);
150 | window.addEventListener("message", async (ev) => {
151 | const msg = ev.data;
152 | if (typeof msg !== 'object' || msg === null || !msg.id) { return; }
153 | const handlerLogPrefix = `${logPrefix}[Handler id:${msg.id}]`;
154 | console.log(`${handlerLogPrefix} Received: type=${msg.type}`);
155 | const wall0 = now();
156 |
157 | const reply = { id: msg.id, ok: false, stdout:'', stderr:'', result:null, elapsed:0, wall_ms:0, error:null };
158 | let pyResultProxy = null;
159 | let namespaceProxy = null; // Holds the target execution scope
160 | let micropipProxy = null;
161 | let persistentReplProxyToDestroy = null;
162 |
163 | try { // Outer try for message handling
164 | if (!window.pyodide) { throw new Error('Pyodide instance lost!'); }
165 | const pyodide = window.pyodide;
166 |
167 | // === Handle Reset Message ===
168 | if (msg.type === "reset") {
169 | console.log(`${handlerLogPrefix} Reset request received.`);
170 | try {
171 | if (pyodide.globals.has("_MCP_REPL_NS")) {
172 | console.log(`${handlerLogPrefix} Found _MCP_REPL_NS, attempting deletion.`);
173 | persistentReplProxyToDestroy = pyodide.globals.get("_MCP_REPL_NS");
174 | pyodide.globals.delete("_MCP_REPL_NS");
175 | reply.cleared = true; console.log(`${handlerLogPrefix} _MCP_REPL_NS deleted.`);
176 | } else {
177 | reply.cleared = false; console.log(`${handlerLogPrefix} No _MCP_REPL_NS found.`);
178 | }
179 | reply.ok = true;
180 | } catch (err) {
181 | console.error(`${handlerLogPrefix} Error during reset operation:`, err);
182 | reply.ok = false; reply.error = { type: err.name || 'ResetError', message: `Reset failed: ${err.message || err}`, traceback: err.stack };
183 | } finally {
184 | safeDestroy(persistentReplProxyToDestroy, "Persistent REPL (on reset)", handlerLogPrefix);
185 | console.log(`${handlerLogPrefix} Delivering reset response via callback (ok=${reply.ok})`);
186 | if(typeof window._deliverReplyToHost === 'function') { window._deliverReplyToHost(reply); }
187 | else { console.error(`${handlerLogPrefix} Host callback _deliverReplyToHost not found!`); }
188 | }
189 | return; // Exit handler
190 | } // End Reset
191 |
192 | // === Ignore Non-Exec ===
193 | if (msg.type !== "exec") { console.log(`${handlerLogPrefix} Ignoring non-exec type: ${msg.type}`); return; }
194 |
195 | // ================== Handle Exec Message ==================
196 | console.log(`${handlerLogPrefix} Processing exec request (repl=${msg.repl_mode})`);
197 |
198 | /* === Step 1: Load *Additional* Packages/Wheels === */
199 | // Filter out packages already loaded during init
200 | const currentlyLoaded = pyodide.loadedPackages ? Object.keys(pyodide.loadedPackages) : [];
201 | const additionalPackagesToLoad = msg.packages?.filter(p => !currentlyLoaded.includes(p)) || [];
202 | if (additionalPackagesToLoad.length > 0) {
203 | const pkgs = additionalPackagesToLoad.join(", ");
204 | console.log(`${handlerLogPrefix} Loading additional packages: ${pkgs}`);
205 | await pyodide.loadPackage(additionalPackagesToLoad).catch(err => {
206 | throw new Error(`Additional package loading failed: ${pkgs} - ${err?.message || err}`);
207 | });
208 | console.log(`${handlerLogPrefix} Additional packages loaded: ${pkgs}`);
209 | }
210 |
211 | // Load wheels (ensure micropip is available)
212 | if (msg.wheels?.length) {
213 | const whls = msg.wheels.join(", ");
214 | console.log(`${handlerLogPrefix} Loading wheels: ${whls}`);
215 | // Check if micropip needs loading (it might be a core package now)
216 | if (!pyodide.loadedPackages || !pyodide.loadedPackages['micropip']) {
217 | console.log(`${handlerLogPrefix} Loading micropip for wheels...`);
218 | await pyodide.loadPackage("micropip").catch(err => {
219 | throw new Error(`Failed to load micropip: ${err?.message || err}`);
220 | });
221 | }
222 | micropipProxy = pyodide.pyimport("micropip");
223 | console.log(`${handlerLogPrefix} Installing wheels via micropip...`);
224 | for (const whl of msg.wheels) {
225 | console.log(`${handlerLogPrefix} Installing wheel: ${whl}`);
226 | await micropipProxy.install(whl).catch(err => {
227 | let pyError = ""; if (err instanceof pyodide.ffi.PythonError) pyError = `${err.type}: `;
228 | throw new Error(`Wheel install failed for ${whl}: ${pyError}${err?.message || err}`);
229 | });
230 | console.log(`${handlerLogPrefix} Wheel installed: ${whl}`);
231 | }
232 | // Micropip proxy destroyed in finally
233 | }
234 |
235 | /* === Step 2: Prepare Namespace Proxy (REPL aware) === */
236 | if (msg.repl_mode) {
237 | if (pyodide.globals.has("_MCP_REPL_NS")) {
238 | console.log(`${handlerLogPrefix} Reusing persistent REPL namespace.`);
239 | namespaceProxy = pyodide.globals.get("_MCP_REPL_NS");
240 | if (!namespaceProxy || typeof namespaceProxy.set !== 'function') {
241 | console.warn(`${handlerLogPrefix} REPL namespace invalid. Resetting.`);
242 | safeDestroy(namespaceProxy, "Invalid REPL", handlerLogPrefix);
243 | namespaceProxy = pyodide.toPy({'__name__': '__main__'});
244 | pyodide.globals.set("_MCP_REPL_NS", namespaceProxy);
245 | }
246 | } else {
247 | console.log(`${handlerLogPrefix} Initializing new persistent REPL namespace.`);
248 | namespaceProxy = pyodide.toPy({'__name__': '__main__'});
249 | pyodide.globals.set("_MCP_REPL_NS", namespaceProxy);
250 | }
251 | } else {
252 | console.log(`${handlerLogPrefix} Creating fresh temporary namespace.`);
253 | namespaceProxy = pyodide.toPy({'__name__': '__main__'});
254 | }
255 | if (!namespaceProxy || typeof namespaceProxy.set !== 'function') { // Final check
256 | throw new Error("Failed to obtain valid namespace proxy.");
257 | }
258 |
259 | /* === Step 3: Prepare and Set User Code INTO Namespace Proxy === */
260 | let userCode = '';
261 | try {
262 | if (typeof msg.code_b64 !== 'string' || msg.code_b64 === '') throw new Error("Missing/empty code_b64");
263 | userCode = atob(msg.code_b64); // JS base64 decode
264 | } catch (decodeErr) { throw new Error(`Base64 decode failed: ${decodeErr.message}`); }
265 | console.log(`${handlerLogPrefix} Setting _USER_CODE_TO_EXEC on target namespace proxy...`);
266 | namespaceProxy.set("_USER_CODE_TO_EXEC", userCode); // Set ON THE TARGET PROXY
267 |
268 | /* === Step 4: Execute Runner === */
269 | console.log(`${handlerLogPrefix} Executing Python runner...`);
270 | // Pass the namespaceProxy (which now contains the code) as globals
271 | pyResultProxy = await pyodide.runPythonAsync(pythonRunnerTemplate, { globals: namespaceProxy });
272 | // Cleanup the code variable from the namespaceProxy afterwards
273 | console.log(`${handlerLogPrefix} Deleting _USER_CODE_TO_EXEC from namespace proxy...`);
274 | if (namespaceProxy.has && namespaceProxy.has("_USER_CODE_TO_EXEC")) { // Check if method exists
275 | namespaceProxy.delete("_USER_CODE_TO_EXEC");
276 | } else { console.warn(`${handlerLogPrefix} Could not check/delete _USER_CODE_TO_EXEC from namespace.`); }
277 | reply.wall_ms = now() - wall0;
278 | console.log(`${handlerLogPrefix} Python runner finished. Wall: ${reply.wall_ms.toFixed(0)}ms`);
279 |
280 | /* === Step 5: Process Result Proxy === */
281 | if (!pyResultProxy || typeof pyResultProxy.toJs !== 'function') { throw new Error(`Runner returned invalid result.`); }
282 | console.log(`${handlerLogPrefix} Converting Python result payload...`);
283 | let jsResultPayload = pyResultProxy.toJs({ dict_converter: Object.fromEntries });
284 | Object.assign(reply, jsResultPayload); // Merge python results
285 | reply.ok = jsResultPayload.ok;
286 |
287 | } catch (err) { // Catch ANY error during the process
288 | reply.wall_ms = now() - wall0;
289 | console.error(`${handlerLogPrefix} *** ERROR DURING EXECUTION PROCESS ***:`, err);
290 | reply.ok = false;
291 | reply.error = { type: err.name || 'JavaScriptError', message: err.message || String(err), traceback: err.stack || null };
292 | if (err instanceof pyodide.ffi.PythonError) { reply.error.type = err.type || 'PythonError'; }
293 | reply.stdout = reply.stdout || ''; reply.stderr = reply.stderr || ''; reply.result = reply.result || null; reply.elapsed = reply.elapsed || 0;
294 | } finally {
295 | /* === Step 6: Cleanup Proxies === */
296 | console.log(`${handlerLogPrefix} Entering finally block for cleanup.`);
297 | safeDestroy(pyResultProxy, "Result", handlerLogPrefix);
298 | safeDestroy(micropipProxy, "Micropip", handlerLogPrefix);
299 | // Only destroy namespace if it was temporary (non-REPL)
300 | if (!msg?.repl_mode) { // Use optional chaining
301 | safeDestroy(namespaceProxy, "Temporary Namespace", handlerLogPrefix);
302 | } else {
303 | console.log(`${handlerLogPrefix} Skipping destruction of persistent REPL namespace proxy.`);
304 | }
305 | console.log(`${handlerLogPrefix} Cleanup finished.`);
306 |
307 | /* === Step 7: Send Reply via Exposed Function === */
308 | console.log(`${handlerLogPrefix} *** Delivering final response via exposed function *** (ok=${reply.ok})`);
309 | console.log(`${handlerLogPrefix} Reply payload:`, JSON.stringify(reply, null, 2));
310 | try {
311 | if (typeof window._deliverReplyToHost === 'function') { window._deliverReplyToHost(reply); console.log(`${handlerLogPrefix} Reply delivered.`); }
312 | else { console.error(`${handlerLogPrefix} !!! ERROR: Host function _deliverReplyToHost not found!`); }
313 | } catch (deliveryErr) { console.error(`${handlerLogPrefix} !!! FAILED TO DELIVER REPLY !!!`, deliveryErr); }
314 | } // End finally block
315 |
316 | /* Step 8: Heap Watchdog */
317 | const currentHeapMB = heapMB();
318 | if (Number.isFinite(currentHeapMB) && currentHeapMB > MEM_LIMIT_MB) {
319 | console.warn(`${handlerLogPrefix}[WATCHDOG] Heap ${currentHeapMB.toFixed(0)}MB > limit ${MEM_LIMIT_MB}MB. Closing!`);
320 | statusIndicator.textContent = `Heap limit exceeded (${currentHeapMB.toFixed(0)}MB). Closing...`;
321 | statusIndicator.className = 'status status-error';
322 | setTimeout(() => window.close(), 200);
323 | }
324 |
325 | }); // End message listener
326 |
327 | console.log(`${logPrefix} Main message listener active.`);
328 | window.postMessage({ ready: true, boot_ms: BOOT_MS, id: "pyodide_ready" });
329 | console.log(`${logPrefix} Full Sandbox Ready.`);
330 |
331 | } catch (err) { // Catch Initialization Errors
332 | const initErrorMsg = `FATAL: Pyodide init failed: ${err.message || err}`;
333 | console.error(`${logPrefix} ${initErrorMsg}`, err.stack || '(no stack)', err);
334 | statusIndicator.textContent = initErrorMsg; statusIndicator.className = 'status status-error';
335 | try { window.postMessage({ id: "pyodide_init_error", ok: false, error: { type: err.name || 'InitError', message: initErrorMsg, traceback: err.stack || null } });
336 | } catch (postErr) { console.error(`${logPrefix} Failed to post init error:`, postErr); }
337 | }
338 | })(); // End main IIFE
339 | </script>
340 |
341 | </body>
342 | </html>
```
--------------------------------------------------------------------------------
/ultimate_mcp_server/clients/rag_client.py:
--------------------------------------------------------------------------------
```python
1 | """High-level client for RAG (Retrieval-Augmented Generation) operations."""
2 |
3 | from typing import Any, Dict, List, Optional
4 |
5 | from ultimate_mcp_server.services.knowledge_base import (
6 | get_knowledge_base_manager,
7 | get_knowledge_base_retriever,
8 | get_rag_service,
9 | )
10 | from ultimate_mcp_server.utils import get_logger
11 |
12 | logger = get_logger("ultimate_mcp_server.clients.rag")
13 |
14 | class RAGClient:
15 | """
16 | High-level client for Retrieval-Augmented Generation (RAG) operations.
17 |
18 | The RAGClient provides a simplified, unified interface for building and using
19 | RAG systems within the MCP ecosystem. It encapsulates all the key operations in
20 | the RAG workflow, from knowledge base creation and document ingestion to context
21 | retrieval and LLM-augmented generation.
22 |
23 | RAG is a technique that enhances LLM capabilities by retrieving relevant information
24 | from external knowledge bases before generation, allowing models to access information
25 | beyond their training data and produce more accurate, up-to-date responses.
26 |
27 | Key capabilities:
28 | - Knowledge base creation and management
29 | - Document ingestion with automatic chunking
30 | - Semantic retrieval of relevant context
31 | - LLM generation with retrieved context
32 | - Various retrieval methods (vector, hybrid, keyword)
33 |
34 | Architecture:
35 | The client follows a modular architecture with three main components:
36 | 1. Knowledge Base Manager: Handles creation, deletion, and document ingestion
37 | 2. Knowledge Base Retriever: Responsible for context retrieval using various methods
38 | 3. RAG Service: Combines retrieval with LLM generation for complete RAG workflow
39 |
40 | Performance Considerations:
41 | - Document chunking size affects both retrieval quality and storage requirements
42 | - Retrieval method selection impacts accuracy vs. speed tradeoffs:
43 | * Vector search: Fast with good semantic understanding but may miss keyword matches
44 | * Keyword search: Good for exact matches but misses semantic similarities
45 | * Hybrid search: Most comprehensive but computationally more expensive
46 | - Top-k parameter balances between providing sufficient context and relevance dilution
47 | - Different LLM models may require different prompt templates for optimal performance
48 |
49 | This client abstracts away the complexity of the underlying vector stores,
50 | embeddings, and retrieval mechanisms, providing a simple API for RAG operations.
51 |
52 | Example usage:
53 | ```python
54 | # Create a RAG client
55 | client = RAGClient()
56 |
57 | # Create a knowledge base
58 | await client.create_knowledge_base(
59 | "company_docs",
60 | "Company documentation and policies"
61 | )
62 |
63 | # Add documents to the knowledge base with metadata
64 | await client.add_documents(
65 | "company_docs",
66 | documents=[
67 | "Our return policy allows returns within 30 days of purchase with receipt.",
68 | "Product warranties cover manufacturing defects for a period of one year."
69 | ],
70 | metadatas=[
71 | {"source": "policies/returns.pdf", "page": 1, "department": "customer_service"},
72 | {"source": "policies/warranty.pdf", "page": 3, "department": "legal"}
73 | ],
74 | chunk_size=500,
75 | chunk_method="semantic"
76 | )
77 |
78 | # Retrieve context without generation (for inspection or custom handling)
79 | context = await client.retrieve(
80 | "company_docs",
81 | query="What is our return policy?",
82 | top_k=3,
83 | retrieval_method="hybrid"
84 | )
85 |
86 | # Print retrieved context and sources
87 | for i, (doc, meta) in enumerate(zip(context["documents"], context["metadatas"])):
88 | print(f"Source: {meta.get('source', 'unknown')} | Score: {context['distances'][i]:.3f}")
89 | print(f"Content: {doc[:100]}...\n")
90 |
91 | # Generate a response using RAG with specific provider and template
92 | result = await client.generate_with_rag(
93 | "company_docs",
94 | query="Explain our warranty coverage for electronics",
95 | provider="openai",
96 | model="gpt-4",
97 | template="customer_service_response",
98 | temperature=0.3,
99 | retrieval_method="hybrid"
100 | )
101 |
102 | print(result["response"])
103 | print("\nSources:")
104 | for source in result.get("sources", []):
105 | print(f"- {source['metadata'].get('source')}")
106 | ```
107 | """
108 |
109 | def __init__(self):
110 | """Initialize the RAG client."""
111 | self.kb_manager = get_knowledge_base_manager()
112 | self.kb_retriever = get_knowledge_base_retriever()
113 | self.rag_service = get_rag_service()
114 |
115 | async def create_knowledge_base(
116 | self,
117 | name: str,
118 | description: Optional[str] = None,
119 | overwrite: bool = False
120 | ) -> Dict[str, Any]:
121 | """Create a knowledge base.
122 |
123 | Args:
124 | name: The name of the knowledge base
125 | description: Optional description
126 | overwrite: Whether to overwrite an existing KB with the same name
127 |
128 | Returns:
129 | Result of the operation
130 | """
131 | logger.info(f"Creating knowledge base: {name}", emoji_key="processing")
132 |
133 | try:
134 | result = await self.kb_manager.create_knowledge_base(
135 | name=name,
136 | description=description,
137 | overwrite=overwrite
138 | )
139 |
140 | logger.success(f"Knowledge base created: {name}", emoji_key="success")
141 | return result
142 | except Exception as e:
143 | logger.error(f"Failed to create knowledge base: {str(e)}", emoji_key="error")
144 | raise
145 |
146 | async def add_documents(
147 | self,
148 | knowledge_base_name: str,
149 | documents: List[str],
150 | metadatas: Optional[List[Dict[str, Any]]] = None,
151 | chunk_size: int = 1000,
152 | chunk_method: str = "semantic"
153 | ) -> Dict[str, Any]:
154 | """Add documents to the knowledge base.
155 |
156 | Args:
157 | knowledge_base_name: Name of the knowledge base to add to
158 | documents: List of document texts
159 | metadatas: Optional list of metadata dictionaries
160 | chunk_size: Size of chunks to split documents into
161 | chunk_method: Method to use for chunking ('simple', 'semantic', etc.)
162 |
163 | Returns:
164 | Result of the operation
165 | """
166 | logger.info(f"Adding documents to knowledge base: {knowledge_base_name}", emoji_key="processing")
167 |
168 | try:
169 | result = await self.kb_manager.add_documents(
170 | knowledge_base_name=knowledge_base_name,
171 | documents=documents,
172 | metadatas=metadatas,
173 | chunk_size=chunk_size,
174 | chunk_method=chunk_method
175 | )
176 |
177 | added_count = result.get("added_count", 0)
178 | logger.success(f"Added {added_count} documents to knowledge base", emoji_key="success")
179 | return result
180 | except Exception as e:
181 | logger.error(f"Failed to add documents: {str(e)}", emoji_key="error")
182 | raise
183 |
184 | async def list_knowledge_bases(self) -> List[Any]:
185 | """List all knowledge bases.
186 |
187 | Returns:
188 | List of knowledge base information
189 | """
190 | logger.info("Retrieving list of knowledge bases", emoji_key="processing")
191 |
192 | try:
193 | knowledge_bases = await self.kb_manager.list_knowledge_bases()
194 | return knowledge_bases
195 | except Exception as e:
196 | logger.error(f"Failed to list knowledge bases: {str(e)}", emoji_key="error")
197 | raise
198 |
199 | async def retrieve(
200 | self,
201 | knowledge_base_name: str,
202 | query: str,
203 | top_k: int = 3,
204 | retrieval_method: str = "vector"
205 | ) -> Dict[str, Any]:
206 | """
207 | Retrieve relevant documents from a knowledge base for a given query.
208 |
209 | This method performs the retrieval stage of RAG, finding the most relevant
210 | documents in the knowledge base based on the query. The method is useful
211 | for standalone retrieval operations or when you want to examine retrieved
212 | context before generating a response.
213 |
214 | The retrieval process works by:
215 | 1. Converting the query into a vector representation (embedding)
216 | 2. Finding documents with similar embeddings and/or matching keywords
217 | 3. Ranking and returning the most relevant documents
218 |
219 | Available retrieval methods:
220 | - "vector": Embedding-based similarity search using cosine distance
221 | Best for conceptual/semantic queries where exact wording may differ
222 | - "keyword": Traditional text search using BM25 or similar algorithms
223 | Best for queries with specific terms that must be matched
224 | - "hybrid": Combines vector and keyword approaches with a weighted blend
225 | Good general-purpose approach that balances semantic and keyword matching
226 | - "rerank": Two-stage retrieval that first gets candidates, then reranks them
227 | More computationally intensive but often more accurate
228 |
229 | Optimization strategies:
230 | - For factual queries with specific terminology, use "keyword" or "hybrid"
231 | - For conceptual or paraphrased queries, use "vector"
232 | - For highest accuracy at cost of performance, use "rerank"
233 | - Adjust top_k based on document length; shorter documents may need higher top_k
234 | - Pre-filter by metadata before retrieval when targeting specific sections
235 |
236 | Understanding retrieval metrics:
237 | - "distances" represent similarity scores where lower values indicate higher similarity
238 | for vector search, and higher values indicate better matches for keyword search
239 | - Scores are normalized differently between retrieval methods, so direct
240 | comparison between methods is not meaningful
241 | - Score thresholds for "good matches" vary based on embedding model and content domain
242 |
243 | Args:
244 | knowledge_base_name: Name of the knowledge base to search
245 | query: The search query (question or keywords)
246 | top_k: Maximum number of documents to retrieve (default: 3)
247 | retrieval_method: Method to use for retrieval ("vector", "keyword", "hybrid", "rerank")
248 |
249 | Returns:
250 | Dictionary containing:
251 | - "documents": List of retrieved documents (text chunks)
252 | - "metadatas": List of metadata for each document (source info, etc.)
253 | - "distances": List of similarity scores or relevance metrics
254 | (interpretation depends on retrieval_method)
255 | - "query": The original query
256 | - "retrieval_method": The method used for retrieval
257 | - "processing_time_ms": Time taken for retrieval in milliseconds
258 |
259 | Raises:
260 | ValueError: If knowledge_base_name doesn't exist or query is invalid
261 | Exception: If the retrieval process fails
262 |
263 | Example:
264 | ```python
265 | # Retrieve context about product returns
266 | results = await rag_client.retrieve(
267 | knowledge_base_name="support_docs",
268 | query="How do I return a product?",
269 | top_k=5,
270 | retrieval_method="hybrid"
271 | )
272 |
273 | # Check if we got high-quality matches
274 | if results["documents"] and min(results["distances"]) < 0.3: # Good match threshold
275 | print("Found relevant information!")
276 | else:
277 | print("No strong matches found, consider reformulating the query")
278 |
279 | # Display retrieved documents and their sources with scores
280 | for i, (doc, meta) in enumerate(zip(results["documents"], results["metadatas"])):
281 | score = results["distances"][i]
282 | score_indicator = "🟢" if score < 0.3 else "🟡" if score < 0.6 else "🔴"
283 | print(f"{score_indicator} Result {i+1} (score: {score:.3f}):")
284 | print(f"Source: {meta.get('source', 'unknown')}")
285 | print(doc[:100] + "...\n")
286 | ```
287 | """
288 | logger.info(f"Retrieving context for query: '{query}'", emoji_key="processing")
289 |
290 | try:
291 | results = await self.kb_retriever.retrieve(
292 | knowledge_base_name=knowledge_base_name,
293 | query=query,
294 | top_k=top_k,
295 | retrieval_method=retrieval_method
296 | )
297 |
298 | return results
299 | except Exception as e:
300 | logger.error(f"Failed to retrieve context: {str(e)}", emoji_key="error")
301 | raise
302 |
303 | async def generate_with_rag(
304 | self,
305 | knowledge_base_name: str,
306 | query: str,
307 | provider: str = "gemini",
308 | model: Optional[str] = None,
309 | template: str = "rag_default",
310 | temperature: float = 0.3,
311 | top_k: int = 3,
312 | retrieval_method: str = "hybrid",
313 | include_sources: bool = True
314 | ) -> Dict[str, Any]:
315 | """
316 | Generate a response using Retrieval-Augmented Generation (RAG).
317 |
318 | This method performs the complete RAG process:
319 | 1. Retrieves relevant documents from the specified knowledge base based on the query
320 | 2. Constructs a prompt that includes both the query and retrieved context
321 | 3. Sends the augmented prompt to the LLM to generate a response
322 | 4. Optionally includes source information for transparency and citation
323 |
324 | The retrieval process can use different methods:
325 | - "vector": Pure semantic/embedding-based similarity search (good for conceptual queries)
326 | - "keyword": Traditional keyword-based search (good for specific terms or phrases)
327 | - "hybrid": Combines vector and keyword approaches (good general-purpose approach)
328 | - "rerank": Uses a two-stage approach with retrieval and reranking
329 |
330 | Args:
331 | knowledge_base_name: Name of the knowledge base to query for relevant context
332 | query: The user's question or request
333 | provider: LLM provider to use for generation (e.g., "openai", "anthropic", "gemini")
334 | model: Specific model to use (if None, uses provider's default)
335 | template: Prompt template name that defines how to format the RAG prompt
336 | Different templates can be optimized for different use cases
337 | temperature: Sampling temperature for controlling randomness (0.0-1.0)
338 | Lower values recommended for factual RAG responses
339 | top_k: Number of relevant documents to retrieve and include in the context
340 | Higher values provide more context but may dilute relevance
341 | retrieval_method: The method to use for retrieving documents ("vector", "keyword", "hybrid")
342 | include_sources: Whether to include source information in the output for citations
343 |
344 | Returns:
345 | Dictionary containing:
346 | - "response": The generated text response from the LLM
347 | - "sources": List of source documents and their metadata (if include_sources=True)
348 | - "context": The retrieved context that was used for generation
349 | - "prompt": The full prompt that was sent to the LLM (useful for debugging)
350 | - "tokens": Token usage information (input, output, total)
351 | - "processing_time_ms": Total processing time in milliseconds
352 |
353 | Raises:
354 | ValueError: If knowledge_base_name or query is invalid
355 | Exception: If retrieval or generation fails
356 |
357 | Example:
358 | ```python
359 | # Generate a response using RAG with custom parameters
360 | result = await rag_client.generate_with_rag(
361 | knowledge_base_name="financial_docs",
362 | query="What were our Q3 earnings?",
363 | provider="openai",
364 | model="gpt-4",
365 | temperature=0.1,
366 | top_k=5,
367 | retrieval_method="hybrid"
368 | )
369 |
370 | # Access the response and sources
371 | print(result["response"])
372 | for src in result["sources"]:
373 | print(f"- {src['metadata']['source']} (relevance: {src['score']})")
374 | ```
375 | """
376 | logger.info(f"Generating RAG response for: '{query}'", emoji_key="processing")
377 |
378 | try:
379 | result = await self.rag_service.generate_with_rag(
380 | knowledge_base_name=knowledge_base_name,
381 | query=query,
382 | provider=provider,
383 | model=model,
384 | template=template,
385 | temperature=temperature,
386 | top_k=top_k,
387 | retrieval_method=retrieval_method,
388 | include_sources=include_sources
389 | )
390 |
391 | return result
392 | except Exception as e:
393 | logger.error(f"Failed to call RAG service: {str(e)}", emoji_key="error")
394 | raise
395 |
396 | async def delete_knowledge_base(self, name: str) -> Dict[str, Any]:
397 | """Delete a knowledge base.
398 |
399 | Args:
400 | name: Name of the knowledge base to delete
401 |
402 | Returns:
403 | Result of the operation
404 | """
405 | logger.info(f"Deleting knowledge base: {name}", emoji_key="processing")
406 |
407 | try:
408 | result = await self.kb_manager.delete_knowledge_base(name=name)
409 | logger.success(f"Knowledge base {name} deleted successfully", emoji_key="success")
410 | return result
411 | except Exception as e:
412 | logger.error(f"Failed to delete knowledge base: {str(e)}", emoji_key="error")
413 | raise
414 |
415 | async def reset_knowledge_base(self, knowledge_base_name: str) -> None:
416 | """
417 | Reset (delete and recreate) a knowledge base.
418 |
419 | This method completely removes an existing knowledge base and creates a new
420 | empty one with the same name. This is useful when you need to:
421 | - Remove all documents from a knowledge base efficiently
422 | - Fix a corrupted knowledge base
423 | - Change the underlying embedding model without renaming the knowledge base
424 | - Update document chunking strategy for an entire collection
425 | - Clear outdated information before a complete refresh
426 |
427 | Performance Considerations:
428 | - Resetting is significantly faster than removing documents individually
429 | - For large knowledge bases, resetting and bulk re-adding documents can be
430 | orders of magnitude more efficient than incremental updates
431 | - New documents added after reset will use any updated embedding models or
432 | chunking strategies configured in the system
433 |
434 | Data Integrity:
435 | - This operation preserves knowledge base configuration but removes all content
436 | - The knowledge base name and any associated permissions remain intact
437 | - Custom configuration settings on the knowledge base will be preserved
438 | if the knowledge base service supports configuration persistence
439 |
440 | WARNING: This operation is irreversible. All documents and their embeddings
441 | will be permanently deleted. Consider backing up important data before resetting.
442 |
443 | Disaster Recovery:
444 | Before resetting a production knowledge base, consider these strategies:
445 | 1. Create a backup by exporting documents and metadata if the feature is available
446 | 2. Maintain source documents in original form outside the knowledge base
447 | 3. Document the ingestion pipeline to reproduce the knowledge base if needed
448 | 4. Consider creating a temporary duplicate before resetting critical knowledge bases
449 |
450 | The reset process:
451 | 1. Deletes the entire knowledge base collection/index
452 | 2. Creates a new empty knowledge base with the same name
453 | 3. Re-initializes any associated metadata or settings
454 |
455 | Args:
456 | knowledge_base_name: Name of the knowledge base to reset
457 |
458 | Returns:
459 | None
460 |
461 | Raises:
462 | ValueError: If the knowledge base doesn't exist
463 | Exception: If deletion or recreation fails
464 |
465 | Example:
466 | ```python
467 | # Backup critical metadata before reset (if needed)
468 | kb_info = await rag_client.list_knowledge_bases()
469 | kb_config = next((kb for kb in kb_info if kb['name'] == 'product_documentation'), None)
470 |
471 | # Reset a knowledge base that may have outdated or corrupted data
472 | await rag_client.reset_knowledge_base("product_documentation")
473 |
474 | # After resetting, re-add documents with potentially improved chunking strategy
475 | await rag_client.add_documents(
476 | knowledge_base_name="product_documentation",
477 | documents=updated_docs,
478 | metadatas=updated_metadatas,
479 | chunk_size=800, # Updated chunk size better suited for the content
480 | chunk_method="semantic" # Using semantic chunking for better results
481 | )
482 | ```
483 | """
484 | logger.info(f"Deleting knowledge base: {knowledge_base_name}", emoji_key="processing")
485 |
486 | try:
487 | result = await self.kb_manager.delete_knowledge_base(name=knowledge_base_name)
488 | logger.success(f"Knowledge base {knowledge_base_name} deleted successfully", emoji_key="success")
489 | return result
490 | except Exception as e:
491 | logger.error(f"Failed to delete knowledge base: {str(e)}", emoji_key="error")
492 | raise
```