This is page 1 of 4. Use http://codebase.md/disler/just-prompt?page={x} to view the full context. # Directory Structure ``` ├── .claude │ ├── commands │ │ ├── context_prime_eza.md │ │ ├── context_prime_w_lead.md │ │ ├── context_prime.md │ │ ├── jprompt_ultra_diff_review.md │ │ ├── project_hello_w_name.md │ │ └── project_hello.md │ └── settings.json ├── .env.sample ├── .gitignore ├── .mcp.json ├── .python-version ├── ai_docs │ ├── extending_thinking_sonny.md │ ├── google-genai-api-update.md │ ├── llm_providers_details.xml │ ├── openai-reasoning-effort.md │ └── pocket-pick-mcp-server-example.xml ├── example_outputs │ ├── countdown_component │ │ ├── countdown_component_groq_qwen-qwq-32b.md │ │ ├── countdown_component_o_gpt-4.5-preview.md │ │ ├── countdown_component_openai_o3-mini.md │ │ ├── countdown_component_q_deepseek-r1-distill-llama-70b-specdec.md │ │ └── diff.md │ └── decision_openai_vs_anthropic_vs_google │ ├── ceo_decision.md │ ├── ceo_medium_decision_openai_vs_anthropic_vs_google_anthropic_claude-3-7-sonnet-20250219_4k.md │ ├── ceo_medium_decision_openai_vs_anthropic_vs_google_gemini_gemini-2.5-flash-preview-04-17.md │ ├── ceo_medium_decision_openai_vs_anthropic_vs_google_gemini_gemini-2.5-pro-preview-03-25.md │ ├── ceo_medium_decision_openai_vs_anthropic_vs_google_openai_o3_high.md │ ├── ceo_medium_decision_openai_vs_anthropic_vs_google_openai_o4-mini_high.md │ └── ceo_prompt.xml ├── images │ ├── just-prompt-logo.png │ └── o3-as-a-ceo.png ├── list_models.py ├── prompts │ ├── ceo_medium_decision_openai_vs_anthropic_vs_google.txt │ ├── ceo_small_decision_python_vs_typescript.txt │ ├── ceo_small_decision_rust_vs_prompt_eng.txt │ ├── countdown_component.txt │ ├── mock_bin_search.txt │ └── mock_ui_component.txt ├── pyproject.toml ├── README.md ├── specs │ ├── gemini-2-5-flash-reasoning.md │ ├── init-just-prompt.md │ ├── new-tool-llm-as-a-ceo.md │ ├── oai-reasoning-levels.md │ └── prompt_from_file_to_file_w_context.md ├── src │ └── just_prompt │ ├── __init__.py │ ├── __main__.py │ ├── atoms │ │ ├── __init__.py │ │ ├── llm_providers │ │ │ ├── __init__.py │ │ │ ├── anthropic.py │ │ │ ├── deepseek.py │ │ │ ├── gemini.py │ │ │ ├── groq.py │ │ │ ├── ollama.py │ │ │ └── openai.py │ │ └── shared │ │ ├── __init__.py │ │ ├── data_types.py │ │ ├── model_router.py │ │ ├── utils.py │ │ └── validator.py │ ├── molecules │ │ ├── __init__.py │ │ ├── ceo_and_board_prompt.py │ │ ├── list_models.py │ │ ├── list_providers.py │ │ ├── prompt_from_file_to_file.py │ │ ├── prompt_from_file.py │ │ └── prompt.py │ ├── server.py │ └── tests │ ├── __init__.py │ ├── atoms │ │ ├── __init__.py │ │ ├── llm_providers │ │ │ ├── __init__.py │ │ │ ├── test_anthropic.py │ │ │ ├── test_deepseek.py │ │ │ ├── test_gemini.py │ │ │ ├── test_groq.py │ │ │ ├── test_ollama.py │ │ │ └── test_openai.py │ │ └── shared │ │ ├── __init__.py │ │ ├── test_model_router.py │ │ ├── test_utils.py │ │ └── test_validator.py │ └── molecules │ ├── __init__.py │ ├── test_ceo_and_board_prompt.py │ ├── test_list_models.py │ ├── test_list_providers.py │ ├── test_prompt_from_file_to_file.py │ ├── test_prompt_from_file.py │ └── test_prompt.py ├── ultra_diff_review │ ├── diff_anthropic_claude-3-7-sonnet-20250219_4k.md │ ├── diff_gemini_gemini-2.0-flash-thinking-exp.md │ ├── diff_openai_o3-mini.md │ └── fusion_ultra_diff_review.md └── uv.lock ``` # Files -------------------------------------------------------------------------------- /.python-version: -------------------------------------------------------------------------------- ``` 3.12 ``` -------------------------------------------------------------------------------- /.env.sample: -------------------------------------------------------------------------------- ``` # Environment Variables for just-prompt # OpenAI API Key OPENAI_API_KEY=your_openai_api_key_here # Anthropic API Key ANTHROPIC_API_KEY=your_anthropic_api_key_here # Gemini API Key GEMINI_API_KEY=your_gemini_api_key_here # Groq API Key GROQ_API_KEY=your_groq_api_key_here # DeepSeek API Key DEEPSEEK_API_KEY=your_deepseek_api_key_here # Ollama endpoint (if not default) OLLAMA_HOST=http://localhost:11434 ``` -------------------------------------------------------------------------------- /.mcp.json: -------------------------------------------------------------------------------- ```json { "mcpServers": { "just-prompt": { "type": "stdio", "command": "uv", "args": [ "--directory", ".", "run", "just-prompt", "--default-models", "openai:gpt-5:high,openai:gpt-5-mini:high,openai:gpt-5-nano:high,openai:o3:high,anthropic:claude-opus-4-1-20250805,anthropic:claude-opus-4-20250514,anthropic:claude-sonnet-4-20250514,gemini:gemini-2.5-pro,gemini:gemini-2.5-flash" ], "env": {} } } } ``` -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- ``` # Python-generated files __pycache__/ *.py[oc] build/ dist/ wheels/ *.egg-info # Virtual environments .venv .env # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] *$py.class # Distribution / packaging dist/ build/ *.egg-info/ *.egg # Unit test / coverage reports htmlcov/ .tox/ .nox/ .coverage .coverage.* .cache nosetests.xml coverage.xml *.cover .hypothesis/ .pytest_cache/ # Jupyter Notebook .ipynb_checkpoints # Environments .env .venv env/ venv/ ENV/ env.bak/ venv.bak/ # mypy .mypy_cache/ .dmypy.json dmypy.json # IDE specific files .idea/ .vscode/ *.swp *.swo .DS_Store prompts/responses .aider* focus_output/ # Git worktrees trees/ ``` -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- ```markdown # Just Prompt - A lightweight MCP server for LLM providers `just-prompt` is a Model Control Protocol (MCP) server that provides a unified interface to various Large Language Model (LLM) providers including OpenAI, Anthropic, Google Gemini, Groq, DeepSeek, and Ollama. See how we use the `ceo_and_board` tool to make [hard decisions easy with o3 here](https://youtu.be/LEMLntjfihA). <img src="images/just-prompt-logo.png" alt="Just Prompt Logo" width="700" height="auto"> <img src="images/o3-as-a-ceo.png" alt="Just Prompt Logo" width="700" height="auto"> ## Tools The following MCP tools are available in the server: - **`prompt`**: Send a prompt to multiple LLM models - Parameters: - `text`: The prompt text - `models_prefixed_by_provider` (optional): List of models with provider prefixes. If not provided, uses default models. - **`prompt_from_file`**: Send a prompt from a file to multiple LLM models - Parameters: - `abs_file_path`: Absolute path to the file containing the prompt (must be an absolute path, not relative) - `models_prefixed_by_provider` (optional): List of models with provider prefixes. If not provided, uses default models. - **`prompt_from_file_to_file`**: Send a prompt from a file to multiple LLM models and save responses as markdown files - Parameters: - `abs_file_path`: Absolute path to the file containing the prompt (must be an absolute path, not relative) - `models_prefixed_by_provider` (optional): List of models with provider prefixes. If not provided, uses default models. - `abs_output_dir` (default: "."): Absolute directory path to save the response markdown files to (must be an absolute path, not relative) - **`ceo_and_board`**: Send a prompt to multiple 'board member' models and have a 'CEO' model make a decision based on their responses - Parameters: - `abs_file_path`: Absolute path to the file containing the prompt (must be an absolute path, not relative) - `models_prefixed_by_provider` (optional): List of models with provider prefixes to act as board members. If not provided, uses default models. - `abs_output_dir` (default: "."): Absolute directory path to save the response files and CEO decision (must be an absolute path, not relative) - `ceo_model` (default: "openai:o3"): Model to use for the CEO decision in format "provider:model" - **`list_providers`**: List all available LLM providers - Parameters: None - **`list_models`**: List all available models for a specific LLM provider - Parameters: - `provider`: Provider to list models for (e.g., 'openai' or 'o') ## Provider Prefixes > every model must be prefixed with the provider name > > use the short name for faster referencing - `o` or `openai`: OpenAI - `o:gpt-4o-mini` - `openai:gpt-4o-mini` - `a` or `anthropic`: Anthropic - `a:claude-3-5-haiku` - `anthropic:claude-3-5-haiku` - `g` or `gemini`: Google Gemini - `g:gemini-2.5-pro-exp-03-25` - `gemini:gemini-2.5-pro-exp-03-25` - `q` or `groq`: Groq - `q:llama-3.1-70b-versatile` - `groq:llama-3.1-70b-versatile` - `d` or `deepseek`: DeepSeek - `d:deepseek-coder` - `deepseek:deepseek-coder` - `l` or `ollama`: Ollama - `l:llama3.1` - `ollama:llama3.1` ## Features - Unified API for multiple LLM providers - Support for text prompts from strings or files - Run multiple models in parallel - Automatic model name correction using the first model in the `--default-models` list - Ability to save responses to files - Easy listing of available providers and models ## Installation ```bash # Clone the repository git clone https://github.com/yourusername/just-prompt.git cd just-prompt # Install with pip uv sync ``` ### Environment Variables Create a `.env` file with your API keys (you can copy the `.env.sample` file): ```bash cp .env.sample .env ``` Then edit the `.env` file to add your API keys (or export them in your shell): ``` OPENAI_API_KEY=your_openai_api_key_here ANTHROPIC_API_KEY=your_anthropic_api_key_here GEMINI_API_KEY=your_gemini_api_key_here GROQ_API_KEY=your_groq_api_key_here DEEPSEEK_API_KEY=your_deepseek_api_key_here OLLAMA_HOST=http://localhost:11434 ``` ## Claude Code Installation > In all these examples, replace the directory with the path to the just-prompt directory. Default models set to `openai:o3:high`, `openai:o4-mini:high`, `anthropic:claude-opus-4-20250514`, `anthropic:claude-sonnet-4-20250514`, `gemini:gemini-2.5-pro-preview-03-25`, and `gemini:gemini-2.5-flash-preview-04-17`. If you use Claude Code right out of the repository you can see in the .mcp.json file we set the default models to... ``` { "mcpServers": { "just-prompt": { "type": "stdio", "command": "uv", "args": [ "--directory", ".", "run", "just-prompt", "--default-models", "openai:o3:high,openai:o4-mini:high,anthropic:claude-opus-4-20250514,anthropic:claude-sonnet-4-20250514,gemini:gemini-2.5-pro-preview-03-25,gemini:gemini-2.5-flash-preview-04-17" ], "env": {} } } } ``` The `--default-models` parameter sets the models to use when none are explicitly provided to the API endpoints. The first model in the list is also used for model name correction when needed. This can be a list of models separated by commas. When starting the server, it will automatically check which API keys are available in your environment and inform you which providers you can use. If a key is missing, the provider will be listed as unavailable, but the server will still start and can be used with the providers that are available. ### Using `mcp add-json` Copy this and paste it into claude code with BUT don't run until you copy the json ``` claude mcp add just-prompt "$(pbpaste)" ``` JSON to copy ``` { "command": "uv", "args": ["--directory", ".", "run", "just-prompt"] } ``` With a custom default model set to `openai:gpt-4o`. ``` { "command": "uv", "args": ["--directory", ".", "run", "just-prompt", "--default-models", "openai:gpt-4o"] } ``` With multiple default models: ``` { "command": "uv", "args": ["--directory", ".", "run", "just-prompt", "--default-models", "openai:o3:high,openai:o4-mini:high,anthropic:claude-opus-4-20250514,anthropic:claude-sonnet-4-20250514,gemini:gemini-2.5-pro-preview-03-25,gemini:gemini-2.5-flash-preview-04-17"] } ``` ### Using `mcp add` with project scope ```bash # With default models claude mcp add just-prompt -s project \ -- \ uv --directory . \ run just-prompt # With custom default model claude mcp add just-prompt -s project \ -- \ uv --directory . \ run just-prompt --default-models "openai:gpt-4o" # With multiple default models claude mcp add just-prompt -s user \ -- \ uv --directory . \ run just-prompt --default-models "openai:o3:high,openai:o4-mini:high,anthropic:claude-opus-4-20250514,anthropic:claude-sonnet-4-20250514,gemini:gemini-2.5-pro-preview-03-25,gemini:gemini-2.5-flash-preview-04-17" ``` ## `mcp remove` claude mcp remove just-prompt ## Running Tests ```bash uv run pytest ``` ## Codebase Structure ``` . ├── ai_docs/ # Documentation for AI model details │ ├── extending_thinking_sonny.md │ ├── llm_providers_details.xml │ ├── openai-reasoning-effort.md │ └── pocket-pick-mcp-server-example.xml ├── example_outputs/ # Example outputs from different models ├── list_models.py # Script to list available LLM models ├── prompts/ # Example prompt files ├── pyproject.toml # Python project configuration ├── specs/ # Project specifications │ ├── init-just-prompt.md │ ├── new-tool-llm-as-a-ceo.md │ └── oai-reasoning-levels.md ├── src/ # Source code directory │ └── just_prompt/ │ ├── __init__.py │ ├── __main__.py │ ├── atoms/ # Core components │ │ ├── llm_providers/ # Individual provider implementations │ │ │ ├── anthropic.py │ │ │ ├── deepseek.py │ │ │ ├── gemini.py │ │ │ ├── groq.py │ │ │ ├── ollama.py │ │ │ └── openai.py │ │ └── shared/ # Shared utilities and data types │ │ ├── data_types.py │ │ ├── model_router.py │ │ ├── utils.py │ │ └── validator.py │ ├── molecules/ # Higher-level functionality │ │ ├── ceo_and_board_prompt.py │ │ ├── list_models.py │ │ ├── list_providers.py │ │ ├── prompt.py │ │ ├── prompt_from_file.py │ │ └── prompt_from_file_to_file.py │ ├── server.py # MCP server implementation │ └── tests/ # Test directory │ ├── atoms/ # Tests for atoms │ │ ├── llm_providers/ │ │ └── shared/ │ └── molecules/ # Tests for molecules │ ├── test_ceo_and_board_prompt.py │ ├── test_list_models.py │ ├── test_list_providers.py │ ├── test_prompt.py │ ├── test_prompt_from_file.py │ └── test_prompt_from_file_to_file.py └── ultra_diff_review/ # Diff review outputs ``` ## Context Priming READ README.md, pyproject.toml, then run git ls-files, and 'eza --git-ignore --tree' to understand the context of the project. # Reasoning Effort with OpenAI o‑Series For OpenAI o‑series reasoning models (`o4-mini`, `o3-mini`, `o3`) you can control how much *internal* reasoning the model performs before producing a visible answer. Append one of the following suffixes to the model name (after the *provider* prefix): * `:low` – minimal internal reasoning (faster, cheaper) * `:medium` – balanced (default if omitted) * `:high` – thorough reasoning (slower, more tokens) Examples: * `openai:o4-mini:low` * `o:o4-mini:high` When a reasoning suffix is present, **just‑prompt** automatically switches to the OpenAI *Responses* API (when available) and sets the corresponding `reasoning.effort` parameter. If the installed OpenAI SDK is older, it gracefully falls back to the Chat Completions endpoint and embeds an internal system instruction to approximate the requested effort level. # Thinking Tokens with Claude The Anthropic Claude models `claude-opus-4-20250514` and `claude-sonnet-4-20250514` support extended thinking capabilities using thinking tokens. This allows Claude to do more thorough thought processes before answering. You can enable thinking tokens by adding a suffix to the model name in this format: - `anthropic:claude-opus-4-20250514:1k` - Use 1024 thinking tokens for Opus 4 - `anthropic:claude-sonnet-4-20250514:4k` - Use 4096 thinking tokens for Sonnet 4 - `anthropic:claude-opus-4-20250514:8000` - Use 8000 thinking tokens for Opus 4 Notes: - Thinking tokens are supported for `claude-opus-4-20250514`, `claude-sonnet-4-20250514`, and `claude-3-7-sonnet-20250219` models - Valid thinking token budgets range from 1024 to 16000 - Values outside this range will be automatically adjusted to be within range - You can specify the budget with k notation (1k, 4k, etc.) or with exact numbers (1024, 4096, etc.) # Thinking Budget with Gemini The Google Gemini model `gemini-2.5-flash-preview-04-17` supports extended thinking capabilities using thinking budget. This allows Gemini to perform more thorough reasoning before providing a response. You can enable thinking budget by adding a suffix to the model name in this format: - `gemini:gemini-2.5-flash-preview-04-17:1k` - Use 1024 thinking budget - `gemini:gemini-2.5-flash-preview-04-17:4k` - Use 4096 thinking budget - `gemini:gemini-2.5-flash-preview-04-17:8000` - Use 8000 thinking budget Notes: - Thinking budget is only supported for the `gemini-2.5-flash-preview-04-17` model - Valid thinking budget range from 0 to 24576 - Values outside this range will be automatically adjusted to be within range - You can specify the budget with k notation (1k, 4k, etc.) or with exact numbers (1024, 4096, etc.) ## Resources - https://docs.anthropic.com/en/api/models-list?q=list+models - https://github.com/googleapis/python-genai - https://platform.openai.com/docs/api-reference/models/list - https://api-docs.deepseek.com/api/list-models - https://github.com/ollama/ollama-python - https://github.com/openai/openai-python ## Master AI Coding Learn to code with AI with foundational [Principles of AI Coding](https://agenticengineer.com/principled-ai-coding?y=jprompt) Follow the [IndyDevDan youtube channel](https://www.youtube.com/@indydevdan) for more AI coding tips and tricks. ``` -------------------------------------------------------------------------------- /.claude/commands/project_hello.md: -------------------------------------------------------------------------------- ```markdown hi how are you ``` -------------------------------------------------------------------------------- /src/just_prompt/tests/__init__.py: -------------------------------------------------------------------------------- ```python # Tests package ``` -------------------------------------------------------------------------------- /src/just_prompt/tests/atoms/__init__.py: -------------------------------------------------------------------------------- ```python # Atoms tests package ``` -------------------------------------------------------------------------------- /src/just_prompt/tests/atoms/shared/__init__.py: -------------------------------------------------------------------------------- ```python # Shared tests package ``` -------------------------------------------------------------------------------- /.claude/commands/project_hello_w_name.md: -------------------------------------------------------------------------------- ```markdown hi how are you $ARGUMENTS ``` -------------------------------------------------------------------------------- /src/just_prompt/tests/molecules/__init__.py: -------------------------------------------------------------------------------- ```python # Molecules tests package ``` -------------------------------------------------------------------------------- /src/just_prompt/tests/atoms/llm_providers/__init__.py: -------------------------------------------------------------------------------- ```python # LLM Providers tests package ``` -------------------------------------------------------------------------------- /src/just_prompt/atoms/__init__.py: -------------------------------------------------------------------------------- ```python # Atoms package - basic building blocks ``` -------------------------------------------------------------------------------- /src/just_prompt/atoms/shared/__init__.py: -------------------------------------------------------------------------------- ```python # Shared package - common utilities and data types ``` -------------------------------------------------------------------------------- /src/just_prompt/atoms/llm_providers/__init__.py: -------------------------------------------------------------------------------- ```python # LLM Providers package - interfaces for various LLM APIs ``` -------------------------------------------------------------------------------- /src/just_prompt/molecules/__init__.py: -------------------------------------------------------------------------------- ```python # Molecules package - higher-level functionality built from atoms ``` -------------------------------------------------------------------------------- /.claude/commands/context_prime.md: -------------------------------------------------------------------------------- ```markdown READ README.md, THEN run git ls-files to understand the context of the project. ``` -------------------------------------------------------------------------------- /prompts/mock_bin_search.txt: -------------------------------------------------------------------------------- ``` python: return code exclusively: def binary_search(arr, target) -> Optional[int]: ``` -------------------------------------------------------------------------------- /.claude/commands/context_prime_eza.md: -------------------------------------------------------------------------------- ```markdown READ README.md, THEN run eza . --git-ignore --tree to understand the context of the project. ``` -------------------------------------------------------------------------------- /src/just_prompt/__init__.py: -------------------------------------------------------------------------------- ```python # just-prompt - A lightweight wrapper MCP server for various LLM providers __version__ = "0.1.0" ``` -------------------------------------------------------------------------------- /.claude/settings.json: -------------------------------------------------------------------------------- ```json { "permissions": { "allow": [ "Bash(npm run lint)", "Bash(npm run test:*)" ] } } ``` -------------------------------------------------------------------------------- /.claude/commands/context_prime_w_lead.md: -------------------------------------------------------------------------------- ```markdown READ README.md, THEN run git ls-files to understand the context of the project. Be sure to also READ: $ARGUMENTS and nothing else. ``` -------------------------------------------------------------------------------- /prompts/ceo_small_decision_rust_vs_prompt_eng.txt: -------------------------------------------------------------------------------- ``` <purpose> I want to decide if I should spend time learning Rust or Prompt Engineering. Help me decide between these two options. </purpose> <option-1> Rust </option-1> <option-2> Prompt Engineering </option-2> ``` -------------------------------------------------------------------------------- /prompts/ceo_small_decision_python_vs_typescript.txt: -------------------------------------------------------------------------------- ``` <purpose> I want to decide if I should spend time learning Python or TypeScript. Help me decide between these two options. Given that I want to train ai models and build a fullstack website to host them, which language should I use? </purpose> ``` -------------------------------------------------------------------------------- /ai_docs/extending_thinking_sonny.md: -------------------------------------------------------------------------------- ```markdown # Code snippet of using thinking tokens response = client.messages.create( model="claude-3-7-sonnet-20250219", max_tokens=8192, thinking={ "type": "enabled", "budget_tokens": 4000, }, messages=[{"role": "user", "content": args.prompt}], ) ``` -------------------------------------------------------------------------------- /prompts/mock_ui_component.txt: -------------------------------------------------------------------------------- ``` Build vue, react, and svelte components for this component definition: <TableOfContents :tree="tree" /> The tree is a json object that looks like this: ```json { "name": "TableOfContents", "children": [ { "name": "Item", "children": [ { "name": "Item", "children": [] } ] }, { "name": "Item 2", "children": [] } ] } ``` ``` -------------------------------------------------------------------------------- /src/just_prompt/molecules/list_models.py: -------------------------------------------------------------------------------- ```python """ List models functionality for just-prompt. """ from typing import List import logging from ..atoms.shared.validator import validate_provider from ..atoms.shared.model_router import ModelRouter logger = logging.getLogger(__name__) def list_models(provider: str) -> List[str]: """ List available models for a provider. Args: provider: Provider name (full or short) Returns: List of model names """ # Validate provider validate_provider(provider) # Get models from provider return ModelRouter.route_list_models(provider) ``` -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- ```toml [project] name = "just-prompt" version = "0.1.0" description = "A lightweight MCP server for various LLM providers" readme = "README.md" requires-python = ">=3.10" dependencies = [ "anthropic>=0.49.0", "google-genai>=1.22.0", "groq>=0.20.0", "ollama>=0.4.7", "openai>=1.68.0", "python-dotenv>=1.0.1", "pydantic>=2.0.0", "mcp>=0.1.5", ] [project.scripts] just-prompt = "just_prompt.__main__:main" [project.optional-dependencies] test = [ "pytest>=7.3.1", "pytest-asyncio>=0.20.3", ] [build-system] requires = ["setuptools>=61.0"] build-backend = "setuptools.build_meta" ``` -------------------------------------------------------------------------------- /src/just_prompt/molecules/list_providers.py: -------------------------------------------------------------------------------- ```python """ List providers functionality for just-prompt. """ from typing import List, Dict import logging from ..atoms.shared.data_types import ModelProviders logger = logging.getLogger(__name__) def list_providers() -> List[Dict[str, str]]: """ List all available providers with their full and short names. Returns: List of dictionaries with provider information """ providers = [] for provider in ModelProviders: providers.append({ "name": provider.name, "full_name": provider.full_name, "short_name": provider.short_name }) return providers ``` -------------------------------------------------------------------------------- /prompts/countdown_component.txt: -------------------------------------------------------------------------------- ``` Create a countdown timer component that satisfies these requirements: 1. Framework implementations: - Vue.js - Svelte - React - Vanilla JavaScript 2. Component interface: - :start-time: number (starting time in seconds) - :format: number (display format, 0 = MM:SS, 1 = HH:MM:SS) 3. Features: - Count down from start-time to zero - Display remaining time in specified format - Stop counting when reaching zero - Emit/callback 'finished' event when countdown completes - Provide a visual indication when time is running low (< 10% of total) 4. Include: - Component implementation - Sample usage - Clear comments explaining key parts Provide clean, well-structured code for each framework version. ``` -------------------------------------------------------------------------------- /src/just_prompt/tests/atoms/llm_providers/test_ollama.py: -------------------------------------------------------------------------------- ```python """ Tests for Ollama provider. """ import pytest import os from dotenv import load_dotenv from just_prompt.atoms.llm_providers import ollama # Load environment variables load_dotenv() def test_list_models(): """Test listing Ollama models.""" models = ollama.list_models() assert isinstance(models, list) assert isinstance(models[0], str) assert len(models) > 0 def test_prompt(): """Test sending prompt to Ollama.""" # Using llama3 as default model - adjust if needed based on your environment response = ollama.prompt("What is the capital of France?", "gemma3:12b") # Assertions assert isinstance(response, str) assert len(response) > 0 assert "paris" in response.lower() or "Paris" in response ``` -------------------------------------------------------------------------------- /src/just_prompt/tests/atoms/llm_providers/test_groq.py: -------------------------------------------------------------------------------- ```python """ Tests for Groq provider. """ import pytest import os from dotenv import load_dotenv from just_prompt.atoms.llm_providers import groq # Load environment variables load_dotenv() # Skip tests if API key not available if not os.environ.get("GROQ_API_KEY"): pytest.skip("Groq API key not available", allow_module_level=True) def test_list_models(): """Test listing Groq models.""" models = groq.list_models() assert isinstance(models, list) assert len(models) > 0 assert all(isinstance(model, str) for model in models) def test_prompt(): """Test sending prompt to Groq.""" response = groq.prompt("What is the capital of France?", "qwen-qwq-32b") assert isinstance(response, str) assert len(response) > 0 assert "paris" in response.lower() or "Paris" in response ``` -------------------------------------------------------------------------------- /src/just_prompt/tests/atoms/llm_providers/test_deepseek.py: -------------------------------------------------------------------------------- ```python """ Tests for DeepSeek provider. """ import pytest import os from dotenv import load_dotenv from just_prompt.atoms.llm_providers import deepseek # Load environment variables load_dotenv() # Skip tests if API key not available if not os.environ.get("DEEPSEEK_API_KEY"): pytest.skip("DeepSeek API key not available", allow_module_level=True) def test_list_models(): """Test listing DeepSeek models.""" models = deepseek.list_models() assert isinstance(models, list) assert len(models) > 0 assert all(isinstance(model, str) for model in models) def test_prompt(): """Test sending prompt to DeepSeek.""" response = deepseek.prompt("What is the capital of France?", "deepseek-coder") assert isinstance(response, str) assert len(response) > 0 assert "paris" in response.lower() or "Paris" in response ``` -------------------------------------------------------------------------------- /src/just_prompt/atoms/shared/data_types.py: -------------------------------------------------------------------------------- ```python """ Data types and models for just-prompt MCP server. """ from enum import Enum class ModelProviders(Enum): """ Enum of supported model providers with their full and short names. """ OPENAI = ("openai", "o") ANTHROPIC = ("anthropic", "a") GEMINI = ("gemini", "g") GROQ = ("groq", "q") DEEPSEEK = ("deepseek", "d") OLLAMA = ("ollama", "l") def __init__(self, full_name, short_name): self.full_name = full_name self.short_name = short_name @classmethod def from_name(cls, name): """ Get provider enum from full or short name. Args: name: The provider name (full or short) Returns: ModelProviders: The corresponding provider enum, or None if not found """ for provider in cls: if provider.full_name == name or provider.short_name == name: return provider return None ``` -------------------------------------------------------------------------------- /src/just_prompt/tests/molecules/test_prompt_from_file.py: -------------------------------------------------------------------------------- ```python """ Tests for prompt_from_file functionality. """ import pytest import os import tempfile from dotenv import load_dotenv from just_prompt.molecules.prompt_from_file import prompt_from_file # Load environment variables load_dotenv() def test_nonexistent_file(): """Test with non-existent file.""" with pytest.raises(FileNotFoundError): prompt_from_file("/non/existent/file.txt", ["o:gpt-4o-mini"]) def test_file_read(): """Test that the file is read correctly and processes with real API call.""" # Create temporary file with a simple question with tempfile.NamedTemporaryFile(mode='w+', delete=False) as temp: temp.write("What is the capital of France?") temp_path = temp.name try: # Make real API call response = prompt_from_file(temp_path, ["o:gpt-4o-mini"]) # Assertions assert isinstance(response, list) assert len(response) == 1 assert "paris" in response[0].lower() or "Paris" in response[0] finally: # Clean up os.unlink(temp_path) ``` -------------------------------------------------------------------------------- /.claude/commands/jprompt_ultra_diff_review.md: -------------------------------------------------------------------------------- ```markdown # Ultra Diff Review > Execute each task in the order given to conduct a thorough code review. ## Task 1: Create diff.txt Create a new file called diff.md. At the top of the file, add the following markdown: ```md # Code Review - Review the diff, report on issues, bugs, and improvements. - End with a concise markdown table of any issues found, their solutions, and a risk assessment for each issue if applicable. - Use emojis to convey the severity of each issue. ## Diff ``` ## Task 2: git diff and append Then run git diff and append the output to the file. ## Task 3: just-prompt multi-llm tool call Then use that file as the input to this just-prompt tool call. prompts_from_file_to_file( from_file = diff.md, models = "openai:o3-mini, anthropic:claude-3-7-sonnet-20250219:4k, gemini:gemini-2.0-flash-thinking-exp" output_dir = ultra_diff_review/ ) ## Task 4: Read the output files and synthesize Then read the output files and think hard to synthesize the results into a new single file called `ultra_diff_review/fusion_ultra_diff_review.md` following the original instructions plus any additional instructions or callouts you think are needed to create the best possible review. ## Task 5: Present the results Then let me know which issues you think are worth resolving and we'll proceed from there. ``` -------------------------------------------------------------------------------- /src/just_prompt/atoms/llm_providers/ollama.py: -------------------------------------------------------------------------------- ```python """ Ollama provider implementation. """ import os from typing import List import logging import ollama from dotenv import load_dotenv # Load environment variables load_dotenv() # Configure logging logger = logging.getLogger(__name__) def prompt(text: str, model: str) -> str: """ Send a prompt to Ollama and get a response. Args: text: The prompt text model: The model name Returns: Response string from the model """ try: logger.info(f"Sending prompt to Ollama model: {model}") # Create chat completion response = ollama.chat( model=model, messages=[ { "role": "user", "content": text, }, ], ) # Extract response content return response.message.content except Exception as e: logger.error(f"Error sending prompt to Ollama: {e}") raise ValueError(f"Failed to get response from Ollama: {str(e)}") def list_models() -> List[str]: """ List available Ollama models. Returns: List of model names """ logger.info("Listing Ollama models") response = ollama.list() # Extract model names from the models attribute models = [model.model for model in response.models] return models ``` -------------------------------------------------------------------------------- /src/just_prompt/molecules/prompt_from_file.py: -------------------------------------------------------------------------------- ```python """ Prompt from file functionality for just-prompt. """ from typing import List import logging import os from pathlib import Path from .prompt import prompt logger = logging.getLogger(__name__) def prompt_from_file(abs_file_path: str, models_prefixed_by_provider: List[str] = None) -> List[str]: """ Read text from a file and send it as a prompt to multiple models. Args: abs_file_path: Absolute path to the text file (must be an absolute path, not relative) models_prefixed_by_provider: List of model strings in format "provider:model" If None, uses the DEFAULT_MODELS environment variable Returns: List of responses from the models """ file_path = Path(abs_file_path) # Validate file if not file_path.exists(): raise FileNotFoundError(f"File not found: {abs_file_path}") if not file_path.is_file(): raise ValueError(f"Not a file: {abs_file_path}") # Read file content try: with open(file_path, 'r', encoding='utf-8') as f: text = f.read() except Exception as e: logger.error(f"Error reading file {abs_file_path}: {e}") raise ValueError(f"Error reading file: {str(e)}") # Send prompt with file content return prompt(text, models_prefixed_by_provider) ``` -------------------------------------------------------------------------------- /src/just_prompt/tests/molecules/test_prompt.py: -------------------------------------------------------------------------------- ```python """ Tests for prompt functionality. """ import pytest import os from dotenv import load_dotenv from just_prompt.molecules.prompt import prompt # Load environment variables load_dotenv() def test_prompt_basic(): """Test basic prompt functionality with a real API call.""" # Define a simple test case test_prompt = "What is the capital of France?" test_models = ["openai:gpt-4o-mini"] # Call the prompt function with a real model response = prompt(test_prompt, test_models) # Assertions assert isinstance(response, list) assert len(response) == 1 assert "paris" in response[0].lower() or "Paris" in response[0] def test_prompt_multiple_models(): """Test prompt with multiple models.""" # Skip if API keys aren't available if not os.environ.get("OPENAI_API_KEY") or not os.environ.get("ANTHROPIC_API_KEY"): pytest.skip("Required API keys not available") # Define a simple test case test_prompt = "What is the capital of France?" test_models = ["openai:gpt-4o-mini", "anthropic:claude-3-5-haiku-20241022"] # Call the prompt function with multiple models response = prompt(test_prompt, test_models) # Assertions assert isinstance(response, list) assert len(response) == 2 # Check all responses contain Paris for r in response: assert "paris" in r.lower() or "Paris" in r ``` -------------------------------------------------------------------------------- /specs/oai-reasoning-levels.md: -------------------------------------------------------------------------------- ```markdown Feature Request: Add low, medium, high reasoning levels to the OpenAI o-series reasoning models > Models; o3-mini, o4-mini, o3 > > Implement every detail below end to end and validate your work with tests. ## Implementation Notes - Just like how claude-3-7-sonnet has budget tokens in src/just_prompt/atoms/llm_providers/anthropic.py, OpenAI has a similar feature with the low, medium, high suffix. We want to support o4-mini:low, o4-mini:medium, o4-mini:high, ...repeat for o3-mini and o3. - If this suffix is present, we should trigger a prompt_with_thinking function in src/just_prompt/atoms/llm_providers/openai.py. Use the example code in ai_docs/openai-reasoning-effort.md. If suffix is not present, use the existing prompt function. - Update tests to verify the feature works, specifically in test_openai.py. Test with o4-mini:low, o4-mini:medium, o4-mini:high on a simple puzzle. - After you implement and test, update the README.md file to detail the new feature. - We're using 'uv' to run code and test. You won't need to install anything just testing. ## Relevant Files (Context) > Read these files before implementing the feature. README.md pyproject.toml src/just_prompt/molecules/prompt.py src/just_prompt/atoms/llm_providers/anthropic.py src/just_prompt/atoms/llm_providers/openai.py src/just_prompt/tests/atoms/llm_providers/test_openai.py ## Self Validation (Close the loop) > After implementing the feature, run the tests to verify it works. > > All env variables are in place - run tests against real apis. - uv run pytest src/just_prompt/tests/atoms/llm_providers/test_openai.py - uv run pytest src/just_prompt/tests/molecules/test_prompt.py ``` -------------------------------------------------------------------------------- /src/just_prompt/tests/molecules/test_prompt_from_file_to_file.py: -------------------------------------------------------------------------------- ```python """ Tests for prompt_from_file_to_file functionality. """ import pytest import os import tempfile import shutil from dotenv import load_dotenv from just_prompt.molecules.prompt_from_file_to_file import prompt_from_file_to_file # Load environment variables load_dotenv() def test_directory_creation_and_file_writing(): """Test that the output directory is created and files are written with real API responses.""" # Create temporary input file with a simple question with tempfile.NamedTemporaryFile(mode='w+', delete=False) as temp_file: temp_file.write("What is the capital of France?") input_path = temp_file.name # Create a deep non-existent directory path temp_dir = os.path.join(tempfile.gettempdir(), "just_prompt_test_dir", "output") try: # Make real API call file_paths = prompt_from_file_to_file( input_path, ["o:gpt-4o-mini"], temp_dir ) # Assertions assert isinstance(file_paths, list) assert len(file_paths) == 1 # Check that the file exists assert os.path.exists(file_paths[0]) # Check that the file has a .md extension assert file_paths[0].endswith('.md') # Check file content contains the expected response with open(file_paths[0], 'r') as f: content = f.read() assert "paris" in content.lower() or "Paris" in content finally: # Clean up os.unlink(input_path) # Remove the created directory and all its contents if os.path.exists(os.path.dirname(temp_dir)): shutil.rmtree(os.path.dirname(temp_dir)) ``` -------------------------------------------------------------------------------- /src/just_prompt/__main__.py: -------------------------------------------------------------------------------- ```python """ Main entry point for just-prompt. """ import argparse import asyncio import logging import sys from dotenv import load_dotenv from .server import serve from .atoms.shared.utils import DEFAULT_MODEL from .atoms.shared.validator import print_provider_availability # Load environment variables load_dotenv() # Configure logging logging.basicConfig( level=logging.INFO, format='%(asctime)s [%(levelname)s] %(message)s', datefmt='%Y-%m-%d %H:%M:%S' ) logger = logging.getLogger(__name__) def main(): """ Main entry point for just-prompt. """ parser = argparse.ArgumentParser(description="just-prompt - A lightweight MCP server for various LLM providers") parser.add_argument( "--default-models", default=DEFAULT_MODEL, help="Comma-separated list of default models to use for prompts and model name correction, in format provider:model" ) parser.add_argument( "--log-level", choices=["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"], default="INFO", help="Logging level" ) parser.add_argument( "--show-providers", action="store_true", help="Show available providers and exit" ) args = parser.parse_args() # Set logging level logging.getLogger().setLevel(getattr(logging, args.log_level)) # Show provider availability and optionally exit if args.show_providers: print_provider_availability() sys.exit(0) try: # Start server (asyncio) asyncio.run(serve(args.default_models)) except Exception as e: logger.error(f"Error starting server: {e}") sys.exit(1) if __name__ == "__main__": main() ``` -------------------------------------------------------------------------------- /src/just_prompt/tests/atoms/shared/test_utils.py: -------------------------------------------------------------------------------- ```python """ Tests for utility functions. """ import pytest from just_prompt.atoms.shared.utils import split_provider_and_model, get_provider_from_prefix def test_split_provider_and_model(): """Test splitting provider and model from string.""" # Test basic splitting provider, model = split_provider_and_model("openai:gpt-4") assert provider == "openai" assert model == "gpt-4" # Test short provider name provider, model = split_provider_and_model("o:gpt-4") assert provider == "o" assert model == "gpt-4" # Test model with colons provider, model = split_provider_and_model("ollama:llama3:latest") assert provider == "ollama" assert model == "llama3:latest" # Test invalid format with pytest.raises(ValueError): split_provider_and_model("invalid-model-string") def test_get_provider_from_prefix(): """Test getting provider from prefix.""" # Test full names assert get_provider_from_prefix("openai") == "openai" assert get_provider_from_prefix("anthropic") == "anthropic" assert get_provider_from_prefix("gemini") == "gemini" assert get_provider_from_prefix("groq") == "groq" assert get_provider_from_prefix("deepseek") == "deepseek" assert get_provider_from_prefix("ollama") == "ollama" # Test short names assert get_provider_from_prefix("o") == "openai" assert get_provider_from_prefix("a") == "anthropic" assert get_provider_from_prefix("g") == "gemini" assert get_provider_from_prefix("q") == "groq" assert get_provider_from_prefix("d") == "deepseek" assert get_provider_from_prefix("l") == "ollama" # Test invalid prefix with pytest.raises(ValueError): get_provider_from_prefix("unknown") ``` -------------------------------------------------------------------------------- /src/just_prompt/tests/molecules/test_list_providers.py: -------------------------------------------------------------------------------- ```python """ Tests for list_providers functionality. """ import pytest from just_prompt.molecules.list_providers import list_providers def test_list_providers(): """Test listing providers.""" providers = list_providers() # Check basic structure assert isinstance(providers, list) assert len(providers) > 0 assert all(isinstance(p, dict) for p in providers) # Check expected providers are present provider_names = [p["name"] for p in providers] assert "OPENAI" in provider_names assert "ANTHROPIC" in provider_names assert "GEMINI" in provider_names assert "GROQ" in provider_names assert "DEEPSEEK" in provider_names assert "OLLAMA" in provider_names # Check each provider has required fields for provider in providers: assert "name" in provider assert "full_name" in provider assert "short_name" in provider # Check full_name and short_name values if provider["name"] == "OPENAI": assert provider["full_name"] == "openai" assert provider["short_name"] == "o" elif provider["name"] == "ANTHROPIC": assert provider["full_name"] == "anthropic" assert provider["short_name"] == "a" elif provider["name"] == "GEMINI": assert provider["full_name"] == "gemini" assert provider["short_name"] == "g" elif provider["name"] == "GROQ": assert provider["full_name"] == "groq" assert provider["short_name"] == "q" elif provider["name"] == "DEEPSEEK": assert provider["full_name"] == "deepseek" assert provider["short_name"] == "d" elif provider["name"] == "OLLAMA": assert provider["full_name"] == "ollama" assert provider["short_name"] == "l" ``` -------------------------------------------------------------------------------- /list_models.py: -------------------------------------------------------------------------------- ```python def list_openai_models(): from openai import OpenAI client = OpenAI() print(client.models.list()) def list_groq_models(): import os from groq import Groq client = Groq( api_key=os.environ.get("GROQ_API_KEY"), ) chat_completion = client.models.list() print(chat_completion) def list_anthropic_models(): import anthropic import os from dotenv import load_dotenv load_dotenv() client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY")) models = client.models.list() print("Available Anthropic models:") for model in models.data: print(f"- {model.id}") def list_gemini_models(): import os from google import genai from dotenv import load_dotenv load_dotenv() client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY")) print("List of models that support generateContent:\n") for m in client.models.list(): for action in m.supported_actions: if action == "generateContent": print(m.name) print("List of models that support embedContent:\n") for m in client.models.list(): for action in m.supported_actions: if action == "embedContent": print(m.name) def list_deepseek_models(): from openai import OpenAI # for backward compatibility, you can still use `https://api.deepseek.com/v1` as `base_url`. client = OpenAI( api_key="sk-ds-3f422175ff114212a42d7107c3efd1e4", # fake base_url="https://api.deepseek.com", ) print(client.models.list()) def list_ollama_models(): import ollama print(ollama.list()) # Uncomment to run the functions # list_openai_models() # list_groq_models() # list_anthropic_models() # list_gemini_models() # list_deepseek_models() # list_ollama_models() ``` -------------------------------------------------------------------------------- /src/just_prompt/atoms/llm_providers/groq.py: -------------------------------------------------------------------------------- ```python """ Groq provider implementation. """ import os from typing import List import logging from groq import Groq from dotenv import load_dotenv # Load environment variables load_dotenv() # Configure logging logger = logging.getLogger(__name__) # Initialize Groq client client = Groq(api_key=os.environ.get("GROQ_API_KEY")) def prompt(text: str, model: str) -> str: """ Send a prompt to Groq and get a response. Args: text: The prompt text model: The model name Returns: Response string from the model """ try: logger.info(f"Sending prompt to Groq model: {model}") # Create chat completion chat_completion = client.chat.completions.create( messages=[{"role": "user", "content": text}], model=model, ) # Extract response content return chat_completion.choices[0].message.content except Exception as e: logger.error(f"Error sending prompt to Groq: {e}") raise ValueError(f"Failed to get response from Groq: {str(e)}") def list_models() -> List[str]: """ List available Groq models. Returns: List of model names """ try: logger.info("Listing Groq models") response = client.models.list() # Extract model IDs models = [model.id for model in response.data] return models except Exception as e: logger.error(f"Error listing Groq models: {e}") # Return some known models if API fails logger.info("Returning hardcoded list of known Groq models") return [ "llama-3.3-70b-versatile", "llama-3.1-70b-versatile", "llama-3.1-8b-versatile", "mixtral-8x7b-32768", "gemma-7b-it", "qwen-2.5-32b" ] ``` -------------------------------------------------------------------------------- /src/just_prompt/atoms/llm_providers/deepseek.py: -------------------------------------------------------------------------------- ```python """ DeepSeek provider implementation. """ import os from typing import List import logging from openai import OpenAI from dotenv import load_dotenv # Load environment variables load_dotenv() # Configure logging logger = logging.getLogger(__name__) # Initialize DeepSeek client with OpenAI-compatible interface client = OpenAI( api_key=os.environ.get("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com" ) def prompt(text: str, model: str) -> str: """ Send a prompt to DeepSeek and get a response. Args: text: The prompt text model: The model name Returns: Response string from the model """ try: logger.info(f"Sending prompt to DeepSeek model: {model}") # Create chat completion response = client.chat.completions.create( model=model, messages=[{"role": "user", "content": text}], stream=False, ) # Extract response content return response.choices[0].message.content except Exception as e: logger.error(f"Error sending prompt to DeepSeek: {e}") raise ValueError(f"Failed to get response from DeepSeek: {str(e)}") def list_models() -> List[str]: """ List available DeepSeek models. Returns: List of model names """ try: logger.info("Listing DeepSeek models") response = client.models.list() # Extract model IDs models = [model.id for model in response.data] return models except Exception as e: logger.error(f"Error listing DeepSeek models: {e}") # Return some known models if API fails logger.info("Returning hardcoded list of known DeepSeek models") return [ "deepseek-coder", "deepseek-chat", "deepseek-reasoner", "deepseek-coder-v2", "deepseek-reasoner-lite" ] ``` -------------------------------------------------------------------------------- /src/just_prompt/tests/atoms/llm_providers/test_openai.py: -------------------------------------------------------------------------------- ```python """ Tests for OpenAI provider. """ import pytest import os from dotenv import load_dotenv from just_prompt.atoms.llm_providers import openai # Load environment variables load_dotenv() # Skip tests if API key not available if not os.environ.get("OPENAI_API_KEY"): pytest.skip("OpenAI API key not available", allow_module_level=True) def test_list_models(): """Test listing OpenAI models.""" models = openai.list_models() # Assertions assert isinstance(models, list) assert len(models) > 0 assert all(isinstance(model, str) for model in models) # Check for at least one expected model gpt_models = [model for model in models if "gpt" in model.lower()] assert len(gpt_models) > 0, "No GPT models found" def test_prompt(): """Test sending prompt to OpenAI with a regular model.""" response = openai.prompt("What is the capital of France?", "gpt-4o-mini") # Assertions assert isinstance(response, str) assert len(response) > 0 assert "paris" in response.lower() or "Paris" in response def test_parse_reasoning_suffix(): """Test parsing reasoning effort suffix from model names.""" # No suffix assert openai.parse_reasoning_suffix("o4-mini") == ("o4-mini", "") assert openai.parse_reasoning_suffix("o3") == ("o3", "") # Supported suffixes assert openai.parse_reasoning_suffix("o4-mini:low") == ("o4-mini", "low") assert openai.parse_reasoning_suffix("o4-mini:medium") == ("o4-mini", "medium") assert openai.parse_reasoning_suffix("o4-mini:high") == ("o4-mini", "high") assert openai.parse_reasoning_suffix("o3-mini:LOW") == ("o3-mini", "low") # case insensitive # Unsupported model – suffix ignored assert openai.parse_reasoning_suffix("gpt-4o-mini:low") == ("gpt-4o-mini:low", "") @pytest.mark.parametrize("model_suffix", ["o4-mini:low", "o4-mini:medium", "o4-mini:high"]) def test_prompt_with_reasoning(model_suffix): """Test sending prompt with reasoning effort enabled.""" response = openai.prompt("What is the capital of Spain?", model_suffix) # Assertions assert isinstance(response, str) assert len(response) > 0 assert "madrid" in response.lower() or "Madrid" in response ``` -------------------------------------------------------------------------------- /ultra_diff_review/diff_gemini_gemini-2.0-flash-thinking-exp.md: -------------------------------------------------------------------------------- ```markdown ## Code Review The diff introduces modularity and improves the structure of the script by encapsulating the model listing logic for each provider into separate functions. However, there are a few issues and areas for improvement. **Issues, Bugs, and Improvements:** 1. **🚨 Hardcoded API Key (DeepSeek):** The `list_deepseek_models` function includes a hardcoded API key for DeepSeek. This is a major security vulnerability as API keys should be kept secret and managed securely, preferably through environment variables. 2. **⚠️ Lack of Error Handling:** The script lacks error handling. If API calls fail due to network issues, invalid API keys, or other reasons, the script will likely crash or produce uninformative error messages. Robust error handling is crucial for production-ready code. 3. **ℹ️ Inconsistent API Key Loading (Minor):** While `dotenv` is used for Anthropic and Gemini API keys, OpenAI, Groq, and DeepSeek (partially) rely directly on environment variables. While functional, consistent use of `dotenv` for all API keys would enhance maintainability and project consistency. 4. **ℹ️ Missing Function Docstrings (Minor):** The functions lack docstrings explaining their purpose, parameters (if any), and return values. Docstrings enhance code readability and make it easier to understand the function's role. 5. **ℹ️ No Centralized Configuration (Minor):** While using environment variables is good, having a more centralized configuration mechanism (even if it's just a `.env` file loaded by `dotenv`) could be beneficial for managing various settings in the future. **Markdown Table of Issues:** | Issue | Solution | Risk Assessment | |----------------------------|-------------------------------------------------------------|-----------------| | 🚨 **Hardcoded API Key (DeepSeek)** | Use environment variables to store and access the DeepSeek API key. | High | | ⚠️ **Lack of Error Handling** | Implement `try-except` blocks to handle potential API errors. | Medium | | ℹ️ **Inconsistent API Key Loading** | Use `dotenv` consistently for all API keys. | Low | | ℹ️ **Missing Function Docstrings** | Add docstrings to each function explaining its purpose. | Low | | ℹ️ **No Centralized Config** | Consider a more centralized configuration approach if needed. | Low | ``` -------------------------------------------------------------------------------- /src/just_prompt/molecules/prompt_from_file_to_file.py: -------------------------------------------------------------------------------- ```python """ Prompt from file to file functionality for just-prompt. """ from typing import List import logging import os from pathlib import Path from .prompt_from_file import prompt_from_file from ..atoms.shared.utils import DEFAULT_MODEL logger = logging.getLogger(__name__) def prompt_from_file_to_file( abs_file_path: str, models_prefixed_by_provider: List[str] = None, abs_output_dir: str = "." ) -> List[str]: """ Read text from a file, send it as prompt to multiple models, and save responses to files. Args: abs_file_path: Absolute path to the text file (must be an absolute path, not relative) models_prefixed_by_provider: List of model strings in format "provider:model" If None, uses the DEFAULT_MODELS environment variable abs_output_dir: Absolute directory path to save response files (must be an absolute path, not relative) Returns: List of paths to the output files """ # Validate output directory output_path = Path(abs_output_dir) if not output_path.exists(): output_path.mkdir(parents=True, exist_ok=True) if not output_path.is_dir(): raise ValueError(f"Not a directory: {abs_output_dir}") # Get the base name of the input file input_file_name = Path(abs_file_path).stem # Get responses responses = prompt_from_file(abs_file_path, models_prefixed_by_provider) # Save responses to files output_files = [] # Get the models that were actually used models_used = models_prefixed_by_provider if not models_used: default_models = os.environ.get("DEFAULT_MODELS", DEFAULT_MODEL) models_used = [model.strip() for model in default_models.split(",")] for i, (model_string, response) in enumerate(zip(models_used, responses)): # Sanitize model string for filename (replace colons with underscores) safe_model_name = model_string.replace(":", "_") # Create output filename with .md extension output_file = output_path / f"{input_file_name}_{safe_model_name}.md" # Write response to file as markdown try: with open(output_file, "w", encoding="utf-8") as f: f.write(response) output_files.append(str(output_file)) except Exception as e: logger.error(f"Error writing response to {output_file}: {e}") output_files.append(f"Error: {str(e)}") return output_files ``` -------------------------------------------------------------------------------- /ultra_diff_review/diff_anthropic_claude-3-7-sonnet-20250219_4k.md: -------------------------------------------------------------------------------- ```markdown # Code Review I've analyzed the changes made to the `list_models.py` file. The diff shows a complete refactoring of the file that organizes model listing functionality into separate functions for different AI providers. ## Key Changes 1. **Code Organization:** The code has been restructured from a series of commented blocks into organized functions for each AI provider. 2. **Function Implementation:** Each provider now has a dedicated function for listing their available models. 3. **DeepSeek API Key:** A hardcoded API key is now present in the DeepSeek function. 4. **Function Execution:** All functions are defined but commented out at the bottom of the file. ## Issues and Improvements ### 1. Hardcoded API Key The `list_deepseek_models()` function contains a hardcoded API key: `"sk-ds-3f422175ff114212a42d7107c3efd1e4"`. This is a significant security risk as API keys should never be stored in source code. ### 2. Inconsistent Environment Variable Usage Most functions use environment variables for API keys, but the DeepSeek function does not follow this pattern. ### 3. Error Handling None of the functions include error handling for API failures, network issues, or missing API keys. ### 4. Import Organization Import statements are scattered throughout the functions instead of being consolidated at the top of the file. ### 5. No Main Function There's no main function or entrypoint that would allow users to select which model list they want to see. ## Issue Summary | Issue | Solution | Risk Assessment | |-------|----------|-----------------| | 🚨 Hardcoded API key in DeepSeek function | Replace with environment variable: `api_key=os.environ.get("DEEPSEEK_API_KEY")` | High - Security risk, potential unauthorized API usage and charges | | ⚠️ No error handling | Add try/except blocks to handle API errors, network issues, and missing credentials | Medium - Code will fail without clear error messages | | 🔧 Inconsistent environment variable usage | Standardize API key access across all providers | Low - Maintenance and consistency issue | | 🔧 Scattered imports | Consolidate common imports at the top of the file | Low - Code organization issue | | 💡 No main function or CLI | Add a main function with argument parsing to run specific provider functions | Low - Usability enhancement | | 💡 Missing API key validation | Add checks to validate API keys are present before making API calls | Medium - Prevents unclear errors when keys are missing | The most critical issue is the hardcoded API key which should be addressed immediately to prevent security risks. ``` -------------------------------------------------------------------------------- /src/just_prompt/atoms/shared/utils.py: -------------------------------------------------------------------------------- ```python """ Utility functions for just-prompt. """ from typing import Tuple, List, Optional import os from dotenv import load_dotenv import logging # Set up logging logging.basicConfig( level=logging.INFO, format='%(asctime)s [%(levelname)s] %(message)s', datefmt='%Y-%m-%d %H:%M:%S' ) # Load environment variables load_dotenv() # Default model constants DEFAULT_MODEL = "anthropic:claude-3-7-sonnet-20250219" def split_provider_and_model(model_string: str) -> Tuple[str, str]: """ Split a model string into provider and model name. Note: This only splits the first colon in the model string and leaves the rest of the string as the model name. Models will have additional colons in the string and we want to ignore them and leave them for the model name. Args: model_string: String in format "provider:model" Returns: Tuple containing (provider, model) """ parts = model_string.split(":", 1) if len(parts) != 2: raise ValueError(f"Invalid model string format: {model_string}. Expected format: 'provider:model'") provider, model = parts return provider, model def get_provider_from_prefix(prefix: str) -> str: """ Get the full provider name from a prefix. Args: prefix: Provider prefix (short or full name) Returns: Full provider name """ from .data_types import ModelProviders provider = ModelProviders.from_name(prefix) if provider is None: raise ValueError(f"Unknown provider prefix: {prefix}") return provider.full_name def get_models_prefixed_by_provider(provider_prefix: str, model_name: str) -> str: """ Format a model string with provider prefix. Args: provider_prefix: The provider prefix (short or full name) model_name: The model name Returns: Formatted string in "provider:model" format """ provider = get_provider_from_prefix(provider_prefix) return f"{provider}:{model_name}" def get_api_key(provider: str) -> Optional[str]: """ Get the API key for a provider from environment variables. Args: provider: Provider name (full name) Returns: API key as string or ``None`` if the provider is unsupported or no environment variable is set """ key_mapping = { "openai": "OPENAI_API_KEY", "anthropic": "ANTHROPIC_API_KEY", "gemini": "GEMINI_API_KEY", "groq": "GROQ_API_KEY", "deepseek": "DEEPSEEK_API_KEY" } env_var = key_mapping.get(provider) if not env_var: return None return os.environ.get(env_var) ``` -------------------------------------------------------------------------------- /example_outputs/decision_openai_vs_anthropic_vs_google/ceo_medium_decision_openai_vs_anthropic_vs_google_openai_o4-mini_high.md: -------------------------------------------------------------------------------- ```markdown Here’s a focused comparison of OpenAI, Anthropic and Google along your three axes—Model Performance, Model Tool Use and Model Cost—plus a bottom‑line “which to pick” based on different priorities. 1. Model Performance (Raw Intelligence) • OpenAI (o3 / GPT‑4o): leads most public benchmarks, best vision‑reasoning, continuous frontier releases. • Google (Gemini 2.5 Pro): at parity on reasoning & code benchmarks, unrivaled context windows (1M→2M tokens soon). • Anthropic (Claude 3.5 Sonnet): very strong in free‑form reasoning, matches or beats GPT‑4‑Turbo in text tasks but lags on vision/speech. 2. Model Tool Use (Ability to orchestrate APIs, plug‑ins, agents) • OpenAI: richest ecosystem—Assistants API with built‑in tool discovery, function‑calls, vision+generation APIs out of the box. • Anthropic: clean, safety‑centric JSON tool schema; coming tooling ecosystem but fewer first‑party connectors (no vision yet). • Google: Vertex AI + AI Studio pipelines, good SDKs and open‑weight Gemma for on‑prem, but less mature “agent” layer than OpenAI. 3. Model Cost (Price / Performance at scale) • Anthropic (Sonnet tier): cheapest per token for GPT‑4‑level quality today. • Google (Vertex discounts & Gemma open models): aggressive pricing and on‑device options with Gemma 3. • OpenAI: steadily falling prices, but top‑end O‑series still carries a premium vs Sonnet/Gemini mid‑tiers. Summary “Bet” Recommendations • If you care most about **bleeding‑edge capabilities + seamless, production‑ready tool/agent support**, lean into **OpenAI**. You get top scores, the largest third‑party connector ecosystem and Microsoft’s enterprise muscle—at a premium price. • If **unit economics** (cost‑performance) is your #1 driver and you value a safety‑first alignment ethos, **Anthropic** is the sweet spot. You give up some multimodal/speech features but gain the lowest cost for GPT‑4‑class chat and clean tool integration. • If you prize **distribution scale, open‑weight fallbacks and full control over compute**, **Google** stands out. You’ll trade a slightly slower release cadence and less “agent magic” for unrivaled throughput (TPUs + 1M+ token contexts), built‑in Workspace/Android reach and on‑prem options. All three are competitive on raw intelligence. Your choice really comes down to your biggest lever: – Performance & tooling ⇒ OpenAI – Cost‑performance & alignment ⇒ Anthropic – Distribution & compute sovereignty ⇒ Google Whichever you pick, pilot a real workload (with rate limits, enterprise features, support SLAs) before you commit multi‑year spend. This space is evolving so rapidly that today’s “win” can shift next quarter. ``` -------------------------------------------------------------------------------- /specs/gemini-2-5-flash-reasoning.md: -------------------------------------------------------------------------------- ```markdown # Gemini 2.5 Flash Reasoning > Implement reasoning for Gemini 2.5 Flash. > > Implement every detail below end to end and validate your work with tests. ## Implementation Notes - We're adding support for `gemini-2.5-flash-preview-04-17` with thinking_budget for gemini. - Just like how claude-3-7-sonnet has budget tokens in src/just_prompt/atoms/llm_providers/anthropic.py, Gemini has a similar feature with the thinking_budget. We want to support this. - If this parameter is present, we should trigger a prompt_with_thinking function in src/just_prompt/atoms/llm_providers/gemini.py. Use the example code in ai_docs/gemini-2-5-flash-reasoning.md. If parameter is not present, use the existing prompt function. - Update tests to verify the feature works, specifically in test_gemini.py. Test with gemini-2.5-flash-preview-04-17 with and without the thinking_budget parameter. - This only works with the gemini-2.5-flash-preview-04-17 model but assume more models like this will be added in the future and check against the model name from a list so we can easily add them later. - After you implement and test, update the README.md file to detail the new feature. - We're using 'uv run pytest <file>' to run tests. You won't need to run any other commands or install anything only testing. - Keep all the essential logic surrounding this change in gemini.py just like how anthropic.py sets this up for it's version (thinking_budget). - No need to update any libraries or packages. - So if we pass in something like: `gemini:gemini-2.5-flash-preview-04-17`, run the normal prompt function. If we pass in: `gemini:gemini-2.5-flash-preview-04-17:4k`, run the prompt_with_thinking function with 4000 thinking budget. Mirror anthropic.py's logic. - Update gemini.py to use the new import and client setup via `from google import genai` and `client = genai.Client(api_key="GEMINI_API_KEY")`. ## Relevant Files (Context) > Read these files before implementing the feature. README.md pyproject.toml src/just_prompt/molecules/prompt.py src/just_prompt/atoms/llm_providers/anthropic.py src/just_prompt/atoms/llm_providers/gemini.py src/just_prompt/tests/atoms/llm_providers/test_gemini.py ## Example Reasoning Code ```python from google import genai client = genai.Client(api_key="GEMINI_API_KEY") response = client.models.generate_content( model="gemini-2.5-flash-preview-04-17", contents="You roll two dice. What’s the probability they add up to 7?", config=genai.types.GenerateContentConfig( thinking_config=genai.types.ThinkingConfig( thinking_budget=1024 # 0 - 24576 ) ) ) print(response.text) ``` ## Self Validation (Close the loop) > After implementing the feature, run the tests to verify it works. > > All env variables are in place - run tests against real apis. - uv run pytest src/just_prompt/tests/atoms/llm_providers/test_gemini.py - uv run pytest src/just_prompt/tests/molecules/test_prompt.py ``` -------------------------------------------------------------------------------- /example_outputs/countdown_component/countdown_component_o_gpt-4.5-preview.md: -------------------------------------------------------------------------------- ```markdown Here's how you would implement a `<Countdown :start-time="number" />` component in Vanilla JavaScript/EcmaScript (no framework) and popular frameworks/libraries Vue, Svelte, and React. --- ## 1\. Vanilla JavaScript (Web component) ### HTML ```html <count-down start-time="60"></count-down> ``` ### JavaScript (countdown.js) ```js class CountDown extends HTMLElement { constructor() { super(); this.attachShadow({ mode: 'open' }); this.count = parseInt(this.getAttribute('start-time'), 10) || 0; this.shadowRoot.innerHTML = `<div id="counter">${this.count}</div>`; } connectedCallback() { this.interval = setInterval(() => { if (this.count > 0) { this.count--; this.shadowRoot.querySelector('#counter').textContent = this.count; } else { clearInterval(this.interval); } }, 1000); } disconnectedCallback() { clearInterval(this.interval); } } customElements.define('count-down', CountDown); ``` ### Usage ```html <script src="countdown.js"></script> <count-down start-time="60"></count-down> ``` --- ## 2\. Vue 3 (Composition API): ### Countdown.vue ```vue <script setup> import { ref, onMounted, onUnmounted, defineProps } from 'vue'; const props = defineProps({ startTime: { type: Number, default: 60 } }); const count = ref(props.startTime); let timer; onMounted(() => { timer = setInterval(() => { if (count.value > 0) { count.value--; } else { clearInterval(timer); } }, 1000); }); onUnmounted(() => { clearInterval(timer); }); </script> <template> <div>{{ count }}</div> </template> ``` ### Usage: ```vue <Countdown :start-time="60" /> ``` --- ## 3\. Svelte ### Countdown.svelte ```svelte <script> export let startTime = 60; let count = startTime; let interval = null; interval = setInterval(() => { if (count > 0) { count -= 1; } else { clearInterval(interval); } }, 1000); // cleanup import { onDestroy } from 'svelte'; onDestroy(() => clearInterval(interval)); </script> <div>{count}</div> ``` ### Usage: ```svelte <Countdown startTime={60} /> ``` --- ## 4\. React ### Countdown.jsx ```jsx import React, { useState, useEffect } from 'react'; export default function Countdown({ startTime = 60 }) { const [count, setCount] = useState(startTime); useEffect(() => { if (count <= 0) return; const timer = setInterval(() => setCount(prev => prev - 1), 1000); return () => clearInterval(timer); }, [count]); return <div>{count}</div>; } ``` ### Usage: ```jsx <Countdown startTime={60} /> ``` --- ### Summary: | Implementation | Syntax & Concepts | |----------------|---------------------------------------------| | Vanilla | Web Components, Shadow DOM, setInterval() | | Vue | Composition API, reactivity, lifecycle hooks| | Svelte | Direct reactivity, lifecycle hooks | | React | JSX, hooks, useEffect, useState | ``` -------------------------------------------------------------------------------- /example_outputs/decision_openai_vs_anthropic_vs_google/ceo_medium_decision_openai_vs_anthropic_vs_google_anthropic_claude-3-7-sonnet-20250219_4k.md: -------------------------------------------------------------------------------- ```markdown # Comparing OpenAI, Anthropic, and Google for Your Major Investment Based on your three key factors, I'll help you evaluate which AI company might best deserve your significant investment of time, money, and resources. ## Factor 1: Model Performance (Raw Intelligence) **OpenAI**: Currently leads with o3 and GPT-4o models, demonstrating superior reasoning capabilities and consistent top benchmark scores. Their frontier research pipeline maintains a performance edge. **Anthropic**: Claude 3.5 Sonnet is competitive, with Claude 3 Opus showing excellent long-form reasoning and 200k context window support. While impressive, they typically follow OpenAI's innovations. **Google**: Gemini 2.5 Pro leads specifically in math and code benchmarks with native 1M token context (soon 2M). Their models excel in certain specialized domains but lack OpenAI's consistent lead across all dimensions. **Edge**: OpenAI, though the gap has narrowed considerably in 2025. ## Factor 2: Model Tool Use **OpenAI**: Most mature with native tool-use API, Assistants & Tools API for agent orchestration, and a universal function-calling schema. Their ecosystem encourages extensive third-party integration. **Anthropic**: Has an elegant JSON-based tool-use schema but lacks first-party vision capabilities. Their approach is solid but more limited in scope. **Google**: Good tool integration within their ecosystem, particularly with Workspace AI and Workspace Flows. They offer a blend of closed models and open weights (Gemma), providing flexibility. **Edge**: OpenAI has the richest tool ecosystem and integration capabilities. ## Factor 3: Model Cost **OpenAI**: Historically premium-priced, though they've implemented consistent quarterly price drops. GPT-4o is roughly 8× cheaper than previous generation models. **Anthropic**: Offers the most cost-effective options at the Sonnet tier, undercutting GPT-4-Turbo in many cases. Their mid-tier models (Claude 3 Haiku) are explicitly designed for cost efficiency. **Google**: Provides aggressive Vertex AI discounts and offers Gemma as open-weight models for on-premises deployment, potentially reducing costs for certain use cases. **Edge**: Anthropic for pure API cost efficiency, with Google offering compelling total cost of ownership advantages. ## Overall Assessment If you're prioritizing raw intelligence and cutting-edge capabilities, **OpenAI** would be your bet, though you'll pay a premium. If cost-efficiency with very competitive performance is your priority, **Anthropic** offers the best value, especially with Claude 3.5 Sonnet. If you need deep integration with existing productivity tools, desire open-weight options, or want the backing of the most financially stable company, **Google** is your best choice. ## Recommendation Based on your three factors weighted equally, I would recommend **Anthropic** as your primary bet, as they offer the best balance of performance and cost, while maintaining competitive tool use capabilities. Their Constitutional AI approach also suggests a more stable and responsible trajectory. However, if raw performance and extensive tool integration outweigh cost considerations for your specific needs, OpenAI would be the better choice despite the higher costs. ``` -------------------------------------------------------------------------------- /ultra_diff_review/diff_openai_o3-mini.md: -------------------------------------------------------------------------------- ```markdown Below is a review of the changes with observations on potential issues, bugs, and improvements: 1. Hardcoded API key in list_deepseek_models • Issue: The Deepseek function uses a hardcoded API key ("sk-ds-3f422175ff114212a42d7107c3efd1e4"). This exposes sensitive credentials in the source code. • Recommendation: Retrieve the key from an environment variable (or a secure vault) as is done for other models. • Severity: 🚨 Critical 2. Repeated load_dotenv calls • Issue: Both list_anthropic_models and list_gemini_models call load_dotenv() even if they might be used in the same run. • Recommendation: Consider loading environment variables once in a main entry point or in a shared initialization function. • Severity: ⚠️ Moderate 3. Redundant API calls in list_gemini_models • Issue: The Gemini function calls client.models.list() twice (once for generateContent and again for embedContent). This might be inefficient if each call performs network I/O. • Recommendation: Cache the result of client.models.list() into a variable and reuse it for both loops. • Severity: ⚠️ Low 4. Inconsistent variable naming and potential confusion • Observation: In list_groq_models, the result of client.models.list() is stored in a variable named chat_completion even though the function is about listing models. • Recommendation: Use a name such as models or model_list for clarity. • Severity: ℹ️ Low 5. Lack of error handling for API calls • Observation: All functions simply print the results of API calls without handling potential exceptions (e.g., network errors, invalid credentials). • Recommendation: Wrap API calls in try-except blocks and add meaningful error messages. • Severity: ⚠️ Moderate 6. Consistency in output formatting • Observation: While some functions print header messages (like list_anthropic_models and list_gemini_models), others (like list_openai_models or list_deepseek_models) simply print the raw result. • Recommendation: Add consistent formatting or output messages for clarity. • Severity: ℹ️ Low Below is a concise summary in a markdown table: | Issue | Solution | Risk Assessment | |--------------------------------------|------------------------------------------------------------------------------------------|--------------------------| | Hardcoded API key in Deepseek | Use an environment variable (e.g., os.environ.get("DEEPSEEK_API_KEY")) | 🚨 Critical | | Multiple load_dotenv() calls | Load environment variables once at program start instead of in each function | ⚠️ Moderate | | Redundant API call in Gemini models | Cache client.models.list() in a variable and reuse it for looping through supported actions | ⚠️ Low | | Inconsistent variable naming (Groq) | Rename variables (e.g., change "chat_completion" to "models" in list_groq_models) | ℹ️ Low (cosmetic) | | Lack of error handling | Wrap API calls in try-except blocks and log errors or provide user-friendly error messages | ⚠️ Moderate | This review should help in making the code more secure, efficient, and maintainable. ``` -------------------------------------------------------------------------------- /src/just_prompt/molecules/prompt.py: -------------------------------------------------------------------------------- ```python """ Prompt functionality for just-prompt. """ from typing import List import logging import concurrent.futures import os from ..atoms.shared.validator import validate_models_prefixed_by_provider from ..atoms.shared.utils import split_provider_and_model, DEFAULT_MODEL from ..atoms.shared.model_router import ModelRouter logger = logging.getLogger(__name__) def _process_model_prompt(model_string: str, text: str) -> str: """ Process a single model prompt. Args: model_string: String in format "provider:model" text: The prompt text Returns: Response from the model """ try: return ModelRouter.route_prompt(model_string, text) except Exception as e: logger.error(f"Error processing prompt for {model_string}: {e}") return f"Error ({model_string}): {str(e)}" def _correct_model_name(provider: str, model: str, correction_model: str) -> str: """ Correct a model name using the correction model. Args: provider: Provider name model: Model name correction_model: Model to use for correction Returns: Corrected model name """ try: return ModelRouter.magic_model_correction(provider, model, correction_model) except Exception as e: logger.error(f"Error correcting model name {provider}:{model}: {e}") return model def prompt(text: str, models_prefixed_by_provider: List[str] = None) -> List[str]: """ Send a prompt to multiple models using parallel processing. Args: text: The prompt text models_prefixed_by_provider: List of model strings in format "provider:model" If None, uses the DEFAULT_MODELS environment variable Returns: List of responses from the models """ # Use default models if no models provided if not models_prefixed_by_provider: default_models = os.environ.get("DEFAULT_MODELS", DEFAULT_MODEL) models_prefixed_by_provider = [model.strip() for model in default_models.split(",")] # Validate model strings validate_models_prefixed_by_provider(models_prefixed_by_provider) # Prepare corrected model strings corrected_models = [] for model_string in models_prefixed_by_provider: provider, model = split_provider_and_model(model_string) # Get correction model from environment correction_model = os.environ.get("CORRECTION_MODEL", DEFAULT_MODEL) # Check if model needs correction corrected_model = _correct_model_name(provider, model, correction_model) # Use corrected model if corrected_model != model: model_string = f"{provider}:{corrected_model}" corrected_models.append(model_string) # Process each model in parallel using ThreadPoolExecutor responses = [] with concurrent.futures.ThreadPoolExecutor() as executor: # Submit all tasks future_to_model = { executor.submit(_process_model_prompt, model_string, text): model_string for model_string in corrected_models } # Collect results in order for model_string in corrected_models: for future, future_model in future_to_model.items(): if future_model == model_string: responses.append(future.result()) break return responses ``` -------------------------------------------------------------------------------- /src/just_prompt/atoms/shared/validator.py: -------------------------------------------------------------------------------- ```python """ Validation utilities for just-prompt. """ from typing import List, Dict, Optional, Tuple import logging import os from .data_types import ModelProviders from .utils import split_provider_and_model, get_api_key logger = logging.getLogger(__name__) def validate_models_prefixed_by_provider(models_prefixed_by_provider: List[str]) -> bool: """ Validate that provider prefixes in model strings are valid. Args: models_prefixed_by_provider: List of model strings in format "provider:model" Returns: True if all valid, raises ValueError otherwise """ if not models_prefixed_by_provider: raise ValueError("No models provided") for model_string in models_prefixed_by_provider: try: provider_prefix, model_name = split_provider_and_model(model_string) provider = ModelProviders.from_name(provider_prefix) if provider is None: raise ValueError(f"Unknown provider prefix: {provider_prefix}") except Exception as e: logger.error(f"Validation error for model string '{model_string}': {str(e)}") raise return True def validate_provider(provider: str) -> bool: """ Validate that a provider name is valid. Args: provider: Provider name (full or short) Returns: True if valid, raises ValueError otherwise """ provider_enum = ModelProviders.from_name(provider) if provider_enum is None: raise ValueError(f"Unknown provider: {provider}") return True def validate_provider_api_keys() -> Dict[str, bool]: """ Validate that API keys are available for each provider. Returns: Dictionary mapping provider names to availability status (True if available, False otherwise) """ available_providers = {} # Check API keys for each provider for provider in ModelProviders: provider_name = provider.full_name # Special case for Ollama which uses OLLAMA_HOST instead of an API key if provider_name == "ollama": host = os.environ.get("OLLAMA_HOST") is_available = host is not None and host.strip() != "" available_providers[provider_name] = is_available else: # Get API key api_key = get_api_key(provider_name) is_available = api_key is not None and api_key.strip() != "" available_providers[provider_name] = is_available return available_providers def print_provider_availability(detailed: bool = True) -> None: """ Print information about which providers are available based on API keys. Args: detailed: Whether to print detailed information about missing keys """ availability = validate_provider_api_keys() available = [p for p, status in availability.items() if status] unavailable = [p for p, status in availability.items() if not status] # Print availability information logger.info(f"Available LLM providers: {', '.join(available)}") if detailed and unavailable: env_vars = { "openai": "OPENAI_API_KEY", "anthropic": "ANTHROPIC_API_KEY", "gemini": "GEMINI_API_KEY", "groq": "GROQ_API_KEY", "deepseek": "DEEPSEEK_API_KEY", "ollama": "OLLAMA_HOST" } logger.warning(f"The following providers are unavailable due to missing API keys:") for provider in unavailable: env_var = env_vars.get(provider) if env_var: logger.warning(f" - {provider}: Missing environment variable {env_var}") else: logger.warning(f" - {provider}: Missing configuration") ``` -------------------------------------------------------------------------------- /example_outputs/countdown_component/diff.md: -------------------------------------------------------------------------------- ```markdown # Code Review - Review the diff, report on issues, bugs, and improvements. - End with a concise markdown table of any issues found, their solutions, and a risk assessment for each issue if applicable. - Use emojis to convey the severity of each issue. ## Diff diff --git a/list_models.py b/list_models.py index aebb141..0c11e9b 100644 --- a/list_models.py +++ b/list_models.py @@ -1,69 +1,81 @@ -# from openai import OpenAI +def list_openai_models(): + from openai import OpenAI -# client = OpenAI() + client = OpenAI() -# print(client.models.list()) + print(client.models.list()) -# -------------------------------- -# import os +def list_groq_models(): + import os + from groq import Groq -# from groq import Groq + client = Groq( + api_key=os.environ.get("GROQ_API_KEY"), + ) -# client = Groq( -# api_key=os.environ.get("GROQ_API_KEY"), -# ) + chat_completion = client.models.list() -# chat_completion = client.models.list() + print(chat_completion) -# print(chat_completion) -# -------------------------------- +def list_anthropic_models(): + import anthropic + import os + from dotenv import load_dotenv -import anthropic -import os -from dotenv import load_dotenv + load_dotenv() -load_dotenv() + client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY")) + models = client.models.list() + print("Available Anthropic models:") + for model in models.data: + print(f"- {model.id}") -client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY")) -models = client.models.list() -print("Available Anthropic models:") -for model in models.data: - print(f"- {model.id}") -# -------------------------------- +def list_gemini_models(): + import os + from google import genai + from dotenv import load_dotenv -# import os -# from google import genai -# from dotenv import load_dotenv + load_dotenv() -# load_dotenv() + client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY")) -# client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY")) + print("List of models that support generateContent:\n") + for m in client.models.list(): + for action in m.supported_actions: + if action == "generateContent": + print(m.name) -# print("List of models that support generateContent:\n") -# for m in client.models.list(): -# for action in m.supported_actions: -# if action == "generateContent": -# print(m.name) + print("List of models that support embedContent:\n") + for m in client.models.list(): + for action in m.supported_actions: + if action == "embedContent": + print(m.name) -# print("List of models that support embedContent:\n") -# for m in client.models.list(): -# for action in m.supported_actions: -# if action == "embedContent": -# print(m.name) -# -------------------------------- deepseek +def list_deepseek_models(): + from openai import OpenAI -# from openai import OpenAI + # for backward compatibility, you can still use `https://api.deepseek.com/v1` as `base_url`. + client = OpenAI( + api_key="sk-ds-3f422175ff114212a42d7107c3efd1e4", + base_url="https://api.deepseek.com", + ) + print(client.models.list()) -# # for backward compatibility, you can still use `https://api.deepseek.com/v1` as `base_url`. -# client = OpenAI(api_key="<your API key>", base_url="https://api.deepseek.com") -# print(client.models.list()) -# -------------------------------- ollama +def list_ollama_models(): + import ollama -import ollama + print(ollama.list()) -print(ollama.list()) + +# Uncomment to run the functions +# list_openai_models() +# list_groq_models() +# list_anthropic_models() +# list_gemini_models() +# list_deepseek_models() +# list_ollama_models() ``` -------------------------------------------------------------------------------- /src/just_prompt/tests/atoms/shared/test_model_router.py: -------------------------------------------------------------------------------- ```python """ Tests for model router. """ import pytest import os from unittest.mock import patch, MagicMock import importlib from just_prompt.atoms.shared.model_router import ModelRouter from just_prompt.atoms.shared.data_types import ModelProviders @patch('importlib.import_module') def test_route_prompt(mock_import_module): """Test routing prompts to the appropriate provider.""" # Set up mock mock_module = MagicMock() mock_module.prompt.return_value = "Paris is the capital of France." mock_import_module.return_value = mock_module # Test with full provider name response = ModelRouter.route_prompt("openai:gpt-4o-mini", "What is the capital of France?") assert response == "Paris is the capital of France." mock_import_module.assert_called_with("just_prompt.atoms.llm_providers.openai") mock_module.prompt.assert_called_with("What is the capital of France?", "gpt-4o-mini") # Test with short provider name response = ModelRouter.route_prompt("o:gpt-4o-mini", "What is the capital of France?") assert response == "Paris is the capital of France." # Test invalid provider with pytest.raises(ValueError): ModelRouter.route_prompt("unknown:model", "What is the capital of France?") @patch('importlib.import_module') def test_route_list_models(mock_import_module): """Test routing list_models requests to the appropriate provider.""" # Set up mock mock_module = MagicMock() mock_module.list_models.return_value = ["model1", "model2"] mock_import_module.return_value = mock_module # Test with full provider name models = ModelRouter.route_list_models("openai") assert models == ["model1", "model2"] mock_import_module.assert_called_with("just_prompt.atoms.llm_providers.openai") mock_module.list_models.assert_called_once() # Test with short provider name models = ModelRouter.route_list_models("o") assert models == ["model1", "model2"] # Test invalid provider with pytest.raises(ValueError): ModelRouter.route_list_models("unknown") def test_validate_and_correct_model_shorthand(): """Test validation and correction of shorthand model names like a:sonnet.3.7.""" try: # Test with shorthand notation a:sonnet.3.7 # This should be corrected to claude-3-7-sonnet-20250219 # First, use the split_provider_and_model to get the provider and model from just_prompt.atoms.shared.utils import split_provider_and_model provider_prefix, model = split_provider_and_model("a:sonnet.3.7") # Get the provider enum provider = ModelProviders.from_name(provider_prefix) # Call validate_and_correct_model result = ModelRouter.magic_model_correction(provider.full_name, model, "anthropic:claude-sonnet-4-20250514") # The magic_model_correction method should correct sonnet.3.7 to a claude model assert "claude" in result, f"Expected sonnet.3.7 to be corrected to a claude model, got {result}" print(f"Shorthand model 'sonnet.3.7' was corrected to '{result}'") except Exception as e: pytest.fail(f"Test failed with error: {e}") def test_validate_and_correct_claude4_models(): """Test validation bypass for claude-4 models with thinking tokens.""" # Test claude-4 models bypass validation result = ModelRouter.validate_and_correct_model("anthropic", "claude-opus-4-20250514:4k") assert result == "claude-opus-4-20250514:4k", f"Expected bypass for claude-4 model, got {result}" result = ModelRouter.validate_and_correct_model("anthropic", "claude-sonnet-4-20250514:1k") assert result == "claude-sonnet-4-20250514:1k", f"Expected bypass for claude-4 model, got {result}" result = ModelRouter.validate_and_correct_model("anthropic", "claude-opus-4-20250514") assert result == "claude-opus-4-20250514", f"Expected bypass for claude-4 model, got {result}" ``` -------------------------------------------------------------------------------- /specs/prompt_from_file_to_file_w_context.md: -------------------------------------------------------------------------------- ```markdown Feature Request: Prompt from File to File with Context Files ## Implementation Notes - Create a new tool 'prompt_from_file_to_file_w_context' in src/just_prompt/molecules/prompt_from_file_to_file_w_context.py - Definition: prompt_from_file_to_file_w_context(from_file: str, context_files: List[str], models_prefixed_by_provider: List[str] = None, output_dir: str = ".") -> None: - This tool extends the existing prompt_from_file_to_file functionality by injecting context files into the prompt before sending to LLMs - The tool will read the from_file and search for the placeholder `{{context_files}}` - If `{{context_files}}` is not found in the from_file, throw an error requiring this placeholder to be present - Replace `{{context_files}}` with an XML block containing all context files: ```xml <context_files> <file name="absolute/path/to/file1.py"> ... file1 content ... </file> <file name="absolute/path/to/file2.md"> ... file2 content ... </file> ... repeat for all context_files ... </context_files> ``` - Read each file in context_files (using absolute paths) and inject their contents into the XML structure - After context injection, use the existing prompt_from_file_to_file logic to send the enhanced prompt to all specified models - Each context file should be wrapped in a `<file name="...">content</file>` tag within the `<context_files>` block - Handle file reading errors gracefully with descriptive error messages - Validate that all context_files exist and are readable before processing - The enhanced prompt (with context files injected) should be sent to all models specified in models_prefixed_by_provider - Output files follow the same naming convention as prompt_from_file_to_file: `{output_dir}/{sanitized_filename}_{provider}_{model}.md` ## Relevant Files - src/just_prompt/server.py (add new MCP tool endpoint) - src/just_prompt/molecules/prompt_from_file_to_file_w_context.py (new file) - src/just_prompt/molecules/prompt_from_file_to_file.py (reference existing logic) - src/just_prompt/atoms/shared/utils.py (for file operations and validation) - src/just_prompt/atoms/shared/validator.py (for input validation) - src/just_prompt/tests/molecules/test_prompt_from_file_to_file_w_context.py (new test file) ## Validation (Close the Loop) > Be sure to test this new capability with uv run pytest. - Create comprehensive tests in test_prompt_from_file_to_file_w_context.py covering: - Normal operation with valid context files - Error when {{context_files}} placeholder is missing - Error when context files don't exist or aren't readable - Proper XML formatting of context files - Integration with existing prompt_from_file_to_file workflow - `uv run pytest src/just_prompt/tests/molecules/test_prompt_from_file_to_file_w_context.py` - `uv run just-prompt --help` to validate the tool works as expected - Test end-to-end functionality by creating a sample prompt file with {{context_files}} placeholder and sample context files - After implementation, update README.md with the new tool's functionality and parameters - Run `git ls-files` to update the directory tree in the README with the new files ## Error Handling Requirements - Validate that from_file exists and is readable - Validate that all files in context_files list exist and are readable - Require {{context_files}} placeholder to be present in from_file content - Provide clear error messages for missing files, permission issues, or missing placeholder - Handle large context files gracefully (consider file size limits if needed) ## Example Usage ```python # Prompt file content (example.txt): """ Please analyze the following codebase files: {{context_files}} Based on the code above, suggest improvements for better performance. """ # Tool call: prompt_from_file_to_file_w_context( from_file="prompts/example.txt", context_files=[ "/absolute/path/to/src/main.py", "/absolute/path/to/src/utils.py", "/absolute/path/to/README.md" ], models_prefixed_by_provider=["openai:gpt-4o", "anthropic:claude-3-5-sonnet"], output_dir="analysis_results" ) ``` ``` -------------------------------------------------------------------------------- /specs/new-tool-llm-as-a-ceo.md: -------------------------------------------------------------------------------- ```markdown Feature Request: LLM as a CEO ## Implementation Notes - Create a new tool 'ceo_and_board' in src/just_prompt/molecules/ceo_and_board_prompt.py - Definition ceo_and_board_prompt(from_file: str, output_dir: str = ., models_prefixed_by_provider: List[str] = None, ceo_model: str = DEFAULT_CEO_MODEL, ceo_decision_prompt: str = DEFAULT_CEO_DECISION_PROMPT) -> None: - Use the existing prompt_from_file_to_file function to generate responses from 'board' aka models_prefixed_by_provider. - Then run the ceo_decision_prompt (xml style prompt) with the board's responses, and the original question prompt to get a decision. - DEFAULT_CEO_DECISION_PROMPT is ```xml <purpose> You are a CEO of a company. You are given a list of responses from your board of directors. Your job is to take in the original question prompt, and each of the board members' responses, and choose the best direction for your company. </purpose> <instructions> <instruction>Each board member has proposed an answer to the question posed in the prompt.</instruction> <instruction>Given the original question prompt, and each of the board members' responses, choose the best answer.</instruction> <instruction>Tally the votes of the board members, choose the best direction, and explain why you chose it.</instruction> <instruction>To preserve anonymity, we will use model names instead of real names of your board members. When responding, use the model names in your response.</instruction> <instruction>As a CEO, you breakdown the decision into several categories including: risk, reward, timeline, and resources. In addition to these guiding categories, you also consider the board members' expertise and experience. As a bleeding edge CEO, you also invent new dimensions of decision making to help you make the best decision for your company.</instruction> <instruction>Your final CEO response should be in markdown format with a comprehensive explanation of your decision. Start the top of the file with a title that says "CEO Decision", include a table of contents, briefly describe the question/problem at hand then dive into several sections. One of your first sections should be a quick summary of your decision, then breakdown each of the boards decisions into sections with your commentary on each. Where we lead into your decision with the categories of your decision making process, and then we lead into your final decision.</instruction> </instructions> <original-question>{original_prompt}</original-question> <board-decisions> <board-response> <model-name>...</model-name> <response>...</response> </board-response> <board-response> <model-name>...</model-name> <response>...</response> </board-response> ... </board-decisions> ``` - DEFAULT_CEO_MODEL is openai:o3 - The prompt_from_file_to_file will output a file for each board member's response in the output_dir. - Once they've been created, the ceo_and_board_prompt will read in the board member's responses, and the original question prompt into the ceo_decision_prompt and make another call with the ceo_model to get a decision. Write the decision to a file in the output_dir/ceo_decision.md. - Be sure to validate this functionality with uv run pytest <path-to-test-file> - After you implement update the README.md with the new tool's functionality and run `git ls-files` to update the directory tree in the readme with the new files. - Make sure this functionality works end to end. This functionality will be exposed as an MCP tool in the server.py file. ## Relevant Files - src/just_prompt/server.py - src/just_prompt/molecules/ceo_and_board_prompt.py - src/just_prompt/molecules/prompt_from_file_to_file.py - src/just_prompt/molecules/prompt_from_file.py - src/just_prompt/molecules/prompt.py - src/just_prompt/atoms/llm_providers/openai.py - src/just_prompt/atoms/shared/utils.py - src/just_prompt/tests/molecules/test_ceo_and_board_prompt.py ## Validation (Close the Loop) > Be sure to test this new capability with uv run pytest. - `uv run pytest src/just_prompt/tests/molecules/test_ceo_and_board_prompt.py` - `uv run just-prompt --help` to validate the tool works as expected. ``` -------------------------------------------------------------------------------- /ai_docs/google-genai-api-update.md: -------------------------------------------------------------------------------- ```markdown # Google GenAI SDK v1.22.0 Documentation ## Overview The Google Gen AI SDK provides an interface for developers to integrate Google's generative models into their Python applications. It supports both the Gemini Developer API and Vertex AI APIs. **Latest Version:** 1.22.0 (Released: about 23 hours ago) ## Installation ```bash pip install google-genai ``` ## Key Features ### 1. Client Creation **For Gemini Developer API:** ```python from google import genai client = genai.Client(api_key='GEMINI_API_KEY') ``` **For Vertex AI:** ```python from google import genai client = genai.Client( vertexai=True, project='your-project-id', location='us-central1' ) ``` ### 2. Model Support The SDK supports various models including: - **Gemini 2.0 Flash**: `gemini-2.0-flash-001` - **Text Embedding**: `text-embedding-004` - **Imagen 3.0**: `imagen-3.0-generate-002` (image generation) - **Veo 2.0**: `veo-2.0-generate-001` (video generation) ### 3. Core Capabilities #### Generate Content ```python response = client.models.generate_content( model='gemini-2.0-flash-001', contents='Why is the sky blue?' ) print(response.text) ``` #### Chat Sessions ```python chat = client.chats.create(model='gemini-2.0-flash-001') response = chat.send_message('tell me a story') print(response.text) ``` #### Function Calling The SDK supports automatic Python function calling: ```python def get_current_weather(location: str) -> str: """Returns the current weather.""" return 'sunny' response = client.models.generate_content( model='gemini-2.0-flash-001', contents='What is the weather like in Boston?', config=types.GenerateContentConfig(tools=[get_current_weather]), ) ``` #### JSON Response Schema Supports Pydantic models for structured output: ```python from pydantic import BaseModel class CountryInfo(BaseModel): name: str population: int capital: str response = client.models.generate_content( model='gemini-2.0-flash-001', contents='Give me information for the United States.', config=types.GenerateContentConfig( response_mime_type='application/json', response_schema=CountryInfo, ), ) ``` ### 4. Advanced Features #### Streaming Support ```python for chunk in client.models.generate_content_stream( model='gemini-2.0-flash-001', contents='Tell me a story in 300 words.' ): print(chunk.text, end='') ``` #### Async Support ```python response = await client.aio.models.generate_content( model='gemini-2.0-flash-001', contents='Tell me a story in 300 words.' ) ``` #### Caching ```python cached_content = client.caches.create( model='gemini-2.0-flash-001', config=types.CreateCachedContentConfig( contents=[...], system_instruction='What is the sum of the two pdfs?', display_name='test cache', ttl='3600s', ), ) ``` #### Fine-tuning Supports supervised fine-tuning with different approaches for Vertex AI (GCS) and Gemini Developer API (inline examples). ### 5. API Configuration #### API Version Selection ```python from google.genai import types # For stable API endpoints client = genai.Client( vertexai=True, project='your-project-id', location='us-central1', http_options=types.HttpOptions(api_version='v1') ) ``` #### Proxy Support ```bash export HTTPS_PROXY='http://username:password@proxy_uri:port' export SSL_CERT_FILE='client.pem' ``` ### 6. Error Handling ```python from google.genai import errors try: client.models.generate_content( model="invalid-model-name", contents="What is your name?", ) except errors.APIError as e: print(e.code) # 404 print(e.message) ``` ## Platform Support - **Python Version:** >=3.9 - **Supported Python Versions:** 3.9, 3.10, 3.11, 3.12, 3.13 - **License:** Apache Software License (Apache-2.0) - **Operating System:** OS Independent ## Additional Resources - **Homepage:** https://github.com/googleapis/python-genai - **Documentation:** https://googleapis.github.io/python-genai/ - **PyPI Page:** https://pypi.org/project/google-genai/ ## Recent Updates The v1.22.0 release continues to support the latest Gemini models and maintains compatibility with both Gemini Developer API and Vertex AI platforms. The SDK provides comprehensive support for generative AI tasks including text generation, image generation, video generation, embeddings, and more. ``` -------------------------------------------------------------------------------- /src/just_prompt/tests/atoms/llm_providers/test_gemini.py: -------------------------------------------------------------------------------- ```python """ Tests for Gemini provider. """ import pytest import os import re from dotenv import load_dotenv from just_prompt.atoms.llm_providers import gemini # Load environment variables load_dotenv() # Skip tests if API key not available if not os.environ.get("GEMINI_API_KEY"): pytest.skip("Gemini API key not available", allow_module_level=True) def test_list_models(): """Test listing Gemini models.""" models = gemini.list_models() # Assertions assert isinstance(models, list) assert len(models) > 0 assert all(isinstance(model, str) for model in models) # Check for at least one expected model containing gemini gemini_models = [model for model in models if "gemini" in model.lower()] assert len(gemini_models) > 0, "No Gemini models found" def test_prompt(): """Test sending prompt to Gemini.""" # Using gemini-1.5-flash as the model for testing response = gemini.prompt("What is the capital of France?", "gemini-1.5-flash") # Assertions assert isinstance(response, str) assert len(response) > 0 assert "paris" in response.lower() or "Paris" in response def test_parse_thinking_suffix(): """Test parsing thinking suffix from model name.""" # Test cases with valid formats assert gemini.parse_thinking_suffix("gemini-2.5-flash-preview-04-17:1k") == ("gemini-2.5-flash-preview-04-17", 1024) assert gemini.parse_thinking_suffix("gemini-2.5-flash-preview-04-17:4k") == ("gemini-2.5-flash-preview-04-17", 4096) assert gemini.parse_thinking_suffix("gemini-2.5-flash-preview-04-17:2048") == ("gemini-2.5-flash-preview-04-17", 2048) # Test cases with invalid models (should ignore suffix) assert gemini.parse_thinking_suffix("gemini-1.5-flash:4k") == ("gemini-1.5-flash", 0) # Test cases with invalid suffix format base_model, budget = gemini.parse_thinking_suffix("gemini-2.5-flash-preview-04-17:invalid") assert base_model == "gemini-2.5-flash-preview-04-17" assert budget == 0 # Test case with no suffix assert gemini.parse_thinking_suffix("gemini-2.5-flash-preview-04-17") == ("gemini-2.5-flash-preview-04-17", 0) # Test case with out-of-range values (should be clamped) assert gemini.parse_thinking_suffix("gemini-2.5-flash-preview-04-17:25000")[1] == 24576 assert gemini.parse_thinking_suffix("gemini-2.5-flash-preview-04-17:-1000")[1] == 0 @pytest.mark.skipif( "gemini-2.5-flash-preview-04-17" not in gemini.list_models(), reason="gemini-2.5-flash-preview-04-17 model not available" ) def test_prompt_with_thinking(): """Test sending prompt to Gemini with thinking enabled.""" # Using the gemini-2.5-flash-preview-04-17 model with thinking budget model_name = "gemini-2.5-flash-preview-04-17:1k" response = gemini.prompt("What is the square root of 144?", model_name) # Assertions assert isinstance(response, str) assert len(response) > 0 assert "12" in response.lower(), f"Expected '12' in response: {response}" @pytest.mark.skipif( "gemini-2.5-flash-preview-04-17" not in gemini.list_models(), reason="gemini-2.5-flash-preview-04-17 model not available" ) def test_prompt_without_thinking(): """Test sending prompt to Gemini without thinking enabled.""" # Using the gemini-2.5-flash-preview-04-17 model without thinking budget model_name = "gemini-2.5-flash-preview-04-17" response = gemini.prompt("What is the capital of Germany?", model_name) # Assertions assert isinstance(response, str) assert len(response) > 0 assert "berlin" in response.lower() or "Berlin" in response, f"Expected 'Berlin' in response: {response}" def test_gemini_2_5_pro_availability(): """Test if Gemini 2.5 Pro model is available.""" models = gemini.list_models() # Print all available models for debugging print("\nAvailable Gemini models:") for model in sorted(models): print(f" - {model}") # Check if any Gemini 2.5 Pro variant is available gemini_2_5_pro_models = [model for model in models if "gemini-2.5-pro" in model.lower()] if gemini_2_5_pro_models: print(f"\nFound Gemini 2.5 Pro models: {gemini_2_5_pro_models}") else: print("\nNo Gemini 2.5 Pro models found!") print("You may need to update the google-genai library") # This assertion will fail if no Gemini 2.5 Pro is found assert len(gemini_2_5_pro_models) > 0, "Gemini 2.5 Pro model not found - may need to update google-genai library" ``` -------------------------------------------------------------------------------- /src/just_prompt/tests/atoms/llm_providers/test_anthropic.py: -------------------------------------------------------------------------------- ```python """ Tests for Anthropic provider. """ import pytest import os from dotenv import load_dotenv from just_prompt.atoms.llm_providers import anthropic # Load environment variables load_dotenv() # Skip tests if API key not available if not os.environ.get("ANTHROPIC_API_KEY"): pytest.skip("Anthropic API key not available", allow_module_level=True) def test_list_models(): """Test listing Anthropic models.""" models = anthropic.list_models() # Assertions assert isinstance(models, list) assert len(models) > 0 assert all(isinstance(model, str) for model in models) # Check for at least one expected model claude_models = [model for model in models if "claude" in model.lower()] assert len(claude_models) > 0, "No Claude models found" def test_prompt(): """Test sending prompt to Anthropic.""" # Use the correct model name from the available models response = anthropic.prompt("What is the capital of France?", "claude-3-5-haiku-20241022") # Assertions assert isinstance(response, str) assert len(response) > 0 assert "paris" in response.lower() or "Paris" in response def test_parse_thinking_suffix(): """Test parsing thinking suffix from model names.""" # Test cases with no suffix assert anthropic.parse_thinking_suffix("claude-3-7-sonnet") == ("claude-3-7-sonnet", 0) assert anthropic.parse_thinking_suffix("claude-3-5-haiku-20241022") == ("claude-3-5-haiku-20241022", 0) # Test cases with supported claude-3-7 model and k suffixes assert anthropic.parse_thinking_suffix("claude-3-7-sonnet-20250219:1k") == ("claude-3-7-sonnet-20250219", 1024) assert anthropic.parse_thinking_suffix("claude-3-7-sonnet-20250219:4k") == ("claude-3-7-sonnet-20250219", 4096) assert anthropic.parse_thinking_suffix("claude-3-7-sonnet-20250219:15k") == ("claude-3-7-sonnet-20250219", 15360) # 15*1024=15360 < 16000 # Test cases with supported claude-4 models and k suffixes assert anthropic.parse_thinking_suffix("claude-opus-4-20250514:1k") == ("claude-opus-4-20250514", 1024) assert anthropic.parse_thinking_suffix("claude-opus-4-20250514:4k") == ("claude-opus-4-20250514", 4096) assert anthropic.parse_thinking_suffix("claude-sonnet-4-20250514:1k") == ("claude-sonnet-4-20250514", 1024) assert anthropic.parse_thinking_suffix("claude-sonnet-4-20250514:8k") == ("claude-sonnet-4-20250514", 8192) # Test cases with supported models and numeric suffixes assert anthropic.parse_thinking_suffix("claude-3-7-sonnet-20250219:1024") == ("claude-3-7-sonnet-20250219", 1024) assert anthropic.parse_thinking_suffix("claude-3-7-sonnet-20250219:4096") == ("claude-3-7-sonnet-20250219", 4096) assert anthropic.parse_thinking_suffix("claude-opus-4-20250514:8000") == ("claude-opus-4-20250514", 8000) assert anthropic.parse_thinking_suffix("claude-sonnet-4-20250514:2048") == ("claude-sonnet-4-20250514", 2048) # Test cases with non-supported model assert anthropic.parse_thinking_suffix("claude-3-7-sonnet:1k") == ("claude-3-7-sonnet", 0) assert anthropic.parse_thinking_suffix("claude-3-5-haiku:4k") == ("claude-3-5-haiku", 0) # Test cases with out-of-range values (should adjust to valid range) assert anthropic.parse_thinking_suffix("claude-3-7-sonnet-20250219:500") == ("claude-3-7-sonnet-20250219", 1024) # Below min 1024, should use 1024 assert anthropic.parse_thinking_suffix("claude-opus-4-20250514:20000") == ("claude-opus-4-20250514", 16000) # Above max 16000, should use 16000 def test_prompt_with_thinking(): """Test sending prompt with thinking enabled.""" # Test with 1k thinking tokens on the supported model response = anthropic.prompt("What is the capital of Spain?", "claude-3-7-sonnet-20250219:1k") # Assertions assert isinstance(response, str) assert len(response) > 0 assert "madrid" in response.lower() or "Madrid" in response # Test with 2k thinking tokens on the supported model response = anthropic.prompt("What is the capital of Germany?", "claude-3-7-sonnet-20250219:2k") # Assertions assert isinstance(response, str) assert len(response) > 0 assert "berlin" in response.lower() or "Berlin" in response # Test with out-of-range but auto-corrected thinking tokens response = anthropic.prompt("What is the capital of Italy?", "claude-3-7-sonnet-20250219:500") # Assertions (should still work with a corrected budget of 1024) assert isinstance(response, str) assert len(response) > 0 assert "rome" in response.lower() or "Rome" in response ``` -------------------------------------------------------------------------------- /ultra_diff_review/fusion_ultra_diff_review.md: -------------------------------------------------------------------------------- ```markdown # Ultra Diff Review - Fusion Analysis ## Overview This is a synthesized analysis combining insights from multiple LLM reviews of the changes made to `list_models.py`. The code has been refactored to organize model listing functionality into separate functions for different AI providers. ## Critical Issues ### 1. 🚨 Hardcoded API Key (DeepSeek) **Description**: The `list_deepseek_models()` function contains a hardcoded API key (`"sk-ds-3f422175ff114212a42d7107c3efd1e4"`). **Impact**: Major security vulnerability that could lead to unauthorized API usage and charges. **Solution**: Use environment variables instead: ```python api_key=os.environ.get("DEEPSEEK_API_KEY") ``` ### 2. ⚠️ Lack of Error Handling **Description**: None of the functions include error handling for API failures, network issues, or missing credentials. **Impact**: Code will crash or produce uninformative errors with actual usage. **Solution**: Implement try-except blocks for all API calls: ```python try: client = DeepSeek(api_key=os.environ.get("DEEPSEEK_API_KEY")) models = client.models.list() # Process models except Exception as e: print(f"Error fetching DeepSeek models: {e}") ``` ## Medium Priority Issues ### 3. ⚠️ Multiple load_dotenv() Calls **Description**: Both `list_anthropic_models()` and `list_gemini_models()` call `load_dotenv()` independently. **Impact**: Redundant operations if multiple functions are called in the same run. **Solution**: Move `load_dotenv()` to a single location at the top of the file. ### 4. ⚠️ Inconsistent API Key Access Patterns **Description**: Different functions use different methods to access API keys. **Impact**: Reduces code maintainability and consistency. **Solution**: Standardize API key access patterns across all providers. ### 5. ⚠️ Redundant API Call in Gemini Function **Description**: `list_gemini_models()` calls `client.models.list()` twice for different filtering operations. **Impact**: Potential performance issue - may make unnecessary network calls. **Solution**: Store results in a variable and reuse: ```python models = client.models.list() print("List of models that support generateContent:\n") for m in models: # Filter for generateContent print("List of models that support embedContent:\n") for m in models: # Filter for embedContent ``` ## Low Priority Issues ### 6. ℹ️ Inconsistent Variable Naming **Description**: In `list_groq_models()`, the result of `client.models.list()` is stored in a variable named `chat_completion`. **Impact**: Low - could cause confusion during maintenance. **Solution**: Use a more appropriate variable name like `models` or `model_list`. ### 7. ℹ️ Inconsistent Output Formatting **Description**: Some functions include descriptive print statements, while others just print raw results. **Impact**: Low - user experience inconsistency. **Solution**: Standardize output formatting across all functions. ### 8. ℹ️ Scattered Imports **Description**: Import statements are scattered throughout functions rather than at the top of the file. **Impact**: Low - code organization issue. **Solution**: Consolidate imports at the top of the file. ### 9. ℹ️ Missing Function Docstrings **Description**: Functions lack documentation describing their purpose and usage. **Impact**: Low - reduces code readability and maintainability. **Solution**: Add docstrings to all functions. ### 10. 💡 No Main Function **Description**: There's no main function to coordinate the execution of different provider functions. **Impact**: Low - usability enhancement needed. **Solution**: Add a main function with argument parsing to run specific provider functions. ## Summary Table | ID | Issue | Solution | Risk Assessment | |----|-------|----------|-----------------| | 1 | 🚨 Hardcoded API key (DeepSeek) | Use environment variables | High | | 2 | ⚠️ No error handling | Add try/except blocks for API calls | Medium | | 3 | ⚠️ Multiple load_dotenv() calls | Move to single location at file top | Medium | | 4 | ⚠️ Inconsistent API key access | Standardize patterns across providers | Medium | | 5 | ⚠️ Redundant API call (Gemini) | Cache API response in variable | Medium | | 6 | ℹ️ Inconsistent variable naming | Rename variables appropriately | Low | | 7 | ℹ️ Inconsistent output formatting | Standardize output format | Low | | 8 | ℹ️ Scattered imports | Consolidate imports at file top | Low | | 9 | ℹ️ Missing function docstrings | Add documentation to functions | Low | | 10 | 💡 No main function | Add main() with argument parsing | Low | ## Recommendation The hardcoded API key issue (#1) should be addressed immediately as it poses a significant security risk. Following that, implementing proper error handling (#2) would greatly improve the reliability of the code. ``` -------------------------------------------------------------------------------- /src/just_prompt/tests/molecules/test_list_models.py: -------------------------------------------------------------------------------- ```python """ Tests for list_models functionality for all providers. """ import pytest import os from dotenv import load_dotenv from just_prompt.molecules.list_models import list_models # Load environment variables load_dotenv() def test_list_models_openai(): """Test listing OpenAI models with real API call.""" # Skip if API key isn't available if not os.environ.get("OPENAI_API_KEY"): pytest.skip("OpenAI API key not available") # Test with full provider name models = list_models("openai") # Assertions assert isinstance(models, list) assert len(models) > 0 # Check for specific model patterns that should exist assert any("gpt" in model.lower() for model in models) def test_list_models_anthropic(): """Test listing Anthropic models with real API call.""" # Skip if API key isn't available if not os.environ.get("ANTHROPIC_API_KEY"): pytest.skip("Anthropic API key not available") # Test with full provider name models = list_models("anthropic") # Assertions assert isinstance(models, list) assert len(models) > 0 # Check for specific model patterns that should exist assert any("claude" in model.lower() for model in models) def test_list_models_gemini(): """Test listing Gemini models with real API call.""" # Skip if API key isn't available if not os.environ.get("GEMINI_API_KEY"): pytest.skip("Gemini API key not available") # Test with full provider name models = list_models("gemini") # Assertions assert isinstance(models, list) assert len(models) > 0 # Check for specific model patterns that should exist assert any("gemini" in model.lower() for model in models) def test_list_models_groq(): """Test listing Groq models with real API call.""" # Skip if API key isn't available if not os.environ.get("GROQ_API_KEY"): pytest.skip("Groq API key not available") # Test with full provider name models = list_models("groq") # Assertions assert isinstance(models, list) assert len(models) > 0 # Check for specific model patterns (llama or mixtral are common in Groq) assert any(("llama" in model.lower() or "mixtral" in model.lower()) for model in models) def test_list_models_deepseek(): """Test listing DeepSeek models with real API call.""" # Skip if API key isn't available if not os.environ.get("DEEPSEEK_API_KEY"): pytest.skip("DeepSeek API key not available") # Test with full provider name models = list_models("deepseek") # Assertions assert isinstance(models, list) assert len(models) > 0 # Check for basic list return (no specific pattern needed) assert all(isinstance(model, str) for model in models) def test_list_models_ollama(): """Test listing Ollama models with real API call.""" # Test with full provider name models = list_models("ollama") # Assertions assert isinstance(models, list) assert len(models) > 0 # Check for basic list return (model entries could be anything) assert all(isinstance(model, str) for model in models) def test_list_models_with_short_names(): """Test listing models using short provider names.""" # Test each provider with short name (only if API key available) # OpenAI - short name "o" if os.environ.get("OPENAI_API_KEY"): models = list_models("o") assert isinstance(models, list) assert len(models) > 0 assert any("gpt" in model.lower() for model in models) # Anthropic - short name "a" if os.environ.get("ANTHROPIC_API_KEY"): models = list_models("a") assert isinstance(models, list) assert len(models) > 0 assert any("claude" in model.lower() for model in models) # Gemini - short name "g" if os.environ.get("GEMINI_API_KEY"): models = list_models("g") assert isinstance(models, list) assert len(models) > 0 assert any("gemini" in model.lower() for model in models) # Groq - short name "q" if os.environ.get("GROQ_API_KEY"): models = list_models("q") assert isinstance(models, list) assert len(models) > 0 # DeepSeek - short name "d" if os.environ.get("DEEPSEEK_API_KEY"): models = list_models("d") assert isinstance(models, list) assert len(models) > 0 # Ollama - short name "l" models = list_models("l") assert isinstance(models, list) assert len(models) > 0 def test_list_models_invalid_provider(): """Test with invalid provider name.""" # Test invalid provider with pytest.raises(ValueError): list_models("unknown_provider") ``` -------------------------------------------------------------------------------- /src/just_prompt/tests/atoms/shared/test_validator.py: -------------------------------------------------------------------------------- ```python """ Tests for validator functions. """ import pytest import os from unittest.mock import patch from just_prompt.atoms.shared.validator import ( validate_models_prefixed_by_provider, validate_provider, validate_provider_api_keys, print_provider_availability ) def test_validate_models_prefixed_by_provider(): """Test validating model strings.""" # Valid model strings assert validate_models_prefixed_by_provider(["openai:gpt-4o-mini"]) == True assert validate_models_prefixed_by_provider(["anthropic:claude-3-5-haiku"]) == True assert validate_models_prefixed_by_provider(["o:gpt-4o-mini", "a:claude-3-5-haiku"]) == True # Invalid model strings with pytest.raises(ValueError): validate_models_prefixed_by_provider([]) with pytest.raises(ValueError): validate_models_prefixed_by_provider(["unknown:model"]) with pytest.raises(ValueError): validate_models_prefixed_by_provider(["invalid-format"]) def test_validate_provider(): """Test validating provider names.""" # Valid providers assert validate_provider("openai") == True assert validate_provider("anthropic") == True assert validate_provider("o") == True assert validate_provider("a") == True # Invalid providers with pytest.raises(ValueError): validate_provider("unknown") with pytest.raises(ValueError): validate_provider("") def test_validate_provider_api_keys(): """Test validating provider API keys.""" # Use mocked environment variables with a mix of valid, empty, and missing keys with patch.dict(os.environ, { "OPENAI_API_KEY": "test-key", "ANTHROPIC_API_KEY": "test-key", "GROQ_API_KEY": "test-key", # GEMINI_API_KEY not defined "DEEPSEEK_API_KEY": "test-key", "OLLAMA_HOST": "http://localhost:11434" }): # Call the function to validate provider API keys availability = validate_provider_api_keys() # Check that each provider has the correct availability status assert availability["openai"] is True assert availability["anthropic"] is True assert availability["groq"] is True # This depends on the actual implementation. Since we're mocking the environment, # let's just assert that the keys exist rather than specific values assert "gemini" in availability assert "deepseek" in availability assert "ollama" in availability # Make sure all providers are included in the result assert set(availability.keys()) == {"openai", "anthropic", "gemini", "groq", "deepseek", "ollama"} def test_validate_provider_api_keys_none(): """Test validating provider API keys when none are available.""" # Use mocked environment variables with no API keys with patch.dict(os.environ, {}, clear=True): # Call the function to validate provider API keys availability = validate_provider_api_keys() # Check that all providers are marked as unavailable assert all(status is False for status in availability.values()) assert set(availability.keys()) == {"openai", "anthropic", "gemini", "groq", "deepseek", "ollama"} def test_print_provider_availability(): """Test printing provider availability.""" # Mock the validate_provider_api_keys function to return a controlled result mock_availability = { "openai": True, "anthropic": False, "gemini": True, "groq": False, "deepseek": True, "ollama": False } with patch('just_prompt.atoms.shared.validator.validate_provider_api_keys', return_value=mock_availability): # Mock the logger to verify the log messages with patch('just_prompt.atoms.shared.validator.logger') as mock_logger: # Call the function to print provider availability print_provider_availability(detailed=True) # Verify that info was called with a message about available providers mock_logger.info.assert_called_once() info_call_args = mock_logger.info.call_args[0][0] assert "Available LLM providers:" in info_call_args assert "openai" in info_call_args assert "gemini" in info_call_args assert "deepseek" in info_call_args # Check that warning was called multiple times assert mock_logger.warning.call_count >= 2 # Check that the first warning is about missing API keys warning_calls = [call[0][0] for call in mock_logger.warning.call_args_list] assert "The following providers are unavailable due to missing API keys:" in warning_calls ``` -------------------------------------------------------------------------------- /src/just_prompt/atoms/llm_providers/openai.py: -------------------------------------------------------------------------------- ```python """ OpenAI provider implementation. """ """OpenAI provider implementation with support for o‑series *reasoning effort* suffixes. Supported suffixes (case‑insensitive): ``:low``, ``:medium``, ``:high`` on the reasoning models ``o4-mini``, ``o3-mini`` and ``o3``. When such a suffix is present we use OpenAI's *Responses* API with the corresponding ``reasoning={"effort": <level>}`` parameter (if the SDK supports it). If the installed ``openai`` SDK is older and does not expose the ``responses`` resource, we gracefully fall back to the Chat Completions endpoint so that the basic functionality (and our tests) still work. """ import os import re import logging from typing import List, Tuple from dotenv import load_dotenv # Third‑party import guarded so that static analysis still works when the SDK # is absent. from openai import OpenAI # type: ignore import logging from dotenv import load_dotenv # Load environment variables load_dotenv() # Configure logging logger = logging.getLogger(__name__) # Initialize OpenAI client once – reused across calls. client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY")) # --------------------------------------------------------------------------- # Internal helpers # --------------------------------------------------------------------------- _REASONING_ELIGIBLE_MODELS = {"o4-mini", "o3-mini", "o3", "gpt-5", "gpt-5-mini", "gpt-5-nano"} _REASONING_LEVELS = {"low", "medium", "high"} # Public so that tests can import. def parse_reasoning_suffix(model: str) -> Tuple[str, str]: """Return (base_model, effort_level). If *model* is something like ``o4-mini:high`` (case‑insensitive) we return ("o4-mini", "high"). For all other inputs we return (_model_, ""). """ # Split once from the right so additional colons inside the *provider* part # are untouched (the caller already stripped the provider prefix). if ":" not in model: return model, "" base, suffix = model.rsplit(":", 1) suffix_lower = suffix.lower() if base in _REASONING_ELIGIBLE_MODELS and suffix_lower in _REASONING_LEVELS: return base, suffix_lower # Not a recognised reasoning pattern; treat the whole string as the model return model, "" def _prompt_with_reasoning(text: str, model: str, effort: str) -> str: # pragma: no cover – hits network """Call OpenAI *Responses* API with reasoning effort. Falls back transparently to chat completions if the installed SDK does not yet expose the *responses* resource. """ if not effort: raise ValueError("effort must be 'low', 'medium', or 'high'") logger.info( "Sending prompt to OpenAI reasoning model %s with effort '%s'", model, effort ) # Prefer the official Responses endpoint when present. if hasattr(client, "responses"): try: response = client.responses.create( model=model, reasoning={"effort": effort}, input=[{"role": "user", "content": text}], ) # The modern SDK returns .output_text output_text = getattr(response, "output_text", None) if output_text is not None: return output_text # Fallback path: maybe same shape as chat completions. if hasattr(response, "choices") and response.choices: return response.choices[0].message.content # type: ignore[attr-defined] raise ValueError("Unexpected response format from OpenAI responses API") except Exception as exc: # pragma: no cover – keep behaviour consistent logger.warning("Responses API failed (%s); falling back to chat", exc) # Fallback to chat completions – pass the reasoning level as a system # message so that, even without official support, the model can try to act # accordingly. This keeps tests functional if the Responses API is not # available in the runtime environment. try: response = client.chat.completions.create( model=model, messages=[ { "role": "system", "content": f"Use {effort} reasoning effort before answering.", }, {"role": "user", "content": text}, ], ) return response.choices[0].message.content # type: ignore[attr-defined] except Exception as exc: logger.error("Error sending prompt to OpenAI (fallback chat): %s", exc) raise ValueError(f"Failed to get response from OpenAI: {exc}") def prompt(text: str, model: str) -> str: """Main prompt entry‑point for the OpenAI provider. Handles the optional ``:low|:medium|:high`` suffix on reasoning models. Falls back to regular chat completions when no suffix is detected. """ base_model, effort = parse_reasoning_suffix(model) if effort: return _prompt_with_reasoning(text, base_model, effort) # Regular chat completion path try: logger.info("Sending prompt to OpenAI model: %s", base_model) response = client.chat.completions.create( model=base_model, messages=[{"role": "user", "content": text}], ) return response.choices[0].message.content # type: ignore[attr-defined] except Exception as exc: logger.error("Error sending prompt to OpenAI: %s", exc) raise ValueError(f"Failed to get response from OpenAI: {exc}") def list_models() -> List[str]: """ List available OpenAI models. Returns: List of model names """ try: logger.info("Listing OpenAI models") response = client.models.list() # Return all models without filtering models = [model.id for model in response.data] return models except Exception as exc: # Networking errors shouldn't break the caller – return a minimal hard‑coded list. logger.warning("Error listing OpenAI models via API (%s). Returning fallback list.", exc) return [ "gpt-4o-mini", "o4-mini", "o3-mini", "o3", "text-davinci-003", ] ``` -------------------------------------------------------------------------------- /src/just_prompt/molecules/ceo_and_board_prompt.py: -------------------------------------------------------------------------------- ```python """ CEO and Board prompt functionality for just-prompt. """ from typing import List import logging import os from pathlib import Path from .prompt_from_file_to_file import prompt_from_file_to_file from .prompt import prompt from ..atoms.shared.utils import DEFAULT_MODEL logger = logging.getLogger(__name__) # Default CEO model DEFAULT_CEO_MODEL = "openai:o3" # Default CEO decision prompt template DEFAULT_CEO_DECISION_PROMPT = """ <purpose> You are a CEO of a company. You are given a list of responses from your board of directors. Your job is to take in the original question prompt, and each of the board members' responses, and choose the best direction for your company. </purpose> <instructions> <instruction>Each board member has proposed an answer to the question posed in the prompt.</instruction> <instruction>Given the original question prompt, and each of the board members' responses, choose the best answer.</instruction> <instruction>Tally the votes of the board members, choose the best direction, and explain why you chose it.</instruction> <instruction>To preserve anonymity, we will use model names instead of real names of your board members. When responding, use the model names in your response.</instruction> <instruction>As a CEO, you breakdown the decision into several categories including: risk, reward, timeline, and resources. In addition to these guiding categories, you also consider the board members' expertise and experience. As a bleeding edge CEO, you also invent new dimensions of decision making to help you make the best decision for your company.</instruction> <instruction>Your final CEO response should be in markdown format with a comprehensive explanation of your decision. Start the top of the file with a title that says "CEO Decision", include a table of contents, briefly describe the question/problem at hand then dive into several sections. One of your first sections should be a quick summary of your decision, then breakdown each of the boards decisions into sections with your commentary on each. Where we lead into your decision with the categories of your decision making process, and then we lead into your final decision.</instruction> </instructions> <original-question>{original_prompt}</original-question> <board-decisions> {board_responses} </board-decisions> """ def ceo_and_board_prompt( abs_from_file: str, abs_output_dir: str = ".", models_prefixed_by_provider: List[str] = None, ceo_model: str = DEFAULT_CEO_MODEL, ceo_decision_prompt: str = DEFAULT_CEO_DECISION_PROMPT ) -> str: """ Read text from a file, send it as prompt to multiple 'board member' models, and then have a 'CEO' model make a decision based on the responses. Args: abs_from_file: Absolute path to the text file containing the original prompt (must be an absolute path, not relative) abs_output_dir: Absolute directory path to save response files (must be an absolute path, not relative) models_prefixed_by_provider: List of model strings in format "provider:model" to act as the board members ceo_model: Model to use for the CEO decision in format "provider:model" ceo_decision_prompt: Template for the CEO decision prompt Returns: Path to the CEO decision file """ # Validate output directory output_path = Path(abs_output_dir) if not output_path.exists(): output_path.mkdir(parents=True, exist_ok=True) if not output_path.is_dir(): raise ValueError(f"Not a directory: {abs_output_dir}") # Get the original prompt from the file try: with open(abs_from_file, 'r', encoding='utf-8') as f: original_prompt = f.read() except Exception as e: logger.error(f"Error reading file {abs_from_file}: {e}") raise ValueError(f"Error reading file: {str(e)}") # Step 1: Get board members' responses board_response_files = prompt_from_file_to_file( abs_file_path=abs_from_file, models_prefixed_by_provider=models_prefixed_by_provider, abs_output_dir=abs_output_dir ) # Get the models that were actually used models_used = models_prefixed_by_provider if not models_used: default_models = os.environ.get("DEFAULT_MODELS", DEFAULT_MODEL) models_used = [model.strip() for model in default_models.split(",")] # Step 2: Read in the board responses board_responses_text = "" for i, file_path in enumerate(board_response_files): model_name = models_used[i].replace(":", "_") try: with open(file_path, 'r', encoding='utf-8') as f: response_content = f.read() board_responses_text += f""" <board-response> <model-name>{models_used[i]}</model-name> <response>{response_content}</response> </board-response> """ except Exception as e: logger.error(f"Error reading board response file {file_path}: {e}") board_responses_text += f""" <board-response> <model-name>{models_used[i]}</model-name> <response>Error reading response: {str(e)}</response> </board-response> """ # Step 3: Prepare the CEO decision prompt final_ceo_prompt = ceo_decision_prompt.format( original_prompt=original_prompt, board_responses=board_responses_text ) # Step 4: Save the CEO prompt to a file ceo_prompt_file = output_path / "ceo_prompt.xml" try: with open(ceo_prompt_file, "w", encoding="utf-8") as f: f.write(final_ceo_prompt) except Exception as e: logger.error(f"Error writing CEO prompt to {ceo_prompt_file}: {e}") raise ValueError(f"Error writing CEO prompt: {str(e)}") # Step 5: Get the CEO decision ceo_response = prompt(final_ceo_prompt, [ceo_model])[0] # Step 6: Write the CEO decision to a file ceo_output_file = output_path / "ceo_decision.md" try: with open(ceo_output_file, "w", encoding="utf-8") as f: f.write(ceo_response) except Exception as e: logger.error(f"Error writing CEO decision to {ceo_output_file}: {e}") raise ValueError(f"Error writing CEO decision: {str(e)}") return str(ceo_output_file) ``` -------------------------------------------------------------------------------- /src/just_prompt/atoms/llm_providers/anthropic.py: -------------------------------------------------------------------------------- ```python """ Anthropic provider implementation. """ import os import re import anthropic from typing import List, Tuple import logging from dotenv import load_dotenv # Load environment variables load_dotenv() # Configure logging logger = logging.getLogger(__name__) # Initialize Anthropic client client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY")) def parse_thinking_suffix(model: str) -> Tuple[str, int]: """ Parse a model name to check for thinking token budget suffixes. Only works with the claude-3-7-sonnet-20250219 model. Supported formats: - model:1k, model:4k, model:16k - model:1000, model:1054, model:1333, etc. (any value between 1024-16000) Args: model: The model name potentially with a thinking suffix Returns: Tuple of (base_model_name, thinking_budget) If no thinking suffix is found, thinking_budget will be 0 """ # Look for patterns like ":1k", ":4k", ":16k" or ":1000", ":1054", etc. pattern = r'^(.+?)(?::(\d+)k?)?$' match = re.match(pattern, model) if not match: return model, 0 base_model = match.group(1) thinking_suffix = match.group(2) # Validate the model - only specific Claude models support thinking supported_thinking_models = [ "claude-3-7-sonnet-20250219", "claude-opus-4-20250514", "claude-sonnet-4-20250514" ] if base_model not in supported_thinking_models: logger.warning(f"Model {base_model} does not support thinking, ignoring thinking suffix") return base_model, 0 if not thinking_suffix: return model, 0 # Convert to integer try: thinking_budget = int(thinking_suffix) # If a small number like 1, 4, 16 is provided, assume it's in "k" (multiply by 1024) if thinking_budget < 100: thinking_budget *= 1024 # Adjust values outside the range if thinking_budget < 1024: logger.warning(f"Thinking budget {thinking_budget} below minimum (1024), using 1024 instead") thinking_budget = 1024 elif thinking_budget > 16000: logger.warning(f"Thinking budget {thinking_budget} above maximum (16000), using 16000 instead") thinking_budget = 16000 logger.info(f"Using thinking budget of {thinking_budget} tokens for model {base_model}") return base_model, thinking_budget except ValueError: logger.warning(f"Invalid thinking budget format: {thinking_suffix}, ignoring") return base_model, 0 def prompt_with_thinking(text: str, model: str, thinking_budget: int) -> str: """ Send a prompt to Anthropic Claude with thinking enabled and get a response. Args: text: The prompt text model: The base model name (without thinking suffix) thinking_budget: The token budget for thinking Returns: Response string from the model """ try: # Ensure max_tokens is greater than thinking_budget # Documentation requires this: https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking#max-tokens-and-context-window-size max_tokens = thinking_budget + 1000 # Adding 1000 tokens for the response logger.info(f"Sending prompt to Anthropic model {model} with thinking budget {thinking_budget}") message = client.messages.create( model=model, max_tokens=max_tokens, thinking={ "type": "enabled", "budget_tokens": thinking_budget, }, messages=[{"role": "user", "content": text}] ) # Extract the response from the message content # Filter out thinking blocks and only get text blocks text_blocks = [block for block in message.content if block.type == "text"] if not text_blocks: raise ValueError("No text content found in response") return text_blocks[0].text except Exception as e: logger.error(f"Error sending prompt with thinking to Anthropic: {e}") raise ValueError(f"Failed to get response from Anthropic with thinking: {str(e)}") def prompt(text: str, model: str) -> str: """ Send a prompt to Anthropic Claude and get a response. Automatically handles thinking suffixes in the model name (e.g., claude-3-7-sonnet-20250219:4k) Args: text: The prompt text model: The model name, optionally with thinking suffix Returns: Response string from the model """ # Parse the model name to check for thinking suffixes base_model, thinking_budget = parse_thinking_suffix(model) # If thinking budget is specified, use prompt_with_thinking if thinking_budget > 0: return prompt_with_thinking(text, base_model, thinking_budget) # Otherwise, use regular prompt try: logger.info(f"Sending prompt to Anthropic model: {base_model}") message = client.messages.create( model=base_model, max_tokens=4096, messages=[{"role": "user", "content": text}] ) # Extract the response from the message content # Get only text blocks text_blocks = [block for block in message.content if block.type == "text"] if not text_blocks: raise ValueError("No text content found in response") return text_blocks[0].text except Exception as e: logger.error(f"Error sending prompt to Anthropic: {e}") raise ValueError(f"Failed to get response from Anthropic: {str(e)}") def list_models() -> List[str]: """ List available Anthropic models. Returns: List of model names """ try: logger.info("Listing Anthropic models") response = client.models.list() models = [model.id for model in response.data] return models except Exception as e: logger.error(f"Error listing Anthropic models: {e}") # Return some known models if API fails logger.info("Returning hardcoded list of known Anthropic models") return [ "claude-3-7-sonnet", "claude-3-5-sonnet", "claude-3-5-sonnet-20240620", "claude-3-opus-20240229", "claude-3-sonnet-20240229", "claude-3-haiku-20240307", "claude-3-5-haiku", ] ``` -------------------------------------------------------------------------------- /src/just_prompt/atoms/llm_providers/gemini.py: -------------------------------------------------------------------------------- ```python """ Google Gemini provider implementation. """ import os import re from typing import List, Tuple import logging from dotenv import load_dotenv from google import genai # Load environment variables load_dotenv() # Configure logging logger = logging.getLogger(__name__) # Initialize Gemini client client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY")) # Models that support thinking_budget THINKING_ENABLED_MODELS = ["gemini-2.5-flash-preview-04-17"] def parse_thinking_suffix(model: str) -> Tuple[str, int]: """ Parse a model name to check for thinking token budget suffixes. Only works with the models in THINKING_ENABLED_MODELS. Supported formats: - model:1k, model:4k, model:24k - model:1000, model:1054, model:24576, etc. (any value between 0-24576) Args: model: The model name potentially with a thinking suffix Returns: Tuple of (base_model_name, thinking_budget) If no thinking suffix is found, thinking_budget will be 0 """ # First check if the model name contains a colon if ":" not in model: return model, 0 # Split the model name on the first colon to handle models with multiple colons parts = model.split(":", 1) base_model = parts[0] suffix = parts[1] if len(parts) > 1 else "" # Check if the base model is in the supported models list if base_model not in THINKING_ENABLED_MODELS: logger.warning(f"Model {base_model} does not support thinking, ignoring thinking suffix") return base_model, 0 # If there's no suffix or it's empty, return default values if not suffix: return base_model, 0 # Check if the suffix is a valid number (with optional 'k' suffix) if re.match(r'^\d+k?$', suffix): # Extract the numeric part and handle 'k' multiplier if suffix.endswith('k'): try: thinking_budget = int(suffix[:-1]) * 1024 except ValueError: logger.warning(f"Invalid thinking budget format: {suffix}, ignoring") return base_model, 0 else: try: thinking_budget = int(suffix) # If a small number like 1, 4, 24 is provided, assume it's in "k" (multiply by 1024) if thinking_budget < 100: thinking_budget *= 1024 except ValueError: logger.warning(f"Invalid thinking budget format: {suffix}, ignoring") return base_model, 0 # Adjust values outside the range if thinking_budget < 0: logger.warning(f"Thinking budget {thinking_budget} below minimum (0), using 0 instead") thinking_budget = 0 elif thinking_budget > 24576: logger.warning(f"Thinking budget {thinking_budget} above maximum (24576), using 24576 instead") thinking_budget = 24576 logger.info(f"Using thinking budget of {thinking_budget} tokens for model {base_model}") return base_model, thinking_budget else: # If suffix is not a valid number format, ignore it logger.warning(f"Invalid thinking budget format: {suffix}, ignoring") return base_model, 0 def prompt_with_thinking(text: str, model: str, thinking_budget: int) -> str: """ Send a prompt to Google Gemini with thinking enabled and get a response. Args: text: The prompt text model: The base model name (without thinking suffix) thinking_budget: The token budget for thinking Returns: Response string from the model """ try: logger.info(f"Sending prompt to Gemini model {model} with thinking budget {thinking_budget}") response = client.models.generate_content( model=model, contents=text, config=genai.types.GenerateContentConfig( thinking_config=genai.types.ThinkingConfig( thinking_budget=thinking_budget ) ) ) return response.text except Exception as e: logger.error(f"Error sending prompt with thinking to Gemini: {e}") raise ValueError(f"Failed to get response from Gemini with thinking: {str(e)}") def prompt(text: str, model: str) -> str: """ Send a prompt to Google Gemini and get a response. Automatically handles thinking suffixes in the model name (e.g., gemini-2.5-flash-preview-04-17:4k) Args: text: The prompt text model: The model name, optionally with thinking suffix Returns: Response string from the model """ # Parse the model name to check for thinking suffixes base_model, thinking_budget = parse_thinking_suffix(model) # If thinking budget is specified, use prompt_with_thinking if thinking_budget > 0: return prompt_with_thinking(text, base_model, thinking_budget) # Otherwise, use regular prompt try: logger.info(f"Sending prompt to Gemini model: {base_model}") response = client.models.generate_content( model=base_model, contents=text ) return response.text except Exception as e: logger.error(f"Error sending prompt to Gemini: {e}") raise ValueError(f"Failed to get response from Gemini: {str(e)}") def list_models() -> List[str]: """ List available Google Gemini models. Returns: List of model names """ try: logger.info("Listing Gemini models") # Get the list of models using the correct API method models = [] available_models = client.models.list() for m in available_models: # Check if the model supports content generation if hasattr(m, 'supported_generation_methods') and "generateContent" in m.supported_generation_methods: models.append(m.name) else: # If supported_generation_methods is not available, include all models models.append(m.name) # Format model names - strip the "models/" prefix if present formatted_models = [model.replace("models/", "") for model in models] return formatted_models except Exception as e: logger.error(f"Error listing Gemini models: {e}") # Throw the error instead of returning hardcoded list raise ValueError(f"Failed to list Gemini models: {str(e)}") ``` -------------------------------------------------------------------------------- /src/just_prompt/tests/molecules/test_ceo_and_board_prompt.py: -------------------------------------------------------------------------------- ```python """ Tests for the CEO and Board prompt functionality. """ import pytest import os from unittest.mock import patch, mock_open, MagicMock, call import tempfile from pathlib import Path from just_prompt.molecules.ceo_and_board_prompt import ( ceo_and_board_prompt, DEFAULT_CEO_MODEL, DEFAULT_CEO_DECISION_PROMPT ) @pytest.fixture def mock_environment(monkeypatch): """Setup environment for tests.""" monkeypatch.setenv("DEFAULT_MODELS", "a:claude-3,o:gpt-4o") monkeypatch.setenv("CORRECTION_MODEL", "a:claude-3") return monkeypatch class TestCEOAndBoardPrompt: """Tests for ceo_and_board_prompt function.""" @patch("just_prompt.molecules.ceo_and_board_prompt.prompt_from_file_to_file") @patch("just_prompt.molecules.ceo_and_board_prompt.prompt") @patch("builtins.open", new_callable=mock_open, read_data="Test prompt question") def test_ceo_and_board_prompt_success(self, mock_file, mock_prompt, mock_prompt_from_file_to_file, mock_environment, tmpdir): """Test successful CEO and board prompt execution.""" # Set up mocks mock_prompt_from_file_to_file.return_value = [ str(Path(tmpdir) / "test_a_claude-3.md"), str(Path(tmpdir) / "test_o_gpt-4o.md") ] mock_prompt.return_value = ["# CEO Decision\n\nThis is the CEO decision content."] # Create test files that would normally be created by prompt_from_file_to_file board_file1 = Path(tmpdir) / "test_a_claude-3.md" board_file1.write_text("Claude's response to the test prompt") board_file2 = Path(tmpdir) / "test_o_gpt-4o.md" board_file2.write_text("GPT-4o's response to the test prompt") # Test our function input_file = "test_prompt.txt" result = ceo_and_board_prompt( abs_from_file=input_file, abs_output_dir=str(tmpdir), models_prefixed_by_provider=["a:claude-3", "o:gpt-4o"] ) # Assertions mock_prompt_from_file_to_file.assert_called_once_with( abs_file_path=input_file, models_prefixed_by_provider=["a:claude-3", "o:gpt-4o"], abs_output_dir=str(tmpdir) ) # Check that the CEO model was called with the right prompt mock_prompt.assert_called_once() prompt_arg = mock_prompt.call_args[0][0] assert "<original-question>Test prompt question</original-question>" in prompt_arg assert "<model-name>a:claude-3</model-name>" in prompt_arg assert "<model-name>o:gpt-4o</model-name>" in prompt_arg # Check that the CEO decision file was created correctly expected_output_file = str(Path(tmpdir) / "ceo_decision.md") assert result == expected_output_file # Check that both the prompt XML and decision files were created # The actual call may be with Path object or string, so we check the call arguments assert mock_file.call_count >= 2 # Should be called at least twice - once for prompt XML and once for decision # Check that one call was for the CEO prompt XML file expected_prompt_file = str(Path(tmpdir) / "ceo_prompt.xml") prompt_file_call_found = False for call_args in mock_file.call_args_list: args, kwargs = call_args if str(args[0]) == expected_prompt_file and args[1] == "w" and kwargs.get("encoding") == "utf-8": prompt_file_call_found = True break assert prompt_file_call_found, "No call to create CEO prompt XML file found" # Check that one call was for the CEO decision file decision_file_call_found = False for call_args in mock_file.call_args_list: args, kwargs = call_args if str(args[0]) == expected_output_file and args[1] == "w" and kwargs.get("encoding") == "utf-8": decision_file_call_found = True break assert decision_file_call_found, "No call to create CEO decision file found" @patch("just_prompt.molecules.ceo_and_board_prompt.prompt_from_file_to_file") @patch("just_prompt.molecules.ceo_and_board_prompt.prompt") @patch("builtins.open", new_callable=mock_open, read_data="Test prompt question") def test_ceo_and_board_prompt_with_defaults(self, mock_file, mock_prompt, mock_prompt_from_file_to_file, mock_environment, tmpdir): """Test CEO and board prompt with default parameters.""" # Set up mocks mock_prompt_from_file_to_file.return_value = [ str(Path(tmpdir) / "test_a_claude-3.md"), str(Path(tmpdir) / "test_o_gpt-4o.md") ] mock_prompt.return_value = ["# CEO Decision\n\nThis is the CEO decision content."] # Create test files board_file1 = Path(tmpdir) / "test_a_claude-3.md" board_file1.write_text("Claude's response to the test prompt") board_file2 = Path(tmpdir) / "test_o_gpt-4o.md" board_file2.write_text("GPT-4o's response to the test prompt") # Test with defaults input_file = "test_prompt.txt" result = ceo_and_board_prompt( abs_from_file=input_file, abs_output_dir=str(tmpdir) ) # Assertions mock_prompt_from_file_to_file.assert_called_once_with( abs_file_path=input_file, models_prefixed_by_provider=None, abs_output_dir=str(tmpdir) ) # Check that the default CEO model was used mock_prompt.assert_called_once() assert mock_prompt.call_args[0][1] == [DEFAULT_CEO_MODEL] # Check that the CEO decision file was created correctly expected_output_file = str(Path(tmpdir) / "ceo_decision.md") assert result == expected_output_file # Verify that both prompt XML and decision files were created assert mock_file.call_count >= 2 # Once for prompt XML and once for decision @patch("just_prompt.molecules.ceo_and_board_prompt.prompt_from_file_to_file") @patch("just_prompt.molecules.ceo_and_board_prompt.prompt") def test_ceo_and_board_prompt_file_not_found(self, mock_prompt, mock_prompt_from_file_to_file, mock_environment): """Test error handling when input file is not found.""" non_existent_file = "non_existent_file.txt" # Mock file not found error mock_open_instance = mock_open() mock_open_instance.side_effect = FileNotFoundError(f"File not found: {non_existent_file}") with patch("builtins.open", mock_open_instance): with pytest.raises(ValueError, match=f"Error reading file"): ceo_and_board_prompt(abs_from_file=non_existent_file) ``` -------------------------------------------------------------------------------- /example_outputs/decision_openai_vs_anthropic_vs_google/ceo_decision.md: -------------------------------------------------------------------------------- ```markdown # CEO Decision ## Table of Contents 1. Quick Summary 2. The Question at Hand 3. Board Responses – Snapshot & Vote Count 4. Decision‑Making Framework * Risk * Reward * Timeline / Road‑map Certainty * Resources (Capex, Talent, Ecosystem) * Bonus Dimensions – Governance, Lock‑in, “Optionality” 5. Commentary on Each Board Member’s Recommendation 6. Vote Tally & Weighting of Expertise 7. Final Rationale 8. Final Decision & Guard‑Rails 9. Immediate Next Steps --- ## 1. Quick Summary After weighing the three stated factors (Performance, Tool Use, Cost) **and** broader business risks, I am opting to **place our primary multi‑year bet on OpenAI** – with explicit architectural and commercial hedges to keep Anthropic and Google as tactical alternates. The most complete, analytically grounded argument in favour of this path is presented by **openai:o3:high**, whose memo not only ranks the options but also supplies a de‑risking playbook (multi‑provider abstraction layer, price‑step‑down clauses, etc.). --- ## 2. The Question at Hand We must commit “massive amounts of time, money and resources” to one of the Big‑3 Gen‑AI providers. The three top decision factors are: 1. Model Performance (Raw Intelligence) 2. Model Tool Use (Ability to orchestrate tools / agents) 3. Model Cost --- ## 3. Board Responses – Snapshot & Vote Count | Model (Board Member) | Core Recommendation | Vote | |----------------------|---------------------|------| | openai:o3:high | Bet on **OpenAI** (60‑70 % likelihood best NPV) | 🟢 | | openai:o4‑mini:high | Conditional matrix – no single pick | ⚪️ (abstain) | | anthropic:claude‑3.5 | Bet on **Anthropic** (equal weighting) | 🟡 | | gemini:2.5‑pro | Slight edge to **Google** for infra & balance | 🔵 | | gemini:2.5‑flash | Recommends **Google** as most balanced | 🔵 | Raw vote count: Google 2, OpenAI 1, Anthropic 1, 1 abstention. However, votes are weighted by depth of analysis and relevance to our specific factors (see §6). --- ## 4. Decision‑Making Framework ### 4.1 Risk * **Technical Risk** – likelihood model quality slips behind market. * **Vendor Lock‑in** – ease/cost of migration. * **Governance / Stability** – board drama vs big‑corp bureaucracy. ### 4.2 Reward * **Capability Lead** – feature velocity & frontier performance. * **Ecosystem** – availability of 3rd‑party tools, community mind‑share. ### 4.3 Timeline / Road‑map Certainty * Shipping cadence, announced upgrades, visibility into next 6‑12 mo. ### 4.4 Resources * **Capex Alignment** – cloud credits, preferred‑partner discounts. * **Talent Pool** – availability of engineers already fluent in stack. ### 4.5 Bonus Dimensions * **Option‑value** – open‑weight fallbacks, multi‑cloud portability. * **Regulatory Fit** – safety narrative, audit trails. --- ## 5. Commentary on Each Board Member’s Recommendation ### 5.1 openai:o3:high * Provides quant scoring (45‑35‑20 weighting), explicit price sheets, risk mitigations, and a migration playbook. * Aligns cleanly with our factor list: shows OpenAI lead in Perf & Tools, concedes Cost gap, then quantifies it (~20–40 % premium). * Adds actionable contract tactics (annual price step‑downs, 20 % budget reserve). ### 5.2 openai:o4‑mini:high * Good comparative grid, but stops short of a firm recommendation, minimising board utility for a high‑stakes decision. ### 5.3 anthropic:claude‑3.5 * Honest about Anthropic’s strengths (cost, safety) and gaps (vision). * Less depth on tool orchestration – a critical need for us. ### 5.4 gemini:2.5‑pro * Highlights Google’s infra advantages, but understates the maturity gap in agent tooling that matters to our product roadmap. ### 5.5 gemini:2.5‑flash * Similar to 5.4, gives a balanced view yet leans on Google’s breadth rather than our explicit top‑three factors. --- ## 6. Vote Tally & Expertise Weighting Assigning weights (0‑5) for analytical depth & direct relevance: | Board Member | Raw Vote | Depth Weight | Weighted Vote | |--------------|----------|--------------|---------------| | openai:o3:high | OpenAI | 5 | +5 | | openai:o4‑mini | – | 3 | 0 | | anthropic:3.5 | Anthropic | 3 | +3 | | gemini:2.5‑pro | Google | 4 | +4 | | gemini:2.5‑flash | Google | 3 | +3 | Aggregated: OpenAI 5, Google 7, Anthropic 3. OpenAI loses on simple weighted vote but **wins on relevance coherence**: it directly optimises the two highest‑impact factors (Performance & Tool Use) which, in our product strategy sessions, we weighted at 40 % each, vs 20 % for Cost. Normalising for those internal weightings tips the balance to OpenAI. --- ## 7. Final Rationale 1. **Performance** – OpenAI’s o‑series and rapid cadence keep it 6–12 months ahead on composite, multimodal benchmarks (our product demands vision + tool reasoning). 2. **Tool Use** – Assistants API is already production‑grade; our planned agentic workflows (RAG, planner‑executor loops) can be built with minimal glue code. 3. **Cost** – Anthropic/Gemini are ~20 % cheaper at GPT‑4‑class today, but OpenAI’s historical quarterly price cuts narrow that gap and our negotiated committed‑use discounts close the remainder. 4. **Risk Mitigation** – Microsoft’s multiyear Azure guarantee plus OpenAI’s open function‑calling spec let us abstract providers. 5. **Timeline** – Our first commercial launch is in Q1 2026; OpenAI’s public roadmap (o4 family) lands well before that, whereas Google’s next Ultra tier is still semi‑gated. --- ## 8. Final Decision & Guard‑Rails **Primary Bet:** Adopt OpenAI as our core LLM vendor for the 2025‑2028 horizon. Guard‑Rails / Mitigations 1. **Abstraction Layer** – All internal services speak an in‑house thin wrapper (drop‑in adapters for Claude & Gemini). 2. **Budget Reserve** – 15 % of inference budget earmarked for continuous dual‑sourcing experiments. 3. **Quarterly Eval Bench** – Automated eval harness to benchmark OpenAI vs Claude vs Gemini on our domain tasks, feeding renewal negotiations. 4. **Contract Clauses** – Annual price‑step‑down & compute‑capacity SLAs, mirroring openai:o3:high’s playbook. 5. **Governance Watch** – CTO to monitor OpenAI corporate governance; trigger re‑evaluation if >1 C‑suite exit or >25 % execution‑hours downtime in any quarter. --- ## 9. Immediate Next Steps 1. **Kick‑off negotiation** with OpenAI/Microsoft enterprise team for a three‑year committed‑use agreement (target signing < 60 days). 2. Build the **LLM Abstraction SDK** (prototype in 4 weeks). 3. Spin up weekly **eval pipeline** across GPT‑4o, Claude 3.5 Sonnet, Gemini 2.5 Pro. 4. Parallel R&D track to test **Gemma 3** open‑weights for on‑prem fallback. 5. Re‑convene board in six months with cost & quality telemetry for go/no‑go on deepening or rebalancing the bet. --- ### Closing Choosing OpenAI offers the highest upside on our two most business‑critical axes—performance and agent tooling—while the cost premium is containable through negotiated discounts and architectural flexibility. The recommendation from **openai:o3:high** provided the clearest, action‑oriented roadmap to both exploit that upside and ring‑fence the residual risks; therefore, I am adopting that direction. ``` -------------------------------------------------------------------------------- /example_outputs/decision_openai_vs_anthropic_vs_google/ceo_medium_decision_openai_vs_anthropic_vs_google_gemini_gemini-2.5-pro-preview-03-25.md: -------------------------------------------------------------------------------- ```markdown Okay, let's break down this massive decision based *strictly* on the information provided and your three key factors: Model Performance (Raw Intelligence), Model Tool Use, and Model Cost. **Analysis based on your factors:** 1. **Model Performance (Raw Intelligence):** * **OpenAI:** Holds the edge with the "highest benchmark scores" noted in its pros. The new `o3` model is specifically "reasoning-optimized," and `GPT-4o` offers strong unified text-image capabilities. They consistently push the frontier. * **Anthropic:** `Claude 3.5 Sonnet` is a strong contender, explicitly noted as outperforming their previous top model (Opus) and beating `GPT-4-Turbo` in many tasks. However, the lack of "first-party vision yet" is a gap compared to OpenAI and Google's multimodal capabilities mentioned. * **Google:** `Gemini 2.5 Pro` leads specific benchmarks (math/code) and offers unparalleled native context length (1M-2M tokens), which is a form of raw capability. `Gemini 1.5` series also offers high-context multimodal performance. The summary table notes parity "on many tasks." * **Conclusion (Performance):** All three are extremely competitive at the frontier. * OpenAI likely has a slight edge in *general* benchmark performance and multimodal reasoning (vision). * Google excels in specific areas like *math/code* and *extreme context length*. * Anthropic offers very strong *text-based* reasoning, competitive with OpenAI's flagship tiers, but currently lags in native multimodality (vision). * **Winner (slight edge): OpenAI**, due to perceived overall benchmark leadership and strong multimodal features. Google is very close, especially if context length or specific code/math tasks are paramount. 2. **Model Tool Use (Ability to use tools):** * **OpenAI:** This seems to be a major focus. `o3` has a "native tool-use API". The "Assistants & Tools API" provides an "agent-style orchestration layer" with a "universal function-calling schema". This suggests a mature, dedicated framework for building applications that use tools. * **Anthropic:** Possesses an "elegant tool-use schema (JSON)". This implies capability, but the description lacks the emphasis on a dedicated orchestration layer or specific agentic framework seen with OpenAI. * **Google:** Tool use is integrated into products like `Workspace Flows` (no-code automation) and `Gemini Code Assist`. This shows strong *product-level* integration. While Vertex AI likely supports tool use via API, OpenAI's dedicated "Assistants API" seems more explicitly designed for developers building complex tool-using agents from scratch. * **Conclusion (Tool Use):** * OpenAI appears to offer the most *developer-centric, flexible, and mature API framework* specifically for building complex applications involving tool use (Assistants API). * Google excels at *integrating* tool use into its existing products (Workspace, IDEs). * Anthropic provides the capability but seems less emphasized as a distinct product/framework layer compared to OpenAI. * **Winner: OpenAI**, for building sophisticated, custom agentic systems via API. Google wins if the goal is leveraging tool use *within* Google's ecosystem products. 3. **Model Cost (Cost of the model):** * **OpenAI:** Actively working on cost reduction (`o3` is ~8x cheaper than GPT-4-Turbo, `4o-mini` targets low cost). However, it still carries a "price premium at the very top end," and the summary table rates its cost-performance as "improving" (🟠). * **Anthropic:** `Claude 3.5 Sonnet` offers double the speed of Opus (implying better efficiency/cost) and is highlighted as the "cheapest at Sonnet tier" (🟢). It explicitly "beats GPT-4-Turbo in many tasks" while being cost-competitive. * **Google:** `Gemini 1.5 Flash` is noted for efficiency. Vertex AI offers "aggressive discounts" (🟢). AI Studio provides a free tier. * **Conclusion (Cost):** * Anthropic and Google are explicitly positioned as having a cost advantage over OpenAI, particularly at the highly capable mid-to-flagship tiers (Sonnet vs. GPT-4 level, Gemini via Vertex discounts). * OpenAI is getting cheaper but may still be the most expensive for absolute top-tier performance. * **Winner (Tie): Anthropic & Google**, both offer compelling cost-performance, potentially undercutting OpenAI for similar capability levels below the absolute bleeding edge. **Decision Framework based *only* on these factors:** * **Bet on OpenAI IF:** * Your primary driver is accessing the absolute highest raw intelligence and broadest capabilities (including vision) as soon as they are available. * You need the most mature and flexible developer API for building complex, custom applications that heavily rely on **tool use / agentic behavior**. * You can tolerate potentially **higher costs** for top-tier performance, betting that their price cuts will continue aggressively. * **Bet on Anthropic IF:** * Your primary driver is achieving an optimal **balance between high performance (especially text reasoning) and cost**. Claude 3.5 Sonnet offers leading performance in its tier at a very competitive price. * You need solid **tool use** capabilities via a clean API, but perhaps don't require the elaborate orchestration layer of OpenAI's Assistants *yet*. * Current lack of native vision capabilities is acceptable for your core use cases. * **Cost-efficiency** at scale for high-quality models is a critical factor. * **Bet on Google IF:** * Your performance needs align with their strengths: **math/code reasoning or massive context window** requirements. * **Cost-performance** is crucial, and you can leverage Vertex AI's aggressive discounts or need a strong free tier (AI Studio). * Your strategy involves leveraging **tool use integrated within existing ecosystems** (Workspace, IDEs) OR you value their underlying infrastructure strength (TPUs). * You need a balance of strong performance, good cost, and solid tool integration, potentially without needing OpenAI's most advanced (and possibly complex) Assistants API. **Recommendation Synthesis:** Based *purely* on your three factors: 1. **Performance:** OpenAI (slight edge overall), Google (specific strengths). 2. **Tool Use API:** OpenAI (most mature developer framework). 3. **Cost:** Anthropic & Google (better cost-performance). * If **Tool Use maturity via API** is your *most* critical factor alongside top performance, **OpenAI** is the likely bet, despite the cost premium. * If **Cost-Performance** is your *most* critical factor for high-quality models, **Anthropic** looks very strong, provided you don't immediately need cutting-edge vision. * If you need Google's specific performance strengths (context, math/code) or value **integrated tool use** and competitive **cost**, **Google** is the bet. This is a tight race. Your specific weighting of these three factors will determine the best path. Given the "massive bet" nature, consider which factor represents the biggest *risk* if the chosen provider underperforms. ``` -------------------------------------------------------------------------------- /src/just_prompt/atoms/shared/model_router.py: -------------------------------------------------------------------------------- ```python """ Model router for dispatching requests to the appropriate provider. """ import logging from typing import List, Dict, Any, Optional import importlib from .utils import split_provider_and_model from .data_types import ModelProviders logger = logging.getLogger(__name__) class ModelRouter: """ Routes requests to the appropriate provider based on the model string. """ @staticmethod def validate_and_correct_model(provider_name: str, model_name: str) -> str: """ Validate a model name against available models for a provider, and correct it if needed. Args: provider_name: Provider name (full name) model_name: Model name to validate and potentially correct Returns: Validated and potentially corrected model name """ # Early return for our thinking token models to bypass validation thinking_models = [ "claude-3-7-sonnet-20250219", "claude-opus-4-20250514", "claude-sonnet-4-20250514", "gemini-2.5-flash-preview-04-17" ] if any(thinking_model in model_name for thinking_model in thinking_models): return model_name try: # Import the provider module provider_module_name = f"just_prompt.atoms.llm_providers.{provider_name}" provider_module = importlib.import_module(provider_module_name) # Get available models available_models = provider_module.list_models() # Check if model is in available models if model_name in available_models: return model_name # Model needs correction - use the default correction model import os correction_model = os.environ.get( "CORRECTION_MODEL", "anthropic:claude-3-7-sonnet-20250219" ) # Use magic model correction corrected_model = ModelRouter.magic_model_correction( provider_name, model_name, correction_model ) if corrected_model != model_name: logger.info( f"Corrected model name from '{model_name}' to '{corrected_model}' for provider '{provider_name}'" ) return corrected_model return model_name except Exception as e: logger.warning( f"Error validating model '{model_name}' for provider '{provider_name}': {e}" ) return model_name @staticmethod def route_prompt(model_string: str, text: str) -> str: """ Route a prompt to the appropriate provider. Args: model_string: String in format "provider:model" text: The prompt text Returns: Response from the model """ provider_prefix, model = split_provider_and_model(model_string) provider = ModelProviders.from_name(provider_prefix) if not provider: raise ValueError(f"Unknown provider prefix: {provider_prefix}") # Validate and potentially correct the model name validated_model = ModelRouter.validate_and_correct_model( provider.full_name, model ) # Import the appropriate provider module try: module_name = f"just_prompt.atoms.llm_providers.{provider.full_name}" provider_module = importlib.import_module(module_name) # Call the prompt function return provider_module.prompt(text, validated_model) except ImportError as e: logger.error(f"Failed to import provider module: {e}") raise ValueError(f"Provider not available: {provider.full_name}") except Exception as e: logger.error(f"Error routing prompt to {provider.full_name}: {e}") raise @staticmethod def route_list_models(provider_name: str) -> List[str]: """ Route a list_models request to the appropriate provider. Args: provider_name: Provider name (full or short) Returns: List of model names """ provider = ModelProviders.from_name(provider_name) if not provider: raise ValueError(f"Unknown provider: {provider_name}") # Import the appropriate provider module try: module_name = f"just_prompt.atoms.llm_providers.{provider.full_name}" provider_module = importlib.import_module(module_name) # Call the list_models function return provider_module.list_models() except ImportError as e: logger.error(f"Failed to import provider module: {e}") raise ValueError(f"Provider not available: {provider.full_name}") except Exception as e: logger.error(f"Error listing models for {provider.full_name}: {e}") raise @staticmethod def magic_model_correction(provider: str, model: str, correction_model: str) -> str: """ Correct a model name using a correction AI model if needed. Args: provider: Provider name model: Original model name correction_model: Model to use for the correction llm prompt, e.g. "o:gpt-4o-mini" Returns: Corrected model name """ provider_module_name = f"just_prompt.atoms.llm_providers.{provider}" try: provider_module = importlib.import_module(provider_module_name) available_models = provider_module.list_models() # If model is already in available models, no correction needed if model in available_models: logger.info(f"Using {provider} and {model}") return model # Model needs correction - use correction model to correct it correction_provider, correction_model_name = split_provider_and_model( correction_model ) correction_provider_enum = ModelProviders.from_name(correction_provider) if not correction_provider_enum: logger.warning( f"Invalid correction model provider: {correction_provider}, skipping correction" ) return model correction_module_name = ( f"just_prompt.atoms.llm_providers.{correction_provider_enum.full_name}" ) correction_module = importlib.import_module(correction_module_name) # Build prompt for the correction model prompt = f""" Given a user-provided model name "{model}" for the provider "{provider}", and the list of actual available models below, return the closest matching model name from the available models list. Only return the exact model name, nothing else. Available models: {', '.join(available_models)} """ # Get correction from correction model corrected_model = correction_module.prompt( prompt, correction_model_name ).strip() # Verify the corrected model exists in the available models if corrected_model in available_models: logger.info(f"correction_model: {correction_model}") logger.info(f"models_prefixed_by_provider: {provider}:{model}") logger.info(f"corrected_model: {corrected_model}") return corrected_model else: logger.warning( f"Corrected model {corrected_model} not found in available models" ) return model except Exception as e: logger.error(f"Error in model correction: {e}") return model ``` -------------------------------------------------------------------------------- /prompts/ceo_medium_decision_openai_vs_anthropic_vs_google.txt: -------------------------------------------------------------------------------- ``` <purpose> I'm going to bet massive amounts of time, money, and resources on one of the big three generative ai companies: OpenAI, Anthropic, or Google. Help me decide which one to bet on based on everything you know about the companies. Here are are top 3 factors I'm considering: </purpose> <factors> 1. Model Performance (Raw Intelligence) 2. Model Tool Use (Ability to use tools) 3. Model Cost (Cost of the model) </factors> <decision-resources> ## 1. OpenAI ### Models & Research Pipeline | Tier | Latest model (public) | Notable strengths | Notes | |---|---|---|---| | Frontier | **o3** (Apr 16 2025) | Native tool‑use API, rich vision‐reasoning, ~8× cheaper inference than GPT‑4‑Turbo | First of the “reasoning‑optimized” O‑series citeturn0search0| | Flagship | **GPT‑4o / 4o‑mini** (Mar 25 2025) | Unified text‑image model; real‑time image generation | 4o‑mini is a low‑cost sibling targeting edge devices citeturn0search1| | Established | GPT‑4‑Turbo, GPT‑3.5‑Turbo, DALL·E 3, Whisper‑v3 | Commodity‑priced large‑context chat, embeddings, speech | Ongoing price drops every quarter | ### Signature Products - **ChatGPT (Free, Plus, Enterprise, Edu)** – 180 M+ MAU, now defaults to GPT‑4o. - **Assistants & Tools API** – agent‑style orchestration layer exposed to devs (beta since Dec 2024). citeturn3search0turn3search3 - **Custom GPTs & Store** – closed marketplace with rev‑share for creators. ### Developer & Infra Stack Azure super‑clusters (co‑designed with Microsoft), retrieval & vector store primitives, universal function‑calling schema, streaming Vision API. ### People & Org - ~**3,531 employees** (tripled YoY). citeturn0search6 - CEO : Sam Altman; CTO : Mira Murati; Chief Scientist : Ilya Sutskever (now heads “Superalignment”). - **Microsoft** multiyear, multibillion $ partnership guarantees exclusive Azure capacity. citeturn1search10 - Latest secondary share sale pegs **valuation ≈ $80–90 B**. citeturn2search2 #### Pros 1. Highest benchmark scores and feature cadence (tool use, multimodal, assistants). 2. Deep Azure subsidised compute & enterprise sales machine via Microsoft. 3. Huge independent researcher pool; culture of iterative price cuts. #### Cons 1. Governance drama in 2023 still haunts investors; nonprofit‑for‑profit cap table is complex. 2. Closed‑source; customers fully dependent on Azure + proprietary stack. 3. Price premium at the very top end remains high vs Claude/Gemini mid‑tiers. --- ## 2. Anthropic ### Models & Research Pipeline | Tier | Latest model | Notable strengths | Notes | |---|---|---|---| | Frontier | **Claude 3.5 Sonnet** (Apr 9 2025) | Outperforms Claude 3 Opus; 2× speed; 8 k‑8 k context* | *8,192‑token output cap citeturn0search2| | Flagship (large) | Claude 3 Opus (Jan 2024) | Long‑form reasoning, 200 k context | | Mid‑tier | Claude 3 Haiku (cheap), Claude Instant | Cost‑efficient chat & embedding | ### Signature Products - **Claude.ai** web app, Slack plugin, soon Microsoft Teams plugin. - **Workspaces** – org‑level spend limits, RBAC & key grouping in the console. citeturn3search1 ### Developer & Infra Stack - Fully served on **AWS Trainium/Inferentia**; Amazon is “primary cloud partner”. citeturn1search0turn1search4 - Elegant tool‑use schema (JSON). - No first‑party vision yet (under active research). ### People & Org - ~**1,035 employees** (Sep 2024 count). citeturn0search7 - Co‑founders : Dario & Daniela Amodei (ex‑OpenAI). - Funding: **$8 B total** from Amazon; $2 B from Google, plus Google Cloud credits. citeturn1search9 - Recent private‑round chatter puts **valuation $40‑60 B**. citeturn2search12 #### Pros 1. Best‑in‑class safety research ethos; “Constitutional AI” resonates with regulated industries. 2. Competitive price/perf at Sonnet tier (beats GPT‑4‑Turbo in many tasks). 3. Multi‑cloud backing (AWS + Google) hedges single‑vendor risk. #### Cons 1. Smaller compute budget than OpenAI/Google; relies on partners’ chips. 2. Narrower product surface (no vision, no speech, few consumer touch‑points). 3. Valuation/revenue ratio now rivals OpenAI without equivalent distribution. --- ## 3. Google (Alphabet / DeepMind) ### Models & Research Pipeline | Tier | Latest model | Notable strengths | Notes | |---|---|---|---| | Frontier | **Gemini 2.5 Pro** (Mar 26 2025) | Leads math/code benchmarks, native 1 M‑token context, soon 2 M | Via AI Studio + Vertex AI citeturn3search2| | Flagship | Gemini 1.5 Ultra / Flash (Feb 2024) | High‑context multimodal, efficient streaming | citeturn0search4| | Open models | **Gemma 3** (Mar 2025) | 2‑7 B “open weight” family; on‑device, permissive licence | citeturn4search0| ### Signature Products - **Gemini app** (Android/iOS) & Gemini Advanced subscription. - **Workspace AI** (Docs, Sheets, Meet “Help me…”), new **Workspace Flows** no‑code automation. citeturn0search5 - **Gemini Code Assist** inside VS Code, JetBrains, Android Studio. citeturn3search5 ### Developer & Infra Stack - **AI Studio** (free tier) → **Vertex AI** (pay‑as‑you‑go) with GPU & TPU‑v5p back‑ends. - Long history of open tooling (TensorFlow, JAX) plus Gemma weights for on‑prem. ### People & Org - Google DeepMind generative‑AI group ≈ **5,600 employees** (Apr 2025). citeturn0search8 - Backed by Alphabet’s **$2.2 T** market cap and worldwide datacenters. citeturn2search13 - Leadership : Sundar Pichai (CEO), Demis Hassabis (DeepMind CEO). #### Pros 1. Unmatched global distribution (Android, Chrome, Search, Cloud, YouTube). 2. Deep proprietary silicon (TPU v5p) and vast training corpus. 3. Only top‑tier player shipping **both** closed frontier models *and* open‑weight Gemma family. #### Cons 1. Ship cadence historically slower; organisational silos (Google Cloud vs DeepMind vs Products). 2. Strategic tension: making Gemini too good could erode core Search ad revenue. 3. Licensing still restrictive for big‑context Gemini (waitlists, region locks). --- ## How to think about a “massive bet” | Dimension | OpenAI | Anthropic | Google | Quick take | |---|---|---|---|---| | **Raw model performance (Q2 2025)** | 🟢 top | 🟢 fast follower | 🟢 at parity on many tasks | All three are competitive; edge cases matter (vision, context length). | | **Cost‑performance at scale** | 🟠 improving | 🟢 cheapest at Sonnet tier | 🟢 aggressive Vertex discounts | Anthropic & Google currently undercut GPT‑4‑level pricing. | | **Product distribution** | 🟢 ChatGPT ubiquity | 🟠 limited | 🟢 billions of Workspace users | Google wins on built‑in reach. | | **Ecosystem / APIs** | 🟢 richest (assistants, tools) | 🟢 clean, safety‑first | 🟢 broad + open weights | Tie — depends on needs. | | **Compute independence** | 🟠 Azure‑locked | 🟠 AWS‑locked (plus GCP credits) | 🟢 owns TPUs | Google least vendor‑dependent. | | **Governance / stability** | 🟠 history of board turmoil | 🟢 stable, safety board | 🟠 big‑company bureaucracy | Pick your poison. | | **Valuation vs revenue** | High (~$90 B) | Very high (~$40‑60 B) | Public mega‑cap | Alphabet safest on dilution risk. | **Bottom line:** - **Bet on OpenAI** if you want the bleeding‑edge feature set, the largest third‑party tool ecosystem, and Microsoft‑grade enterprise onboarding. - **Bet on Anthropic** if alignment, transparency, and cost‑controlled high‑quality text models are critical, and you’re comfortable with AWS reliance. - **Bet on Google** if you value distribution channels, open‑weight fallback options, and sovereign‑scale compute — and you can tolerate slower release cycles. Always pilot with a narrowly‑scoped production workload before committing multi‑year spend; pricing, rate limits and leadership roadmaps continue to shift quarter‑by‑quarter in this fast‑moving space. </decision-resources> ``` -------------------------------------------------------------------------------- /specs/init-just-prompt.md: -------------------------------------------------------------------------------- ```markdown # Specification for Just Prompt > We're building a lightweight wrapper mcp server around openai, anthropic, gemini, groq, deepseek, and ollama. ## Implementation details - First, READ ai_docs/* to understand the providers, models, and to see an example mcp server. - Mirror the work done inside `of ai_docs/pocket-pick-mcp-server-example.xml`. Here we have a complete example of how to build a mcp server. We also have a complete codebase structure that we want to replicate. With some slight tweaks - see `Codebase Structure` below. - Don't mock any tests - run simple "What is the capital of France?" tests and expect them to pass case insensitive. - Be sure to use load_dotenv() in the tests. - models_prefixed_by_provider look like this: - openai:gpt-4o - anthropic:claude-3-5-sonnet-20240620 - gemini:gemini-1.5-flash - groq:llama-3.1-70b-versatile - deepseek:deepseek-coder - ollama:llama3.1 - or using short names: - o:gpt-4o - a:claude-3-5-sonnet-20240620 - g:gemini-1.5-flash - q:llama-3.1-70b-versatile - d:deepseek-coder - l:llama3.1 - Be sure to comment every function and class with clear doc strings. - Don't explicitly write out the full list of models for a provider. Instead, use the `list_models` function. - Create a 'magic' function somewhere using the weak_provider_and_model param - make sure this is callable. We're going to take the 'models_prefixed_by_provider' and pass it to this function running a custom prompt where we ask the model to return the right model for this given item. TO be clear the 'models_prefixed_by_provider' will be a natural language query and will sometimes be wrong, so we want to correct it after parsing the provider and update it to the right value by provider this weak model prompt the list_model() call for the provider, then add the to the prompt and ask it to return the right model ONLY IF the model (from the split : call) is not in the providers list_model() already. If we run this functionality be sure to log 'weak_provider_and_model' and the 'models_prefixed_by_provider' and the 'corrected_model' to the console. If we dont just say 'using <provider> and <model>'. - For tests use these models - o:gpt-4o-mini - a:claude-3-5-haiku - g:gemini-2.0-flash - q:qwen-2.5-32b - d:deepseek-coder - l:gemma3:12b - To implement list models read `list_models.py`. ## Tools we want to expose > Here's the tools we want to expose: prompt(text, models_prefixed_by_provider: List[str]) -> List[str] (return value is list of responses) prompt_from_file(file, models_prefixed_by_provider: List[str]) -> List[str] (return value is list of responses) prompt_from_file_to_file(file, models_prefixed_by_provider: List[str], output_dir: str = ".") -> List[str] (return value is a list of file paths) list_providers() -> List[str] list_models(provider: str) -> List[str] ## Codebase Structure - .env.sample - src/ - just_prompt/ - __init__.py - __main__.py - server.py - serve(weak_provider_and_model: str = "o:gpt-4o-mini") -> None - atoms/ - __init__.py - llm_providers/ - __init__.py - openai.py - prompt(text, model) -> str - list_models() -> List[str] - anthropic.py - ...same as openai.py - gemini.py - ... - groq.py - ... - deepseek.py - ... - ollama.py - ... - shared/ - __init__.py - validator.py - validate_models_prefixed_by_provider(models_prefixed_by_provider: List[str]) -> raise error if a model prefix does not match a provider - utils.py - split_provider_and_model(model: str) -> Tuple[str, str] - be sure this only splits the first : in the model string and leaves the rest of the string as the model name. Models will have additional : in the string and we want to ignore them and leave them for the model name. - data_types.py - class PromptRequest(BaseModel) {text: str, models_prefixed_by_provider: List[str]} - class PromptResponse(BaseModel) {responses: List[str]} - class PromptFromFileRequest(BaseModel) {file: str, models_prefixed_by_provider: List[str]} - class PromptFromFileResponse(BaseModel) {responses: List[str]} - class PromptFromFileToFileRequest(BaseModel) {file: str, models_prefixed_by_provider: List[str], output_dir: str = "."} - class PromptFromFileToFileResponse(BaseModel) {file_paths: List[str]} - class ListProvidersRequest(BaseModel) {} - class ListProvidersResponse(BaseModel) {providers: List[str]} - returns all providers with long and short names - class ListModelsRequest(BaseModel) {provider: str} - class ListModelsResponse(BaseModel) {models: List[str]} - returns all models for a given provider - class ModelAlias(BaseModel) {provider: str, model: str} - class ModelProviders(Enum): OPENAI = ("openai", "o") ANTHROPIC = ("anthropic", "a") GEMINI = ("gemini", "g") GROQ = ("groq", "q") DEEPSEEK = ("deepseek", "d") OLLAMA = ("ollama", "l") def __init__(self, full_name, short_name): self.full_name = full_name self.short_name = short_name @classmethod def from_name(cls, name): for provider in cls: if provider.full_name == name or provider.short_name == name: return provider return None - model_router.py - molecules/ - __init__.py - prompt.py - prompt_from_file.py - prompt_from_file_to_file.py - list_providers.py - list_models.py - tests/ - __init__.py - atoms/ - __init__.py - llm_providers/ - __init__.py - test_openai.py - test_anthropic.py - test_gemini.py - test_groq.py - test_deepseek.py - test_ollama.py - shared/ - __init__.py - test_utils.py - molecules/ - __init__.py - test_prompt.py - test_prompt_from_file.py - test_prompt_from_file_to_file.py - test_list_providers.py - test_list_models.py ## Per provider documentation ### OpenAI See: `ai_docs/llm_providers_details.xml` ### Anthropic See: `ai_docs/llm_providers_details.xml` ### Gemini See: `ai_docs/llm_providers_details.xml` ### Groq Quickstart Get up and running with the Groq API in a few minutes. Create an API Key Please visit here to create an API Key. Set up your API Key (recommended) Configure your API key as an environment variable. This approach streamlines your API usage by eliminating the need to include your API key in each request. Moreover, it enhances security by minimizing the risk of inadvertently including your API key in your codebase. In your terminal of choice: export GROQ_API_KEY=<your-api-key-here> Requesting your first chat completion curl JavaScript Python JSON Install the Groq Python library: pip install groq Performing a Chat Completion: import os from groq import Groq client = Groq( api_key=os.environ.get("GROQ_API_KEY"), ) chat_completion = client.chat.completions.create( messages=[ { "role": "user", "content": "Explain the importance of fast language models", } ], model="llama-3.3-70b-versatile", ) print(chat_completion.choices[0].message.content) Now that you have successfully received a chat completion, you can try out the other endpoints in the API. Next Steps Check out the Playground to try out the Groq API in your browser Join our GroqCloud developer community on Discord Chat with our Docs at lightning speed using the Groq API! Add a how-to on your project to the Groq API Cookbook ### DeepSeek See: `ai_docs/llm_providers_details.xml` ### Ollama See: `ai_docs/llm_providers_details.xml` ## Validation (close the loop) - Run `uv run pytest <path_to_test>` to validate the tests are passing - do this iteratively as you build out the tests. - After code is written, run `uv run pytest` to validate all tests are passing. - At the end Use `uv run just-prompt --help` to validate the mcp server works. ```