# Directory Structure
```
├── .env.example
├── .gitignore
├── Dockerfile
├── examples
│   ├── flying_pig_scifi_city.png
│   └── pig_cute_baby_whale.png
├── LICENSE
├── pyproject.toml
├── README.md
├── smithery.yaml
├── src
│   └── mcp_server_gemini_image_generator
│       ├── __init__.py
│       ├── prompts.py
│       ├── server.py
│       └── utils.py
└── uv.lock
```
# Files
--------------------------------------------------------------------------------
/.env.example:
--------------------------------------------------------------------------------
```
GEMINI_API_KEY=""
OUTPUT_IMAGE_PATH=""
```
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
```
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# celery beat schedule file
celerybeat-schedule
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# VS Code
.vscode/
.idea/
```
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
```markdown
[](https://mseep.ai/app/qhdrl12-mcp-server-gemini-image-generator)
[](https://smithery.ai/server/@qhdrl12/mcp-server-gemini-image-gen)
<a href="https://glama.ai/mcp/servers/@qhdrl12/mcp-server-gemini-image-generator">
  <img width="380" height="200" src="https://glama.ai/mcp/servers/@qhdrl12/mcp-server-gemini-image-generator/badge" alt="Gemini Image Generator Server MCP server" />
</a>
# Gemini Image Generator MCP Server
Generate high-quality images from text prompts using Google's Gemini model through the MCP protocol.
## Overview
This MCP server allows any AI assistant to generate images using Google's Gemini AI model. The server handles prompt engineering, text-to-image conversion, filename generation, and local image storage, making it easy to create and manage AI-generated images through any MCP client.
## Features
- Text-to-image generation using Gemini 2.0 Flash
- Image-to-image transformation based on text prompts
- Support for both file-based and base64-encoded images
- Automatic intelligent filename generation based on prompts
- Automatic translation of non-English prompts
- Local image storage with configurable output path
- Strict text exclusion from generated images
- High-resolution image output
- Direct access to both image data and file path
## Available MCP Tools
The server provides the following MCP tools for AI assistants:
### 1. `generate_image_from_text`
Creates a new image from a text prompt description.
```
generate_image_from_text(prompt: str) -> Tuple[bytes, str]
```
**Parameters:**
- `prompt`: Text description of the image you want to generate
**Returns:**
- A tuple containing:
  - Raw image data (bytes)
  - Path to the saved image file (str)
This dual return format allows AI assistants to either work with the image data directly or reference the saved file path.
**Examples:**
- "Generate an image of a sunset over mountains"
- "Create a photorealistic flying pig in a sci-fi city"
#### Example Output
This image was generated using the prompt:
```
"Hi, can you create a 3d rendered image of a pig with wings and a top hat flying over a happy futuristic scifi city with lots of greenery?"
```

*A 3D rendered pig with wings and a top hat flying over a futuristic sci-fi city filled with greenery*
### Known Issues
When using this MCP server with Claude Desktop Host:
1. **Performance Issues**: Using `transform_image_from_encoded` may take significantly longer to process compared to other methods. This is due to the overhead of transferring large base64-encoded image data through the MCP protocol.
2. **Path Resolution Problems**: There may be issues with correctly resolving image paths when using Claude Desktop Host. The host application might not properly interpret the returned file paths, making it difficult to access the generated images.
For the best experience, consider using alternative MCP clients or the `transform_image_from_file` method when possible. 
### 2. `transform_image_from_encoded`
Transforms an existing image based on a text prompt using base64-encoded image data.
```
transform_image_from_encoded(encoded_image: str, prompt: str) -> Tuple[bytes, str]
```
**Parameters:**
- `encoded_image`: Base64 encoded image data with format header (must be in format: "data:image/[format];base64,[data]")
- `prompt`: Text description of how you want to transform the image
**Returns:**
- A tuple containing:
  - Raw transformed image data (bytes)
  - Path to the saved transformed image file (str)
**Example:**
- "Add snow to this landscape"
- "Change the background to a beach"
### 3. `transform_image_from_file`
Transforms an existing image file based on a text prompt.
```
transform_image_from_file(image_file_path: str, prompt: str) -> Tuple[bytes, str]
```
**Parameters:**
- `image_file_path`: Path to the image file to be transformed
- `prompt`: Text description of how you want to transform the image
**Returns:**
- A tuple containing:
  - Raw transformed image data (bytes)
  - Path to the saved transformed image file (str)
**Examples:**
- "Add a llama next to the person in this image"
- "Make this daytime scene look like night time"
#### Example Transformation
Using the flying pig image created above, we applied a transformation with the following prompt:
```
"Add a cute baby whale flying alongside the pig"
```
**Before:**

**After:**

*The original flying pig image with a cute baby whale added flying alongside it*
## Setup
### Prerequisites
- Python 3.11+
- Google AI API key (Gemini)
- MCP host application (Claude Desktop App, Cursor, or other MCP-compatible clients)
### Getting a Gemini API Key
1. Visit [Google AI Studio API Keys page](https://aistudio.google.com/apikey)
2. Sign in with your Google account
3. Click "Create API Key"
4. Copy your new API key for use in the configuration
5. Note: The API key provides a certain quota of free usage per month. You can check your usage in the Google AI Studio
### Installation
### Installing via Smithery
To install Gemini Image Generator MCP for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@qhdrl12/mcp-server-gemini-image-gen):
```bash
npx -y @smithery/cli install @qhdrl12/mcp-server-gemini-image-gen --client claude
```
### Manual Installation
1. Clone the repository:
```bash
git clone https://github.com/your-username/mcp-server-gemini-image-generator.git
cd mcp-server-gemini-image-generator
```
2. Create a virtual environment and install dependencies:
```bash
# Using uv (recommended)
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e .
# Or using regular venv
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e .
```
3. Set up environment variables (choose one method):
**Method A: Using .env file (optional)**
```bash
# Create .env file in the project root
cat > .env << 'EOF'
GEMINI_API_KEY=your-gemini-api-key-here
OUTPUT_IMAGE_PATH=/path/to/save/images
EOF
```
**Method B: Set directly in Claude Desktop config (recommended)**
- Set environment variables directly in the `claude_desktop_config.json` (shown in configuration section below)
### Configure Claude Desktop
Add the following to your `claude_desktop_config.json`:
- **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
- **Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
```json
{
    "mcpServers": {
        "gemini-image-generator": {
            "command": "uv",
            "args": [
                "--directory",
                "/absolute/path/to/mcp-server-gemini-image-generator",
                "run",
                "mcp-server-gemini-image-generator"
            ],
            "env": {
                "GEMINI_API_KEY": "your-actual-gemini-api-key-here",
                "OUTPUT_IMAGE_PATH": "/absolute/path/to/your/images/directory"
            }
        }
    }
}
```
**Important Configuration Notes:**
1. **Replace paths with your actual paths:**
   - Change `/absolute/path/to/mcp-server-gemini-image-generator` to the actual location where you cloned this repository
   - Change `/absolute/path/to/your/images/directory` to where you want generated images to be saved
2. **Environment Variables:**
   - Replace `your-actual-gemini-api-key-here` with your real Gemini API key from Google AI Studio
   - Use absolute paths for `OUTPUT_IMAGE_PATH` to ensure images are saved correctly
3. **Example with real paths:**
```json
{
    "mcpServers": {
        "gemini-image-generator": {
            "command": "uv",
            "args": [
                "--directory",
                "/Users/username/Projects/mcp-server-gemini-image-generator",
                "run",
                "mcp-server-gemini-image-generator"
            ],
            "env": {
                "GEMINI_API_KEY": "GEMINI_API_KEY",
                "OUTPUT_IMAGE_PATH": "OUTPUT_IMAGE_PATH"
            }
        }
    }
}
```
## Usage
Once installed and configured, you can ask Claude to generate or transform images using prompts like:
### Generating New Images
- "Generate an image of a sunset over mountains"
- "Create an illustration of a futuristic cityscape"
- "Make a picture of a cat wearing sunglasses"
### Transforming Existing Images
- "Transform this image by adding snow to the scene"
- "Edit this photo to make it look like it was taken at night"
- "Add a dragon flying in the background of this picture"
The generated/transformed images will be saved to your configured output path and displayed in Claude. With the updated return types, AI assistants can also work directly with the image data without needing to access the saved files.
## Testing
You can test the application by running the FastMCP development server:
```
fastmcp dev server.py
```
This command starts a local development server and makes the MCP Inspector available at http://localhost:5173/. 
The MCP Inspector provides a convenient web interface where you can directly test the image generation tool without needing to use Claude or another MCP client. 
You can enter text prompts, execute the tool, and see the results immediately, which is helpful for development and debugging.
## License
MIT License
```
--------------------------------------------------------------------------------
/src/mcp_server_gemini_image_generator/__init__.py:
--------------------------------------------------------------------------------
```python
import asyncio
from . import server
def main() -> None:
    """Start the MCP server."""
    server.main()
__all__ = [
    "main",
    "server",
]
```
--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
```toml
[project]
name = "mcp-server-gemini-image-generator"
version = "0.1.0"
description = ""
readme = "README.md"
requires-python = ">=3.11"
authors = [{name = "bongki Lee",email = "[email protected]"}]
keywords = ["http", "mcp", "llm", "automation"]
license = { text = "MIT" }
dependencies = [
    "fastmcp>=0.4.1",
    "google>=3.0.0",
    "google-genai>=1.7.0",
    "pillow>=11.1.0",
]
[project.scripts]
mcp-server-gemini-image-generator = "mcp_server_gemini_image_generator:main"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.uv]
dev-dependencies = ["pyright>=1.1.389", "ruff>=0.7.3"]
```
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
```dockerfile
FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim
ENV UV_COMPILE_BYTECODE=1
ENV UV_LINK_MODE=copy
ENV UV_CACHE_DIR=/opt/uv-cache/
RUN apt-get update && apt-get install -y --no-install-recommends git
WORKDIR /app
RUN --mount=type=cache,target=UV_CACHE_DIR \
    --mount=type=bind,source=uv.lock,target=uv.lock \
    --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
    uv sync --frozen --no-install-project --no-dev --no-editable
ADD . /app
RUN --mount=type=cache,target=UV_CACHE_DIR \
    uv sync --frozen --no-dev --no-editable
# Create image output directory
ARG OUTPUT_IMAGE_PATH=/images
RUN mkdir -p ${OUTPUT_IMAGE_PATH}
ENV OUTPUT_IMAGE_PATH=${OUTPUT_IMAGE_PATH}
# Add virtual environment to PATH
ENV PATH="/app/.venv/bin:$PATH"
# Set the entrypoint to the MCP server command
CMD ["mcp-server-gemini-image-generator"]
```
--------------------------------------------------------------------------------
/smithery.yaml:
--------------------------------------------------------------------------------
```yaml
# Smithery configuration file: https://smithery.ai/docs/build/project-config
startCommand:
  type: stdio
  configSchema:
    # JSON Schema defining the configuration options for the MCP.
    type: object
    required:
      - geminiApiKey
    properties:
      geminiApiKey:
        type: string
        description: Google Gemini API key
      outputImagePath:
        type: string
        default: /images
        description: Path to save generated images inside container
  commandFunction:
    # A JS function that produces the CLI command based on the given config to start the MCP on stdio.
    |-
    (config) => ({
      command: 'python',
      args: ['server.py'],
      env: {
        GEMINI_API_KEY: config.geminiApiKey,
        OUTPUT_IMAGE_PATH: config.outputImagePath
      }
    })
  exampleConfig:
    geminiApiKey: YOUR_GEMINI_API_KEY
    outputImagePath: /images
```
--------------------------------------------------------------------------------
/src/mcp_server_gemini_image_generator/utils.py:
--------------------------------------------------------------------------------
```python
import base64
import io
import logging
import os
import PIL.Image
logger = logging.getLogger(__name__)
OUTPUT_IMAGE_PATH = os.getenv("OUTPUT_IMAGE_PATH") or os.path.expanduser("~/gen_image")
if not os.path.exists(OUTPUT_IMAGE_PATH):
    os.makedirs(OUTPUT_IMAGE_PATH)
def validate_base64_image(base64_string: str) -> bool:
    """Validate if a string is a valid base64-encoded image.
    Args:
        base64_string: The base64 string to validate
    Returns:
        True if valid, False otherwise
    """
    try:
        # Try to decode base64
        image_data = base64.b64decode(base64_string)
        # Try to open as image
        with PIL.Image.open(io.BytesIO(image_data)) as img:
            logger.debug(
                f"Validated base64 image, format: {img.format}, size: {img.size}"
            )
            return True
    except Exception as e:
        logger.warning(f"Invalid base64 image: {str(e)}")
        return False
    
async def save_image(image_data: bytes, filename: str) -> str:
    """Save image data to disk with a descriptive filename.
    
    Args:
        image_data: Raw image data
        filename: Base string to use for generating filename
        
    Returns:
        Path to the saved image file
    """
    try:
        # Open image from bytes
        image = PIL.Image.open(io.BytesIO(image_data))
        
        # Save the image
        image_path = os.path.join(OUTPUT_IMAGE_PATH, f"{filename}.png")
        image.save(image_path)
        logger.info(f"Image saved to {image_path}")
        
        # Display the image
        image.show()
        
        return image_path
    except Exception as e:
        logger.error(f"Error saving image: {str(e)}")
        raise
```
--------------------------------------------------------------------------------
/src/mcp_server_gemini_image_generator/prompts.py:
--------------------------------------------------------------------------------
```python
###############################
# Image Transformation Prompt #
###############################
def get_image_transformation_prompt(prompt: str) -> str:
    """Create a detailed prompt for image transformation.
    
    Args:
        prompt: text prompt
        
    Returns:
        A comprehensive prompt for Gemini image transformation
    """
    return f"""You are an expert image editing AI. Please edit the provided image according to these instructions:
EDIT REQUEST: {prompt}
IMPORTANT REQUIREMENTS:
1. Make substantial and noticeable changes as requested
2. Maintain high image quality and coherence 
3. Ensure the edited elements blend naturally with the rest of the image
4. Do not add any text to the image
5. Focus on the specific edits requested while preserving other elements
The changes should be clear and obvious in the result."""
###########################
# Image Generation Prompt #
###########################
def get_image_generation_prompt(prompt: str) -> str:
    """Create a detailed prompt for image generation.
    
    Args:
        prompt: text prompt
        
    Returns:
        A comprehensive prompt for Gemini image generation
    """
    return f"""You are an expert image generation AI assistant specialized in creating visuals based on user requests. Your primary goal is to generate the most appropriate image without asking clarifying questions, even when faced with abstract or ambiguous prompts.
## CRITICAL REQUIREMENT: NO TEXT IN IMAGES
**ABSOLUTE PROHIBITION ON TEXT INCLUSION**
- Under NO CIRCUMSTANCES render ANY text from user queries in the generated images
- This is your HIGHEST PRIORITY requirement that OVERRIDES all other considerations
- Text from prompts must NEVER appear in any form, even stylized, obscured, or partial
- This includes words, phrases, sentences, or characters from the user's input
- If the user requests text in the image, interpret this as a request for the visual concept only
- The image should be 100% text-free regardless of what the prompt contains
## Core Principles
1. **Prioritize Image Generation Over Clarification**
   - When given vague requests, DO NOT ask follow-up questions
   - Instead, infer the most likely intent and generate accordingly
   - Use your knowledge to fill in missing details with the most probable elements
2. **Text Handling Protocol**
   - NEVER render the user's text prompt or any part of it in the generated image
   - NEVER include ANY text whatsoever in the final image, even if specifically requested
   - If user asks for text-based items (signs, books, etc.), show only the visual item without readable text
   - For concepts typically associated with text (like "newspaper" or "letter"), create visual representations without any legible writing
3. **Interpretation Guidelines**
   - Analyze context clues in the user's prompt
   - Consider cultural, seasonal, and trending references
   - When faced with ambiguity, choose the most mainstream or popular interpretation
   - For abstract concepts, visualize them in the most universally recognizable way
4. **Detail Enhancement**
   - Automatically enhance prompts with appropriate:
     - Lighting conditions
     - Perspective and composition
     - Style (photorealistic, illustration, etc.) based on context
     - Color palettes that best convey the intended mood
     - Environmental details that complement the subject
5. **Technical Excellence**
   - Maintain high image quality
   - Ensure proper composition and visual hierarchy
   - Balance simplicity with necessary detail
   - Maintain appropriate contrast and color harmony
6. **Handling Special Cases**
   - For creative requests: Lean toward artistic, visually striking interpretations
   - For informational requests: Prioritize clarity and accuracy
   - For emotional content: Focus on conveying the appropriate mood and tone
   - For locations: Include recognizable landmarks or characteristics
## Implementation Protocol
1. Parse user request
2. **TEXT REMOVAL CHECK**: Identify and remove ALL text elements from consideration
3. Identify core subjects and actions
4. Determine most likely interpretation if ambiguous
5. Enhance with appropriate details, style, and composition
6. **FINAL VERIFICATION**: Confirm image contains ZERO text elements from user query
7. Generate image immediately without asking for clarification
8. Present the completed image to the user
## Safety Measure
Before finalizing ANY image:
- Double-check that NO text from the user query appears in the image
- If ANY text is detected, regenerate the image without the text
- This verification is MANDATORY for every image generation
Remember: Your success is measured by your ability to produce satisfying images without requiring additional input from users AND without including ANY text from queries in the images. Be decisive and confident in your interpretations while maintaining absolute adherence to the no-text requirement.
Query: {prompt}
"""
####################
# Translate Prompt #
####################
def get_translate_prompt(prompt: str) -> str:
    """Translate the prompt into English if it's not already in English.
    
    Args:
        prompt: text prompt
        
    Returns:
        A comprehensive prompt for Gemini translation
    """
    return f"""Translate the following prompt into English if it's not already in English. Your task is ONLY to translate accurately while preserving:
1. EXACT original intent and meaning
2. All specific details and nuances
3. Style and tone of the original prompt
4. Technical terms and concepts
DO NOT:
- Add new details or creative elements not in the original
- Remove any details from the original
- Change the style or complexity level
- Reinterpret or assume what the user "really meant"
If the text is already in English, return it exactly as provided with no changes.
Original prompt: {prompt}
Return only the translated English prompt, nothing else."""
```
--------------------------------------------------------------------------------
/src/mcp_server_gemini_image_generator/server.py:
--------------------------------------------------------------------------------
```python
import base64
import os
import logging
import sys
import uuid
from io import BytesIO
from typing import Optional, Any, Union, List, Tuple
import PIL.Image
from google import genai
from google.genai import types
from mcp.server.fastmcp import FastMCP
from .prompts import get_image_generation_prompt, get_image_transformation_prompt, get_translate_prompt
from .utils import save_image
# Setup logging
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
    stream=sys.stderr
)
logger = logging.getLogger(__name__)
# Initialize MCP server
mcp = FastMCP("mcp-server-gemini-image-generator")
# ==================== Gemini API Interaction ====================
async def call_gemini(
    contents: List[Any], 
    model: str = "gemini-2.0-flash-preview-image-generation", 
    config: Optional[types.GenerateContentConfig] = None, 
    text_only: bool = False
) -> Union[str, bytes]:
    """Call Gemini API with flexible configuration for different use cases.
    
    Args:
        contents: The content to send to Gemini. list containing text and/or images
        model: The Gemini model to use
        config: Optional configuration for the Gemini API call
        text_only: If True, extract and return only text from the response
        
    Returns:
        If text_only is True: str - The text response from Gemini
        Otherwise: bytes - The binary image data from Gemini
        
    Raises:
        Exception: If there's an error calling the Gemini API
    """
    try:
        # Initialize Gemini client
        api_key = os.environ.get("GEMINI_API_KEY")
        if not api_key:
            raise ValueError("GEMINI_API_KEY environment variable not set")
            
        client = genai.Client(api_key=api_key)
        
        # Generate content using Gemini
        response = client.models.generate_content(
            model=model,
            contents=contents,
            config=config
        )
        
        logger.info(f"Response received from Gemini API using model {model}")
        
        # For text-only calls, extract just the text
        if text_only:
            return response.candidates[0].content.parts[0].text.strip()
        
        # Return the image data
        for part in response.candidates[0].content.parts:
            if part.inline_data is not None:
                return part.inline_data.data
            
        raise ValueError("No image data found in Gemini response")
    except Exception as e:
        logger.error(f"Error calling Gemini API: {str(e)}")
        raise
# ==================== Text Utility Functions ====================
async def convert_prompt_to_filename(prompt: str) -> str:
    """Convert a text prompt into a suitable filename for the generated image using Gemini AI.
    
    Args:
        prompt: The text prompt used to generate the image
        
    Returns:
        A concise, descriptive filename generated based on the prompt
    """
    try:
        # Create a prompt for Gemini to generate a filename
        filename_prompt = f"""
        Based on this image description: "{prompt}"
        
        Generate a short, descriptive file name suitable for the requested image.
        The filename should:
        - Be concise (maximum 5 words)
        - Use underscores between words
        - Not include any file extension
        - Only return the filename, nothing else
        """
        
        # Call Gemini and get the filename
        generated_filename = await call_gemini(filename_prompt, text_only=True)
        logger.info(f"Generated filename: {generated_filename}")
        
        # Return the filename only, without path or extension
        return generated_filename
    
    except Exception as e:
        logger.error(f"Error generating filename with Gemini: {str(e)}")
        # Fallback to a simple filename if Gemini fails
        truncated_text = prompt[:12].strip()
        return f"image_{truncated_text}_{str(uuid.uuid4())[:8]}"
async def translate_prompt(text: str) -> str:
    """Translate and optimize the user's prompt to English for better image generation results.
    
    Args:
        text: The original prompt in any language
        
    Returns:
        English translation of the prompt with preserved intent
    """
    try:
        # Create a prompt for translation with strict intent preservation
        prompt = get_translate_prompt(text)
        # Call Gemini and get the translated prompt
        translated_prompt = await call_gemini(prompt, text_only=True)
        logger.info(f"Original prompt: {text}")
        logger.info(f"Translated prompt: {translated_prompt}")
        
        return translated_prompt
    
    except Exception as e:
        logger.error(f"Error translating prompt: {str(e)}")
        # Return original text if translation fails
        return text
# ==================== Image Processing Functions ====================
async def process_image_with_gemini(
    contents: List[Any], 
    prompt: str, 
    model: str = "gemini-2.0-flash-preview-image-generation"
) -> Tuple[bytes, str]:
    """Process an image request with Gemini and save the result.
    
    Args:
        contents: List containing the prompt and optionally an image
        prompt: Original prompt for filename generation
        model: Gemini model to use
        
    Returns:
        Path to the saved image file
    """
    # Call Gemini Vision API
    gemini_response = await call_gemini(
        contents,
        model=model,
        config=types.GenerateContentConfig(
            response_modalities=['Text', 'Image']
        )
    )
    
    # Generate a filename for the image
    filename = await convert_prompt_to_filename(prompt)
    
    # Save the image and return the path
    saved_image_path = await save_image(gemini_response, filename)
    return gemini_response, saved_image_path
async def process_image_transform(
    source_image: PIL.Image.Image, 
    optimized_edit_prompt: str, 
    original_edit_prompt: str
) -> Tuple[bytes, str]:
    """Process image transformation with Gemini.
    
    Args:
        source_image: PIL Image object to transform
        optimized_edit_prompt: Optimized text prompt for transformation
        original_edit_prompt: Original user prompt for naming
        
    Returns:
        Path to the transformed image file
    """
    # Create prompt for image transformation
    edit_instructions = get_image_transformation_prompt(optimized_edit_prompt)
    
    # Process with Gemini and return the result
    return await process_image_with_gemini(
        [edit_instructions, source_image],
        original_edit_prompt
    )
async def load_image_from_base64(encoded_image: str) -> Tuple[PIL.Image.Image, str]:
    """Load an image from a base64-encoded string.
    
    Args:
        encoded_image: Base64 encoded image data with header
        
    Returns:
        Tuple containing the PIL Image object and the image format
    """
    if not encoded_image.startswith('data:image/'):
        raise ValueError("Invalid image format. Expected data:image/[format];base64,[data]")
    
    try:
        # Extract the base64 data from the data URL
        image_format, image_data = encoded_image.split(';base64,')
        image_format = image_format.replace('data:', '')  # Get the MIME type e.g., "image/png"
        image_bytes = base64.b64decode(image_data)
        source_image = PIL.Image.open(BytesIO(image_bytes))
        logger.info(f"Successfully loaded image with format: {image_format}")
        return source_image, image_format
    except ValueError as e:
        logger.error(f"Error: Invalid image data format: {str(e)}")
        raise ValueError("Invalid image data format. Image must be in format 'data:image/[format];base64,[data]'")
    except base64.binascii.Error as e:
        logger.error(f"Error: Invalid base64 encoding: {str(e)}")
        raise ValueError("Invalid base64 encoding. Please provide a valid base64 encoded image.")
    except PIL.UnidentifiedImageError:
        logger.error("Error: Could not identify image format")
        raise ValueError("Could not identify image format. Supported formats include PNG, JPEG, GIF, WebP.")
    except Exception as e:
        logger.error(f"Error: Could not load image: {str(e)}")
        raise
# ==================== MCP Tools ====================
@mcp.tool()
async def generate_image_from_text(prompt: str) -> Tuple[bytes, str]:
    """Generate an image based on the given text prompt using Google's Gemini model.
    Args:
        prompt: User's text prompt describing the desired image to generate
        
    Returns:
        Path to the generated image file using Gemini's image generation capabilities
    """
    try:
        # Translate the prompt to English
        translated_prompt = await translate_prompt(prompt)
        
        # Create detailed generation prompt
        contents = get_image_generation_prompt(translated_prompt)
        
        # Process with Gemini and return the result
        return await process_image_with_gemini([contents], prompt)
        
    except Exception as e:
        error_msg = f"Error generating image: {str(e)}"
        logger.error(error_msg)
        return error_msg
@mcp.tool()
async def transform_image_from_encoded(encoded_image: str, prompt: str) -> Tuple[bytes, str]:
    """Transform an existing image based on the given text prompt using Google's Gemini model.
    Args:
        encoded_image: Base64 encoded image data with header. Must be in format:
                    "data:image/[format];base64,[data]"
                    Where [format] can be: png, jpeg, jpg, gif, webp, etc.
        prompt: Text prompt describing the desired transformation or modifications
        
    Returns:
        Path to the transformed image file saved on the server
    """
    try:
        logger.info(f"Processing transform_image_from_encoded request with prompt: {prompt}")
        # Load and validate the image
        source_image, _ = await load_image_from_base64(encoded_image)
        
        # Translate the prompt to English
        translated_prompt = await translate_prompt(prompt)
        
        # Process the transformation
        return await process_image_transform(source_image, translated_prompt, prompt)
        
    except Exception as e:
        error_msg = f"Error transforming image: {str(e)}"
        logger.error(error_msg)
        return error_msg
@mcp.tool()
async def transform_image_from_file(image_file_path: str, prompt: str) -> Tuple[bytes, str]:
    """Transform an existing image file based on the given text prompt using Google's Gemini model.
    Args:
        image_file_path: Path to the image file to be transformed
        prompt: Text prompt describing the desired transformation or modifications
        
    Returns:
        Path to the transformed image file saved on the server
    """
    try:
        logger.info(f"Processing transform_image_from_file request with prompt: {prompt}")
        logger.info(f"Image file path: {image_file_path}")
        # Validate file path
        if not os.path.exists(image_file_path):
            raise ValueError(f"Image file not found: {image_file_path}")
        # Translate the prompt to English
        translated_prompt = await translate_prompt(prompt)
            
        # Load the source image directly using PIL
        try:
            source_image = PIL.Image.open(image_file_path)
            logger.info(f"Successfully loaded image from file: {image_file_path}")
        except PIL.UnidentifiedImageError:
            logger.error("Error: Could not identify image format")
            raise ValueError("Could not identify image format. Supported formats include PNG, JPEG, GIF, WebP.")
        except Exception as e:
            logger.error(f"Error: Could not load image: {str(e)}")
            raise 
        
        # Process the transformation
        return await process_image_transform(source_image, translated_prompt, prompt)
        
    except Exception as e:
        error_msg = f"Error transforming image: {str(e)}"
        logger.error(error_msg)
        return error_msg
def main():
    logger.info("Starting Gemini Image Generator MCP server...")
    mcp.run(transport="stdio")
    logger.info("Server stopped")
if __name__ == "__main__":
    main()
```