# Directory Structure
```
├── .env.example
├── .gitignore
├── Dockerfile
├── examples
│ ├── flying_pig_scifi_city.png
│ └── pig_cute_baby_whale.png
├── LICENSE
├── pyproject.toml
├── README.md
├── smithery.yaml
├── src
│ └── mcp_server_gemini_image_generator
│ ├── __init__.py
│ ├── prompts.py
│ ├── server.py
│ └── utils.py
└── uv.lock
```
# Files
--------------------------------------------------------------------------------
/.env.example:
--------------------------------------------------------------------------------
```
1 | GEMINI_API_KEY=""
2 | OUTPUT_IMAGE_PATH=""
```
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
```
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | *.egg-info/
24 | .installed.cfg
25 | *.egg
26 | MANIFEST
27 |
28 | # PyInstaller
29 | *.manifest
30 | *.spec
31 |
32 | # Installer logs
33 | pip-log.txt
34 | pip-delete-this-directory.txt
35 |
36 | # Unit test / coverage reports
37 | htmlcov/
38 | .tox/
39 | .nox/
40 | .coverage
41 | .coverage.*
42 | .cache
43 | nosetests.xml
44 | coverage.xml
45 | *.cover
46 | .hypothesis/
47 | .pytest_cache/
48 |
49 | # Translations
50 | *.mo
51 | *.pot
52 |
53 | # Django stuff:
54 | *.log
55 | local_settings.py
56 | db.sqlite3
57 |
58 | # Flask stuff:
59 | instance/
60 | .webassets-cache
61 |
62 | # Scrapy stuff:
63 | .scrapy
64 |
65 | # Sphinx documentation
66 | docs/_build/
67 |
68 | # PyBuilder
69 | target/
70 |
71 | # Jupyter Notebook
72 | .ipynb_checkpoints
73 |
74 | # IPython
75 | profile_default/
76 | ipython_config.py
77 |
78 | # pyenv
79 | .python-version
80 |
81 | # celery beat schedule file
82 | celerybeat-schedule
83 |
84 | # SageMath parsed files
85 | *.sage.py
86 |
87 | # Environments
88 | .env
89 | .venv
90 | env/
91 | venv/
92 | ENV/
93 | env.bak/
94 | venv.bak/
95 |
96 | # Spyder project settings
97 | .spyderproject
98 | .spyproject
99 |
100 | # Rope project settings
101 | .ropeproject
102 |
103 | # mkdocs documentation
104 | /site
105 |
106 | # mypy
107 | .mypy_cache/
108 | .dmypy.json
109 | dmypy.json
110 |
111 | # Pyre type checker
112 | .pyre/
113 |
114 | # VS Code
115 | .vscode/
116 | .idea/
117 |
```
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
```markdown
1 | [](https://mseep.ai/app/qhdrl12-mcp-server-gemini-image-generator)
2 | [](https://smithery.ai/server/@qhdrl12/mcp-server-gemini-image-gen)
3 |
4 | <a href="https://glama.ai/mcp/servers/@qhdrl12/mcp-server-gemini-image-generator">
5 | <img width="380" height="200" src="https://glama.ai/mcp/servers/@qhdrl12/mcp-server-gemini-image-generator/badge" alt="Gemini Image Generator Server MCP server" />
6 | </a>
7 |
8 | # Gemini Image Generator MCP Server
9 |
10 | Generate high-quality images from text prompts using Google's Gemini model through the MCP protocol.
11 |
12 | ## Overview
13 |
14 | This MCP server allows any AI assistant to generate images using Google's Gemini AI model. The server handles prompt engineering, text-to-image conversion, filename generation, and local image storage, making it easy to create and manage AI-generated images through any MCP client.
15 |
16 | ## Features
17 |
18 | - Text-to-image generation using Gemini 2.0 Flash
19 | - Image-to-image transformation based on text prompts
20 | - Support for both file-based and base64-encoded images
21 | - Automatic intelligent filename generation based on prompts
22 | - Automatic translation of non-English prompts
23 | - Local image storage with configurable output path
24 | - Strict text exclusion from generated images
25 | - High-resolution image output
26 | - Direct access to both image data and file path
27 |
28 | ## Available MCP Tools
29 |
30 | The server provides the following MCP tools for AI assistants:
31 |
32 | ### 1. `generate_image_from_text`
33 |
34 | Creates a new image from a text prompt description.
35 |
36 | ```
37 | generate_image_from_text(prompt: str) -> Tuple[bytes, str]
38 | ```
39 |
40 | **Parameters:**
41 | - `prompt`: Text description of the image you want to generate
42 |
43 | **Returns:**
44 | - A tuple containing:
45 | - Raw image data (bytes)
46 | - Path to the saved image file (str)
47 |
48 | This dual return format allows AI assistants to either work with the image data directly or reference the saved file path.
49 |
50 | **Examples:**
51 | - "Generate an image of a sunset over mountains"
52 | - "Create a photorealistic flying pig in a sci-fi city"
53 |
54 | #### Example Output
55 |
56 | This image was generated using the prompt:
57 |
58 | ```
59 | "Hi, can you create a 3d rendered image of a pig with wings and a top hat flying over a happy futuristic scifi city with lots of greenery?"
60 | ```
61 |
62 | 
63 |
64 | *A 3D rendered pig with wings and a top hat flying over a futuristic sci-fi city filled with greenery*
65 |
66 | ### Known Issues
67 |
68 | When using this MCP server with Claude Desktop Host:
69 |
70 | 1. **Performance Issues**: Using `transform_image_from_encoded` may take significantly longer to process compared to other methods. This is due to the overhead of transferring large base64-encoded image data through the MCP protocol.
71 |
72 | 2. **Path Resolution Problems**: There may be issues with correctly resolving image paths when using Claude Desktop Host. The host application might not properly interpret the returned file paths, making it difficult to access the generated images.
73 |
74 | For the best experience, consider using alternative MCP clients or the `transform_image_from_file` method when possible.
75 |
76 | ### 2. `transform_image_from_encoded`
77 |
78 | Transforms an existing image based on a text prompt using base64-encoded image data.
79 |
80 | ```
81 | transform_image_from_encoded(encoded_image: str, prompt: str) -> Tuple[bytes, str]
82 | ```
83 |
84 | **Parameters:**
85 | - `encoded_image`: Base64 encoded image data with format header (must be in format: "data:image/[format];base64,[data]")
86 | - `prompt`: Text description of how you want to transform the image
87 |
88 | **Returns:**
89 | - A tuple containing:
90 | - Raw transformed image data (bytes)
91 | - Path to the saved transformed image file (str)
92 |
93 | **Example:**
94 | - "Add snow to this landscape"
95 | - "Change the background to a beach"
96 |
97 | ### 3. `transform_image_from_file`
98 |
99 | Transforms an existing image file based on a text prompt.
100 |
101 | ```
102 | transform_image_from_file(image_file_path: str, prompt: str) -> Tuple[bytes, str]
103 | ```
104 |
105 | **Parameters:**
106 | - `image_file_path`: Path to the image file to be transformed
107 | - `prompt`: Text description of how you want to transform the image
108 |
109 | **Returns:**
110 | - A tuple containing:
111 | - Raw transformed image data (bytes)
112 | - Path to the saved transformed image file (str)
113 |
114 | **Examples:**
115 | - "Add a llama next to the person in this image"
116 | - "Make this daytime scene look like night time"
117 |
118 | #### Example Transformation
119 |
120 | Using the flying pig image created above, we applied a transformation with the following prompt:
121 |
122 | ```
123 | "Add a cute baby whale flying alongside the pig"
124 | ```
125 |
126 | **Before:**
127 | 
128 |
129 | **After:**
130 | 
131 |
132 | *The original flying pig image with a cute baby whale added flying alongside it*
133 |
134 | ## Setup
135 |
136 | ### Prerequisites
137 |
138 | - Python 3.11+
139 | - Google AI API key (Gemini)
140 | - MCP host application (Claude Desktop App, Cursor, or other MCP-compatible clients)
141 |
142 | ### Getting a Gemini API Key
143 |
144 | 1. Visit [Google AI Studio API Keys page](https://aistudio.google.com/apikey)
145 | 2. Sign in with your Google account
146 | 3. Click "Create API Key"
147 | 4. Copy your new API key for use in the configuration
148 | 5. Note: The API key provides a certain quota of free usage per month. You can check your usage in the Google AI Studio
149 |
150 | ### Installation
151 |
152 | ### Installing via Smithery
153 |
154 | To install Gemini Image Generator MCP for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@qhdrl12/mcp-server-gemini-image-gen):
155 |
156 | ```bash
157 | npx -y @smithery/cli install @qhdrl12/mcp-server-gemini-image-gen --client claude
158 | ```
159 |
160 | ### Manual Installation
161 | 1. Clone the repository:
162 | ```bash
163 | git clone https://github.com/your-username/mcp-server-gemini-image-generator.git
164 | cd mcp-server-gemini-image-generator
165 | ```
166 |
167 | 2. Create a virtual environment and install dependencies:
168 | ```bash
169 | # Using uv (recommended)
170 | uv venv
171 | source .venv/bin/activate # On Windows: .venv\Scripts\activate
172 | uv pip install -e .
173 |
174 | # Or using regular venv
175 | python -m venv .venv
176 | source .venv/bin/activate # On Windows: .venv\Scripts\activate
177 | pip install -e .
178 | ```
179 |
180 | 3. Set up environment variables (choose one method):
181 |
182 | **Method A: Using .env file (optional)**
183 | ```bash
184 | # Create .env file in the project root
185 | cat > .env << 'EOF'
186 | GEMINI_API_KEY=your-gemini-api-key-here
187 | OUTPUT_IMAGE_PATH=/path/to/save/images
188 | EOF
189 | ```
190 |
191 | **Method B: Set directly in Claude Desktop config (recommended)**
192 | - Set environment variables directly in the `claude_desktop_config.json` (shown in configuration section below)
193 |
194 | ### Configure Claude Desktop
195 |
196 | Add the following to your `claude_desktop_config.json`:
197 |
198 | - **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
199 | - **Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
200 |
201 | ```json
202 | {
203 | "mcpServers": {
204 | "gemini-image-generator": {
205 | "command": "uv",
206 | "args": [
207 | "--directory",
208 | "/absolute/path/to/mcp-server-gemini-image-generator",
209 | "run",
210 | "mcp-server-gemini-image-generator"
211 | ],
212 | "env": {
213 | "GEMINI_API_KEY": "your-actual-gemini-api-key-here",
214 | "OUTPUT_IMAGE_PATH": "/absolute/path/to/your/images/directory"
215 | }
216 | }
217 | }
218 | }
219 | ```
220 |
221 | **Important Configuration Notes:**
222 |
223 | 1. **Replace paths with your actual paths:**
224 | - Change `/absolute/path/to/mcp-server-gemini-image-generator` to the actual location where you cloned this repository
225 | - Change `/absolute/path/to/your/images/directory` to where you want generated images to be saved
226 |
227 | 2. **Environment Variables:**
228 | - Replace `your-actual-gemini-api-key-here` with your real Gemini API key from Google AI Studio
229 | - Use absolute paths for `OUTPUT_IMAGE_PATH` to ensure images are saved correctly
230 |
231 | 3. **Example with real paths:**
232 | ```json
233 | {
234 | "mcpServers": {
235 | "gemini-image-generator": {
236 | "command": "uv",
237 | "args": [
238 | "--directory",
239 | "/Users/username/Projects/mcp-server-gemini-image-generator",
240 | "run",
241 | "mcp-server-gemini-image-generator"
242 | ],
243 | "env": {
244 | "GEMINI_API_KEY": "GEMINI_API_KEY",
245 | "OUTPUT_IMAGE_PATH": "OUTPUT_IMAGE_PATH"
246 | }
247 | }
248 | }
249 | }
250 | ```
251 |
252 | ## Usage
253 |
254 | Once installed and configured, you can ask Claude to generate or transform images using prompts like:
255 |
256 | ### Generating New Images
257 | - "Generate an image of a sunset over mountains"
258 | - "Create an illustration of a futuristic cityscape"
259 | - "Make a picture of a cat wearing sunglasses"
260 |
261 | ### Transforming Existing Images
262 | - "Transform this image by adding snow to the scene"
263 | - "Edit this photo to make it look like it was taken at night"
264 | - "Add a dragon flying in the background of this picture"
265 |
266 | The generated/transformed images will be saved to your configured output path and displayed in Claude. With the updated return types, AI assistants can also work directly with the image data without needing to access the saved files.
267 |
268 | ## Testing
269 |
270 | You can test the application by running the FastMCP development server:
271 |
272 | ```
273 | fastmcp dev server.py
274 | ```
275 |
276 | This command starts a local development server and makes the MCP Inspector available at http://localhost:5173/.
277 | The MCP Inspector provides a convenient web interface where you can directly test the image generation tool without needing to use Claude or another MCP client.
278 | You can enter text prompts, execute the tool, and see the results immediately, which is helpful for development and debugging.
279 |
280 | ## License
281 |
282 | MIT License
```
--------------------------------------------------------------------------------
/src/mcp_server_gemini_image_generator/__init__.py:
--------------------------------------------------------------------------------
```python
1 | import asyncio
2 |
3 | from . import server
4 |
5 |
6 | def main() -> None:
7 | """Start the MCP server."""
8 | server.main()
9 |
10 | __all__ = [
11 | "main",
12 | "server",
13 | ]
```
--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
```toml
1 | [project]
2 | name = "mcp-server-gemini-image-generator"
3 | version = "0.1.0"
4 | description = ""
5 | readme = "README.md"
6 | requires-python = ">=3.11"
7 | authors = [{name = "bongki Lee",email = "[email protected]"}]
8 | keywords = ["http", "mcp", "llm", "automation"]
9 | license = { text = "MIT" }
10 | dependencies = [
11 | "fastmcp>=0.4.1",
12 | "google>=3.0.0",
13 | "google-genai>=1.7.0",
14 | "pillow>=11.1.0",
15 | ]
16 |
17 | [project.scripts]
18 | mcp-server-gemini-image-generator = "mcp_server_gemini_image_generator:main"
19 |
20 | [build-system]
21 | requires = ["hatchling"]
22 | build-backend = "hatchling.build"
23 |
24 | [tool.uv]
25 | dev-dependencies = ["pyright>=1.1.389", "ruff>=0.7.3"]
26 |
```
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
```dockerfile
1 |
2 | FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim
3 |
4 | ENV UV_COMPILE_BYTECODE=1
5 | ENV UV_LINK_MODE=copy
6 | ENV UV_CACHE_DIR=/opt/uv-cache/
7 |
8 | RUN apt-get update && apt-get install -y --no-install-recommends git
9 |
10 | WORKDIR /app
11 |
12 | RUN --mount=type=cache,target=UV_CACHE_DIR \
13 | --mount=type=bind,source=uv.lock,target=uv.lock \
14 | --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
15 | uv sync --frozen --no-install-project --no-dev --no-editable
16 |
17 | ADD . /app
18 |
19 | RUN --mount=type=cache,target=UV_CACHE_DIR \
20 | uv sync --frozen --no-dev --no-editable
21 |
22 | # Create image output directory
23 | ARG OUTPUT_IMAGE_PATH=/images
24 | RUN mkdir -p ${OUTPUT_IMAGE_PATH}
25 | ENV OUTPUT_IMAGE_PATH=${OUTPUT_IMAGE_PATH}
26 |
27 | # Add virtual environment to PATH
28 | ENV PATH="/app/.venv/bin:$PATH"
29 |
30 | # Set the entrypoint to the MCP server command
31 | CMD ["mcp-server-gemini-image-generator"]
```
--------------------------------------------------------------------------------
/smithery.yaml:
--------------------------------------------------------------------------------
```yaml
1 | # Smithery configuration file: https://smithery.ai/docs/build/project-config
2 |
3 | startCommand:
4 | type: stdio
5 | configSchema:
6 | # JSON Schema defining the configuration options for the MCP.
7 | type: object
8 | required:
9 | - geminiApiKey
10 | properties:
11 | geminiApiKey:
12 | type: string
13 | description: Google Gemini API key
14 | outputImagePath:
15 | type: string
16 | default: /images
17 | description: Path to save generated images inside container
18 | commandFunction:
19 | # A JS function that produces the CLI command based on the given config to start the MCP on stdio.
20 | |-
21 | (config) => ({
22 | command: 'python',
23 | args: ['server.py'],
24 | env: {
25 | GEMINI_API_KEY: config.geminiApiKey,
26 | OUTPUT_IMAGE_PATH: config.outputImagePath
27 | }
28 | })
29 | exampleConfig:
30 | geminiApiKey: YOUR_GEMINI_API_KEY
31 | outputImagePath: /images
32 |
```
--------------------------------------------------------------------------------
/src/mcp_server_gemini_image_generator/utils.py:
--------------------------------------------------------------------------------
```python
1 | import base64
2 | import io
3 | import logging
4 | import os
5 |
6 | import PIL.Image
7 |
8 | logger = logging.getLogger(__name__)
9 |
10 | OUTPUT_IMAGE_PATH = os.getenv("OUTPUT_IMAGE_PATH") or os.path.expanduser("~/gen_image")
11 |
12 | if not os.path.exists(OUTPUT_IMAGE_PATH):
13 | os.makedirs(OUTPUT_IMAGE_PATH)
14 |
15 | def validate_base64_image(base64_string: str) -> bool:
16 | """Validate if a string is a valid base64-encoded image.
17 |
18 | Args:
19 | base64_string: The base64 string to validate
20 |
21 | Returns:
22 | True if valid, False otherwise
23 | """
24 | try:
25 | # Try to decode base64
26 | image_data = base64.b64decode(base64_string)
27 |
28 | # Try to open as image
29 | with PIL.Image.open(io.BytesIO(image_data)) as img:
30 | logger.debug(
31 | f"Validated base64 image, format: {img.format}, size: {img.size}"
32 | )
33 | return True
34 |
35 | except Exception as e:
36 | logger.warning(f"Invalid base64 image: {str(e)}")
37 | return False
38 |
39 | async def save_image(image_data: bytes, filename: str) -> str:
40 | """Save image data to disk with a descriptive filename.
41 |
42 | Args:
43 | image_data: Raw image data
44 | filename: Base string to use for generating filename
45 |
46 | Returns:
47 | Path to the saved image file
48 | """
49 | try:
50 | # Open image from bytes
51 | image = PIL.Image.open(io.BytesIO(image_data))
52 |
53 | # Save the image
54 | image_path = os.path.join(OUTPUT_IMAGE_PATH, f"{filename}.png")
55 | image.save(image_path)
56 | logger.info(f"Image saved to {image_path}")
57 |
58 | # Display the image
59 | image.show()
60 |
61 | return image_path
62 | except Exception as e:
63 | logger.error(f"Error saving image: {str(e)}")
64 | raise
65 |
```
--------------------------------------------------------------------------------
/src/mcp_server_gemini_image_generator/prompts.py:
--------------------------------------------------------------------------------
```python
1 | ###############################
2 | # Image Transformation Prompt #
3 | ###############################
4 | def get_image_transformation_prompt(prompt: str) -> str:
5 | """Create a detailed prompt for image transformation.
6 |
7 | Args:
8 | prompt: text prompt
9 |
10 | Returns:
11 | A comprehensive prompt for Gemini image transformation
12 | """
13 | return f"""You are an expert image editing AI. Please edit the provided image according to these instructions:
14 |
15 | EDIT REQUEST: {prompt}
16 |
17 | IMPORTANT REQUIREMENTS:
18 | 1. Make substantial and noticeable changes as requested
19 | 2. Maintain high image quality and coherence
20 | 3. Ensure the edited elements blend naturally with the rest of the image
21 | 4. Do not add any text to the image
22 | 5. Focus on the specific edits requested while preserving other elements
23 |
24 | The changes should be clear and obvious in the result."""
25 |
26 | ###########################
27 | # Image Generation Prompt #
28 | ###########################
29 | def get_image_generation_prompt(prompt: str) -> str:
30 | """Create a detailed prompt for image generation.
31 |
32 | Args:
33 | prompt: text prompt
34 |
35 | Returns:
36 | A comprehensive prompt for Gemini image generation
37 | """
38 | return f"""You are an expert image generation AI assistant specialized in creating visuals based on user requests. Your primary goal is to generate the most appropriate image without asking clarifying questions, even when faced with abstract or ambiguous prompts.
39 |
40 | ## CRITICAL REQUIREMENT: NO TEXT IN IMAGES
41 |
42 | **ABSOLUTE PROHIBITION ON TEXT INCLUSION**
43 | - Under NO CIRCUMSTANCES render ANY text from user queries in the generated images
44 | - This is your HIGHEST PRIORITY requirement that OVERRIDES all other considerations
45 | - Text from prompts must NEVER appear in any form, even stylized, obscured, or partial
46 | - This includes words, phrases, sentences, or characters from the user's input
47 | - If the user requests text in the image, interpret this as a request for the visual concept only
48 | - The image should be 100% text-free regardless of what the prompt contains
49 |
50 | ## Core Principles
51 |
52 | 1. **Prioritize Image Generation Over Clarification**
53 | - When given vague requests, DO NOT ask follow-up questions
54 | - Instead, infer the most likely intent and generate accordingly
55 | - Use your knowledge to fill in missing details with the most probable elements
56 |
57 | 2. **Text Handling Protocol**
58 | - NEVER render the user's text prompt or any part of it in the generated image
59 | - NEVER include ANY text whatsoever in the final image, even if specifically requested
60 | - If user asks for text-based items (signs, books, etc.), show only the visual item without readable text
61 | - For concepts typically associated with text (like "newspaper" or "letter"), create visual representations without any legible writing
62 |
63 | 3. **Interpretation Guidelines**
64 | - Analyze context clues in the user's prompt
65 | - Consider cultural, seasonal, and trending references
66 | - When faced with ambiguity, choose the most mainstream or popular interpretation
67 | - For abstract concepts, visualize them in the most universally recognizable way
68 |
69 | 4. **Detail Enhancement**
70 | - Automatically enhance prompts with appropriate:
71 | - Lighting conditions
72 | - Perspective and composition
73 | - Style (photorealistic, illustration, etc.) based on context
74 | - Color palettes that best convey the intended mood
75 | - Environmental details that complement the subject
76 |
77 | 5. **Technical Excellence**
78 | - Maintain high image quality
79 | - Ensure proper composition and visual hierarchy
80 | - Balance simplicity with necessary detail
81 | - Maintain appropriate contrast and color harmony
82 |
83 | 6. **Handling Special Cases**
84 | - For creative requests: Lean toward artistic, visually striking interpretations
85 | - For informational requests: Prioritize clarity and accuracy
86 | - For emotional content: Focus on conveying the appropriate mood and tone
87 | - For locations: Include recognizable landmarks or characteristics
88 |
89 | ## Implementation Protocol
90 |
91 | 1. Parse user request
92 | 2. **TEXT REMOVAL CHECK**: Identify and remove ALL text elements from consideration
93 | 3. Identify core subjects and actions
94 | 4. Determine most likely interpretation if ambiguous
95 | 5. Enhance with appropriate details, style, and composition
96 | 6. **FINAL VERIFICATION**: Confirm image contains ZERO text elements from user query
97 | 7. Generate image immediately without asking for clarification
98 | 8. Present the completed image to the user
99 |
100 | ## Safety Measure
101 |
102 | Before finalizing ANY image:
103 | - Double-check that NO text from the user query appears in the image
104 | - If ANY text is detected, regenerate the image without the text
105 | - This verification is MANDATORY for every image generation
106 |
107 | Remember: Your success is measured by your ability to produce satisfying images without requiring additional input from users AND without including ANY text from queries in the images. Be decisive and confident in your interpretations while maintaining absolute adherence to the no-text requirement.
108 |
109 | Query: {prompt}
110 | """
111 |
112 | ####################
113 | # Translate Prompt #
114 | ####################
115 | def get_translate_prompt(prompt: str) -> str:
116 | """Translate the prompt into English if it's not already in English.
117 |
118 | Args:
119 | prompt: text prompt
120 |
121 | Returns:
122 | A comprehensive prompt for Gemini translation
123 | """
124 | return f"""Translate the following prompt into English if it's not already in English. Your task is ONLY to translate accurately while preserving:
125 |
126 | 1. EXACT original intent and meaning
127 | 2. All specific details and nuances
128 | 3. Style and tone of the original prompt
129 | 4. Technical terms and concepts
130 |
131 | DO NOT:
132 | - Add new details or creative elements not in the original
133 | - Remove any details from the original
134 | - Change the style or complexity level
135 | - Reinterpret or assume what the user "really meant"
136 |
137 | If the text is already in English, return it exactly as provided with no changes.
138 |
139 | Original prompt: {prompt}
140 |
141 | Return only the translated English prompt, nothing else."""
```
--------------------------------------------------------------------------------
/src/mcp_server_gemini_image_generator/server.py:
--------------------------------------------------------------------------------
```python
1 | import base64
2 | import os
3 | import logging
4 | import sys
5 | import uuid
6 | from io import BytesIO
7 | from typing import Optional, Any, Union, List, Tuple
8 |
9 | import PIL.Image
10 | from google import genai
11 | from google.genai import types
12 | from mcp.server.fastmcp import FastMCP
13 |
14 | from .prompts import get_image_generation_prompt, get_image_transformation_prompt, get_translate_prompt
15 | from .utils import save_image
16 |
17 |
18 | # Setup logging
19 | logging.basicConfig(
20 | level=logging.INFO,
21 | format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
22 | stream=sys.stderr
23 | )
24 | logger = logging.getLogger(__name__)
25 |
26 | # Initialize MCP server
27 | mcp = FastMCP("mcp-server-gemini-image-generator")
28 |
29 |
30 | # ==================== Gemini API Interaction ====================
31 |
32 | async def call_gemini(
33 | contents: List[Any],
34 | model: str = "gemini-2.0-flash-preview-image-generation",
35 | config: Optional[types.GenerateContentConfig] = None,
36 | text_only: bool = False
37 | ) -> Union[str, bytes]:
38 | """Call Gemini API with flexible configuration for different use cases.
39 |
40 | Args:
41 | contents: The content to send to Gemini. list containing text and/or images
42 | model: The Gemini model to use
43 | config: Optional configuration for the Gemini API call
44 | text_only: If True, extract and return only text from the response
45 |
46 | Returns:
47 | If text_only is True: str - The text response from Gemini
48 | Otherwise: bytes - The binary image data from Gemini
49 |
50 | Raises:
51 | Exception: If there's an error calling the Gemini API
52 | """
53 | try:
54 | # Initialize Gemini client
55 | api_key = os.environ.get("GEMINI_API_KEY")
56 | if not api_key:
57 | raise ValueError("GEMINI_API_KEY environment variable not set")
58 |
59 | client = genai.Client(api_key=api_key)
60 |
61 | # Generate content using Gemini
62 | response = client.models.generate_content(
63 | model=model,
64 | contents=contents,
65 | config=config
66 | )
67 |
68 | logger.info(f"Response received from Gemini API using model {model}")
69 |
70 | # For text-only calls, extract just the text
71 | if text_only:
72 | return response.candidates[0].content.parts[0].text.strip()
73 |
74 | # Return the image data
75 | for part in response.candidates[0].content.parts:
76 | if part.inline_data is not None:
77 | return part.inline_data.data
78 |
79 | raise ValueError("No image data found in Gemini response")
80 |
81 | except Exception as e:
82 | logger.error(f"Error calling Gemini API: {str(e)}")
83 | raise
84 |
85 |
86 | # ==================== Text Utility Functions ====================
87 |
88 | async def convert_prompt_to_filename(prompt: str) -> str:
89 | """Convert a text prompt into a suitable filename for the generated image using Gemini AI.
90 |
91 | Args:
92 | prompt: The text prompt used to generate the image
93 |
94 | Returns:
95 | A concise, descriptive filename generated based on the prompt
96 | """
97 | try:
98 | # Create a prompt for Gemini to generate a filename
99 | filename_prompt = f"""
100 | Based on this image description: "{prompt}"
101 |
102 | Generate a short, descriptive file name suitable for the requested image.
103 | The filename should:
104 | - Be concise (maximum 5 words)
105 | - Use underscores between words
106 | - Not include any file extension
107 | - Only return the filename, nothing else
108 | """
109 |
110 | # Call Gemini and get the filename
111 | generated_filename = await call_gemini(filename_prompt, text_only=True)
112 | logger.info(f"Generated filename: {generated_filename}")
113 |
114 | # Return the filename only, without path or extension
115 | return generated_filename
116 |
117 | except Exception as e:
118 | logger.error(f"Error generating filename with Gemini: {str(e)}")
119 | # Fallback to a simple filename if Gemini fails
120 | truncated_text = prompt[:12].strip()
121 | return f"image_{truncated_text}_{str(uuid.uuid4())[:8]}"
122 |
123 |
124 | async def translate_prompt(text: str) -> str:
125 | """Translate and optimize the user's prompt to English for better image generation results.
126 |
127 | Args:
128 | text: The original prompt in any language
129 |
130 | Returns:
131 | English translation of the prompt with preserved intent
132 | """
133 | try:
134 | # Create a prompt for translation with strict intent preservation
135 | prompt = get_translate_prompt(text)
136 |
137 | # Call Gemini and get the translated prompt
138 | translated_prompt = await call_gemini(prompt, text_only=True)
139 | logger.info(f"Original prompt: {text}")
140 | logger.info(f"Translated prompt: {translated_prompt}")
141 |
142 | return translated_prompt
143 |
144 | except Exception as e:
145 | logger.error(f"Error translating prompt: {str(e)}")
146 | # Return original text if translation fails
147 | return text
148 |
149 |
150 | # ==================== Image Processing Functions ====================
151 |
152 | async def process_image_with_gemini(
153 | contents: List[Any],
154 | prompt: str,
155 | model: str = "gemini-2.0-flash-preview-image-generation"
156 | ) -> Tuple[bytes, str]:
157 | """Process an image request with Gemini and save the result.
158 |
159 | Args:
160 | contents: List containing the prompt and optionally an image
161 | prompt: Original prompt for filename generation
162 | model: Gemini model to use
163 |
164 | Returns:
165 | Path to the saved image file
166 | """
167 | # Call Gemini Vision API
168 | gemini_response = await call_gemini(
169 | contents,
170 | model=model,
171 | config=types.GenerateContentConfig(
172 | response_modalities=['Text', 'Image']
173 | )
174 | )
175 |
176 | # Generate a filename for the image
177 | filename = await convert_prompt_to_filename(prompt)
178 |
179 | # Save the image and return the path
180 | saved_image_path = await save_image(gemini_response, filename)
181 |
182 | return gemini_response, saved_image_path
183 |
184 |
185 | async def process_image_transform(
186 | source_image: PIL.Image.Image,
187 | optimized_edit_prompt: str,
188 | original_edit_prompt: str
189 | ) -> Tuple[bytes, str]:
190 | """Process image transformation with Gemini.
191 |
192 | Args:
193 | source_image: PIL Image object to transform
194 | optimized_edit_prompt: Optimized text prompt for transformation
195 | original_edit_prompt: Original user prompt for naming
196 |
197 | Returns:
198 | Path to the transformed image file
199 | """
200 | # Create prompt for image transformation
201 | edit_instructions = get_image_transformation_prompt(optimized_edit_prompt)
202 |
203 | # Process with Gemini and return the result
204 | return await process_image_with_gemini(
205 | [edit_instructions, source_image],
206 | original_edit_prompt
207 | )
208 |
209 |
210 | async def load_image_from_base64(encoded_image: str) -> Tuple[PIL.Image.Image, str]:
211 | """Load an image from a base64-encoded string.
212 |
213 | Args:
214 | encoded_image: Base64 encoded image data with header
215 |
216 | Returns:
217 | Tuple containing the PIL Image object and the image format
218 | """
219 | if not encoded_image.startswith('data:image/'):
220 | raise ValueError("Invalid image format. Expected data:image/[format];base64,[data]")
221 |
222 | try:
223 | # Extract the base64 data from the data URL
224 | image_format, image_data = encoded_image.split(';base64,')
225 | image_format = image_format.replace('data:', '') # Get the MIME type e.g., "image/png"
226 | image_bytes = base64.b64decode(image_data)
227 | source_image = PIL.Image.open(BytesIO(image_bytes))
228 | logger.info(f"Successfully loaded image with format: {image_format}")
229 | return source_image, image_format
230 | except ValueError as e:
231 | logger.error(f"Error: Invalid image data format: {str(e)}")
232 | raise ValueError("Invalid image data format. Image must be in format 'data:image/[format];base64,[data]'")
233 | except base64.binascii.Error as e:
234 | logger.error(f"Error: Invalid base64 encoding: {str(e)}")
235 | raise ValueError("Invalid base64 encoding. Please provide a valid base64 encoded image.")
236 | except PIL.UnidentifiedImageError:
237 | logger.error("Error: Could not identify image format")
238 | raise ValueError("Could not identify image format. Supported formats include PNG, JPEG, GIF, WebP.")
239 | except Exception as e:
240 | logger.error(f"Error: Could not load image: {str(e)}")
241 | raise
242 |
243 |
244 | # ==================== MCP Tools ====================
245 |
246 | @mcp.tool()
247 | async def generate_image_from_text(prompt: str) -> Tuple[bytes, str]:
248 | """Generate an image based on the given text prompt using Google's Gemini model.
249 |
250 | Args:
251 | prompt: User's text prompt describing the desired image to generate
252 |
253 | Returns:
254 | Path to the generated image file using Gemini's image generation capabilities
255 | """
256 | try:
257 | # Translate the prompt to English
258 | translated_prompt = await translate_prompt(prompt)
259 |
260 | # Create detailed generation prompt
261 | contents = get_image_generation_prompt(translated_prompt)
262 |
263 | # Process with Gemini and return the result
264 | return await process_image_with_gemini([contents], prompt)
265 |
266 | except Exception as e:
267 | error_msg = f"Error generating image: {str(e)}"
268 | logger.error(error_msg)
269 | return error_msg
270 |
271 |
272 | @mcp.tool()
273 | async def transform_image_from_encoded(encoded_image: str, prompt: str) -> Tuple[bytes, str]:
274 | """Transform an existing image based on the given text prompt using Google's Gemini model.
275 |
276 | Args:
277 | encoded_image: Base64 encoded image data with header. Must be in format:
278 | "data:image/[format];base64,[data]"
279 | Where [format] can be: png, jpeg, jpg, gif, webp, etc.
280 | prompt: Text prompt describing the desired transformation or modifications
281 |
282 | Returns:
283 | Path to the transformed image file saved on the server
284 | """
285 | try:
286 | logger.info(f"Processing transform_image_from_encoded request with prompt: {prompt}")
287 |
288 | # Load and validate the image
289 | source_image, _ = await load_image_from_base64(encoded_image)
290 |
291 | # Translate the prompt to English
292 | translated_prompt = await translate_prompt(prompt)
293 |
294 | # Process the transformation
295 | return await process_image_transform(source_image, translated_prompt, prompt)
296 |
297 | except Exception as e:
298 | error_msg = f"Error transforming image: {str(e)}"
299 | logger.error(error_msg)
300 | return error_msg
301 |
302 |
303 | @mcp.tool()
304 | async def transform_image_from_file(image_file_path: str, prompt: str) -> Tuple[bytes, str]:
305 | """Transform an existing image file based on the given text prompt using Google's Gemini model.
306 |
307 | Args:
308 | image_file_path: Path to the image file to be transformed
309 | prompt: Text prompt describing the desired transformation or modifications
310 |
311 | Returns:
312 | Path to the transformed image file saved on the server
313 | """
314 | try:
315 | logger.info(f"Processing transform_image_from_file request with prompt: {prompt}")
316 | logger.info(f"Image file path: {image_file_path}")
317 |
318 | # Validate file path
319 | if not os.path.exists(image_file_path):
320 | raise ValueError(f"Image file not found: {image_file_path}")
321 |
322 | # Translate the prompt to English
323 | translated_prompt = await translate_prompt(prompt)
324 |
325 | # Load the source image directly using PIL
326 | try:
327 | source_image = PIL.Image.open(image_file_path)
328 | logger.info(f"Successfully loaded image from file: {image_file_path}")
329 | except PIL.UnidentifiedImageError:
330 | logger.error("Error: Could not identify image format")
331 | raise ValueError("Could not identify image format. Supported formats include PNG, JPEG, GIF, WebP.")
332 | except Exception as e:
333 | logger.error(f"Error: Could not load image: {str(e)}")
334 | raise
335 |
336 | # Process the transformation
337 | return await process_image_transform(source_image, translated_prompt, prompt)
338 |
339 | except Exception as e:
340 | error_msg = f"Error transforming image: {str(e)}"
341 | logger.error(error_msg)
342 | return error_msg
343 |
344 |
345 | def main():
346 | logger.info("Starting Gemini Image Generator MCP server...")
347 | mcp.run(transport="stdio")
348 | logger.info("Server stopped")
349 |
350 | if __name__ == "__main__":
351 | main()
```