# Directory Structure
```
├── .gitignore
├── commit_message.txt
├── package.json
├── pnpm-lock.yaml
├── README.md
├── src
│ └── index.ts
└── tsconfig.json
```
# Files
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
```
node_modules/
build/
*.log
.env*
```
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
```markdown
# Ollama MCP Server
🚀 A powerful bridge between Ollama and the Model Context Protocol (MCP), enabling seamless integration of Ollama's local LLM capabilities into your MCP-powered applications.
## 🌟 Features
### Complete Ollama Integration
- **Full API Coverage**: Access all essential Ollama functionality through a clean MCP interface
- **OpenAI-Compatible Chat**: Drop-in replacement for OpenAI's chat completion API
- **Local LLM Power**: Run AI models locally with full control and privacy
### Core Capabilities
- 🔄 **Model Management**
- Pull models from registries
- Push models to registries
- List available models
- Create custom models from Modelfiles
- Copy and remove models
- 🤖 **Model Execution**
- Run models with customizable prompts
- Chat completion API with system/user/assistant roles
- Configurable parameters (temperature, timeout)
- Raw mode support for direct responses
- 🛠 **Server Control**
- Start and manage Ollama server
- View detailed model information
- Error handling and timeout management
## 🚀 Getting Started
### Prerequisites
- [Ollama](https://ollama.ai) installed on your system
- Node.js and npm/pnpm
### Installation
1. Install dependencies:
```bash
pnpm install
```
2. Build the server:
```bash
pnpm run build
```
### Configuration
Add the server to your MCP configuration:
#### For Claude Desktop:
MacOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
Windows: `%APPDATA%/Claude/claude_desktop_config.json`
```json
{
"mcpServers": {
"ollama": {
"command": "node",
"args": ["/path/to/ollama-server/build/index.js"],
"env": {
"OLLAMA_HOST": "http://127.0.0.1:11434" // Optional: customize Ollama API endpoint
}
}
}
}
```
## 🛠 Usage Examples
### Pull and Run a Model
```typescript
// Pull a model
await mcp.use_mcp_tool({
server_name: "ollama",
tool_name: "pull",
arguments: {
name: "llama2"
}
});
// Run the model
await mcp.use_mcp_tool({
server_name: "ollama",
tool_name: "run",
arguments: {
name: "llama2",
prompt: "Explain quantum computing in simple terms"
}
});
```
### Chat Completion (OpenAI-compatible)
```typescript
await mcp.use_mcp_tool({
server_name: "ollama",
tool_name: "chat_completion",
arguments: {
model: "llama2",
messages: [
{
role: "system",
content: "You are a helpful assistant."
},
{
role: "user",
content: "What is the meaning of life?"
}
],
temperature: 0.7
}
});
```
### Create Custom Model
```typescript
await mcp.use_mcp_tool({
server_name: "ollama",
tool_name: "create",
arguments: {
name: "custom-model",
modelfile: "./path/to/Modelfile"
}
});
```
## 🔧 Advanced Configuration
- `OLLAMA_HOST`: Configure custom Ollama API endpoint (default: http://127.0.0.1:11434)
- Timeout settings for model execution (default: 60 seconds)
- Temperature control for response randomness (0-2 range)
## 🤝 Contributing
Contributions are welcome! Feel free to:
- Report bugs
- Suggest new features
- Submit pull requests
## 📝 License
MIT License - feel free to use in your own projects!
---
Built with ❤️ for the MCP ecosystem
```
--------------------------------------------------------------------------------
/tsconfig.json:
--------------------------------------------------------------------------------
```json
{
"compilerOptions": {
"target": "ES2022",
"module": "Node16",
"moduleResolution": "Node16",
"outDir": "./build",
"rootDir": "./src",
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"forceConsistentCasingInFileNames": true
},
"include": ["src/**/*"],
"exclude": ["node_modules"]
}
```
--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------
```json
{
"name": "ollama-mcp",
"version": "0.1.0",
"description": "An ollama MCP server designed to allow Cline or other MCP supporting tools to access ollama",
"private": true,
"type": "module",
"bin": {
"ollama-mcp": "./build/index.js"
},
"files": [
"build"
],
"scripts": {
"build": "tsc && node -e \"import('node:fs').then(fs => fs.chmodSync('build/index.js', '755'))\"",
"prepare": "npm run build",
"watch": "tsc --watch",
"inspector": "npx @modelcontextprotocol/inspector build/index.js"
},
"dependencies": {
"@modelcontextprotocol/sdk": "0.6.0",
"axios": "^1.7.9"
},
"devDependencies": {
"@types/node": "^20.11.24",
"typescript": "^5.3.3"
}
}
```
--------------------------------------------------------------------------------
/commit_message.txt:
--------------------------------------------------------------------------------
```
feat: add streaming support for chat completions
Implemented real-time streaming capability for the chat completion API to:
- Enable progressive output of long responses
- Improve user experience with immediate feedback
- Reduce perceived latency for large generations
- Support interactive applications
The streaming is implemented using Server-Sent Events (SSE) protocol:
- Added SSE transport handler in OllamaServer
- Modified chat_completion tool to support streaming
- Configured proper response headers and event formatting
- Added streaming parameter to API schema
Testing confirmed successful streaming of:
- Short responses (sonnets)
- Long responses (technical articles)
- Various content types and lengths
Resolves: #123 (Add streaming support for chat completions)
```
--------------------------------------------------------------------------------
/src/index.ts:
--------------------------------------------------------------------------------
```typescript
#!/usr/bin/env node
import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { SSEServerTransport } from '@modelcontextprotocol/sdk/server/sse.js';
import {
CallToolRequestSchema,
ErrorCode,
ListToolsRequestSchema,
McpError,
} from '@modelcontextprotocol/sdk/types.js';
import axios from 'axios';
import { exec } from 'child_process';
import { promisify } from 'util';
import http from 'node:http';
import type { AddressInfo } from 'node:net';
const execAsync = promisify(exec);
// Default Ollama API endpoint
const OLLAMA_HOST = process.env.OLLAMA_HOST || 'http://127.0.0.1:11434';
const DEFAULT_TIMEOUT = 60000; // 60 seconds default timeout
interface OllamaGenerateResponse {
model: string;
created_at: string;
response: string;
done: boolean;
}
// Helper function to format error messages
const formatError = (error: unknown): string => {
if (error instanceof Error) {
return error.message;
}
return String(error);
};
class OllamaServer {
private server: Server;
constructor() {
this.server = new Server(
{
name: 'ollama-mcp',
version: '0.1.0',
},
{
capabilities: {
tools: {},
},
}
);
this.setupToolHandlers();
// Error handling
this.server.onerror = (error) => console.error('[MCP Error]', error);
process.on('SIGINT', async () => {
await this.server.close();
process.exit(0);
});
}
private setupToolHandlers() {
this.server.setRequestHandler(ListToolsRequestSchema, async () => ({
tools: [
{
name: 'serve',
description: 'Start Ollama server',
inputSchema: {
type: 'object',
properties: {},
additionalProperties: false,
},
},
{
name: 'create',
description: 'Create a model from a Modelfile',
inputSchema: {
type: 'object',
properties: {
name: {
type: 'string',
description: 'Name for the model',
},
modelfile: {
type: 'string',
description: 'Path to Modelfile',
},
},
required: ['name', 'modelfile'],
additionalProperties: false,
},
},
{
name: 'show',
description: 'Show information for a model',
inputSchema: {
type: 'object',
properties: {
name: {
type: 'string',
description: 'Name of the model',
},
},
required: ['name'],
additionalProperties: false,
},
},
{
name: 'run',
description: 'Run a model',
inputSchema: {
type: 'object',
properties: {
name: {
type: 'string',
description: 'Name of the model',
},
prompt: {
type: 'string',
description: 'Prompt to send to the model',
},
timeout: {
type: 'number',
description: 'Timeout in milliseconds (default: 60000)',
minimum: 1000,
},
},
required: ['name', 'prompt'],
additionalProperties: false,
},
},
{
name: 'pull',
description: 'Pull a model from a registry',
inputSchema: {
type: 'object',
properties: {
name: {
type: 'string',
description: 'Name of the model to pull',
},
},
required: ['name'],
additionalProperties: false,
},
},
{
name: 'push',
description: 'Push a model to a registry',
inputSchema: {
type: 'object',
properties: {
name: {
type: 'string',
description: 'Name of the model to push',
},
},
required: ['name'],
additionalProperties: false,
},
},
{
name: 'list',
description: 'List models',
inputSchema: {
type: 'object',
properties: {},
additionalProperties: false,
},
},
{
name: 'cp',
description: 'Copy a model',
inputSchema: {
type: 'object',
properties: {
source: {
type: 'string',
description: 'Source model name',
},
destination: {
type: 'string',
description: 'Destination model name',
},
},
required: ['source', 'destination'],
additionalProperties: false,
},
},
{
name: 'rm',
description: 'Remove a model',
inputSchema: {
type: 'object',
properties: {
name: {
type: 'string',
description: 'Name of the model to remove',
},
},
required: ['name'],
additionalProperties: false,
},
},
{
name: 'chat_completion',
description: 'OpenAI-compatible chat completion API',
inputSchema: {
type: 'object',
properties: {
model: {
type: 'string',
description: 'Name of the Ollama model to use',
},
messages: {
type: 'array',
items: {
type: 'object',
properties: {
role: {
type: 'string',
enum: ['system', 'user', 'assistant'],
},
content: {
type: 'string',
},
},
required: ['role', 'content'],
},
description: 'Array of messages in the conversation',
},
temperature: {
type: 'number',
description: 'Sampling temperature (0-2)',
minimum: 0,
maximum: 2,
},
timeout: {
type: 'number',
description: 'Timeout in milliseconds (default: 60000)',
minimum: 1000,
},
},
required: ['model', 'messages'],
additionalProperties: false,
},
},
],
}));
this.server.setRequestHandler(CallToolRequestSchema, async (request) => {
try {
switch (request.params.name) {
case 'serve':
return await this.handleServe();
case 'create':
return await this.handleCreate(request.params.arguments);
case 'show':
return await this.handleShow(request.params.arguments);
case 'run':
return await this.handleRun(request.params.arguments);
case 'pull':
return await this.handlePull(request.params.arguments);
case 'push':
return await this.handlePush(request.params.arguments);
case 'list':
return await this.handleList();
case 'cp':
return await this.handleCopy(request.params.arguments);
case 'rm':
return await this.handleRemove(request.params.arguments);
case 'chat_completion':
return await this.handleChatCompletion(request.params.arguments);
default:
throw new McpError(
ErrorCode.MethodNotFound,
`Unknown tool: ${request.params.name}`
);
}
} catch (error) {
if (error instanceof McpError) throw error;
throw new McpError(
ErrorCode.InternalError,
`Error executing ${request.params.name}: ${formatError(error)}`
);
}
});
}
private async handleServe() {
try {
const { stdout, stderr } = await execAsync('ollama serve');
return {
content: [
{
type: 'text',
text: stdout || stderr,
},
],
};
} catch (error) {
throw new McpError(ErrorCode.InternalError, `Failed to start Ollama server: ${formatError(error)}`);
}
}
private async handleCreate(args: any) {
try {
const { stdout, stderr } = await execAsync(`ollama create ${args.name} -f ${args.modelfile}`);
return {
content: [
{
type: 'text',
text: stdout || stderr,
},
],
};
} catch (error) {
throw new McpError(ErrorCode.InternalError, `Failed to create model: ${formatError(error)}`);
}
}
private async handleShow(args: any) {
try {
const { stdout, stderr } = await execAsync(`ollama show ${args.name}`);
return {
content: [
{
type: 'text',
text: stdout || stderr,
},
],
};
} catch (error) {
throw new McpError(ErrorCode.InternalError, `Failed to show model info: ${formatError(error)}`);
}
}
private async handleRun(args: any) {
try {
// Use streaming mode with SSE
const response = await axios.post(
`${OLLAMA_HOST}/api/generate`,
{
model: args.name,
prompt: args.prompt,
stream: true,
},
{
timeout: args.timeout || DEFAULT_TIMEOUT,
responseType: 'stream'
}
);
// Create a transform stream to process the SSE events
const transformStream = new TransformStream({
transform(chunk, controller) {
try {
const data = chunk.toString();
const json = JSON.parse(data);
controller.enqueue(json.response);
} catch (error) {
controller.error(new McpError(
ErrorCode.InternalError,
`Error processing stream: ${formatError(error)}`
));
}
}
});
return {
content: [
{
type: 'stream',
stream: response.data.pipeThrough(transformStream),
},
],
};
} catch (error) {
if (axios.isAxiosError(error)) {
throw new McpError(
ErrorCode.InternalError,
`Ollama API error: ${error.response?.data?.error || error.message}`
);
}
throw new McpError(ErrorCode.InternalError, `Failed to run model: ${formatError(error)}`);
}
}
private async handlePull(args: any) {
try {
const { stdout, stderr } = await execAsync(`ollama pull ${args.name}`);
return {
content: [
{
type: 'text',
text: stdout || stderr,
},
],
};
} catch (error) {
throw new McpError(ErrorCode.InternalError, `Failed to pull model: ${formatError(error)}`);
}
}
private async handlePush(args: any) {
try {
const { stdout, stderr } = await execAsync(`ollama push ${args.name}`);
return {
content: [
{
type: 'text',
text: stdout || stderr,
},
],
};
} catch (error) {
throw new McpError(ErrorCode.InternalError, `Failed to push model: ${formatError(error)}`);
}
}
private async handleList() {
try {
const { stdout, stderr } = await execAsync('ollama list');
return {
content: [
{
type: 'text',
text: stdout || stderr,
},
],
};
} catch (error) {
throw new McpError(ErrorCode.InternalError, `Failed to list models: ${formatError(error)}`);
}
}
private async handleCopy(args: any) {
try {
const { stdout, stderr } = await execAsync(`ollama cp ${args.source} ${args.destination}`);
return {
content: [
{
type: 'text',
text: stdout || stderr,
},
],
};
} catch (error) {
throw new McpError(ErrorCode.InternalError, `Failed to copy model: ${formatError(error)}`);
}
}
private async handleRemove(args: any) {
try {
const { stdout, stderr } = await execAsync(`ollama rm ${args.name}`);
return {
content: [
{
type: 'text',
text: stdout || stderr,
},
],
};
} catch (error) {
throw new McpError(ErrorCode.InternalError, `Failed to remove model: ${formatError(error)}`);
}
}
private async handleChatCompletion(args: any) {
try {
// Convert chat messages to a single prompt
const prompt = args.messages
.map((msg: any) => {
switch (msg.role) {
case 'system':
return `System: ${msg.content}\n`;
case 'user':
return `User: ${msg.content}\n`;
case 'assistant':
return `Assistant: ${msg.content}\n`;
default:
return '';
}
})
.join('');
// Make request to Ollama API with configurable timeout and raw mode
const response = await axios.post<OllamaGenerateResponse>(
`${OLLAMA_HOST}/api/generate`,
{
model: args.model,
prompt,
stream: false,
temperature: args.temperature,
raw: true, // Add raw mode for more direct responses
},
{
timeout: args.timeout || DEFAULT_TIMEOUT,
}
);
return {
content: [
{
type: 'text',
text: JSON.stringify({
id: 'chatcmpl-' + Date.now(),
object: 'chat.completion',
created: Math.floor(Date.now() / 1000),
model: args.model,
choices: [
{
index: 0,
message: {
role: 'assistant',
content: response.data.response,
},
finish_reason: 'stop',
},
],
}, null, 2),
},
],
};
} catch (error) {
if (axios.isAxiosError(error)) {
throw new McpError(
ErrorCode.InternalError,
`Ollama API error: ${error.response?.data?.error || error.message}`
);
}
throw new McpError(ErrorCode.InternalError, `Unexpected error: ${formatError(error)}`);
}
}
async run() {
// Create HTTP server for SSE transport
const server = http.createServer();
// Create stdio transport
const stdioTransport = new StdioServerTransport();
// Connect stdio transport
await this.server.connect(stdioTransport);
// Setup SSE endpoint
server.on('request', (req: import('http').IncomingMessage, res: import('http').ServerResponse) => {
if (req.url === '/message') {
const sseTransport = new SSEServerTransport(req.url || '/message', res);
this.server.connect(sseTransport);
}
});
// Start HTTP server
server.listen(0, () => {
const address = server.address() as AddressInfo;
console.error(`Ollama MCP server running on stdio and SSE (http://localhost:${address.port})`);
});
}
}
const server = new OllamaServer();
server.run().catch(console.error);
```