shadovvsinger/ollama_mcp_guidance # codebase.md

# Directory Structure

```
├── .gitignore
├── .python-version
├── config.json
├── image_utils.py
├── LICENSE
├── ollama_mcp_server.py
├── ollama-api.md
├── pyproject.toml
├── README.md
└── text_utils.py
```

# Files

--------------------------------------------------------------------------------
/.python-version:
--------------------------------------------------------------------------------

```
1 | 3.10
2 | 
```

--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------

```
 1 | # Python
 2 | __pycache__/
 3 | *.py[cod]
 4 | *$py.class
 5 | *.so
 6 | .Python
 7 | build/
 8 | develop-eggs/
 9 | dist/
10 | downloads/
11 | eggs/
12 | .eggs/
13 | lib/
14 | lib64/
15 | parts/
16 | sdist/
17 | var/
18 | wheels/
19 | *.egg-info/
20 | .installed.cfg
21 | *.egg
22 | requirements.txt
23 | 
24 | # Virtual Environment
25 | .env
26 | .venv
27 | env/
28 | venv/
29 | ENV/
30 | 
31 | # IDE
32 | .idea/
33 | .vscode/
34 | *.swp
35 | *.swo
36 | .DS_Store
37 | 
38 | # Project specific
39 | *.log
40 | .coverage
41 | htmlcov/
42 | .pytest_cache/
43 | .ruff_cache/
44 | notes.md
45 | .cursor/
46 | ask-ollama-cli
47 | uv.lock
```

--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------

```toml
 1 | [project]
 2 | name = "Ollama_MCP_Guidance"
 3 | version = "0.1.0"
 4 | description = "A MCP-based Ollama API interaction service providing standardized interfaces and intelligent guidance for LLMs"
 5 | readme = "README.md"
 6 | requires-python = ">=3.10"
 7 | license = {text = "MIT"}
 8 | authors = [
 9 |     {name = "ShadowSinger", email = "[email protected]"},
10 | ]
11 | dependencies = [
12 |     "httpx>=0.28.1",
13 |     "mcp[cli]>=1.3.0",
14 | ]
15 | 
```

--------------------------------------------------------------------------------
/ollama-api.md:
--------------------------------------------------------------------------------

```markdown
   1 | # API
   2 | 
   3 | ## Endpoints
   4 | 
   5 | - [Generate a completion](#generate-a-completion)
   6 | - [Generate a chat completion](#generate-a-chat-completion)
   7 | - [Create a Model](#create-a-model)
   8 | - [List Local Models](#list-local-models)
   9 | - [Show Model Information](#show-model-information)
  10 | - [Copy a Model](#copy-a-model)
  11 | - [Delete a Model](#delete-a-model)
  12 | - [Pull a Model](#pull-a-model)
  13 | - [Push a Model](#push-a-model)
  14 | - [Generate Embeddings](#generate-embeddings)
  15 | - [List Running Models](#list-running-models)
  16 | - [Version](#version)
  17 | 
  18 | ## Conventions
  19 | 
  20 | ### Model names
  21 | 
  22 | Model names follow a `model:tag` format, where `model` can have an optional namespace such as `example/model`. Some examples are `orca-mini:3b-q4_1` and `llama3:70b`. The tag is optional and, if not provided, will default to `latest`. The tag is used to identify a specific version.
  23 | 
  24 | ### Durations
  25 | 
  26 | All durations are returned in nanoseconds.
  27 | 
  28 | ### Streaming responses
  29 | 
  30 | Certain endpoints stream responses as JSON objects. Streaming can be disabled by providing `{"stream": false}` for these endpoints.
  31 | 
  32 | ## Generate a completion
  33 | 
  34 | ```
  35 | POST /api/generate
  36 | ```
  37 | 
  38 | Generate a response for a given prompt with a provided model. This is a streaming endpoint, so there will be a series of responses. The final response object will include statistics and additional data from the request.
  39 | 
  40 | ### Parameters
  41 | 
  42 | - `model`: (required) the [model name](#model-names)
  43 | - `prompt`: the prompt to generate a response for
  44 | - `suffix`: the text after the model response
  45 | - `images`: (optional) a list of base64-encoded images (for multimodal models such as `llava`)
  46 | 
  47 | Advanced parameters (optional):
  48 | 
  49 | - `format`: the format to return a response in. Format can be `json` or a JSON schema
  50 | - `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
  51 | - `system`: system message to (overrides what is defined in the `Modelfile`)
  52 | - `template`: the prompt template to use (overrides what is defined in the `Modelfile`)
  53 | - `stream`: if `false` the response will be returned as a single response object, rather than a stream of objects
  54 | - `raw`: if `true` no formatting will be applied to the prompt. You may choose to use the `raw` parameter if you are specifying a full templated prompt in your request to the API
  55 | - `keep_alive`: controls how long the model will stay loaded into memory following the request (default: `5m`)
  56 | - `context` (deprecated): the context parameter returned from a previous request to `/generate`, this can be used to keep a short conversational memory
  57 | 
  58 | #### Structured outputs
  59 | 
  60 | Structured outputs are supported by providing a JSON schema in the `format` parameter. The model will generate a response that matches the schema. See the [structured outputs](#request-structured-outputs) example below.
  61 | 
  62 | #### JSON mode
  63 | 
  64 | Enable JSON mode by setting the `format` parameter to `json`. This will structure the response as a valid JSON object. See the JSON mode [example](#request-json-mode) below.
  65 | 
  66 | > [!IMPORTANT]
  67 | > It's important to instruct the model to use JSON in the `prompt`. Otherwise, the model may generate large amounts whitespace.
  68 | 
  69 | ### Examples
  70 | 
  71 | #### Generate request (Streaming)
  72 | 
  73 | ##### Request
  74 | 
  75 | ```shell
  76 | curl http://localhost:11434/api/generate -d '{
  77 |   "model": "llama3.2",
  78 |   "prompt": "Why is the sky blue?"
  79 | }'
  80 | ```
  81 | 
  82 | ##### Response
  83 | 
  84 | A stream of JSON objects is returned:
  85 | 
  86 | ```json
  87 | {
  88 |   "model": "llama3.2",
  89 |   "created_at": "2023-08-04T08:52:19.385406455-07:00",
  90 |   "response": "The",
  91 |   "done": false
  92 | }
  93 | ```
  94 | 
  95 | The final response in the stream also includes additional data about the generation:
  96 | 
  97 | - `total_duration`: time spent generating the response
  98 | - `load_duration`: time spent in nanoseconds loading the model
  99 | - `prompt_eval_count`: number of tokens in the prompt
 100 | - `prompt_eval_duration`: time spent in nanoseconds evaluating the prompt
 101 | - `eval_count`: number of tokens in the response
 102 | - `eval_duration`: time in nanoseconds spent generating the response
 103 | - `context`: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory
 104 | - `response`: empty if the response was streamed, if not streamed, this will contain the full response
 105 | 
 106 | To calculate how fast the response is generated in tokens per second (token/s), divide `eval_count` / `eval_duration` * `10^9`.
 107 | 
 108 | ```json
 109 | {
 110 |   "model": "llama3.2",
 111 |   "created_at": "2023-08-04T19:22:45.499127Z",
 112 |   "response": "",
 113 |   "done": true,
 114 |   "context": [1, 2, 3],
 115 |   "total_duration": 10706818083,
 116 |   "load_duration": 6338219291,
 117 |   "prompt_eval_count": 26,
 118 |   "prompt_eval_duration": 130079000,
 119 |   "eval_count": 259,
 120 |   "eval_duration": 4232710000
 121 | }
 122 | ```
 123 | 
 124 | #### Request (No streaming)
 125 | 
 126 | ##### Request
 127 | 
 128 | A response can be received in one reply when streaming is off.
 129 | 
 130 | ```shell
 131 | curl http://localhost:11434/api/generate -d '{
 132 |   "model": "llama3.2",
 133 |   "prompt": "Why is the sky blue?",
 134 |   "stream": false
 135 | }'
 136 | ```
 137 | 
 138 | ##### Response
 139 | 
 140 | If `stream` is set to `false`, the response will be a single JSON object:
 141 | 
 142 | ```json
 143 | {
 144 |   "model": "llama3.2",
 145 |   "created_at": "2023-08-04T19:22:45.499127Z",
 146 |   "response": "The sky is blue because it is the color of the sky.",
 147 |   "done": true,
 148 |   "context": [1, 2, 3],
 149 |   "total_duration": 5043500667,
 150 |   "load_duration": 5025959,
 151 |   "prompt_eval_count": 26,
 152 |   "prompt_eval_duration": 325953000,
 153 |   "eval_count": 290,
 154 |   "eval_duration": 4709213000
 155 | }
 156 | ```
 157 | 
 158 | #### Request (with suffix)
 159 | 
 160 | ##### Request
 161 | 
 162 | ```shell
 163 | curl http://localhost:11434/api/generate -d '{
 164 |   "model": "codellama:code",
 165 |   "prompt": "def compute_gcd(a, b):",
 166 |   "suffix": "    return result",
 167 |   "options": {
 168 |     "temperature": 0
 169 |   },
 170 |   "stream": false
 171 | }'
 172 | ```
 173 | 
 174 | ##### Response
 175 | 
 176 | ```json
 177 | {
 178 |   "model": "codellama:code",
 179 |   "created_at": "2024-07-22T20:47:51.147561Z",
 180 |   "response": "\n  if a == 0:\n    return b\n  else:\n    return compute_gcd(b % a, a)\n\ndef compute_lcm(a, b):\n  result = (a * b) / compute_gcd(a, b)\n",
 181 |   "done": true,
 182 |   "done_reason": "stop",
 183 |   "context": [...],
 184 |   "total_duration": 1162761250,
 185 |   "load_duration": 6683708,
 186 |   "prompt_eval_count": 17,
 187 |   "prompt_eval_duration": 201222000,
 188 |   "eval_count": 63,
 189 |   "eval_duration": 953997000
 190 | }
 191 | ```
 192 | 
 193 | #### Request (Structured outputs)
 194 | 
 195 | ##### Request
 196 | 
 197 | ```shell
 198 | curl -X POST http://localhost:11434/api/generate -H "Content-Type: application/json" -d '{
 199 |   "model": "llama3.1:8b",
 200 |   "prompt": "Ollama is 22 years old and is busy saving the world. Respond using JSON",
 201 |   "stream": false,
 202 |   "format": {
 203 |     "type": "object",
 204 |     "properties": {
 205 |       "age": {
 206 |         "type": "integer"
 207 |       },
 208 |       "available": {
 209 |         "type": "boolean"
 210 |       }
 211 |     },
 212 |     "required": [
 213 |       "age",
 214 |       "available"
 215 |     ]
 216 |   }
 217 | }'
 218 | ```
 219 | 
 220 | ##### Response
 221 | 
 222 | ```json
 223 | {
 224 |   "model": "llama3.1:8b",
 225 |   "created_at": "2024-12-06T00:48:09.983619Z",
 226 |   "response": "{\n  \"age\": 22,\n  \"available\": true\n}",
 227 |   "done": true,
 228 |   "done_reason": "stop",
 229 |   "context": [1, 2, 3],
 230 |   "total_duration": 1075509083,
 231 |   "load_duration": 567678166,
 232 |   "prompt_eval_count": 28,
 233 |   "prompt_eval_duration": 236000000,
 234 |   "eval_count": 16,
 235 |   "eval_duration": 269000000
 236 | }
 237 | ```
 238 | 
 239 | #### Request (JSON mode)
 240 | 
 241 | > [!IMPORTANT]
 242 | > When `format` is set to `json`, the output will always be a well-formed JSON object. It's important to also instruct the model to respond in JSON.
 243 | 
 244 | ##### Request
 245 | 
 246 | ```shell
 247 | curl http://localhost:11434/api/generate -d '{
 248 |   "model": "llama3.2",
 249 |   "prompt": "What color is the sky at different times of the day? Respond using JSON",
 250 |   "format": "json",
 251 |   "stream": false
 252 | }'
 253 | ```
 254 | 
 255 | ##### Response
 256 | 
 257 | ```json
 258 | {
 259 |   "model": "llama3.2",
 260 |   "created_at": "2023-11-09T21:07:55.186497Z",
 261 |   "response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",
 262 |   "done": true,
 263 |   "context": [1, 2, 3],
 264 |   "total_duration": 4648158584,
 265 |   "load_duration": 4071084,
 266 |   "prompt_eval_count": 36,
 267 |   "prompt_eval_duration": 439038000,
 268 |   "eval_count": 180,
 269 |   "eval_duration": 4196918000
 270 | }
 271 | ```
 272 | 
 273 | The value of `response` will be a string containing JSON similar to:
 274 | 
 275 | ```json
 276 | {
 277 |   "morning": {
 278 |     "color": "blue"
 279 |   },
 280 |   "noon": {
 281 |     "color": "blue-gray"
 282 |   },
 283 |   "afternoon": {
 284 |     "color": "warm gray"
 285 |   },
 286 |   "evening": {
 287 |     "color": "orange"
 288 |   }
 289 | }
 290 | ```
 291 | 
 292 | #### Request (with images)
 293 | 
 294 | To submit images to multimodal models such as `llava` or `bakllava`, provide a list of base64-encoded `images`:
 295 | 
 296 | #### Request
 297 | 
 298 | ```shell
 299 | curl http://localhost:11434/api/generate -d '{
 300 |   "model": "llava",
 301 |   "prompt":"What is in this picture?",
 302 |   "stream": false,
 303 |   "images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"]
 304 | }'
 305 | ```
 306 | 
 307 | #### Response
 308 | 
 309 | ```json
 310 | {
 311 |   "model": "llava",
 312 |   "created_at": "2023-11-03T15:36:02.583064Z",
 313 |   "response": "A happy cartoon character, which is cute and cheerful.",
 314 |   "done": true,
 315 |   "context": [1, 2, 3],
 316 |   "total_duration": 2938432250,
 317 |   "load_duration": 2559292,
 318 |   "prompt_eval_count": 1,
 319 |   "prompt_eval_duration": 2195557000,
 320 |   "eval_count": 44,
 321 |   "eval_duration": 736432000
 322 | }
 323 | ```
 324 | 
 325 | #### Request (Raw Mode)
 326 | 
 327 | In some cases, you may wish to bypass the templating system and provide a full prompt. In this case, you can use the `raw` parameter to disable templating. Also note that raw mode will not return a context.
 328 | 
 329 | ##### Request
 330 | 
 331 | ```shell
 332 | curl http://localhost:11434/api/generate -d '{
 333 |   "model": "mistral",
 334 |   "prompt": "[INST] why is the sky blue? [/INST]",
 335 |   "raw": true,
 336 |   "stream": false
 337 | }'
 338 | ```
 339 | 
 340 | #### Request (Reproducible outputs)
 341 | 
 342 | For reproducible outputs, set `seed` to a number:
 343 | 
 344 | ##### Request
 345 | 
 346 | ```shell
 347 | curl http://localhost:11434/api/generate -d '{
 348 |   "model": "mistral",
 349 |   "prompt": "Why is the sky blue?",
 350 |   "options": {
 351 |     "seed": 123
 352 |   }
 353 | }'
 354 | ```
 355 | 
 356 | ##### Response
 357 | 
 358 | ```json
 359 | {
 360 |   "model": "mistral",
 361 |   "created_at": "2023-11-03T15:36:02.583064Z",
 362 |   "response": " The sky appears blue because of a phenomenon called Rayleigh scattering.",
 363 |   "done": true,
 364 |   "total_duration": 8493852375,
 365 |   "load_duration": 6589624375,
 366 |   "prompt_eval_count": 14,
 367 |   "prompt_eval_duration": 119039000,
 368 |   "eval_count": 110,
 369 |   "eval_duration": 1779061000
 370 | }
 371 | ```
 372 | 
 373 | #### Generate request (With options)
 374 | 
 375 | If you want to set custom options for the model at runtime rather than in the Modelfile, you can do so with the `options` parameter. This example sets every available option, but you can set any of them individually and omit the ones you do not want to override.
 376 | 
 377 | ##### Request
 378 | 
 379 | ```shell
 380 | curl http://localhost:11434/api/generate -d '{
 381 |   "model": "llama3.2",
 382 |   "prompt": "Why is the sky blue?",
 383 |   "stream": false,
 384 |   "options": {
 385 |     "num_keep": 5,
 386 |     "seed": 42,
 387 |     "num_predict": 100,
 388 |     "top_k": 20,
 389 |     "top_p": 0.9,
 390 |     "min_p": 0.0,
 391 |     "typical_p": 0.7,
 392 |     "repeat_last_n": 33,
 393 |     "temperature": 0.8,
 394 |     "repeat_penalty": 1.2,
 395 |     "presence_penalty": 1.5,
 396 |     "frequency_penalty": 1.0,
 397 |     "mirostat": 1,
 398 |     "mirostat_tau": 0.8,
 399 |     "mirostat_eta": 0.6,
 400 |     "penalize_newline": true,
 401 |     "stop": ["\n", "user:"],
 402 |     "numa": false,
 403 |     "num_ctx": 1024,
 404 |     "num_batch": 2,
 405 |     "num_gpu": 1,
 406 |     "main_gpu": 0,
 407 |     "low_vram": false,
 408 |     "vocab_only": false,
 409 |     "use_mmap": true,
 410 |     "use_mlock": false,
 411 |     "num_thread": 8
 412 |   }
 413 | }'
 414 | ```
 415 | 
 416 | ##### Response
 417 | 
 418 | ```json
 419 | {
 420 |   "model": "llama3.2",
 421 |   "created_at": "2023-08-04T19:22:45.499127Z",
 422 |   "response": "The sky is blue because it is the color of the sky.",
 423 |   "done": true,
 424 |   "context": [1, 2, 3],
 425 |   "total_duration": 4935886791,
 426 |   "load_duration": 534986708,
 427 |   "prompt_eval_count": 26,
 428 |   "prompt_eval_duration": 107345000,
 429 |   "eval_count": 237,
 430 |   "eval_duration": 4289432000
 431 | }
 432 | ```
 433 | 
 434 | #### Load a model
 435 | 
 436 | If an empty prompt is provided, the model will be loaded into memory.
 437 | 
 438 | ##### Request
 439 | 
 440 | ```shell
 441 | curl http://localhost:11434/api/generate -d '{
 442 |   "model": "llama3.2"
 443 | }'
 444 | ```
 445 | 
 446 | ##### Response
 447 | 
 448 | A single JSON object is returned:
 449 | 
 450 | ```json
 451 | {
 452 |   "model": "llama3.2",
 453 |   "created_at": "2023-12-18T19:52:07.071755Z",
 454 |   "response": "",
 455 |   "done": true
 456 | }
 457 | ```
 458 | 
 459 | #### Unload a model
 460 | 
 461 | If an empty prompt is provided and the `keep_alive` parameter is set to `0`, a model will be unloaded from memory.
 462 | 
 463 | ##### Request
 464 | 
 465 | ```shell
 466 | curl http://localhost:11434/api/generate -d '{
 467 |   "model": "llama3.2",
 468 |   "keep_alive": 0
 469 | }'
 470 | ```
 471 | 
 472 | ##### Response
 473 | 
 474 | A single JSON object is returned:
 475 | 
 476 | ```json
 477 | {
 478 |   "model": "llama3.2",
 479 |   "created_at": "2024-09-12T03:54:03.516566Z",
 480 |   "response": "",
 481 |   "done": true,
 482 |   "done_reason": "unload"
 483 | }
 484 | ```
 485 | 
 486 | ## Generate a chat completion
 487 | 
 488 | ```
 489 | POST /api/chat
 490 | ```
 491 | 
 492 | Generate the next message in a chat with a provided model. This is a streaming endpoint, so there will be a series of responses. Streaming can be disabled using `"stream": false`. The final response object will include statistics and additional data from the request.
 493 | 
 494 | ### Parameters
 495 | 
 496 | - `model`: (required) the [model name](#model-names)
 497 | - `messages`: the messages of the chat, this can be used to keep a chat memory
 498 | - `tools`: list of tools in JSON for the model to use if supported
 499 | 
 500 | The `message` object has the following fields:
 501 | 
 502 | - `role`: the role of the message, either `system`, `user`, `assistant`, or `tool`
 503 | - `content`: the content of the message
 504 | - `images` (optional): a list of images to include in the message (for multimodal models such as `llava`)
 505 | - `tool_calls` (optional): a list of tools in JSON that the model wants to use
 506 | 
 507 | Advanced parameters (optional):
 508 | 
 509 | - `format`: the format to return a response in. Format can be `json` or a JSON schema. 
 510 | - `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
 511 | - `stream`: if `false` the response will be returned as a single response object, rather than a stream of objects
 512 | - `keep_alive`: controls how long the model will stay loaded into memory following the request (default: `5m`)
 513 | 
 514 | ### Structured outputs
 515 | 
 516 | Structured outputs are supported by providing a JSON schema in the `format` parameter. The model will generate a response that matches the schema. See the [Chat request (Structured outputs)](#chat-request-structured-outputs) example below.
 517 | 
 518 | ### Examples
 519 | 
 520 | #### Chat Request (Streaming)
 521 | 
 522 | ##### Request
 523 | 
 524 | Send a chat message with a streaming response.
 525 | 
 526 | ```shell
 527 | curl http://localhost:11434/api/chat -d '{
 528 |   "model": "llama3.2",
 529 |   "messages": [
 530 |     {
 531 |       "role": "user",
 532 |       "content": "why is the sky blue?"
 533 |     }
 534 |   ]
 535 | }'
 536 | ```
 537 | 
 538 | ##### Response
 539 | 
 540 | A stream of JSON objects is returned:
 541 | 
 542 | ```json
 543 | {
 544 |   "model": "llama3.2",
 545 |   "created_at": "2023-08-04T08:52:19.385406455-07:00",
 546 |   "message": {
 547 |     "role": "assistant",
 548 |     "content": "The",
 549 |     "images": null
 550 |   },
 551 |   "done": false
 552 | }
 553 | ```
 554 | 
 555 | Final response:
 556 | 
 557 | ```json
 558 | {
 559 |   "model": "llama3.2",
 560 |   "created_at": "2023-08-04T19:22:45.499127Z",
 561 |   "done": true,
 562 |   "total_duration": 4883583458,
 563 |   "load_duration": 1334875,
 564 |   "prompt_eval_count": 26,
 565 |   "prompt_eval_duration": 342546000,
 566 |   "eval_count": 282,
 567 |   "eval_duration": 4535599000
 568 | }
 569 | ```
 570 | 
 571 | #### Chat request (No streaming)
 572 | 
 573 | ##### Request
 574 | 
 575 | ```shell
 576 | curl http://localhost:11434/api/chat -d '{
 577 |   "model": "llama3.2",
 578 |   "messages": [
 579 |     {
 580 |       "role": "user",
 581 |       "content": "why is the sky blue?"
 582 |     }
 583 |   ],
 584 |   "stream": false
 585 | }'
 586 | ```
 587 | 
 588 | ##### Response
 589 | 
 590 | ```json
 591 | {
 592 |   "model": "llama3.2",
 593 |   "created_at": "2023-12-12T14:13:43.416799Z",
 594 |   "message": {
 595 |     "role": "assistant",
 596 |     "content": "Hello! How are you today?"
 597 |   },
 598 |   "done": true,
 599 |   "total_duration": 5191566416,
 600 |   "load_duration": 2154458,
 601 |   "prompt_eval_count": 26,
 602 |   "prompt_eval_duration": 383809000,
 603 |   "eval_count": 298,
 604 |   "eval_duration": 4799921000
 605 | }
 606 | ```
 607 | 
 608 | #### Chat request (Structured outputs)
 609 | 
 610 | ##### Request
 611 | 
 612 | ```shell
 613 | curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
 614 |   "model": "llama3.1",
 615 |   "messages": [{"role": "user", "content": "Ollama is 22 years old and busy saving the world. Return a JSON object with the age and availability."}],
 616 |   "stream": false,
 617 |   "format": {
 618 |     "type": "object",
 619 |     "properties": {
 620 |       "age": {
 621 |         "type": "integer"
 622 |       },
 623 |       "available": {
 624 |         "type": "boolean"
 625 |       }
 626 |     },
 627 |     "required": [
 628 |       "age",
 629 |       "available"
 630 |     ]
 631 |   },
 632 |   "options": {
 633 |     "temperature": 0
 634 |   }
 635 | }'
 636 | ```
 637 | 
 638 | ##### Response
 639 | 
 640 | ```json
 641 | {
 642 |   "model": "llama3.1",
 643 |   "created_at": "2024-12-06T00:46:58.265747Z",
 644 |   "message": { "role": "assistant", "content": "{\"age\": 22, \"available\": false}" },
 645 |   "done_reason": "stop",
 646 |   "done": true,
 647 |   "total_duration": 2254970291,
 648 |   "load_duration": 574751416,
 649 |   "prompt_eval_count": 34,
 650 |   "prompt_eval_duration": 1502000000,
 651 |   "eval_count": 12,
 652 |   "eval_duration": 175000000
 653 | }
 654 | ```
 655 | 
 656 | #### Chat request (With History)
 657 | 
 658 | Send a chat message with a conversation history. You can use this same approach to start the conversation using multi-shot or chain-of-thought prompting.
 659 | 
 660 | ##### Request
 661 | 
 662 | ```shell
 663 | curl http://localhost:11434/api/chat -d '{
 664 |   "model": "llama3.2",
 665 |   "messages": [
 666 |     {
 667 |       "role": "user",
 668 |       "content": "why is the sky blue?"
 669 |     },
 670 |     {
 671 |       "role": "assistant",
 672 |       "content": "due to rayleigh scattering."
 673 |     },
 674 |     {
 675 |       "role": "user",
 676 |       "content": "how is that different than mie scattering?"
 677 |     }
 678 |   ]
 679 | }'
 680 | ```
 681 | 
 682 | ##### Response
 683 | 
 684 | A stream of JSON objects is returned:
 685 | 
 686 | ```json
 687 | {
 688 |   "model": "llama3.2",
 689 |   "created_at": "2023-08-04T08:52:19.385406455-07:00",
 690 |   "message": {
 691 |     "role": "assistant",
 692 |     "content": "The"
 693 |   },
 694 |   "done": false
 695 | }
 696 | ```
 697 | 
 698 | Final response:
 699 | 
 700 | ```json
 701 | {
 702 |   "model": "llama3.2",
 703 |   "created_at": "2023-08-04T19:22:45.499127Z",
 704 |   "done": true,
 705 |   "total_duration": 8113331500,
 706 |   "load_duration": 6396458,
 707 |   "prompt_eval_count": 61,
 708 |   "prompt_eval_duration": 398801000,
 709 |   "eval_count": 468,
 710 |   "eval_duration": 7701267000
 711 | }
 712 | ```
 713 | 
 714 | #### Chat request (with images)
 715 | 
 716 | ##### Request
 717 | 
 718 | Send a chat message with images. The images should be provided as an array, with the individual images encoded in Base64.
 719 | 
 720 | ```shell
 721 | curl http://localhost:11434/api/chat -d '{
 722 |   "model": "llava",
 723 |   "messages": [
 724 |     {
 725 |       "role": "user",
 726 |       "content": "what is in this image?",
 727 |       "images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"]
 728 |     }
 729 |   ]
 730 | }'
 731 | ```
 732 | 
 733 | ##### Response
 734 | 
 735 | ```json
 736 | {
 737 |   "model": "llava",
 738 |   "created_at": "2023-12-13T22:42:50.203334Z",
 739 |   "message": {
 740 |     "role": "assistant",
 741 |     "content": " The image features a cute, little pig with an angry facial expression. It's wearing a heart on its shirt and is waving in the air. This scene appears to be part of a drawing or sketching project.",
 742 |     "images": null
 743 |   },
 744 |   "done": true,
 745 |   "total_duration": 1668506709,
 746 |   "load_duration": 1986209,
 747 |   "prompt_eval_count": 26,
 748 |   "prompt_eval_duration": 359682000,
 749 |   "eval_count": 83,
 750 |   "eval_duration": 1303285000
 751 | }
 752 | ```
 753 | 
 754 | #### Chat request (Reproducible outputs)
 755 | 
 756 | ##### Request
 757 | 
 758 | ```shell
 759 | curl http://localhost:11434/api/chat -d '{
 760 |   "model": "llama3.2",
 761 |   "messages": [
 762 |     {
 763 |       "role": "user",
 764 |       "content": "Hello!"
 765 |     }
 766 |   ],
 767 |   "options": {
 768 |     "seed": 101,
 769 |     "temperature": 0
 770 |   }
 771 | }'
 772 | ```
 773 | 
 774 | ##### Response
 775 | 
 776 | ```json
 777 | {
 778 |   "model": "llama3.2",
 779 |   "created_at": "2023-12-12T14:13:43.416799Z",
 780 |   "message": {
 781 |     "role": "assistant",
 782 |     "content": "Hello! How are you today?"
 783 |   },
 784 |   "done": true,
 785 |   "total_duration": 5191566416,
 786 |   "load_duration": 2154458,
 787 |   "prompt_eval_count": 26,
 788 |   "prompt_eval_duration": 383809000,
 789 |   "eval_count": 298,
 790 |   "eval_duration": 4799921000
 791 | }
 792 | ```
 793 | 
 794 | #### Chat request (with tools)
 795 | 
 796 | ##### Request
 797 | 
 798 | ```shell
 799 | curl http://localhost:11434/api/chat -d '{
 800 |   "model": "llama3.2",
 801 |   "messages": [
 802 |     {
 803 |       "role": "user",
 804 |       "content": "What is the weather today in Paris?"
 805 |     }
 806 |   ],
 807 |   "stream": false,
 808 |   "tools": [
 809 |     {
 810 |       "type": "function",
 811 |       "function": {
 812 |         "name": "get_current_weather",
 813 |         "description": "Get the current weather for a location",
 814 |         "parameters": {
 815 |           "type": "object",
 816 |           "properties": {
 817 |             "location": {
 818 |               "type": "string",
 819 |               "description": "The location to get the weather for, e.g. San Francisco, CA"
 820 |             },
 821 |             "format": {
 822 |               "type": "string",
 823 |               "description": "The format to return the weather in, e.g. 'celsius' or 'fahrenheit'",
 824 |               "enum": ["celsius", "fahrenheit"]
 825 |             }
 826 |           },
 827 |           "required": ["location", "format"]
 828 |         }
 829 |       }
 830 |     }
 831 |   ]
 832 | }'
 833 | ```
 834 | 
 835 | ##### Response
 836 | 
 837 | ```json
 838 | {
 839 |   "model": "llama3.2",
 840 |   "created_at": "2024-07-22T20:33:28.123648Z",
 841 |   "message": {
 842 |     "role": "assistant",
 843 |     "content": "",
 844 |     "tool_calls": [
 845 |       {
 846 |         "function": {
 847 |           "name": "get_current_weather",
 848 |           "arguments": {
 849 |             "format": "celsius",
 850 |             "location": "Paris, FR"
 851 |           }
 852 |         }
 853 |       }
 854 |     ]
 855 |   },
 856 |   "done_reason": "stop",
 857 |   "done": true,
 858 |   "total_duration": 885095291,
 859 |   "load_duration": 3753500,
 860 |   "prompt_eval_count": 122,
 861 |   "prompt_eval_duration": 328493000,
 862 |   "eval_count": 33,
 863 |   "eval_duration": 552222000
 864 | }
 865 | ```
 866 | 
 867 | #### Load a model
 868 | 
 869 | If the messages array is empty, the model will be loaded into memory.
 870 | 
 871 | ##### Request
 872 | 
 873 | ```shell
 874 | curl http://localhost:11434/api/chat -d '{
 875 |   "model": "llama3.2",
 876 |   "messages": []
 877 | }'
 878 | ```
 879 | 
 880 | ##### Response
 881 | 
 882 | ```json
 883 | {
 884 |   "model": "llama3.2",
 885 |   "created_at":"2024-09-12T21:17:29.110811Z",
 886 |   "message": {
 887 |     "role": "assistant",
 888 |     "content": ""
 889 |   },
 890 |   "done_reason": "load",
 891 |   "done": true
 892 | }
 893 | ```
 894 | 
 895 | #### Unload a model
 896 | 
 897 | If the messages array is empty and the `keep_alive` parameter is set to `0`, a model will be unloaded from memory.
 898 | 
 899 | ##### Request
 900 | 
 901 | ```shell
 902 | curl http://localhost:11434/api/chat -d '{
 903 |   "model": "llama3.2",
 904 |   "messages": [],
 905 |   "keep_alive": 0
 906 | }'
 907 | ```
 908 | 
 909 | ##### Response
 910 | 
 911 | A single JSON object is returned:
 912 | 
 913 | ```json
 914 | {
 915 |   "model": "llama3.2",
 916 |   "created_at":"2024-09-12T21:33:17.547535Z",
 917 |   "message": {
 918 |     "role": "assistant",
 919 |     "content": ""
 920 |   },
 921 |   "done_reason": "unload",
 922 |   "done": true
 923 | }
 924 | ```
 925 | 
 926 | ## Create a Model
 927 | 
 928 | ```
 929 | POST /api/create
 930 | ```
 931 | 
 932 | Create a model from:
 933 |  * another model;
 934 |  * a safetensors directory; or
 935 |  * a GGUF file.
 936 | 
 937 | If you are creating a model from a safetensors directory or from a GGUF file, you must [create a blob](#create-a-blob) for each of the files and then use the file name and SHA256 digest associated with each blob in the `files` field.
 938 | 
 939 | ### Parameters
 940 | 
 941 | - `model`: name of the model to create
 942 | - `from`: (optional) name of an existing model to create the new model from
 943 | - `files`: (optional) a dictionary of file names to SHA256 digests of blobs to create the model from
 944 | - `adapters`: (optional) a dictionary of file names to SHA256 digests of blobs for LORA adapters
 945 | - `template`: (optional) the prompt template for the model
 946 | - `license`: (optional) a string or list of strings containing the license or licenses for the model
 947 | - `system`: (optional) a string containing the system prompt for the model
 948 | - `parameters`: (optional) a dictionary of parameters for the model (see [Modelfile](./modelfile.md#valid-parameters-and-values) for a list of parameters)
 949 | - `messages`: (optional) a list of message objects used to create a conversation
 950 | - `stream`: (optional) if `false` the response will be returned as a single response object, rather than a stream of objects
 951 | - `quantize` (optional): quantize a non-quantized (e.g. float16) model
 952 | 
 953 | #### Quantization types
 954 | 
 955 | | Type | Recommended |
 956 | | --- | :-: |
 957 | | q2_K | |
 958 | | q3_K_L | |
 959 | | q3_K_M | |
 960 | | q3_K_S | |
 961 | | q4_0 | |
 962 | | q4_1 | |
 963 | | q4_K_M | * |
 964 | | q4_K_S | |
 965 | | q5_0 | |
 966 | | q5_1 | |
 967 | | q5_K_M | |
 968 | | q5_K_S | |
 969 | | q6_K | |
 970 | | q8_0 | * |
 971 | 
 972 | ### Examples
 973 | 
 974 | #### Create a new model
 975 | 
 976 | Create a new model from an existing model.
 977 | 
 978 | ##### Request
 979 | 
 980 | ```shell
 981 | curl http://localhost:11434/api/create -d '{
 982 |   "model": "mario",
 983 |   "from": "llama3.2",
 984 |   "system": "You are Mario from Super Mario Bros."
 985 | }'
 986 | ```
 987 | 
 988 | ##### Response
 989 | 
 990 | A stream of JSON objects is returned:
 991 | 
 992 | ```json
 993 | {"status":"reading model metadata"}
 994 | {"status":"creating system layer"}
 995 | {"status":"using already created layer sha256:22f7f8ef5f4c791c1b03d7eb414399294764d7cc82c7e94aa81a1feb80a983a2"}
 996 | {"status":"using already created layer sha256:8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b"}
 997 | {"status":"using already created layer sha256:7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d"}
 998 | {"status":"using already created layer sha256:2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988"}
 999 | {"status":"using already created layer sha256:2759286baa875dc22de5394b4a925701b1896a7e3f8e53275c36f75a877a82c9"}
1000 | {"status":"writing layer sha256:df30045fe90f0d750db82a058109cecd6d4de9c90a3d75b19c09e5f64580bb42"}
1001 | {"status":"writing layer sha256:f18a68eb09bf925bb1b669490407c1b1251c5db98dc4d3d81f3088498ea55690"}
1002 | {"status":"writing manifest"}
1003 | {"status":"success"}
1004 | ```
1005 | 
1006 | #### Quantize a model
1007 | 
1008 | Quantize a non-quantized model.
1009 | 
1010 | ##### Request
1011 | 
1012 | ```shell
1013 | curl http://localhost:11434/api/create -d '{
1014 |   "model": "llama3.1:quantized",
1015 |   "from": "llama3.1:8b-instruct-fp16",
1016 |   "quantize": "q4_K_M"
1017 | }'
1018 | ```
1019 | 
1020 | ##### Response
1021 | 
1022 | A stream of JSON objects is returned:
1023 | 
1024 | ```json
1025 | {"status":"quantizing F16 model to Q4_K_M"}
1026 | {"status":"creating new layer sha256:667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29"}
1027 | {"status":"using existing layer sha256:11ce4ee3e170f6adebac9a991c22e22ab3f8530e154ee669954c4bc73061c258"}
1028 | {"status":"using existing layer sha256:0ba8f0e314b4264dfd19df045cde9d4c394a52474bf92ed6a3de22a4ca31a177"}
1029 | {"status":"using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb"}
1030 | {"status":"creating new layer sha256:455f34728c9b5dd3376378bfb809ee166c145b0b4c1f1a6feca069055066ef9a"}
1031 | {"status":"writing manifest"}
1032 | {"status":"success"}
1033 | ```
1034 | 
1035 | #### Create a model from GGUF
1036 | 
1037 | Create a model from a GGUF file. The `files` parameter should be filled out with the file name and SHA256 digest of the GGUF file you wish to use. Use [/api/blobs/:digest](#push-a-blob) to push the GGUF file to the server before calling this API.
1038 | 
1039 | 
1040 | ##### Request
1041 | 
1042 | ```shell
1043 | curl http://localhost:11434/api/create -d '{
1044 |   "model": "my-gguf-model",
1045 |   "files": {
1046 |     "test.gguf": "sha256:432f310a77f4650a88d0fd59ecdd7cebed8d684bafea53cbff0473542964f0c3"
1047 |   }
1048 | }'
1049 | ```
1050 | 
1051 | ##### Response
1052 | 
1053 | A stream of JSON objects is returned:
1054 | 
1055 | ```json
1056 | {"status":"parsing GGUF"}
1057 | {"status":"using existing layer sha256:432f310a77f4650a88d0fd59ecdd7cebed8d684bafea53cbff0473542964f0c3"}
1058 | {"status":"writing manifest"}
1059 | {"status":"success"}
1060 | ```
1061 | 
1062 | 
1063 | #### Create a model from a Safetensors directory
1064 | 
1065 | The `files` parameter should include a dictionary of files for the safetensors model which includes the file names and SHA256 digest of each file. Use [/api/blobs/:digest](#push-a-blob) to first push each of the files to the server before calling this API. Files will remain in the cache until the Ollama server is restarted.
1066 | 
1067 | ##### Request
1068 | 
1069 | ```shell
1070 | curl http://localhost:11434/api/create -d '{
1071 |   "model": "fred",
1072 |   "files": {
1073 |     "config.json": "sha256:dd3443e529fb2290423a0c65c2d633e67b419d273f170259e27297219828e389",
1074 |     "generation_config.json": "sha256:88effbb63300dbbc7390143fbbdd9d9fa50587b37e8bfd16c8c90d4970a74a36",
1075 |     "special_tokens_map.json": "sha256:b7455f0e8f00539108837bfa586c4fbf424e31f8717819a6798be74bef813d05",
1076 |     "tokenizer.json": "sha256:bbc1904d35169c542dffbe1f7589a5994ec7426d9e5b609d07bab876f32e97ab",
1077 |     "tokenizer_config.json": "sha256:24e8a6dc2547164b7002e3125f10b415105644fcf02bf9ad8b674c87b1eaaed6",
1078 |     "model.safetensors": "sha256:1ff795ff6a07e6a68085d206fb84417da2f083f68391c2843cd2b8ac6df8538f"
1079 |   }
1080 | }'
1081 | ```
1082 | 
1083 | ##### Response
1084 | 
1085 | A stream of JSON objects is returned:
1086 | 
1087 | ```shell
1088 | {"status":"converting model"}
1089 | {"status":"creating new layer sha256:05ca5b813af4a53d2c2922933936e398958855c44ee534858fcfd830940618b6"}
1090 | {"status":"using autodetected template llama3-instruct"}
1091 | {"status":"using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb"}
1092 | {"status":"writing manifest"}
1093 | {"status":"success"}
1094 | ```
1095 | 
1096 | ## Check if a Blob Exists
1097 | 
1098 | ```shell
1099 | HEAD /api/blobs/:digest
1100 | ```
1101 | 
1102 | Ensures that the file blob (Binary Large Object) used with create a model exists on the server. This checks your Ollama server and not ollama.com.
1103 | 
1104 | ### Query Parameters
1105 | 
1106 | - `digest`: the SHA256 digest of the blob
1107 | 
1108 | ### Examples
1109 | 
1110 | #### Request
1111 | 
1112 | ```shell
1113 | curl -I http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2
1114 | ```
1115 | 
1116 | #### Response
1117 | 
1118 | Return 200 OK if the blob exists, 404 Not Found if it does not.
1119 | 
1120 | ## Push a Blob
1121 | 
1122 | ```
1123 | POST /api/blobs/:digest
1124 | ```
1125 | 
1126 | Push a file to the Ollama server to create a "blob" (Binary Large Object).
1127 | 
1128 | ### Query Parameters
1129 | 
1130 | - `digest`: the expected SHA256 digest of the file
1131 | 
1132 | ### Examples
1133 | 
1134 | #### Request
1135 | 
1136 | ```shell
1137 | curl -T model.gguf -X POST http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2
1138 | ```
1139 | 
1140 | #### Response
1141 | 
1142 | Return 201 Created if the blob was successfully created, 400 Bad Request if the digest used is not expected.
1143 | 
1144 | ## List Local Models
1145 | 
1146 | ```
1147 | GET /api/tags
1148 | ```
1149 | 
1150 | List models that are available locally.
1151 | 
1152 | ### Examples
1153 | 
1154 | #### Request
1155 | 
1156 | ```shell
1157 | curl http://localhost:11434/api/tags
1158 | ```
1159 | 
1160 | #### Response
1161 | 
1162 | A single JSON object will be returned.
1163 | 
1164 | ```json
1165 | {
1166 |   "models": [
1167 |     {
1168 |       "name": "codellama:13b",
1169 |       "modified_at": "2023-11-04T14:56:49.277302595-07:00",
1170 |       "size": 7365960935,
1171 |       "digest": "9f438cb9cd581fc025612d27f7c1a6669ff83a8bb0ed86c94fcf4c5440555697",
1172 |       "details": {
1173 |         "format": "gguf",
1174 |         "family": "llama",
1175 |         "families": null,
1176 |         "parameter_size": "13B",
1177 |         "quantization_level": "Q4_0"
1178 |       }
1179 |     },
1180 |     {
1181 |       "name": "llama3:latest",
1182 |       "modified_at": "2023-12-07T09:32:18.757212583-08:00",
1183 |       "size": 3825819519,
1184 |       "digest": "fe938a131f40e6f6d40083c9f0f430a515233eb2edaa6d72eb85c50d64f2300e",
1185 |       "details": {
1186 |         "format": "gguf",
1187 |         "family": "llama",
1188 |         "families": null,
1189 |         "parameter_size": "7B",
1190 |         "quantization_level": "Q4_0"
1191 |       }
1192 |     }
1193 |   ]
1194 | }
1195 | ```
1196 | 
1197 | ## Show Model Information
1198 | 
1199 | ```
1200 | POST /api/show
1201 | ```
1202 | 
1203 | Show information about a model including details, modelfile, template, parameters, license, system prompt.
1204 | 
1205 | ### Parameters
1206 | 
1207 | - `model`: name of the model to show
1208 | - `verbose`: (optional) if set to `true`, returns full data for verbose response fields
1209 | 
1210 | ### Examples
1211 | 
1212 | #### Request
1213 | 
1214 | ```shell
1215 | curl http://localhost:11434/api/show -d '{
1216 |   "model": "llama3.2"
1217 | }'
1218 | ```
1219 | 
1220 | #### Response
1221 | 
1222 | ```json
1223 | {
1224 |   "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM llava:latest\n\nFROM /Users/matt/.ollama/models/blobs/sha256:200765e1283640ffbd013184bf496e261032fa75b99498a9613be4e94d63ad52\nTEMPLATE \"\"\"{{ .System }}\nUSER: {{ .Prompt }}\nASSISTANT: \"\"\"\nPARAMETER num_ctx 4096\nPARAMETER stop \"\u003c/s\u003e\"\nPARAMETER stop \"USER:\"\nPARAMETER stop \"ASSISTANT:\"",
1225 |   "parameters": "num_keep                       24\nstop                           \"<|start_header_id|>\"\nstop                           \"<|end_header_id|>\"\nstop                           \"<|eot_id|>\"",
1226 |   "template": "{{ if .System }}<|start_header_id|>system<|end_header_id|>\n\n{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>\n\n{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>\n\n{{ .Response }}<|eot_id|>",
1227 |   "details": {
1228 |     "parent_model": "",
1229 |     "format": "gguf",
1230 |     "family": "llama",
1231 |     "families": [
1232 |       "llama"
1233 |     ],
1234 |     "parameter_size": "8.0B",
1235 |     "quantization_level": "Q4_0"
1236 |   },
1237 |   "model_info": {
1238 |     "general.architecture": "llama",
1239 |     "general.file_type": 2,
1240 |     "general.parameter_count": 8030261248,
1241 |     "general.quantization_version": 2,
1242 |     "llama.attention.head_count": 32,
1243 |     "llama.attention.head_count_kv": 8,
1244 |     "llama.attention.layer_norm_rms_epsilon": 0.00001,
1245 |     "llama.block_count": 32,
1246 |     "llama.context_length": 8192,
1247 |     "llama.embedding_length": 4096,
1248 |     "llama.feed_forward_length": 14336,
1249 |     "llama.rope.dimension_count": 128,
1250 |     "llama.rope.freq_base": 500000,
1251 |     "llama.vocab_size": 128256,
1252 |     "tokenizer.ggml.bos_token_id": 128000,
1253 |     "tokenizer.ggml.eos_token_id": 128009,
1254 |     "tokenizer.ggml.merges": [],            // populates if `verbose=true`
1255 |     "tokenizer.ggml.model": "gpt2",
1256 |     "tokenizer.ggml.pre": "llama-bpe",
1257 |     "tokenizer.ggml.token_type": [],        // populates if `verbose=true`
1258 |     "tokenizer.ggml.tokens": []             // populates if `verbose=true`
1259 |   }
1260 | }
1261 | ```
1262 | 
1263 | ## Copy a Model
1264 | 
1265 | ```
1266 | POST /api/copy
1267 | ```
1268 | 
1269 | Copy a model. Creates a model with another name from an existing model.
1270 | 
1271 | ### Examples
1272 | 
1273 | #### Request
1274 | 
1275 | ```shell
1276 | curl http://localhost:11434/api/copy -d '{
1277 |   "source": "llama3.2",
1278 |   "destination": "llama3-backup"
1279 | }'
1280 | ```
1281 | 
1282 | #### Response
1283 | 
1284 | Returns a 200 OK if successful, or a 404 Not Found if the source model doesn't exist.
1285 | 
1286 | ## Delete a Model
1287 | 
1288 | ```
1289 | DELETE /api/delete
1290 | ```
1291 | 
1292 | Delete a model and its data.
1293 | 
1294 | ### Parameters
1295 | 
1296 | - `model`: model name to delete
1297 | 
1298 | ### Examples
1299 | 
1300 | #### Request
1301 | 
1302 | ```shell
1303 | curl -X DELETE http://localhost:11434/api/delete -d '{
1304 |   "model": "llama3:13b"
1305 | }'
1306 | ```
1307 | 
1308 | #### Response
1309 | 
1310 | Returns a 200 OK if successful, 404 Not Found if the model to be deleted doesn't exist.
1311 | 
1312 | ## Pull a Model
1313 | 
1314 | ```
1315 | POST /api/pull
1316 | ```
1317 | 
1318 | Download a model from the ollama library. Cancelled pulls are resumed from where they left off, and multiple calls will share the same download progress.
1319 | 
1320 | ### Parameters
1321 | 
1322 | - `model`: name of the model to pull
1323 | - `insecure`: (optional) allow insecure connections to the library. Only use this if you are pulling from your own library during development.
1324 | - `stream`: (optional) if `false` the response will be returned as a single response object, rather than a stream of objects
1325 | 
1326 | ### Examples
1327 | 
1328 | #### Request
1329 | 
1330 | ```shell
1331 | curl http://localhost:11434/api/pull -d '{
1332 |   "model": "llama3.2"
1333 | }'
1334 | ```
1335 | 
1336 | #### Response
1337 | 
1338 | If `stream` is not specified, or set to `true`, a stream of JSON objects is returned:
1339 | 
1340 | The first object is the manifest:
1341 | 
1342 | ```json
1343 | {
1344 |   "status": "pulling manifest"
1345 | }
1346 | ```
1347 | 
1348 | Then there is a series of downloading responses. Until any of the download is completed, the `completed` key may not be included. The number of files to be downloaded depends on the number of layers specified in the manifest.
1349 | 
1350 | ```json
1351 | {
1352 |   "status": "downloading digestname",
1353 |   "digest": "digestname",
1354 |   "total": 2142590208,
1355 |   "completed": 241970
1356 | }
1357 | ```
1358 | 
1359 | After all the files are downloaded, the final responses are:
1360 | 
1361 | ```json
1362 | {
1363 |     "status": "verifying sha256 digest"
1364 | }
1365 | {
1366 |     "status": "writing manifest"
1367 | }
1368 | {
1369 |     "status": "removing any unused layers"
1370 | }
1371 | {
1372 |     "status": "success"
1373 | }
1374 | ```
1375 | 
1376 | if `stream` is set to false, then the response is a single JSON object:
1377 | 
1378 | ```json
1379 | {
1380 |   "status": "success"
1381 | }
1382 | ```
1383 | 
1384 | ## Push a Model
1385 | 
1386 | ```
1387 | POST /api/push
1388 | ```
1389 | 
1390 | Upload a model to a model library. Requires registering for ollama.ai and adding a public key first.
1391 | 
1392 | ### Parameters
1393 | 
1394 | - `model`: name of the model to push in the form of `<namespace>/<model>:<tag>`
1395 | - `insecure`: (optional) allow insecure connections to the library. Only use this if you are pushing to your library during development.
1396 | - `stream`: (optional) if `false` the response will be returned as a single response object, rather than a stream of objects
1397 | 
1398 | ### Examples
1399 | 
1400 | #### Request
1401 | 
1402 | ```shell
1403 | curl http://localhost:11434/api/push -d '{
1404 |   "model": "mattw/pygmalion:latest"
1405 | }'
1406 | ```
1407 | 
1408 | #### Response
1409 | 
1410 | If `stream` is not specified, or set to `true`, a stream of JSON objects is returned:
1411 | 
1412 | ```json
1413 | { "status": "retrieving manifest" }
1414 | ```
1415 | 
1416 | and then:
1417 | 
1418 | ```json
1419 | {
1420 |   "status": "starting upload",
1421 |   "digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
1422 |   "total": 1928429856
1423 | }
1424 | ```
1425 | 
1426 | Then there is a series of uploading responses:
1427 | 
1428 | ```json
1429 | {
1430 |   "status": "starting upload",
1431 |   "digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
1432 |   "total": 1928429856
1433 | }
1434 | ```
1435 | 
1436 | Finally, when the upload is complete:
1437 | 
1438 | ```json
1439 | {"status":"pushing manifest"}
1440 | {"status":"success"}
1441 | ```
1442 | 
1443 | If `stream` is set to `false`, then the response is a single JSON object:
1444 | 
1445 | ```json
1446 | { "status": "success" }
1447 | ```
1448 | 
1449 | ## Generate Embeddings
1450 | 
1451 | ```
1452 | POST /api/embed
1453 | ```
1454 | 
1455 | Generate embeddings from a model
1456 | 
1457 | ### Parameters
1458 | 
1459 | - `model`: name of model to generate embeddings from
1460 | - `input`: text or list of text to generate embeddings for
1461 | 
1462 | Advanced parameters:
1463 | 
1464 | - `truncate`: truncates the end of each input to fit within context length. Returns error if `false` and context length is exceeded. Defaults to `true`
1465 | - `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
1466 | - `keep_alive`: controls how long the model will stay loaded into memory following the request (default: `5m`)
1467 | 
1468 | ### Examples
1469 | 
1470 | #### Request
1471 | 
1472 | ```shell
1473 | curl http://localhost:11434/api/embed -d '{
1474 |   "model": "all-minilm",
1475 |   "input": "Why is the sky blue?"
1476 | }'
1477 | ```
1478 | 
1479 | #### Response
1480 | 
1481 | ```json
1482 | {
1483 |   "model": "all-minilm",
1484 |   "embeddings": [[
1485 |     0.010071029, -0.0017594862, 0.05007221, 0.04692972, 0.054916814,
1486 |     0.008599704, 0.105441414, -0.025878139, 0.12958129, 0.031952348
1487 |   ]],
1488 |   "total_duration": 14143917,
1489 |   "load_duration": 1019500,
1490 |   "prompt_eval_count": 8
1491 | }
1492 | ```
1493 | 
1494 | #### Request (Multiple input)
1495 | 
1496 | ```shell
1497 | curl http://localhost:11434/api/embed -d '{
1498 |   "model": "all-minilm",
1499 |   "input": ["Why is the sky blue?", "Why is the grass green?"]
1500 | }'
1501 | ```
1502 | 
1503 | #### Response
1504 | 
1505 | ```json
1506 | {
1507 |   "model": "all-minilm",
1508 |   "embeddings": [[
1509 |     0.010071029, -0.0017594862, 0.05007221, 0.04692972, 0.054916814,
1510 |     0.008599704, 0.105441414, -0.025878139, 0.12958129, 0.031952348
1511 |   ],[
1512 |     -0.0098027075, 0.06042469, 0.025257962, -0.006364387, 0.07272725,
1513 |     0.017194884, 0.09032035, -0.051705178, 0.09951512, 0.09072481
1514 |   ]]
1515 | }
1516 | ```
1517 | 
1518 | ## List Running Models
1519 | ```
1520 | GET /api/ps
1521 | ```
1522 | 
1523 | List models that are currently loaded into memory.
1524 | 
1525 | #### Examples
1526 | 
1527 | ### Request
1528 | 
1529 | ```shell
1530 | curl http://localhost:11434/api/ps
1531 | ```
1532 | 
1533 | #### Response
1534 | 
1535 | A single JSON object will be returned.
1536 | 
1537 | ```json
1538 | {
1539 |   "models": [
1540 |     {
1541 |       "name": "mistral:latest",
1542 |       "model": "mistral:latest",
1543 |       "size": 5137025024,
1544 |       "digest": "2ae6f6dd7a3dd734790bbbf58b8909a606e0e7e97e94b7604e0aa7ae4490e6d8",
1545 |       "details": {
1546 |         "parent_model": "",
1547 |         "format": "gguf",
1548 |         "family": "llama",
1549 |         "families": [
1550 |           "llama"
1551 |         ],
1552 |         "parameter_size": "7.2B",
1553 |         "quantization_level": "Q4_0"
1554 |       },
1555 |       "expires_at": "2024-06-04T14:38:31.83753-07:00",
1556 |       "size_vram": 5137025024
1557 |     }
1558 |   ]
1559 | }
1560 | ```
1561 | 
1562 | ## Generate Embedding
1563 | 
1564 | > Note: this endpoint has been superseded by `/api/embed`
1565 | 
1566 | ```
1567 | POST /api/embeddings
1568 | ```
1569 | 
1570 | Generate embeddings from a model
1571 | 
1572 | ### Parameters
1573 | 
1574 | - `model`: name of model to generate embeddings from
1575 | - `prompt`: text to generate embeddings for
1576 | 
1577 | Advanced parameters:
1578 | 
1579 | - `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
1580 | - `keep_alive`: controls how long the model will stay loaded into memory following the request (default: `5m`)
1581 | 
1582 | ### Examples
1583 | 
1584 | #### Request
1585 | 
1586 | ```shell
1587 | curl http://localhost:11434/api/embeddings -d '{
1588 |   "model": "all-minilm",
1589 |   "prompt": "Here is an article about llamas..."
1590 | }'
1591 | ```
1592 | 
1593 | #### Response
1594 | 
1595 | ```json
1596 | {
1597 |   "embedding": [
1598 |     0.5670403838157654, 0.009260174818336964, 0.23178744316101074, -0.2916173040866852, -0.8924556970596313,
1599 |     0.8785552978515625, -0.34576427936553955, 0.5742510557174683, -0.04222835972905159, -0.137906014919281
1600 |   ]
1601 | }
1602 | ```
1603 | 
1604 | ## Version
1605 | 
1606 | ```
1607 | GET /api/version
1608 | ```
1609 | 
1610 | Retrieve the Ollama version
1611 | 
1612 | ### Examples
1613 | 
1614 | #### Request
1615 | 
1616 | ```shell
1617 | curl http://localhost:11434/api/version
1618 | ```
1619 | 
1620 | #### Response
1621 | 
1622 | ```json
1623 | {
1624 |   "version": "0.5.1"
1625 | }
1626 | ```
1627 | 
1628 | 
1629 | 
```