Ollama

ollama/ollama last check 2026-06-19 01:01 UTC 191 releases recent

Notes

site

Release notes

v0.1.35 · 1y+

view on github

New models

Llama 3 ChatQA: A model from NVIDIA based on Llama 3 that excels at conversational question answering (QA) and retrieval-augmented generation (RAG).

What's Changed

Quantization: ollama create can now quantize models when importing them using the --quantize or -q flag:

ollama create -f Modelfile --quantize q4_0 mymodel

> [!NOTE] > --quantize works when importing float16 or float32 models: > * From a binary GGUF files (e.g. FROM ./model.gguf) > * From a library model (e.g. FROM llama3:8b-instruct-fp16)

Fixed issue where inference subprocesses wouldn't be cleaned up on shutdown.
Fixed a series out of memory errors when loading models on multi-GPU systems
<kbd>Ctrl+J</kbd> characters will now properly add newlines in ollama run
Fixed issues when running ollama show for vision models
OPTIONS requests to the Ollama API will no longer result in errors
Fixed issue where partially downloaded files wouldn't be cleaned up
Added a new done_reason field in responses describing why generation stopped responding
Ollama will now more accurately estimate how much memory is available on multi-GPU systems especially when running different models one after another

New Contributors

@fmaclen made their first contribution in https://github.com/ollama/ollama/pull/3884
@Renset made their first contribution in https://github.com/ollama/ollama/pull/3881
@glumia made their first contribution in https://github.com/ollama/ollama/pull/3043
@boessu made their first contribution in https://github.com/ollama/ollama/pull/4236
@gaardhus made their first contribution in https://github.com/ollama/ollama/pull/2307
@svilupp made their first contribution in https://github.com/ollama/ollama/pull/2192
@WolfTheDeveloper made their first contribution in https://github.com/ollama/ollama/pull/4300

Full Changelog: https://github.com/ollama/ollama/compare/v0.1.34...v0.1.35