Ollama

ollama/ollama last check 2026-06-19 01:01 UTC 191 releases recent

Notes

site

Release notes

v0.1.21 · 1y+

view on github

New models

Qwen: Qwen is a series of large language models by Alibaba Cloud spanning from 1.8B to 72B parameters.
DuckDB-NSQL: A text-to-sql LLM for DuckDB
Stable Code: A new code completion model on par with Code Llama 7B and similar models.
Nous Hermes 2 Mixtral: The Nous Hermes 2 model from Nous Research, now trained over Mixtral.

Saving and loading models and messages

Models can now be saved and loaded with /save <model> and /load <model> when using ollama run. This will save or load conversations and any model changes with /set parameter, /set system and more as a new model with the provided name.

`MESSAGE` modelfile command

Messages can now be specified in a Modelfile ahead of time using the MESSAGE command:

# example Modelfile
FROM llama2
SYSTEM You are a friendly assistant that only answers with &#39;yes&#39; or &#39;no&#39;
MESSAGE user Is Toronto in Canada?
MESSAGE assistant yes
MESSAGE user Is Sacramento in Canada?
MESSAGE assistant no
MESSAGE user Is Ontario in Canada?
MESSAGE assistant yes

After creating this model, running it will restore the message history. This is useful for techniques such as Chain-Of-Thought prompting

ollama create -f Modelfile yesno
ollama run yesno
&gt;&gt;&gt; Is Toronto in Canada?
yes

&gt;&gt;&gt; Is Sacramento in Canada?
no

&gt;&gt;&gt; Is Ontario in Canada?
yes

&gt;&gt;&gt; Is Havana in Canada?
no

Python and Javascript libraries

The first versions of the Python and JavaScript libraries for Ollama are now available.

Intel & AMD CPU improvements

Ollama now supports CPUs without AVX. This means Ollama will now run on older CPUs and in environments (such as virtual machines, Rosetta, GitHub actions) that don't provide support for AVX instructions. For newer CPUs that support AVX2, Ollama will receive a small performance boost, running models about 10% faster.

What's Changed

Support for a much broader set of CPUs, including CPUs without AVX instruction set support
If a GPU detection error is hit when attempting to run a model, Ollama will fallback to CPU
Fixed issue where generating responses with the same prompt would hang after around 20 requests
New MESSAGE Modelfile command to set the conversation history when building a model
Ollama will now use AVX2 for faster performance if available
Improved detection of Nvidia GPUs, especially in WSL
Fixed issue where models with LoRA layers may not load
Fixed incorrect error that would occur when retrying network connections in ollama pull and ollama push
Fixed issue where /show parameter would round decimal numbers
Fixed issue where upon hitting the context window limit, requests would hang

New Contributors

@fpreiss made their first contribution in https://github.com/jmorganca/ollama/pull/1921
@eavanvalkenburg made their first contribution in https://github.com/jmorganca/ollama/pull/1931
@0atman made their first contribution in https://github.com/jmorganca/ollama/pull/1924
@sachinsachdeva made their first contribution in https://github.com/jmorganca/ollama/pull/2021
@Arrendy made their first contribution in https://github.com/jmorganca/ollama/pull/2016
@purificant made their first contribution in https://github.com/jmorganca/ollama/pull/1958
@lainedfles made their first contribution in https://github.com/jmorganca/ollama/pull/1999

Full Changelog: https://github.com/jmorganca/ollama/compare/v0.1.20...v0.1.21