Ollama

ollama/ollama last check 2026-06-19 01:01 UTC 191 releases recent

Notes

site

Release notes

v0.1.19 · 1y+

view on github

This release focuses on performance and fixing a number issues and crashes relating to memory allocation.

New Models

LLaMa-Pro: An expansion of LLaMa by Tencent to an 8B that specializes in language, programming and mathematics.

What's Changed

Fixed "out of memory" errors when running models such as llama2, mixtral or llama2:13b with limited GPU memory
Fixed CUDA errors when running on older GPUs that aren't yet supported
Increasing context size with num_ctx will now work (up to a model's supported context window).

To use a 32K context window with Mistral:

# ollama run
/set parameter num_ctx 32678

# api
curl http://localhost:11434/api/generate -d &#39;{
  &quot;model&quot;: &quot;mistral&quot;,
  &quot;prompt&quot;: &quot;Why is the sky blue?&quot;,
  &quot;options&quot;: {&quot;num_ctx&quot;: 32678}
}&#39;

Larger models such as mixtral can now be run on Macs with less memory
Fixed an issue where pressing up or down arrow keys would cause the wrong prompt to show in ollama run
Fixed performance issues on Intel Macs
Fixed an error that would occur with old Nvidia GPUs
OLLAMA_ORIGINS now supports browser extension URLs
Ollama will now offload more processing to the GPU where possible

New Contributors

@sublimator made their first contribution in https://github.com/jmorganca/ollama/pull/1797
@gbaptista made their first contribution in https://github.com/jmorganca/ollama/pull/1830

Full Changelog: https://github.com/jmorganca/ollama/compare/v0.1.18...v0.1.19