Ollama

ollama/ollama last check 191 releases recent
Notes
Release notes
v0.11.5 · 6m+
view on github

What's Changed

  • Performance improvements for the gpt-oss models
  • New memory management: this release of Ollama includes improved memory management for scheduling models on GPUs, leading to better VRAM utilization, model performance and less out of memory errors. These new memory estimations can be enabled with OLLAMA_NEW_ESTIMATES=1 ollama serve and will soon be enabled by default.
  • Improved multi-GPU scheduling and reduced VRAM allocation when using more than 2 GPUs
  • Ollama's new app will now remember default selections for default model, Turbo and Web Search between restarts
  • Fix error when parsing bad harmony tool calls
  • OLLAMA_FLASH_ATTENTION=1 will also enable flash attention for pure-CPU models
  • Fixed OpenAI-compatible API not supporting reasoning_effort
  • Reduced size of installation on Windows and Linux

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.11.4...v0.11.5-rc1