Ollama

notes:
Release list
0.13.3
RECENT
0.13.2
0.13.1
0.13.0
0.12.11
0.12.10
0.12.9
0.12.8
0.12.7
0.12.6
0.12.5
0.12.4
0.12.3
0.12.2
0.12.1
0.12.0
0.11.11
0.11.10
0.11.9
0.11.8
Release notes:

What's Changed

  • Flash attention is now enabled by default for Qwen 3 and Qwen 3 Coder
  • Fixed minor memory estimation issues when scheduling models on NVIDIA GPUs
  • Fixed an issue where keep_alive in the API would accept different values for the /api/chat and /api/generate endpoints
  • Fixed tool calling rendering with qwen3-coder
  • More reliable and accurate VRAM detection
  • OLLAMA_FLASH_ATTENTION can now be overridden to 0 for models that have flash attention enabled by default
  • macOS 12 Monterey and macOS 13 Ventura are no longer supported
  • Fixed crash where templates were not correctly defined
  • Fix memory calculations on NVIDIA iGPUs
  • AMD gfx900 and gfx906 (MI50, MI60, etc) GPUs are no longer supported via ROCm. We're working to support these GPUs via Vulkan in a future release.

New Contributors

  • @Fachep made their first contribution in https://github.com/ollama/ollama/pull/12412

Full Changelog: https://github.com/ollama/ollama/compare/v0.12.3...v0.12.4-rc3

Copyright © 2023 - All right reserved by Yadoc SAS