Ollama

ollama/ollama last check 191 releases recent
Notes
Release notes
v0.1.19 · 1y+
view on github

This release focuses on performance and fixing a number issues and crashes relating to memory allocation.

New Models

  • LLaMa-Pro: An expansion of LLaMa by Tencent to an 8B that specializes in language, programming and mathematics.

What's Changed

  • Fixed "out of memory" errors when running models such as llama2, mixtral or llama2:13b with limited GPU memory
  • Fixed CUDA errors when running on older GPUs that aren't yet supported
  • Increasing context size with num_ctx will now work (up to a model's supported context window).

To use a 32K context window with Mistral:

# ollama run
/set parameter num_ctx 32678

# api
curl http://localhost:11434/api/generate -d '{
  "model": "mistral",
  "prompt": "Why is the sky blue?",
  "options": {"num_ctx": 32678}
}'
  • Larger models such as mixtral can now be run on Macs with less memory
  • Fixed an issue where pressing up or down arrow keys would cause the wrong prompt to show in ollama run
  • Fixed performance issues on Intel Macs
  • Fixed an error that would occur with old Nvidia GPUs
  • OLLAMA_ORIGINS now supports browser extension URLs
  • Ollama will now offload more processing to the GPU where possible

New Contributors

Full Changelog: https://github.com/jmorganca/ollama/compare/v0.1.18...v0.1.19