Ollama
Notes
Release notes
v0.1.35
· 1y+
New models
- Llama 3 ChatQA: A model from NVIDIA based on Llama 3 that excels at conversational question answering (QA) and retrieval-augmented generation (RAG).
What's Changed
- Quantization:
ollama createcan now quantize models when importing them using the--quantizeor-qflag:
ollama create -f Modelfile --quantize q4_0 mymodel
> [!NOTE]
> --quantize works when importing float16 or float32 models:
> * From a binary GGUF files (e.g. FROM ./model.gguf)
> * From a library model (e.g. FROM llama3:8b-instruct-fp16)
- Fixed issue where inference subprocesses wouldn't be cleaned up on shutdown.
- Fixed a series out of memory errors when loading models on multi-GPU systems
- <kbd>Ctrl+J</kbd> characters will now properly add newlines in
ollama run - Fixed issues when running
ollama showfor vision models OPTIONSrequests to the Ollama API will no longer result in errors- Fixed issue where partially downloaded files wouldn't be cleaned up
- Added a new
done_reasonfield in responses describing why generation stopped responding - Ollama will now more accurately estimate how much memory is available on multi-GPU systems especially when running different models one after another
New Contributors
- @fmaclen made their first contribution in https://github.com/ollama/ollama/pull/3884
- @Renset made their first contribution in https://github.com/ollama/ollama/pull/3881
- @glumia made their first contribution in https://github.com/ollama/ollama/pull/3043
- @boessu made their first contribution in https://github.com/ollama/ollama/pull/4236
- @gaardhus made their first contribution in https://github.com/ollama/ollama/pull/2307
- @svilupp made their first contribution in https://github.com/ollama/ollama/pull/2192
- @WolfTheDeveloper made their first contribution in https://github.com/ollama/ollama/pull/4300
Full Changelog: https://github.com/ollama/ollama/compare/v0.1.34...v0.1.35