Ollama

ollama/ollama last check 2026-06-18 20:01 UTC 191 releases recent

Notes

site

Release notes

v0.12.7 · 6m+

<img width="600" alt="Ollama screenshot 2025-10-29 at 13 56 55@2x" src="https://github.com/user-attachments/assets/4fea0b30-5d31-4da2-b99c-7f38606fc0a2" />

New models

Qwen3-VL: Qwen3-VL is now available in all parameter sizes ranging from 2B to 235B
MiniMax-M2: a 230 Billion parameter model built for coding & agentic workflows available on Ollama's cloud

Add files and adjust thinking levels in Ollama's new app

Ollama's new app now includes a way to add one or many files when prompting the model:

<img width="912" height="712" alt="Screenshot 2025-10-29 at 2 16 55 PM" src="https://github.com/user-attachments/assets/60b5fff7-8fab-4433-8183-06696a96042f" />

For better responses, thinking levels can now be adjusted for the gpt-oss models:

<img width="954" height="712" alt="Screenshot 2025-10-29 at 2 12 33 PM" src="https://github.com/user-attachments/assets/7025dcfb-51eb-421c-af2d-fd6f0524955c" />

New API documentation

New API documentation is available for Ollama's API: https://docs.ollama.com/api

What's Changed

Model load failures now include more information on Windows
Fixed embedding results being incorrect when running embeddinggemma
Fixed gemma3n on Vulkan backend
Increased time allocated for ROCm to discover devices
Fixed truncation error when generating embeddings
Fixed request status code when running cloud models
The OpenAI-compatible /v1/embeddings endpoint now supports encoding_format parameter
Ollama will now parse tool calls that don't conform to {"name": name, "arguments": args} (thanks @rick-github!)
Fixed prompt processing reporting in the llama runner
Increase speed when scheduling models
Fixed issue where FROM <model> would not inherit RENDERER or PARSER commands

New Contributors

@npardal made their first contribution in https://github.com/ollama/ollama/pull/12715

Full Changelog: https://github.com/ollama/ollama/compare/v0.12.6...v0.12.7-rc0