Ollama

ollama/ollama last check 191 releases recent
Notes
Release notes
v0.12.7 · 6m+
view on github

<img width="600" alt="Ollama screenshot 2025-10-29 at 13 56 55@2x" src="https://github.com/user-attachments/assets/4fea0b30-5d31-4da2-b99c-7f38606fc0a2" />

New models

  • Qwen3-VL: Qwen3-VL is now available in all parameter sizes ranging from 2B to 235B
  • MiniMax-M2: a 230 Billion parameter model built for coding & agentic workflows available on Ollama's cloud

Add files and adjust thinking levels in Ollama's new app

Ollama's new app now includes a way to add one or many files when prompting the model:

<img width="912" height="712" alt="Screenshot 2025-10-29 at 2 16 55 PM" src="https://github.com/user-attachments/assets/60b5fff7-8fab-4433-8183-06696a96042f" />

For better responses, thinking levels can now be adjusted for the gpt-oss models:

<img width="954" height="712" alt="Screenshot 2025-10-29 at 2 12 33 PM" src="https://github.com/user-attachments/assets/7025dcfb-51eb-421c-af2d-fd6f0524955c" />

New API documentation

New API documentation is available for Ollama's API: https://docs.ollama.com/api

What's Changed

  • Model load failures now include more information on Windows
  • Fixed embedding results being incorrect when running embeddinggemma
  • Fixed gemma3n on Vulkan backend
  • Increased time allocated for ROCm to discover devices
  • Fixed truncation error when generating embeddings
  • Fixed request status code when running cloud models
  • The OpenAI-compatible /v1/embeddings endpoint now supports encoding_format parameter
  • Ollama will now parse tool calls that don't conform to {&quot;name&quot;: name, &quot;arguments&quot;: args} (thanks @rick-github!)
  • Fixed prompt processing reporting in the llama runner
  • Increase speed when scheduling models
  • Fixed issue where FROM &lt;model&gt; would not inherit RENDERER or PARSER commands

New Contributors

Full Changelog: https://github.com/ollama/ollama/compare/v0.12.6...v0.12.7-rc0