Ollama

ollama/ollama last check 2026-06-18 23:02 UTC 191 releases recent

Notes

site

Release notes

v0.3.4 · 1y+

New embedding models

BGE-M3: a large embedding model from BAAI distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity.
BGE-Large: a large embedding model trained in english.
Paraphrase-Multilingual: A multilingual embedding model trained on parallel data for 50+ languages.

New embedding API with batch support

Ollama now supports a new API endpoint /api/embed for embedding generation:

curl http://localhost:11434/api/embed -d &#39;{
  &quot;model&quot;: &quot;all-minilm&quot;,
  &quot;input&quot;: [&quot;Why is the sky blue?&quot;, &quot;Why is the grass green?&quot;]
}&#39;

This API endpoint supports new features:

Batches: generate embeddings for several documents in one request
Normalized embeddings: embeddings are now normalized, improving similarity results
Truncation: a new truncate parameter that will error if set to false
Metrics: responses include load_duration, total_duration and prompt_eval_count metrics

See the API documentation for more details and examples.

What's Changed

Fixed initial slow download speeds on Windows
NUMA support will now be autodetected by Ollama to improve performance
Fixed issue where the /api/embed would sometimes return embedding results out of order

New Contributors

@av made their first contribution in https://github.com/ollama/ollama/pull/6147
@sryu1 made their first contribution in https://github.com/ollama/ollama/pull/6151
@rick-github made their first contribution in https://github.com/ollama/ollama/pull/6154

Full Changelog: https://github.com/ollama/ollama/compare/v0.3.3...v0.3.4