rerank endpoint plugin

This commit is contained in:
2026-01-20 22:01:23 +01:00
parent 6c7f96145b
commit 8149ac8c8b
3 changed files with 119 additions and 53 deletions

View File

@@ -135,6 +135,14 @@ The script will:
- Ollama installed and available in PATH
- Internet connection for downloading models
### Plugins
#### Reranking Endpoint (`plugins/reranking-endpoint/`)
A FastAPI service that provides document reranking using cross-encoder models (BGE-reranker, Qwen3-Reranker, etc.) via Ollama.
**⚠️ Limitation:** This is a workaround that uses embedding magnitudes instead of the proper classification head. Ollama doesn't expose the `/api/rerank` endpoint or classification layer that cross-encoder models are designed to use. Less accurate than sentence-transformers but integrated with Ollama's GPU scheduling. See [plugins/reranking-endpoint/README.md](plugins/reranking-endpoint/README.md) for detailed limitations.
### Other Scripts
- `context-optimizer.py` - Find optimal num_ctx for models based on VRAM constraints