Ollama Utils - Web Interface

Available Models

Loading models...

Test a model's VRAM usage and CPU offloading. This will load the model with a minimal prompt and report actual VRAM consumption.

Select Model:

Find the optimal context size (num_ctx) for a model based on available VRAM. This iteratively tests different context sizes.

Select Model:

GPU Overhead (GB):

Max Iterations (0 = unlimited):

Browse available models at ollama.com/library

Model Name: Enter the model name with optional tag (e.g., llama2:7b)

Paste a HuggingFace repository URL or direct link to a GGUF file.

HuggingFace URL: Repository URL or direct link to .gguf file