Ollama benchmark Q2 2024 - Exoscale A40

Go back to list

Ollama benchmark Q2 2024 - Exoscale A40
AI Ollama Performance GPU
Public

The LLM world is boiling and a pleiade of different models rise on the open-source market. While the name and number of parameters are considered as quick hint to categorize models, it does not give a real estimation of the performance delivered by neural networks. This project aims to test a large panel of LLM models and discover the reading and writing speed offered by GPU-powered machine.

We use a Small-GPU3 from Exoscale with the following characteristics:

  • 12 CPUs AMD EPYC 7413
  • 56GB of RAM
  • 800GB of root Block Storage
  • 1x NVIDIA A40 - 40GB of VRAM