Ollama benchmark Q2 2024 - Exoscale A40

Keyword(s):

AI Ollama Performance GPU

Privacy:

Public

The LLM world is boiling and a pleiade of different models rise on the open-source market. While the name and number of parameters are considered as quick hint to categorize models, it does not give a real estimation of the performance delivered by neural networks. This project aims to test a large panel of LLM models and discover the reading and writing speed offered by GPU-powered machine.

We use a Small-GPU3 from Exoscale with the following characteristics:

12 CPUs AMD EPYC 7413
56GB of RAM
800GB of root Block Storage
1x NVIDIA A40 - 40GB of VRAM