Ollama benchmark Q2 2024 - Exoscale A40
Go back to listAI Ollama Performance GPU
Public
The LLM world is boiling and a pleiade of different models rise on the open-source market. While the name and number of parameters are considered as quick hint to categorize models, it does not give a real estimation of the performance delivered by neural networks. This project aims to test a large panel of LLM models and discover the reading and writing speed offered by GPU-powered machine.
We use a Small-GPU3 from Exoscale with the following characteristics:
- 12 CPUs AMD EPYC 7413
- 56GB of RAM
- 800GB of root Block Storage
- 1x NVIDIA A40 - 40GB of VRAM