Which language model performs best in Basque?

The Latxa model was an important milestone for artificial intelligence in the Basque language. Until then, we had no open-weight model that handled Basque decently. Since then, several others have been created specifically fine-tuned for Basque: Kimu (Orai) and the Latxa Qwen3 VL family of models recently released by HiTZ.

At the same time, new top-tier local models have been emerging — Qwen, Gemma, etc. — which can be especially interesting for use in agents with external tools. These models are also improving their Basque language capabilities noticeably. But do they match the Latxa model?

This is precisely the question I ask myself every time an interesting new model appears. Beyond running a few manual tests, it seemed necessary to build a proper ranking through a rigorous evaluation. That’s how EvalEU was born: a project to measure and compare the Basque language proficiency of Language Models.

To this end, I have used existing Basque evaluation benchmarks and datasets (many thanks to HiTZ and Orai).

Current ranking How has it evolved over time?

Among the things I would like to add in the future:

Add automatic translation evaluations
Beyond local models, also include large proprietary models offered via API (OpenAI, Google, Anthropic…)

The project has been published under the Itzune initiative. In addition to the website, you will find the tools used to run the evaluations in the repository. Improvements and ideas are very welcome!