New Text-to-Speech Models for Basque: Maider, Antton and OmniVoice

In recent years, text-to-speech (TTS) models have advanced enormously. Month after month, new releases improved pronunciation, prosody, and overall audio quality. Unfortunately, most of those models were designed for English, or at best compatible with only a few languages. The tools available for generating Basque voices used to be quite limited: paid proprietary systems (such as Elhuyar’s neural TTS, developed by Orai), or older robotic voices that had clearly fallen behind. ...

April 4, 2026
Nvidia Parakeet in Basque

Nvidia Parakeet in Basque: fast and CPU-friendly

Over the last few days, I have been fine-tuning Nvidia Parakeet for Basque, and I have published the result here. The goal was simple: to have a lightweight Basque speech-to-text model that runs fast and is practical on modest hardware. Accuracy vs speed To be clear, this model is not as accurate as my best Basque Whisper model: xezpeleta/whisper-large-v3-eu However, it has a major advantage: it is very fast and can run on CPU-only setups. ...

March 15, 2026
Kimu eredua

Kimu

Orai has just released a language model called Kimu. Based on Gemma 2, they have created models with 2B and 9B parameters. They have successfully injected the knowledge required to speak and understand Basque into the base model. I have converted both Kimu models to the GGUF format and published them on Hugging Face. This makes it possible to use the Kimu model with applications like Llama.cpp or Ollama. ...

October 13, 2024
Speech-to-text

Basque from speech to text: improving transcription models

Speech-to-text (STT) technology converts spoken language into written text using natural language processing. These systems are becoming increasingly important in digital interfaces, accessibility solutions, and various communication platforms. Since 2022, I’ve been fine-tuning the original Whisper speech recognition model for the Basque language, using the Mozilla Common Voice dataset. Compared to the original models, I’ve seen significant improvements in performance. As the Mozilla Common Voice initiative has grown, the model’s accuracy has continued to improve. ...

February 27, 2024
Tulu-3 workflow

Local RAG in Basque using Tülu 3

In recent months, Local LLMs have significantly improved, and some of them perform surprisingly well with Basque language. Among them, I want to highlight Tülu 3 70B, which shows good results in Basque when using the quantized version (q4_K_M). Until the Latxa instruction model becomes available, this is probably the best option for having conversations or generating text in Basque. ...

February 9, 2024