<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Ltx2 on Xabi Ezpeleta</title><link>https://xezpeleta.github.io/en/tags/ltx2/</link><description>Recent content in Ltx2 on Xabi Ezpeleta</description><generator>Hugo -- 0.161.1</generator><language>en</language><copyright>Copyright © 2021, Xabier Ezpeleta. License CC BY-SA 4.0.</copyright><lastBuildDate>Sat, 16 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://xezpeleta.github.io/en/tags/ltx2/index.xml" rel="self" type="application/rss+xml"/><item><title>Antzoki-TTS: Adding Emotion and Acting Capabilities</title><link>https://xezpeleta.github.io/en/blog/antzoki-tts/</link><pubDate>Sat, 16 May 2026 00:00:00 +0000</pubDate><guid>https://xezpeleta.github.io/en/blog/antzoki-tts/</guid><description>&lt;p&gt;Traditional &lt;em&gt;Text-to-Speech&lt;/em&gt; technology gives us solid voices today, but offers no way to guide the interpretation of the audio. If we want to use these models to produce audiobooks, stories, game audio, etc., we soon realize the results are not ideal. We get cold, neutral voices. Robotic. Boring.&lt;/p&gt;
&lt;p&gt;Lately, however, models that allow us to specify paralinguistic cues (pauses, breathing, and emotional variations) are beginning to emerge. Thanks to this technology, by providing script-like instructions, &lt;strong&gt;we can create emotionally rich synthetic voices&lt;/strong&gt;.&lt;/p&gt;</description></item></channel></rss>