Category Latest News

Muon Optimizer Significantly Accelerates Grokking in Transformers: Microsoft Researchers Explore Optimizer Influence on Delayed Generalization

Revisiting the Grokking Challenge In recent years, the phenomenon of grokking—where deep learning models exhibit a delayed yet sudden transition from memorization to generalization—has prompted renewed investigation into training dynamics. Initially observed in small algorithmic tasks like modular arithmetic, grokking…

Open-Source TTS Reaches New Heights: Nari Labs Releases Dia, a 1.6B Parameter Model for Real-Time Voice Cloning and Expressive Speech Synthesis on Consumer Device

The development of text-to-speech (TTS) systems has seen significant advancements in recent years, particularly with the rise of large-scale neural models. Yet, most high-fidelity systems remain locked behind proprietary APIs and commercial platforms. Addressing this gap, Nari Labs has released…