SpeechLLMs for Speech Recognition and Understanding - Past, Present, and Future
Andreas Stolcke
Uniphore
Resumen/Abstract
The talk will give an overview of how we use multimodal LLMs that interpret both text and speech input, also knowns as speechLLMs, for various tasks in speech-based recognition and understanding at Uniphore. After a brief overview of the paradigm and history of speechLLMs, I will dive into some use-cases, including: correction and arbitration of multiple ASR outputs for higher accuracy; named entity recognition and slot-filling directly from speech input; emotion recognition from spoken input; and summarization of spoken conversations. While much progress has been made within this framework in the past few years, there are still major challenges, which I will highlight as well.
Curriculum ponente
Andreas Stolcke is a distinguished scientist at Uniphore. He obtained his PhD from UC Berkeley and then worked as a researcher/scientist at SRI International, Microsoft, and Amazon. His research interests include computational linguistics, language modeling, speech recognition, speaker recognition and diarization (keeping track of multiple speakers), and paralinguistics (e.g., sentiment and emotion recognition), with over 300 papers and patents in these areas. His open-source SRI Language Modeling Toolkit is widely used in academia. For over 30 years, Andreas has a strong track record inventing novel algorithms for speech and language processing. He is a Fellow of the IEEE, the International Speech Communication Association, and the Asia-Pacific Artificial Intelligence Association
Información del evento
Salón de Grados C (C-001), Edificio C, Escuela Politécnica Superior
Fechas
11/05/2026, 15:00H
Fecha de inicio
11/05/2026, 16:30H
Fecha fin