Accede a Declaración de AccesibilidadAccede al menú principalAccede al pieAccede al contenido principal
Español

SpeechLLMs for Speech Recognition and Understanding - Past, Present, and Future

SpeechLLMs for Speech Recognition and Understanding - Past, Present, and Future

Organizado por Escuela Politécnica Superior

Andreas Stolcke

Uniphore

Resumen/Abstract

The talk will give an overview of how we use multimodal LLMs that interpret both text and speech input, also knowns as speechLLMs, for various tasks in speech-based recognition and understanding at Uniphore. After a brief overview of the paradigm and history of speechLLMs, I will dive into some use-cases, including: correction and arbitration of multiple ASR outputs for higher accuracy; named entity recognition and slot-filling directly from speech input; emotion recognition from spoken input; and summarization of spoken conversations. While much progress has been made within this framework in the past few years, there are still major challenges, which I will highlight as well.

Curriculum ponente

Andreas Stolcke is a distinguished scientist at Uniphore. He obtained his PhD from UC Berkeley and then worked as a researcher/scientist at SRI International, Microsoft, and Amazon. His research interests include computational linguistics, language modeling, speech recognition, speaker recognition and diarization (keeping track of multiple speakers), and paralinguistics (e.g., sentiment and emotion recognition), with over 300 papers and patents in these areas. His open-source SRI Language Modeling Toolkit is widely used in academia. For over 30 years, Andreas has a strong track record inventing novel algorithms for speech and language processing. He is a Fellow of the IEEE, the International Speech Communication Association, and the Asia-Pacific Artificial Intelligence Association

Información del evento

Fechas