Acceder al contenido principalAcceder al menú principal'>Formulario de contacto'>La UAM

Escuela Politécnica SuperiorLogo EPS

Seminarios de Investigación: Model compression as constrained optimization, with application to neural nets

Organiza
Escuela Politécnica Superior
Ponente
Prof. Miguel A. Carreira Perpiñán
Institución de origen
lectrical Engineering & Computer Science, UC Merced
Fecha
09-01-2019
Hora
12:00
Lugar
Sala de Grados A (A-120) , Escuela Politécnica Superior, Universidad Autónoma de Madrid
Descripción

Deep neural nets have become in recent years a widespread practical technology, with impressive performance in computer vision, speech recognition, natural language processing and many other applications. Deploying deep nets in mobile phones, robots, sensors and IoT devices is of great interest. However, state-of-the-art deep nets for tasks such as object recognition are too large to be deployed in these devices because of the computational limits they impose in CPU speed, memory, bandwidth, battery life or energy consumption. This has made compressing neural nets an active research problem. More generally, compression can be seen as a sophisticated form of regularization that allows the model designer to learn the structure of a model automatically. Our aim is to provide a generic computational framework for the problem of model compression that can easily handle different types of compression (such as quantization, low-rank decomposition, pruning or lossless compression) and machine learning tasks (such as classification, regression, dimensionality reduction or clustering). We give a general formulation of model compression as constrained optimization, and a "meta-algorithm" to optimize this nonconvex problem based on the augmented Lagrangian and alternating optimization. This results in a "learning-compression" (LC) algorithm, which alternates a learning step of the uncompressed model, independent of the compression type, with a compression step of the model parameters, independent of the learning task. This simple, efficient algorithm is guaranteed to find the best compressed model for the task in a local sense under some assumptions. We then describe specializations of the LC algorithm for various types of compression, such as binarization, ternarization and other forms of quantization, pruning, low-rank decomposition, and other variations. We show experimentally with large deep neural nets such as ResNets that the LC algorithm can achieve much higher compression rates than previous work on deep net compression for a given target classification accuracy. For example, we can often quantize down to just 1 bit per weight with negligible accuracy degradation. This is joint work with my PhD students Yerlan Idelbayev and Arman Zharmagambetov.

Escuela Politécnica Superior | Universidad Autónoma de Madrid | Francisco Tomás y Valiente, 11 | 28049 Madrid | Tel.: +34 91 497 2222 | e-mail: informacion.eps@uam.es