3620 South Vermont Avenue, Los Angeles, CA 90089

View map


Mahdi Soltanolkotabi, USC

Abstract: One of the key mysteries in modern learning is that a variety of models such as deep neural networks when  trained via (stochastic) gradient descent can extract useful features and learn high quality representations directly from data simultaneously with fitting the labels. This feature learning capability is also at the forefront of the recent success of a variety of contemporary paradigms such as transformer architectures, self-supervised and transfer learning. Despite a flurry of exciting activity over the past few years, existing theoretical results are often too crude and/or pessimistic to explain feature/representation learning in practical regimes of operation or serve as a guiding principle for practitioners. Indeed, existing literature often requires unrealistic hyperparameter choices (e.g. very small step sizes, large initialization or wide models).  In this talk I will focus on demystifying this feature/representation learning phenomena for a variety of problems spanning single index models, low-rank factorization, matrix reconstruction, and neural networks. Our results are based on an intriguing spectral bias phenomena for gradient descent, that puts the iterations on a particular trajectory towards solutions that are not only globally optimal but also generalize well by simultaneously finding good features/representations of the data while fitting to the labels. The proofs combine ideas from high-dimensional probability/statistics, optimization and nonlinear control to develop a precise analysis of model generalization along the trajectory of gradient descent. Time permitting, I will explain the implications of these theoretical results for more contemporary use cases including transfer learning, self-attention, prompt-tuning via transformers and simple self-supervised learning settings.

Event Details