Timezone: »
Spotlight
Nonlinear random matrix theory for deep learning
Jeffrey Pennington · Pratik Worah
Neural network configurations with random weights play an important role in the analysis of deep learning. They define the initial loss landscape and are closely related to kernel and random feature methods. Despite the fact that these networks are built out of random matrices, the vast and powerful machinery of random matrix theory has so far found limited success in studying them. A main obstacle in this direction is that neural networks are nonlinear, which prevents the straightforward utilization of many of the existing mathematical results. In this work, we open the door for direct applications of random matrix theory to deep learning by demonstrating that the pointwise nonlinearities typically applied in neural networks can be incorporated into a standard method of proof in random matrix theory known as the moments method. The test case for our study is the Gram matrix $Y^TY$, $Y=f(WX)$, where $W$ is a random weight matrix, $X$ is a random data matrix, and $f$ is a pointwise nonlinear activation function. We derive an explicit representation for the trace of the resolvent of this matrix, which defines its limiting spectral distribution. We apply these results to the computation of the asymptotic performance of singlelayer random feature methods on a memorization task and to the analysis of the eigenvalues of the data covariance matrix as it propagates through a neural network. As a byproduct of our analysis, we identify an intriguing new class of activation functions with favorable properties.
Author Information
Jeffrey Pennington (Google Brain)
Pratik Worah (Google)
Related Events (a corresponding poster, oral, or spotlight)

2017 Poster: Nonlinear random matrix theory for deep learning »
Thu Dec 7th 02:30  06:30 AM Room Pacific Ballroom #137
More from the Same Authors

2021 Poster: Covariate Shift in HighDimensions and Overparameterized Models »
Nilesh Tripuraneni · Ben Adlam · Jeffrey Pennington 
2020 Poster: Finite Versus Infinite Neural Networks: an Empirical Study »
Jaehoon Lee · Samuel Schoenholz · Jeffrey Pennington · Ben Adlam · Lechao Xiao · Roman Novak · Jascha SohlDickstein 
2020 Spotlight: Finite Versus Infinite Neural Networks: an Empirical Study »
Jaehoon Lee · Samuel Schoenholz · Jeffrey Pennington · Ben Adlam · Lechao Xiao · Roman Novak · Jascha SohlDickstein 
2020 Poster: The Surprising Simplicity of the EarlyTime Learning Dynamics of Neural Networks »
Wei Hu · Lechao Xiao · Ben Adlam · Jeffrey Pennington 
2020 Spotlight: The Surprising Simplicity of the EarlyTime Learning Dynamics of Neural Networks »
Wei Hu · Lechao Xiao · Ben Adlam · Jeffrey Pennington 
2020 Poster: Understanding Double Descent Requires A FineGrained BiasVariance Decomposition »
Ben Adlam · Jeffrey Pennington 
2019 Poster: Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent »
Jaehoon Lee · Lechao Xiao · Samuel Schoenholz · Yasaman Bahri · Roman Novak · Jascha SohlDickstein · Jeffrey Pennington 
2018 Poster: The Spectrum of the Fisher Information Matrix of a SingleHiddenLayer Neural Network »
Jeffrey Pennington · Pratik Worah 
2017 Poster: Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice »
Jeffrey Pennington · Samuel Schoenholz · Surya Ganguli 
2015 Poster: Spherical Random Features for Polynomial Kernels »
Jeffrey Pennington · Felix Yu · Sanjiv Kumar 
2015 Spotlight: Spherical Random Features for Polynomial Kernels »
Jeffrey Pennington · Felix Yu · Sanjiv Kumar