Welcome! I am a PhD student in the Department of Computer Science, Cornell University. I am very fortunate to be advised by Jon Kleinberg.
Research Interests: I am interested in AI, Machine Learning (particularly Deep Learning) and Theory. My broad research goal is to bring greater interpretability to Deep Neural Networks with a mixture of experiments and theory, and use the insights to guide principled architectural innovations.
I have also worked on evaluating deep models in social science contexts and the trade off with more standard methods. A pure social science question was explored in work on Team Performance with Test Scores.
Email: mraghu (at) cs(dot)cornell(dot)edu
Before Cornell, I was at the University of Cambridge (Trinity College) where I completed my Bachelors and Masters (Part III of the Tripos) in Mathematics. I spent the latter half of my life in the UK, but have also lived in France, South Africa, India and the US.
Abstract. We study the expressivity of deep neural networks with random weights. We provide several results, both theoretical and experimental, precisely characterizing their functional properties in terms of the depth and width of the network. In doing so, we illustrate inherent connections between the length of a latent trajectory, local neuron transitions, and network activation patterns. The latter, a notion defined in this paper, is further studied using properties of hyperplane arrangements, which also help precisely characterize the action of the neural network on the input space. We further show dualities between changes to the latent state and changes to the network weights, and between the number of achievable activation patterns and the number of achievable labelings over input data. We see that the depth of the network affects all of these quantities exponentially, while the width appears at most as a base. These results also suggest that the remaining depth of a neural network is an important determinant of expressivity, supported by experiments on MNIST and CIFAR-10.
Abstract. We combine Riemannian geometry with the mean field theory of high dimensional chaos to study the nature of signal propagation in generic, deep neural networks with random weights. Our results reveal an order-to-chaos expressivity phase transition, with networks in the chaotic phase computing nonlinear functions whose global curvature grows exponentially with depth but not width. We prove this generic class of deep random functions cannot be efficiently computed by any shallow network, going beyond prior work restricted to the analysis of single functions. Moreover, we formalize and quantitatively demonstrate the long conjectured idea that deep networks can disentangle highly curved manifolds in input space into flat manifolds in hidden space. Our theoretical analysis of the expressive power of deep networks broadly applies to arbitrary nonlinearities, and provides a quantitative underpinning for previously abstract notions about the geometry of deep functions.
Abstract. Team performance is a ubiquitous area of inquiry in the social sciences, and it motivates the problem of team selection -- choosing the members of a team for maximum performance. Influential work of Hong and Page has argued that testing individuals in isolation and then assembling the highest-scoring ones into a team is not an effective method for team selection. For a broad class of performance measures, based on the expected maximum of random variables representing individual candidates, we show that tests directly measuring individual performance are indeed ineffective, but that a more subtle family of tests used in isolation can provide a constant-factor approximation for team performance. These new tests measure the 'potential' of individuals, in a precise sense, rather than performance; to our knowledge they represent the first time that individual tests have been shown to produce near-optimal teams for a non-trivial team performance measure. We also show families of subdmodular and supermodular team performance functions for which no test applied to individuals can produce near-optimal teams, and discuss implications for submodular maximization via hill-climbing.