Events

DMS Applied and Computational Mathematics Seminar

Time: Sep 05, 2025 (02:00 PM)
Location: 328 Parker Hall

Details:
 
liao
 
Speaker: Wenjing Liao (Georgia Tech)
 
Title: Exploiting Low-Dimensional Data Structures and Understanding Neural Scaling Laws of Transformers
 
 
Abstract: When training deep neural networks, a model’s generalization error is often observed to follow a power scaling law dependent on the model size and the data size. A prominent example is transformer-based large language models (LLMs), where networks with billions of parameters are trained on trillions of tokens. A theoretical interest in LLMs is to understand why transformer scaling laws emerge. In this talk, we exploit low-dimensional structures in language datasets by estimating its intrinsic dimension and establish statistical estimation and mathematical approximation theories for transformers to predict the scaling laws. This perspective shows that transformer scaling laws can be explained in a manner consistent with the underlying data geometry. We further validate our theory with empirical observations of LLMs and find strong agreement between the observed empirical scaling laws and our theoretical predictions. Finally, we turn to in-context learning, analyzing its scaling behavior by uncovering a connection between the attention mechanism in transformers and classical kernel methods in machine learning.
 
 
Host: Yimin Zhong