This note gathers papers that use concepts from information theory and spectral theory for deep learning.

Hierarchical Tokenization for images (also relates to Global Precedence Effect)

Other non-linear tokenizations

Coding Rate

Other

PS: Personally curated list. (1-sentence summaries by gpt-o3 because i was too lazy :p).