Document Details


2505.08727v1.pdf
Download View Text Delete
Clip: Memorization-Compression Cycles Improve Generalization Fangyuan Yu Temus Abstract We prove theoretically that generalization improves not only through data scaling but also by compressing internal representations. To operationalize this insight, we introduce the Information Bottleneck Language Modeling (IBLM) objective, which reframes language modeling as a constrained optimization problem: minimizing representation entropy subject to optimal prediction performance. Empirically, we observe an emergent memorization–compression cycle during LLM pretraining, ev- idenced by oscillating positive/negative gradient alignment between cross-entropy and Matrix-Based Entropy (MBE), a measure for representation entropy. This pat- tern closely mirrors the predictive–compressive trade-off prescribed by IBLM and also parallels the biological alternation between active learning and sleep consoli-
Filename: 2505.08727v1.pdf
Filetype: application/pdf
Size: 1177032 bytes
Uploaded On: 2025-10-24
Abstract:
Summary:
Tags:
Notes:
Visible: 1
Status: Parsed
Author:
CreationDate: 2025-05-14T01:07:18+00:00
Creator: LaTeX with hyperref
Keywords:
ModDate: 2025-05-14T01:07:18+00:00
PTEX.Fullbanner: This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5
Producer: pdfTeX-1.40.25
Subject:
Title:
Trapped: False
Pages: 12

Return to Document Library