Document Details


2508.13898v2.pdf
Download View Text Delete
Clip: Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches Yishun Lu 1 , Wesley Armour 1 1 Department of Engineering Science University of Oxford Oxford, UK Abstract Modern GPUs are equipped with large amounts of high- bandwidth memory, enabling them to support mini-batch sizes of up to tens of thousands of training samples. How- ever, most existing optimizers struggle to perform effectively at such a large batch size. As batch size increases, gradient noise decreases due to averaging over many samples, lim-
Filename: 2508.13898v2.pdf
Filetype: application/pdf
Size: 618928 bytes
Uploaded On: 2025-10-24
Abstract:
Summary:
Tags:
Notes:
Visible: 1
Status: Parsed
Author: Yishun Lu; Wesley Armour
Creator: arXiv GenPDF (tex2pdf:)
DOI: https://doi.org/10.48550/arXiv.2508.13898
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
PTEX.Fullbanner: This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5
Producer: pikepdf 8.15.1
TemplateVersion: 2026.1
Title: Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches
Trapped: False
ArXivID: https://arxiv.org/abs/2508.13898v2
Pages: 15

Return to Document Library