Document Details
Clip:
arXiv:2412.02595v2 [cs.CL] 30 May 2025 Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset Dan Su * , Kezhi Kong * , Ying Lin * , Joseph Jennings, Brandon Norick, Markus Kliegl † ,Mostofa Patwary,Mohammad Shoeybi,Bryan Catanzaro NVIDIA *
Filename:
2412.02595v2.pdf
Filetype:
application/pdf
Size:
400612 bytes
Uploaded On:
2025-10-24
Abstract:
Summary:
Tags:
Notes:
Visible:
1
Status:
Parsed
Author:
Dan Su; Kezhi Kong; Ying Lin; Joseph Jennings; Brandon Norick; Markus Kliegl; Mostofa Patwary; Mohammad Shoeybi; Bryan Catanzaro
Creator:
arXiv GenPDF (tex2pdf:)
DOI:
https://doi.org/10.48550/arXiv.2412.02595
License:
http://arxiv.org/licenses/nonexclusive-distrib/1.0/
PTEX.Fullbanner:
This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5
Producer:
pikepdf 8.15.1
Title:
Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset
Trapped:
False
ArXivID:
https://arxiv.org/abs/2412.02595v2
Pages:
17
Return to Document Library