Extracted Text
1904.09751.pdf
Published as a conference paper at ICLR 2020
THECURIOUSCASE OF
NEURALTEXTDeGENERATION
Ari Holtzman
yz
Jan Buys
xy
Li Du
y
Maxwell Forbes
yz
Yejin Choi
yz
y
Paul G. Allen School of Computer Science & Engineering, University of Washington
z
Allen Institute for Articial Intelligence
x
Department of Computer Science, University of Cape Town
fahai,dul2,mbforbes,yejin g@cs.washington.edu, jbuys@cs.uct.ac.za
ABSTRACT
Despite considerable advances in neural language modeling, it remains an open
question what the bestdecoding strategyis for text generation from a language
model (e.g. to generate a story). The counter-intuitive empirical observation is
that even though the use of likelihood as training objective leads to high quality
models for a broad range of language understanding tasks, maximization-based
decoding methods such as beam search lead todegeneration output text that is
bland, incoherent, or gets stuck in repetitive loops.
To address this we proposeNucleus Sampling, a simple but effective method to
draw considerably higher quality text out of neural language models than previ-
ous decoding strategies. Our approach avoids textdegeneration by truncating the
unreliable tail of the probability distribution, sampling from the dynamic nucleus
of tokens containing the vast majority of the probability mass.
To properly examine current maximization-based and stochastic decoding meth-
ods, we compare generations from each of these methods to the distribution of
human text along several axes such as likelihood, diversity, and repetition. Our re-
sults show that (1) maximization is an inappropriate decoding objective for open-
ended text generation, (2) the probability distributions of the best current language
models have an unreliable tail which needs to be truncated during generation and
(3) Nucleus Sampling is currently the best available decoding strategy for gener-
ating long-form text that is both high-quality as measured by human evaluation
and as diverse as human-written text.
! "
THECURIOUSCASE OF
NEURALTEXTDeGENERATION
Ari Holtzman
yz
Jan Buys
xy
Li Du
y
Maxwell Forbes
yz
Yejin Choi
yz
y
Paul G. Allen School of Computer Science & Engineering, University of Washington
z
Allen Institute for Articial Intelligence
x
Department of Computer Science, University of Cape Town
fahai,dul2,mbforbes,yejin g@cs.washington.edu, jbuys@cs.uct.ac.za
ABSTRACT
Despite considerable advances in neural language modeling, it remains an open
question what the bestdecoding strategyis for text generation from a language
model (e.g. to generate a story). The counter-intuitive empirical observation is
that even though the use of likelihood as training objective leads to high quality
models for a broad range of language understanding tasks, maximization-based
decoding methods such as beam search lead todegeneration output text that is
bland, incoherent, or gets stuck in repetitive loops.
To address this we proposeNucleus Sampling, a simple but effective method to
draw considerably higher quality text out of neural language models than previ-
ous decoding strategies. Our approach avoids textdegeneration by truncating the
unreliable tail of the probability distribution, sampling from the dynamic nucleus
of tokens containing the vast majority of the probability mass.
To properly examine current maximization-based and stochastic decoding meth-
ods, we compare generations from each of these methods to the distribution of
human text along several axes such as likelihood, diversity, and repetition. Our re-
sults show that (1) maximization is an inappropriate decoding objective for open-
ended text generation, (2) the probability distributions of the best current language
models have an unreliable tail which needs to be truncated during generation and
(3) Nucleus Sampling is currently the best available decoding strategy for gener-
ating long-form text that is both high-quality as measured by human evaluation
and as diverse as human-written text.
! "