<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
Ciprian Chelba asked me to forward the following information about a
recently launched initiative in large-scale LM benchmarking. More
information at <a
href="https://code.google.com/p/1-billion-word-language-modeling-benchmark/"
target="_blank">https://code.google.com/p/1-billion-word-language-modeling-benchmark/</a>
.<br>
<br>
Andreas<br>
<br>
_________________________________________________________________________________________________________<br>
<div>Here is a brief description of the project.</div>
<div>
<p
style="line-height:1.25em;max-width:64em;font-size:13.194443702697754px">"The
purpose of the project is to make available a standard training
and test setup for language modeling experiments.</p>
<p
style="line-height:1.25em;max-width:64em;font-size:13.194443702697754px">The
training/held-out data was produced from a download at <a
href="http://statmt.org/" target="_blank">statmt.org</a> using
a combination of Bash shell and Perl scripts distributed here.</p>
<p
style="line-height:1.25em;max-width:64em;font-size:13.194443702697754px">This
also means that your results on this data set are reproducible
by the research community at large.</p>
<p
style="line-height:1.25em;max-width:64em;font-size:13.194443702697754px">
Besides the scripts needed to rebuild the training/held-out
data, it also makes available log-probability values for each
word in each of ten held-out data sets, for each of the
following baseline models:</p>
<ul
style="padding-left:25px;max-width:62em;font-size:13.194443702697754px">
<li style="margin-left:15px;margin-bottom:0.3em">unpruned Katz
(1.1B n-grams),</li>
<li style="margin-left:15px;margin-bottom:0.3em">pruned Katz
(~15M n-grams),</li>
<li style="margin-left:15px;margin-bottom:0.3em">unpruned
Interpolated Kneser-Ney (1.1B n-grams),</li>
<li style="margin-left:15px;margin-bottom:0.3em">pruned
Interpolated Kneser-Ney (~15M n-grams)</li>
</ul>
<div><font color="#000000" face="arial, sans-serif">ArXiv paper: <a
href="http://arxiv.org/abs/1312.3005" target="_blank">http://arxiv.org/abs/1312.3005</a></font></div>
<p
style="line-height:1.25em;max-width:64em;font-size:13.194443702697754px">Happy
benchmarking!"</p>
</div>
<div>
-- <br>
-Ciprian
</div>
<br>
<br>
<br>
</body>
</html>