<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <br>
    Ciprian Chelba asked me to forward the following information about a
    recently launched initiative in large-scale LM benchmarking.  More
    information at  <a
href="https://code.google.com/p/1-billion-word-language-modeling-benchmark/"
      target="_blank">https://code.google.com/p/1-billion-word-language-modeling-benchmark/</a>
    .<br>
    <br>
    Andreas<br>
    <br>
_________________________________________________________________________________________________________<br>
    <div>Here is a brief description of the project.</div>
    <div>
      <p
        style="line-height:1.25em;max-width:64em;font-size:13.194443702697754px">"The
        purpose of the project is to make available a standard training
        and test setup for language modeling experiments.</p>
      <p
        style="line-height:1.25em;max-width:64em;font-size:13.194443702697754px">The
        training/held-out data was produced from a download at <a
          href="http://statmt.org/" target="_blank">statmt.org</a> using
        a combination of Bash shell and Perl scripts distributed here.</p>
      <p
        style="line-height:1.25em;max-width:64em;font-size:13.194443702697754px">This
        also means that your results on this data set are reproducible
        by the research community at large.</p>
      <p
        style="line-height:1.25em;max-width:64em;font-size:13.194443702697754px">
        Besides the scripts needed to rebuild the training/held-out
        data, it also makes available log-probability values for each
        word in each of ten held-out data sets, for each of the
        following baseline models:</p>
      <ul
        style="padding-left:25px;max-width:62em;font-size:13.194443702697754px">
        <li style="margin-left:15px;margin-bottom:0.3em">unpruned Katz
          (1.1B n-grams),</li>
        <li style="margin-left:15px;margin-bottom:0.3em">pruned Katz
          (~15M n-grams),</li>
        <li style="margin-left:15px;margin-bottom:0.3em">unpruned
          Interpolated Kneser-Ney (1.1B n-grams),</li>
        <li style="margin-left:15px;margin-bottom:0.3em">pruned
          Interpolated Kneser-Ney (~15M n-grams)</li>
      </ul>
      <div><font color="#000000" face="arial, sans-serif">ArXiv paper: <a
            href="http://arxiv.org/abs/1312.3005" target="_blank">http://arxiv.org/abs/1312.3005</a></font></div>
      <p
        style="line-height:1.25em;max-width:64em;font-size:13.194443702697754px">Happy
        benchmarking!"</p>
    </div>
    <div>
      -- <br>
      -Ciprian
    </div>
    <br>
    <br>
    <br>
  </body>
</html>