<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    Ciprian Chelba asked me to forward the following information about a

    recently launched initiative in large-scale LM benchmarking.  More

    information at  <a

href="https://code.google.com/p/1-billion-word-language-modeling-benchmark/"

      target="_blank">https://code.google.com/p/1-billion-word-language-modeling-benchmark/</a>

    .<br>

    <br>

    Andreas<br>

    <br>

_________________________________________________________________________________________________________<br>

    <div>Here is a brief description of the project.</div>

    <div>

      <p

        style="line-height:1.25em;max-width:64em;font-size:13.194443702697754px">"The

        purpose of the project is to make available a standard training

        and test setup for language modeling experiments.</p>

      <p

        style="line-height:1.25em;max-width:64em;font-size:13.194443702697754px">The

        training/held-out data was produced from a download at <a

          href="http://statmt.org/" target="_blank">statmt.org</a> using

        a combination of Bash shell and Perl scripts distributed here.</p>

      <p

        style="line-height:1.25em;max-width:64em;font-size:13.194443702697754px">This

        also means that your results on this data set are reproducible

        by the research community at large.</p>

      <p

        style="line-height:1.25em;max-width:64em;font-size:13.194443702697754px">

        Besides the scripts needed to rebuild the training/held-out

        data, it also makes available log-probability values for each

        word in each of ten held-out data sets, for each of the

        following baseline models:</p>

      <ul

        style="padding-left:25px;max-width:62em;font-size:13.194443702697754px">

        <li style="margin-left:15px;margin-bottom:0.3em">unpruned Katz

          (1.1B n-grams),</li>

        <li style="margin-left:15px;margin-bottom:0.3em">pruned Katz

          (~15M n-grams),</li>

        <li style="margin-left:15px;margin-bottom:0.3em">unpruned

          Interpolated Kneser-Ney (1.1B n-grams),</li>

        <li style="margin-left:15px;margin-bottom:0.3em">pruned

          Interpolated Kneser-Ney (~15M n-grams)</li>

      </ul>

      <div><font color="#000000" face="arial, sans-serif">ArXiv paper: <a

            href="http://arxiv.org/abs/1312.3005" target="_blank">http://arxiv.org/abs/1312.3005</a></font></div>

      <p

        style="line-height:1.25em;max-width:64em;font-size:13.194443702697754px">Happy

        benchmarking!"</p>

    </div>

    <div>

      -- <br>

      -Ciprian

    </div>

    <br>

    <br>

    <br>

  </body>

</html>