<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 3/28/2014 2:05 AM, Laatar Rim wrote:<br>
    </div>
    <blockquote
      cite="mid:1395997546.8767.YahooMailNeo@web173205.mail.ir2.yahoo.com"
      type="cite">
      <div style="color:#000; background-color:#fff; font-family:times
        new roman, new york, times, serif;font-size:12pt">thanks <br>
        so my file  replace-word-with-class sould not contain the words
        from data test ?<br>
      </div>
    </blockquote>
    <br>
    Knowing which words should be in class should be considered part of
    the training process, or comes from prior knowledge.<br>
    If you application gives you the class membership of the words in
    the test data then you can add it, otherwise it would be "training
    on test data".<br>
    <br>
    Andreas<br>
    <br>
    <blockquote
      cite="mid:1395997546.8767.YahooMailNeo@web173205.mail.ir2.yahoo.com"
      type="cite">
      <div style="color:#000; background-color:#fff; font-family:times
        new roman, new york, times, serif;font-size:12pt">
        <div> </div>
        <div style="font-family:'times new roman', 'new york', times,
          serif;"><font class="Apple-style-span" size="4">----</font></div>
        <div style="font-family:'times new roman', 'new york', times,
          serif;"><font size="3">Cordialement<br>
            <br>
          </font>
          <div style="font-family:arial, helvetica, sans-serif;"><font
              size="3"><b>Rim LAATAR </b></font></div>
        </div>
        <div><font style="font-family:'times new roman', 'new york',
            times, serif;" size="3"><font style="font-weight:normal;"
              size="3"><span>Ingénieur  Informatique</span></font></font><font
            style="font-family:'times new roman', 'new york', times,
            serif;" size="3">, de l'École Nationale d’Ingénieurs de Sfax</font><span
            style="font-family:'times new roman', 'new york', times,
            serif;"> (</span><font style="font-family:'times new roman',
            'new york', times, serif;" size="3"><a
              moz-do-not-send="true" rel="nofollow"
              style="font-family:'times new roman', 'new york', times,
              serif;" target="_blank" href="http://www.enis.rnu.tn/">ENIS</a></font><span
            style="font-family:'times new roman', 'new york', times,
            serif;">)</span><br style="font-family:'times new roman',
            'new york', times, serif;">
          <font style="font-family:'times new roman', 'new york', times,
            serif;" size="3"><span style="font-family:'times new roman',
              'new york', times, serif;">Étudiante </span><span
              style="font-family:'times new roman', 'new york', times,
              serif;">en mastère de recherche</span></font><font
            style="font-family:'times new roman', 'new york', times,
            serif;" size="3"><span>, Système d'Information &
              Nouvelles Technologies à la</span></font><span
            style="font-family:'times new roman', 'new york', times,
            serif;"> </span><font style="font-family:'times new roman',
            'new york', times, serif;" size="3"><span
              style="font-family:'times new roman', 'new york', times,
              serif;"><a moz-do-not-send="true" rel="nofollow"
                style="font-family:'times new roman', 'new york', times,
                serif;" target="_blank" href="http://www.fsegs.rnu.tn/">FSEGS</a> --Option
              TALN</span></font><br style="font-family:'times new
            roman', 'new york', times, serif;">
          <font style="font-family:'times new roman', 'new york', times,
            serif;" size="3">Site web:<a moz-do-not-send="true"
              rel="nofollow" style="font-family:'times new roman', 'new
              york', times, serif;" target="_blank"
              href="https://sites.google.com/site/rimlaatarbnsaid/"> Rim
              LAATAR BEN SAID</a></font></div>
        <div style="color:rgb(0, 0, 0);font-size:16px;font-family:'times
          new roman', 'new york', times,
          serif;background-color:transparent;font-style:normal;"><font
            size="3"><span></span></font><font size="3"><span></span></font><font
            face="times new roman, new york, times, serif" size="3"><span
              style="line-height:19px;">Tel: (+216) 99 64 74 98 <br>
            </span></font><font style="font-family:'times new roman',
            'new york', times, serif;" class="Apple-style-span" size="4">----</font><br>
        </div>
        <div style="display: block;" class="yahoo_quoted"> <br>
          <br>
          <div style="font-family: times new roman, new york, times,
            serif; font-size: 12pt;">
            <div style="font-family: HelveticaNeue, Helvetica Neue,
              Helvetica, Arial, Lucida Grande, sans-serif; font-size:
              12pt;">
              <div dir="ltr"> <font face="Arial" size="2"> Le Jeudi 27
                  mars 2014 18h44, Andreas Stolcke
                  <a class="moz-txt-link-rfc2396E" href="mailto:stolcke@icsi.berkeley.edu"><stolcke@icsi.berkeley.edu></a> a écrit :<br>
                </font> </div>
              <div class="y_msg_container">
                <div id="yiv8874576582">
                  <div>
                    <div class="yiv8874576582moz-cite-prefix">On
                      3/27/2014 7:38 AM, Laatar Rim wrote:<br
                        clear="none">
                    </div>
                    <blockquote type="cite">
                      <div
                        style="color:#000;background-color:#fff;font-family:times
                        new roman, new york, times,
                        serif;font-size:12pt;">
                        <div><span>Dear Andreas , <br clear="none">
                          </span></div>
                        <div style="color:rgb(0, 0,
                          0);font-size:16px;font-family:times new roman,
                          new york, times,
                          serif;background-color:transparent;font-style:normal;"><span>to
                            calculate perplexity i do this : <br
                              clear="none">
                          </span></div>
                        <div style="color:rgb(0, 0,
                          0);font-size:16px;font-family:times new roman,
                          new york, times,
                          serif;background-color:transparent;font-style:normal;"><span>lenovo@ubuntu:~/Documents/srilm$

                            ngram -lm class_based_model
                            '/home/lenovo/Documents/srilm/ML_N_Class/IN_SRILM'
                            -ppl
                            '/home/lenovo/Documents/srilm/ML_N_Class/titi.txt' 
                            <br clear="none">
                          </span></div>
                        <div style="color:rgb(0, 0,
                          0);font-size:16px;font-family:times new roman,
                          new york, times,
                          serif;background-color:transparent;font-style:normal;"><span>titi.txt
                            is my training data <br clear="none">
                          </span></div>
                        <div style="color:rgb(0, 0,
                          0);font-size:16px;font-family:times new roman,
                          new york, times,
                          serif;background-color:transparent;font-style:normal;"><span>1-
                            i should calculate perplexity elso in my
                            test data ? <br clear="none">
                          </span></div>
                      </div>
                    </blockquote>
                    Yes, in fact, perplexity is usually reported on test
                    data (data not used in training the model) since
                    otherwise you get a very biased estimate.<br
                      clear="none">
                    <br clear="none">
                    <blockquote type="cite">
                      <div
                        style="color:#000;background-color:#fff;font-family:times
                        new roman, new york, times,
                        serif;font-size:12pt;">
                        <div style="color:rgb(0, 0,
                          0);font-size:16px;font-family:times new roman,
                          new york, times,
                          serif;background-color:transparent;font-style:normal;"><span>2-
                            how can i interpretate this result : </span></div>
                        <div style="color:rgb(0, 0,
                          0);font-size:16px;font-family:times new roman,
                          new york, times,
                          serif;background-color:transparent;font-style:normal;"><span>file

                            /home/lenovo/Documents/srilm/ML_N_Class/titi.txt:
                            18657 sentences, 66817 words, 5285 OOVs<br
                              clear="none">
                            0 zeroprobs, logprob= -259950 ppl= 1744.69
                            ppl1= 16773.8</span></div>
                        <div> what is the difference between ppl and
                          ppl1 ??<br clear="none">
                        </div>
                      </div>
                    </blockquote>
                    OOVs is the count of  words that don't occur in the
                    vocabulary (technically, that are mapped to
                    <unk>) and have zero probability.<br
                      clear="none">
                    zeroprobs refers to any other words that have zero
                    probability.<br clear="none">
                    These counts are reported because they are not
                    included in the perplexity computation.<br
                      clear="none">
                    <br clear="none">
                    ppl is the standard perplexity where end-of-sentence
                    tokens (</s>) are counted in the denominator.
                    ppl1 is the same thing but </s> tokens are not
                    counted in the denominator.
                    <div class="yiv8874576582yqt7749481734"
                      id="yiv8874576582yqtfd17197"><br clear="none">
                      <br clear="none">
                      Andreas<br clear="none">
                      <br clear="none">
                    </div>
                  </div>
                </div>
                <br>
                <br>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>