<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 3/28/2014 2:05 AM, Laatar Rim wrote:<br>
</div>
<blockquote
cite="mid:1395997546.8767.YahooMailNeo@web173205.mail.ir2.yahoo.com"
type="cite">
<div style="color:#000; background-color:#fff; font-family:times
new roman, new york, times, serif;font-size:12pt">thanks <br>
so my file replace-word-with-class sould not contain the words
from data test ?<br>
</div>
</blockquote>
<br>
Knowing which words should be in class should be considered part of
the training process, or comes from prior knowledge.<br>
If you application gives you the class membership of the words in
the test data then you can add it, otherwise it would be "training
on test data".<br>
<br>
Andreas<br>
<br>
<blockquote
cite="mid:1395997546.8767.YahooMailNeo@web173205.mail.ir2.yahoo.com"
type="cite">
<div style="color:#000; background-color:#fff; font-family:times
new roman, new york, times, serif;font-size:12pt">
<div> </div>
<div style="font-family:'times new roman', 'new york', times,
serif;"><font class="Apple-style-span" size="4">----</font></div>
<div style="font-family:'times new roman', 'new york', times,
serif;"><font size="3">Cordialement<br>
<br>
</font>
<div style="font-family:arial, helvetica, sans-serif;"><font
size="3"><b>Rim LAATAR </b></font></div>
</div>
<div><font style="font-family:'times new roman', 'new york',
times, serif;" size="3"><font style="font-weight:normal;"
size="3"><span>Ingénieur Informatique</span></font></font><font
style="font-family:'times new roman', 'new york', times,
serif;" size="3">, de l'École Nationale d’Ingénieurs de Sfax</font><span
style="font-family:'times new roman', 'new york', times,
serif;"> (</span><font style="font-family:'times new roman',
'new york', times, serif;" size="3"><a
moz-do-not-send="true" rel="nofollow"
style="font-family:'times new roman', 'new york', times,
serif;" target="_blank" href="http://www.enis.rnu.tn/">ENIS</a></font><span
style="font-family:'times new roman', 'new york', times,
serif;">)</span><br style="font-family:'times new roman',
'new york', times, serif;">
<font style="font-family:'times new roman', 'new york', times,
serif;" size="3"><span style="font-family:'times new roman',
'new york', times, serif;">Étudiante </span><span
style="font-family:'times new roman', 'new york', times,
serif;">en mastère de recherche</span></font><font
style="font-family:'times new roman', 'new york', times,
serif;" size="3"><span>, Système d'Information &
Nouvelles Technologies à la</span></font><span
style="font-family:'times new roman', 'new york', times,
serif;"> </span><font style="font-family:'times new roman',
'new york', times, serif;" size="3"><span
style="font-family:'times new roman', 'new york', times,
serif;"><a moz-do-not-send="true" rel="nofollow"
style="font-family:'times new roman', 'new york', times,
serif;" target="_blank" href="http://www.fsegs.rnu.tn/">FSEGS</a> --Option
TALN</span></font><br style="font-family:'times new
roman', 'new york', times, serif;">
<font style="font-family:'times new roman', 'new york', times,
serif;" size="3">Site web:<a moz-do-not-send="true"
rel="nofollow" style="font-family:'times new roman', 'new
york', times, serif;" target="_blank"
href="https://sites.google.com/site/rimlaatarbnsaid/"> Rim
LAATAR BEN SAID</a></font></div>
<div style="color:rgb(0, 0, 0);font-size:16px;font-family:'times
new roman', 'new york', times,
serif;background-color:transparent;font-style:normal;"><font
size="3"><span></span></font><font size="3"><span></span></font><font
face="times new roman, new york, times, serif" size="3"><span
style="line-height:19px;">Tel: (+216) 99 64 74 98 <br>
</span></font><font style="font-family:'times new roman',
'new york', times, serif;" class="Apple-style-span" size="4">----</font><br>
</div>
<div style="display: block;" class="yahoo_quoted"> <br>
<br>
<div style="font-family: times new roman, new york, times,
serif; font-size: 12pt;">
<div style="font-family: HelveticaNeue, Helvetica Neue,
Helvetica, Arial, Lucida Grande, sans-serif; font-size:
12pt;">
<div dir="ltr"> <font face="Arial" size="2"> Le Jeudi 27
mars 2014 18h44, Andreas Stolcke
<a class="moz-txt-link-rfc2396E" href="mailto:stolcke@icsi.berkeley.edu"><stolcke@icsi.berkeley.edu></a> a écrit :<br>
</font> </div>
<div class="y_msg_container">
<div id="yiv8874576582">
<div>
<div class="yiv8874576582moz-cite-prefix">On
3/27/2014 7:38 AM, Laatar Rim wrote:<br
clear="none">
</div>
<blockquote type="cite">
<div
style="color:#000;background-color:#fff;font-family:times
new roman, new york, times,
serif;font-size:12pt;">
<div><span>Dear Andreas , <br clear="none">
</span></div>
<div style="color:rgb(0, 0,
0);font-size:16px;font-family:times new roman,
new york, times,
serif;background-color:transparent;font-style:normal;"><span>to
calculate perplexity i do this : <br
clear="none">
</span></div>
<div style="color:rgb(0, 0,
0);font-size:16px;font-family:times new roman,
new york, times,
serif;background-color:transparent;font-style:normal;"><span>lenovo@ubuntu:~/Documents/srilm$
ngram -lm class_based_model
'/home/lenovo/Documents/srilm/ML_N_Class/IN_SRILM'
-ppl
'/home/lenovo/Documents/srilm/ML_N_Class/titi.txt'
<br clear="none">
</span></div>
<div style="color:rgb(0, 0,
0);font-size:16px;font-family:times new roman,
new york, times,
serif;background-color:transparent;font-style:normal;"><span>titi.txt
is my training data <br clear="none">
</span></div>
<div style="color:rgb(0, 0,
0);font-size:16px;font-family:times new roman,
new york, times,
serif;background-color:transparent;font-style:normal;"><span>1-
i should calculate perplexity elso in my
test data ? <br clear="none">
</span></div>
</div>
</blockquote>
Yes, in fact, perplexity is usually reported on test
data (data not used in training the model) since
otherwise you get a very biased estimate.<br
clear="none">
<br clear="none">
<blockquote type="cite">
<div
style="color:#000;background-color:#fff;font-family:times
new roman, new york, times,
serif;font-size:12pt;">
<div style="color:rgb(0, 0,
0);font-size:16px;font-family:times new roman,
new york, times,
serif;background-color:transparent;font-style:normal;"><span>2-
how can i interpretate this result : </span></div>
<div style="color:rgb(0, 0,
0);font-size:16px;font-family:times new roman,
new york, times,
serif;background-color:transparent;font-style:normal;"><span>file
/home/lenovo/Documents/srilm/ML_N_Class/titi.txt:
18657 sentences, 66817 words, 5285 OOVs<br
clear="none">
0 zeroprobs, logprob= -259950 ppl= 1744.69
ppl1= 16773.8</span></div>
<div> what is the difference between ppl and
ppl1 ??<br clear="none">
</div>
</div>
</blockquote>
OOVs is the count of words that don't occur in the
vocabulary (technically, that are mapped to
<unk>) and have zero probability.<br
clear="none">
zeroprobs refers to any other words that have zero
probability.<br clear="none">
These counts are reported because they are not
included in the perplexity computation.<br
clear="none">
<br clear="none">
ppl is the standard perplexity where end-of-sentence
tokens (</s>) are counted in the denominator.
ppl1 is the same thing but </s> tokens are not
counted in the denominator.
<div class="yiv8874576582yqt7749481734"
id="yiv8874576582yqtfd17197"><br clear="none">
<br clear="none">
Andreas<br clear="none">
<br clear="none">
</div>
</div>
</div>
<br>
<br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</body>
</html>