Search SRILM-USER Archives

Match: Format: Sort by:
Search:

[SRILM]: -debug 2 info

From: ilya oparin <ioparin at ADDRESS HIDDEN>
Date: Wed, 31 May 2006 11:41:51 +0100 (BST)

--0-164690114-1149072111=:63649
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit

Hi!

When I calculate perplexity of my POS-based class model (word can belong to many classes, class-definition file I create myself on the base of a POS-tagged data), with "-debug 2" I get the output I can not fully understand. For testing puropses I measure ppl on the same data I trained the class model (i.e. there should not be ay OOVs). However, in the debug output, for every N-gram there is a string of the format
P(w| w...) = [OOV][n-gram][n-gram]...[OOV][n-gram][n-gram]...
As far as I get it, [n-gram]s refer to different combinations of assigning words to classes. But why fo those [OOV] may appear (and they appear in equal intervals between strings of [n-gram]s for each word)?

I have only one guess: since [OOVs] are only missing for the last (</s>| ...) n-gram, those [OOV] may correspond to a check if a word is present in the implicit stop-word vocabulary or something...

It would be great if anybody could comment on that.

best regards,
Ilya

---------------------------------
All New Yahoo! Mail – Tired of Vi@gr@! come-ons? Let our SpamGuard protect you.
--0-164690114-1149072111=:63649
Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: 8bit

Hi!<br><br>When I calculate perplexity of my POS-based class model (word can belong to many classes, class-definition file I create myself on the base of a POS-tagged data), with "-debug 2" I get the output I can not fully understand. For testing puropses I measure ppl on the same data I trained the class model (i.e. there should not be ay OOVs). However, in the debug output, for every N-gram there is a string of the format<br>P(w| w...) = [OOV][n-gram][n-gram]...[OOV][n-gram][n-gram]...<br>As far as I get it, [n-gram]s refer to different combinations of assigning words to classes. But why fo those [OOV] may appear (and they appear in equal intervals between strings of [n-gram]s for each word)?<br><br>I have only one guess: since [OOVs] are only missing for the last (</s>| ...) n-gram, those [OOV] may correspond to a check if a word is present in the implicit stop-word vocabulary or something... <br><br>It would be great if anybody could comment on
that.<br><BR><BR>best regards,<br>Ilya<p>
<hr size=1>
<a href="http://us.rd.yahoo.com/mail/uk/taglines/default/nowyoucan/spamguard/*http://us.rd.yahoo.com/evt=40565/*http://uk.docs.yahoo.com/nowyoucan.html">All New Yahoo! Mail</a> – Tired of Vi@gr@! come-ons? Let our SpamGuard protect you.
--0-164690114-1149072111=:63649--

Click here to go to the SRILM home page.