[SRILM]: -debug 2 info

ilya oparin ioparin at yahoo.co.uk
Wed May 31 03:41:51 PDT 2006


Hi!

When I calculate perplexity of my POS-based class model (word can belong to many classes, class-definition file I create myself on the base of a POS-tagged data), with "-debug 2" I get the output I can not fully understand. For testing puropses I measure ppl on the same data I trained the class model (i.e. there should not be ay OOVs). However, in the debug output, for every N-gram there is a string of the format
P(w| w...) = [OOV][n-gram][n-gram]...[OOV][n-gram][n-gram]...
As far as I get it, [n-gram]s refer to different combinations of assigning words to classes. But why fo those [OOV] may appear (and they appear in equal intervals between strings of [n-gram]s for each word)?

I have only one guess: since [OOVs] are only missing for the last (</s>| ...) n-gram, those [OOV] may correspond to a check if a word is present in the implicit stop-word vocabulary or something... 

It would be great if anybody could comment on that.


best regards,
Ilya
		
---------------------------------
 All New Yahoo! Mail – Tired of Vi at gr@! come-ons? Let our SpamGuard protect you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20060531/fdd2a32e/attachment.html>


More information about the SRILM-User mailing list