<table cellspacing="0" cellpadding="0" border="0" ><tr><td valign="top" style="font: inherit;"><DIV>Hi,</DIV>
<DIV>I'm generating some models for guessing the next word in a sequence, using a text file to generate the language model. Currnently I'm using "Alice in Wonderland" as the training text. For example, the words "off with" are generally followed by "his" or "her" in the text:</DIV>
<DIV>&nbsp;</DIV>
<DIV><EM>&gt;grep -i "off with" ..\alice.phrases.txt<BR>Hes murdering the time Off with his head How dreadfully savage exclaimed Alice<BR>screamed Off with her head Off Nonsense said Alice<BR>Off with their heads and the procession moved on<BR>and shouting Off with his head or Off with her head about once in a minute<BR>Off with his head she said<BR>and shouting Off with his head or Off with her head Those whom she sentenced were taken in<BR>to custody by the soldiers<BR>Behead that Dormouse Turn that Dormouse out of court Suppress him Pinch him Off with his whiskers For some minutes the whole court was in confusion<BR>Off with her head the Queen shouted at the top of her voice</EM></DIV>
<DIV><EM></EM>&nbsp;</DIV>
<DIV>I trained a model using this text and generate sentences&nbsp;by iterating though the lm-&gt;vocab calling&nbsp;wordProb() twice, once with no context and then with the prefix words "off with" </DIV>
<DIV>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; p1 = wordProb(word, NULL);</DIV>
<DIV>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; p2 = wordProb(word, prefix);</DIV>
<DIV>I notice that the probability changes when i do this, but the difference in probabilities is the same regardless of the prefix (-sw = starting words):</DIV>
<DIV>&nbsp;</DIV>
<ADDRESS><SPAN lang=EN>guess&nbsp;-gen10 <SPAN style="COLOR: blue">-sw"off"</SPAN> alice.phrases.txt.lm </SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>COUNT: 2815</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>quarrelling,-2.0320</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>staring,-1.9070</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>outside,-1.8101</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>writing,-1.8101</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>together,-1.8101</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>sneezing,-1.6640</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>shouted,-1.5091</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>with,-1.2514</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>from,-1.2419</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>being,-1.2081</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN></SPAN>&nbsp;</ADDRESS>
<ADDRESS><SPAN lang=EN>guess&nbsp;-gen10 -<SPAN style="COLOR: blue">sw"off with "</SPAN> alice.phrases.txt.lm </SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>COUNT: 2815</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>quarrelling,-2.0320</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>staring,-1.9070</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>outside,-1.8101</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>writing,-1.8101</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>together,-1.8101</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>sneezing,-1.6640</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>shouted,-1.5091</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>with,-1.2514</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>from,-1.2419</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>being,-1.2081</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN></SPAN>&nbsp;</ADDRESS>
<ADDRESS><SPAN lang=EN>guess -gen10 <SPAN style="COLOR: blue">-sw"off with her "</SPAN> alice.phrases.txt.lm </SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>COUNT: 2815</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>quarrelling,-2.0320</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>staring,-1.9070</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>outside,-1.8101</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>writing,-1.8101</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>together,-1.8101</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>sneezing,-1.6640</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>shouted,-1.5091</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>with,-1.2514</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>from,-1.2419</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>being,-1.2081</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN></SPAN>&nbsp;</ADDRESS>
<ADDRESS><SPAN lang=EN>Why doesn't this work? Is the language model to small? Is this the correct way to compute the conditioning effect of prefix words on a word probability? Is there a better way to do this?</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN></SPAN>&nbsp;</ADDRESS>
<ADDRESS><SPAN lang=EN>Thanks,</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>John Day</SPAN></ADDRESS>
<ADDRESS><SPAN lang=EN>Palm Bay, Florida</SPAN></ADDRESS></td></tr></table><br>