<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<META content="text/html; charset=iso-8859-2" http-equiv=Content-Type>

<META content="MSHTML 5.00.3103.1000" name=GENERATOR>

<STYLE></STYLE>

</HEAD>

<BODY bgColor=#ffffff>

<DIV><FONT face="Arial CE" size=2>Hi to all!</FONT></DIV>

<DIV><FONT face="Arial CE" size=2>I have a following problem with 

<EM>segment</EM> tool. In the output of segment appears &lt;unk&gt; token 

instead of words including language-specific&nbsp;characters - although in 

language model file they are saved correctly and input text file has the same 

coding (ISO-Latin 2) as the&nbsp;training text.&nbsp;</FONT></DIV>

<DIV><FONT face="Arial CE" size=2>&nbsp;Does anybody know what's&nbsp;the 

problem?</FONT></DIV>

<DIV><FONT face="Arial CE" size=2></FONT>&nbsp;</DIV>

<DIV><FONT face="Arial CE" size=2>Language model was buil using:</FONT></DIV>

<DIV><FONT face="Arial CE" size=2>ngram-count -write-vocab vocabulary -text 

train2.txt -write probs -lm lmfile2</FONT></DIV>

<DIV><FONT face="Arial CE" size=2></FONT>&nbsp;</DIV>

<DIV><FONT face="Arial CE" size=2>Segment tool was used with 

option:</FONT></DIV>

<DIV><FONT face="Arial CE" size=2>segment -lm lmfile2 -text test3.txt -unk 

-posteriors -continuous</FONT></DIV>

<DIV>&nbsp;</DIV>

<DIV><FONT face="Arial CE" size=2>Disabling -unk option&nbsp; I got right words 

in the output but posteriors are probably not correct.</FONT></DIV>

<DIV>&nbsp;</DIV>

<DIV><FONT face="Arial CE" size=2>Jachym Kolar</FONT></DIV>

<DIV><FONT face="Arial CE" size=2>Department of Cybernetics</FONT></DIV>

<DIV><FONT face="Arial CE" size=2>University of West-Bohemia</FONT></DIV>

<DIV><FONT face="Arial CE" size=2>Pilsen, Czech Republic</FONT></DIV>

<DIV>&nbsp;</DIV></BODY></HTML>