<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 4/11/2012 5:48 AM, Saman Noorzadeh wrote:
<blockquote
cite="mid:1334148534.73034.YahooMailNeo@web162006.mail.bf1.yahoo.com"
type="cite">
<div style="color: rgb(0, 0, 0); background-color: rgb(255, 255,
255); font-family: verdana,helvetica,sans-serif; font-size:
12pt;">
<div><span>Thank you, </span></div>
<div><span>-cdiscount 0 works perfectly, but now that </span><span>I
have read about smoothing and different methods of
discounting I have </span><span>another question:</span><span><br>
</span></div>
<div><span><br>
</span></div>
<div><span>I want to know your ideas about this problem:</span></div>
<div><span>I want to have a model out of a text. and then
predict what the user is typing (a word prediction
approach). at any moment I will predict what the next
character would be according to my bigrams.</span></div>
<div><span>Do you think methods of discounting and smoothing are
useful in treating the training data?</span></div>
<div><span>or it is more appropriate if I just disable it?</span></div>
</div>
</blockquote>
<br>
It probably won't make a difference because in an application like
this you are interested in finding the most probable next tokens,
and smoothing helps you with the least probable tokens. However,
this type of LM application has been studied extensively, and you
should look online what others have done. Try<br>
<br>
<a class="moz-txt-link-freetext" href="http://scholar.google.com/scholar?q=character+prediction+typing&hl=en&btnG=Search&as_sdt=1%2C5&as_sdtp=on">http://scholar.google.com/scholar?q=character+prediction+typing&hl=en&btnG=Search&as_sdt=1%2C5&as_sdtp=on</a><br>
<br>
Andreas<br>
<br>
</body>
</html>