<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 7/20/2012 5:04 AM, Nouf Al-Harbi

      wrote:<br>

    </div>

    <blockquote

      cite="mid:1342785895.7010.YahooMailNeo@web171304.mail.ir2.yahoo.com"

      type="cite">

      <div style="color:#000; background-color:#fff; font-family:arial,

        helvetica, sans-serif;font-size:12pt">

        <div>Hello,</div>

        <div><br>

        </div>

        <div>I am new to language modeling and was hoping that someone

          can help me with the following.<br>

          <br>

          I try to predict a word given an input sentence. For example,

          I would like to get a word replacing the ... that has the <br>

          highest probability in sentences such as ' A man is ...' (e.g.

          sitting).<br>

          <br>

          I try to use disambig tool but I couldn't found any example

          illustrate how to use it especially how how I can create the

          map file and what is the type of this file ( e.g. txt, arpa,

          ...).<br>

        </div>

      </div>

    </blockquote>

    <br>

    Indeed you can use disambig, at least in theory to solve this

    problem.<br>

    <br>

    1. prepare a map file of the form:<br>

    <br>

        a       a<br>

        man    man<br>

        ...   [for all words occurring in your data]<br>

        UNKNOWN_WORD  word1 word2  ....  [list all words in the

    vocabulary here]<br>

    <br>

    2. train an LM of word sequences.<br>

    <br>

    3. prepare disambig input of the form<br>

                <br>

                    a man is sitting UNKNOWN_WORD <br>

    <br>

       You can also add known words to the right of UKNOWN_WORD if you

    have that information (see the note about -fw-only below).<br>

    <br>

    4. run disambig<br>

            <br>

                disambig -map MAPFILE -lm LMFILE -text INPUTFILE<br>

    <br>

    If you want to use only the left context of the UNKNOWN_WORD use the

    -fw-only option.<br>

    <br>

    This is in theory.  If your vocabulary is large it may be very slow

    and take too much memory.  I haven't tried it, so let me know if it

    works for you.<br>

    <br>

    Andreas<br>

    <br>

  </body>

</html>