<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=UTF-8" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

ali sadiqui wrote:

<blockquote cite="mid:642280.27633.qm@web28604.mail.ukl.yahoo.com"

 type="cite">

  <pre wrap="">thank you for your answer,

indeed, I knew that ngram-count was the good order to create a model of language but my ambiguity comes from that:

During the segmentation of an Arab word to follow the model Prèfix-Stem-suffix

A word “B” can give several results.

Supposing that the word B gives place to 3 results of segmentation.

b1 = mot1 + sufi1 (mot1 can be noted stem1)

b2 = pref1 + mot2

b3 = mot3

Starting from corpus “A B C D E” I create a file (by programming):

A mot1 suf1 C D E

A pref1 mot2 C D E

A mot3 C D E

(to create all the possible ways)

Then using SRLIM I will create a model of language of order 3 (for example) to use it to afterwards support a decomposition on other.

My question is:

- I supposed that I would need to create lattices, is what that is true or false?

- If they is true how to proceed to use lattice-tool

I am very grateful for your help.

Ali Sadiqui

--- En date de : Jeu 22.4.10, Andreas Stolcke <a class="moz-txt-link-rfc2396E" href="mailto:stolcke@speech.sri.com"><stolcke@speech.sri.com></a> a écrit 

  </pre>

</blockquote>

Ali,<br>

<br>

sorry for not responding earlier.   Your desire to use lattices now

makes sense.<br>

You need to encode your morphologically analyzed training data as

lattices in either the HTK or the PFSG format.<br>

PFSG is more limited but should be enough in your case.  See the

pfsg-format(5) man page for a description There are also some examples

in <br>

$SRILM/lattice/test/tests/lattice-expansion/ .<br>

<br>

After each sentence is encoded as a lattice, you would use <br>

    lattice-tool -in-lattice-list ... -write-ngrams NGRAMS<br>

to generate ngram counts from the corpus.  Then you can train the LM

using<br>

    ngram-count -float-counts -read NGRAMS -lm ...<br>

Note that the counts will be fractional, so you can only use certain

smoothing methods, like --wbdiscount.<br>

<br>

If you have trouble with the lattice generation you can also generate

the ngram counts yourself.<br>

<br>

Note there are more sophisticated ways to model Arabic morphology,

using factored LMs (FLMs).  Google the work of Katrin Kirchhoff, she

developed FLMs partly for this purpose, and this is now incorporated in

SRILM (if you have question about this approach contact her directly).<br>

<br>

Andreas<br>

<br>

<br>

<blockquote cite="mid:642280.27633.qm@web28604.mail.ukl.yahoo.com"

 type="cite">

  <blockquote type="cite">

    <pre wrap="">De: Andreas Stolcke <a class="moz-txt-link-rfc2396E" href="mailto:stolcke@speech.sri.com"><stolcke@speech.sri.com></a>

Objet: Re: [SRILM User List] lattice-tool

À: "ali sadiqui" <a class="moz-txt-link-rfc2396E" href="mailto:sadiqui2000@yahoo.fr"><sadiqui2000@yahoo.fr></a>

Cc: <a class="moz-txt-link-abbreviated" href="mailto:srilm-user@speech.sri.com">srilm-user@speech.sri.com</a>

Date: Jeudi 22 avril 2010, 6h42

ali sadiqui wrote:

    </pre>

    <blockquote type="cite">

      <pre wrap="">hi,

I am a beginner SRILM,

I would like to create a lattice from corpora

"A B{b1, b2, b3) C" and then create a language model

I know I have to use the tool lattice-tool, but how do

      </pre>

    </blockquote>

    <pre wrap="">I proceed, I was stuck there.  I guess I should create

a file-format pfsg but.

    </pre>

    <blockquote type="cite">

      <pre wrap="">If so:

      </pre>

    </blockquote>

    <pre wrap="">   How to define the nodes?

    </pre>

    <blockquote type="cite">

      <pre wrap="">         

      </pre>

    </blockquote>

    <pre wrap="">   Calculating the cost?

    </pre>

    <blockquote type="cite">

      <pre wrap="">Is this is a manually or using a command?

In short, how to fill it?

I am very grateful for your help.

thank you for your help

      </pre>

    </blockquote>

    <pre wrap="">I think you are confused about how to build language

models.  You typically create LMs directly from ngram

counts extracted from a corpus, with no need to build

lattices.

Please consult the file $SRILM/doc/lm-intro for the most

basic procedures, and the FAQ file and recommended text

books for more details.

Andreas

    </pre>

    <blockquote type="cite">

      <pre wrap="">

_______________________________________________

SRILM-User site list

<a class="moz-txt-link-abbreviated" href="mailto:SRILM-User@speech.sri.com">SRILM-User@speech.sri.com</a>

<a class="moz-txt-link-freetext" href="http://www.speech.sri.com/mailman/listinfo/srilm-user">http://www.speech.sri.com/mailman/listinfo/srilm-user</a>

      </pre>

    </blockquote>

    <pre wrap="">

    </pre>

  </blockquote>

  <pre wrap=""><!---->

      </pre>

  <br>

  <hr size="4" width="90%"><br>

  <center><img src="cid:part1.06040407.07080109@speech.sri.com"></center>

</blockquote>

<br>

</body>

</html>