From Antoine.Ghaoui at jinny.ie  Mon Apr  9 03:14:02 2007
From: Antoine.Ghaoui at jinny.ie (Antoine Ghaoui)
Date: Mon, 9 Apr 2007 13:14:02 +0300
Subject: FLM
Message-ID: <434213AD-8977-4AC6-9B07-C6B923CEC4DA@jinny.ie>

Hello,

I'm using FLM to test some models.

I'm using the same data and the same vocabulary in both tools, ngram- 
count and fngram-count.
I'm not able to generate the same trigram model.
The number of bigram and trigram in the LM files generated are  
different.

using ngram-count, I'm getting:
\data\
ngram 1=315
ngram 2=23800
ngram 3=120408

using fngram-count, I'm getting:
\data\
ngram 0x0=315
ngram 0x1=23523
ngram 0x2=0
ngram 0x3=86366

knowing that ngram-count is used with the default parameters and the  
factor file for the fngram-count is:

##rule trigram
1
U : 2 U(-1) U(-2) ntextfile.flm.cnt ntextfile.flm.lm 3
U1U2	U2	wbdiscount	gtmin 3	interpolate
U1	U1	wbdiscount	gtmin	1	interpolate
0	0

What are the parameters  to use in the factor file in order to get  
the same LM output?


Thanks

Antoine

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070409/0a9794b8/attachment.html>

From stolcke at speech.sri.com  Mon Apr  9 22:47:45 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Mon, 09 Apr 2007 22:47:45 -0700
Subject: FLM
In-Reply-To: <434213AD-8977-4AC6-9B07-C6B923CEC4DA@jinny.ie>
References: <434213AD-8977-4AC6-9B07-C6B923CEC4DA@jinny.ie>
Message-ID: <461B2501.8010300@speech.sri.com>

Antoine Ghaoui wrote:
> Hello,
>
> I'm using FLM to test some models.
>
> I'm using the same data and the same vocabulary in both tools, 
> ngram-count and fngram-count.
> I'm not able to generate the same trigram model.
> The number of bigram and trigram in the LM files generated are different.
>
> using ngram-count, I'm getting: 
> \data\
> ngram 1=315
> ngram 2=23800
> ngram 3=120408
>
> using fngram-count, I'm getting:
> \data\
> ngram 0x0=315
> ngram 0x1=23523
> ngram 0x2=0
> ngram 0x3=86366
>
> knowing that ngram-count is used with the default parameters and the 
> factor file for the fngram-count is:
>
> ##rule trigram
> 1
> U : 2 U(-1) U(-2) ntextfile.flm.cnt ntextfile.flm.lm 3
> U1U2 U2 wbdiscount gtmin 3 interpolate
> U1 U1 wbdiscount gtmin 1 interpolate
> 0 0
>
> What are the parameters  to use in the factor file in order to get the 
> same LM output?
For one thing, the default gtmin values in ngram-count are

unigrams   1
bigrams   1
trigrams 2

Andreas


From stolcke at speech.sri.com  Thu Apr 12 22:46:09 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Thu, 12 Apr 2007 22:46:09 -0700
Subject: SRILM.
In-Reply-To: <481682.37738.qm@web7606.mail.in.yahoo.com>
References: <481682.37738.qm@web7606.mail.in.yahoo.com>
Message-ID: <461F1921.3030409@speech.sri.com>

milu philip wrote:
> Sir,
>  
> I am a student of Amrita Vishwa Vidyappethom doing my Mtech in the 
> field of computational engineering and networking. I am doing a 
> project by name Languag eIdentification Using Statistical 
> Approaches.The techniques used are PRLM and PPRLM. The softwares being 
> used are phone recogniser and SRILM. SRILM is used for language 
> modelling and phone recogniser is used for creation of phones from a 
> speech sample.
>  
> After the phones are obtained , it is given as input to SRILM for 
> language modelling.Once the training language models are obtained, the 
> phone sof the  testing language is given. My doubt is that when the 
> command
>  
>         ngram -ppl TEST.text -lm TRAINING.lm
>  
> of SRILM is given, we get perplexity along with a logprob value. Is it 
> the phone sequence of the testing language which is mentioned as 
> "TEST.text" in the above mentioned command or can we create language 
> model for the test sequence and then give it as

-ppl expects a text file containing test data.  One sentence per line.
>  
>         ngram -ppl TEST.lm -lm TRAINING.lm
>  
> In both the cases the logprob values are different.
the second command would produce garbage, as the LM file would be 
interpreted as test data.

Andreas

>  
> Is there any other command in SRILM, which can be used to obtain the 
> probability that a trained language model produces a particular test 
> sequence?
> Please help me out by clarifying this doubt.
>  
>                                Thanking you,
>  
> Milu.         
>
> ------------------------------------------------------------------------
> Check out what you're missing if you're not on Yahoo! Messenger 
> <http://us.rd.yahoo.com/mail/in/ymessenger/*http://in.messenger.yahoo.com/> 


From stolcke at speech.sri.com  Thu Apr 12 22:44:32 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Thu, 12 Apr 2007 22:44:32 -0700
Subject: htk-words-on-nodes option in lattice-tool
In-Reply-To: <u0lO7NSD.1176411281.9364520.jpinto@idiap.ch>
References: <u0lO7NSD.1176411281.9364520.jpinto@idiap.ch>
Message-ID: <461F18C0.8040406@speech.sri.com>

jpinto at idiap.ch wrote:
> Hello,
>
> I have a phoneme lattice (obtained from NOWAY decoder) with phoneme
> tokens on the links (edges). I wish to convert this to HTK format with
> phoneme info on nodes and I do the following:
>
> lattice-tool -in-lattice input.lat -read-htk -write-htk -out-lattice
> output.lattice -htk-words-on-nodes
>
> I observe that the output lattice has more number of nodes & links
> (NODES=448 LINKS=766) compared to the input lattice (N=65   L=383)
>
> when I dont give the option -htk-words-on-nodes, nodes and links remain
> the same.
>
> I dont understand why the number of nodes and links should increase. Am I
> missing something ? Any help in this regard would be very helpful.
>   
That's because when you move attributes from links to nodes you might 
have to duplicate nodes to create
an equivalent lattice.  In fact, the way SRILM reads HTK lattices is by 
converting each link to a node,
thereby enabling the -htk-words-on-nodes mapping.  Unfortunately, the 
code is not smart enough to
avoid the duplication even when it is not really necessary given how the 
links are originally labeled.

Note: lattice-tool is not meant to be a general HTK lattice format 
manipulation tool. You would think HTK has
better tools for that.

Andreas


From sara_abd_elhamed at yahoo.com  Fri Apr 13 09:26:22 2007
From: sara_abd_elhamed at yahoo.com (Sara Abd-ElHamed)
Date: Fri, 13 Apr 2007 09:26:22 -0700 (PDT)
Subject: Question
Message-ID: <197099.73124.qm@web90408.mail.mud.yahoo.com>

I want to know how to run the Factored Lnaguage Model (FLM) that is inside the SRILM.
   
   
  Thanks in advance for help.

       
---------------------------------
Ahhh...imagining that irresistible "new car" smell?
 Check outnew cars at Yahoo! Autos.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070413/aba9f331/attachment.html>

From stolcke at speech.sri.com  Thu Apr 19 16:09:37 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Thu, 19 Apr 2007 13:09:37 -1000
Subject: Problems about srilm
In-Reply-To: <20070419081442.M92426@cyut.edu.tw>
References: <20070419081442.M92426@cyut.edu.tw>
Message-ID: <4627F6B1.70507@speech.sri.com>

??? wrote:
> Hello!
> I am a student from Taiwan.
> I have some questions when I encountered difficulties in using srilm. The 
> problem is as the attaching field. And when I made google n-gram models, I 
> also encountered the same problem. Would you please tell me what the mistake 
> did I make? Thank you!
>   
It is impossible to read the entire google 5gram corpus into memory,
which is what you are trying to do.
You have to use the count-based LM, and estimate deleted interpolation
weights from a small amount of
data, so that only a small portion of the ngrams need to be kept in memory.

I'm sorry there is no good documentation of this process at this point
(you can piece it together by reading
the manual pages for ngram-count and ngram, and look at the example in

$SRILM/test/tests/ngram-count-lm-limit-vocab/run-test

We will make complete instructions for google ngram usage available in
the future.

Andreas


> --
> Chaoyang University of Technology
> WebMail http://webmail.cyut.edu.tw
>
>
>
>   
>
> ------------------------------------------------------------------------
>


From stolcke at speech.sri.com  Mon Apr 23 22:40:56 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Mon, 23 Apr 2007 22:40:56 -0700
Subject: ngram -nbest-files
In-Reply-To: <564887.5274.qm@web92012.mail.cnb.yahoo.com>
References: <564887.5274.qm@web92012.mail.cnb.yahoo.com>
Message-ID: <462D9868.8050106@speech.sri.com>

?? ? wrote:
> Hi,  I have some problems in rescoring multiple n-best list.  The ngram -ppl option can yield the language model probability of each sentence, but can't deal with mulpitle n-best list at one run.  I then tried to use ngram -nbest-files option to rescore multiple n-best lists. But the language model score obtained was quite different from those from the above -ppl option. Aren't they both log probability (base 10) of a sentence?  Any help will be greatly appreciated.  Regards,  Wenxiao
>
> ------------------------------------------------------------------------
> ????????-3.5G???20M??? <http://cn.mail.yahoo.com> 
You need to make sure the nbest lists are in proper format, which is
different from the format
accepted by -ppl. The nbest formats are described in the nbest-format(5)
man page.

N-best rescoring should give the same log-10 probabilities are -ppl.
If not, please send a minimal example to reproduce the problem.

Andreas


From stolcke at speech.sri.com  Mon Apr 23 22:44:45 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Mon, 23 Apr 2007 22:44:45 -0700
Subject: Problems about srilm
In-Reply-To: <46287DDA.9040707@cyut.edu.tw>
References: <20070419081442.M92426@cyut.edu.tw> <4627F6B1.70507@speech.sri.com> <46287DDA.9040707@cyut.edu.tw>
Message-ID: <462D994D.4040603@speech.sri.com>

?? wrote:
> Thank you very much for your answering.
> I have another question that if I only want to train *google 3-gram
> language model*, what instructions should I use?
> I have referred to the pages and tried the instructions, but it still
> did not work out.
> Is the reason the same as memory not big enough?
>   
Even just the google 3-grams will be way to big to all fit into memory.

> Could you give me an *example* about bulilding google 3-gram LM file
> ,please?
>   
Again, this will require using the -count-lm option with some tricks
that are not documents
as yet. Please be patient (or read all the manual pages carefully to
figure it our yourself.)
>
> I figured out maybe there are two methods to resolve the problem:
> 1.Build the google 3-gram LM file by batches of reading google corpus
> and then build the complete google 3-gram LM file.
> But I need to know that is there any instruction to build the google
> 3-gram LM file by *batches of reading google corpus*?
>   
This won't work because the smoothing methods for backoff LMs require
access to the entire
ngram set to compute its discounting estimates.
> 2.I trained small language models individually from google files and
> then combined pieces of google 3-gram LM files.
> But I need to know that is there any instruction to *combine pieces of
> google 3-gram LM files*?
>   
Sorry, that won't work, for the same reason as above.

Andreas


From alumae at gmail.com  Tue Apr 24 03:01:12 2007
From: alumae at gmail.com (Tanel)
Date: Tue, 24 Apr 2007 13:01:12 +0300
Subject: lattice-tool and noise probability
Message-ID: <8abdac980704240301yf711706w44052bc16f961643@mail.gmail.com>

Hello,

When rescoring lattices using lattice-tool, is there a possibility (or
a workaround) to assign a LM probability to noise words? The noise
words should still be skipped when calculating n-gram probabilities
for other words.

I understand that currently, noise words get a LM probability of
log-zero, which may make them too probable to be inserted in place of
other candidates.

Regards,
Tanel


From stolcke at speech.sri.com  Tue Apr 24 07:37:00 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Tue, 24 Apr 2007 07:37:00 -0700
Subject: lattice-tool and noise probability
In-Reply-To: <8abdac980704240301yf711706w44052bc16f961643@mail.gmail.com>
References: <8abdac980704240301yf711706w44052bc16f961643@mail.gmail.com>
Message-ID: <462E160C.7080804@speech.sri.com>

Tanel wrote:
> Hello,
>
> When rescoring lattices using lattice-tool, is there a possibility (or
> a workaround) to assign a LM probability to noise words? The noise
> words should still be skipped when calculating n-gram probabilities
> for other words.
>
> I understand that currently, noise words get a LM probability of
> log-zero, which may make them too probable to be inserted in place of
> other candidates.
No, there is no such provision in lattice-tool. It should be really easy 
to write perl script that
reads a lattice file and insert a constant LM score of your choosing for 
noise words.

Andreas

>
> Regards,
> Tanel


From stolcke at speech.sri.com  Tue Apr 24 10:29:37 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Tue, 24 Apr 2007 10:29:37 PDT
Subject: Class-based LM using the SRILM toolkit? 
In-Reply-To: Your message of Wed, 18 Apr 2007 23:31:30 +0530.
             <d4929ad00704181101t30f6d973s986f692b0010e2ca@mail.gmail.com> 
Message-ID: <200704241729.l3OHTbJ29944@huge>


In message <d4929ad00704181101t30f6d973s986f692b0010e2ca at mail.gmail.com>you wro
te:
> Dear Dr. Stolcke,
> 
> Thank you for your attention.
> 
> Is there no way to construct a class-based LM by pre-defining the
> classes to be used (vis-a-vis inducing them)? The class-format man
> page does mention how classes may be defined by hand, but this format
> requires the specification of the class expansion probabilities as
> well. Can these probabilities be calculated by a program in the
> toolkit? Correct me if I'm wrong, but these probabilities are given by
> (for a certain word wi, and class ci) : Number of times wi occurs in
> class ci/Number of times words in class ci occur.

You 

(1) define your classes by hand, using dummy probabilities.
(2) use the replace-words-with-classes with options
		outfile=FILE normalize=1
    on some training data. This is documented in the training-scripts(5)
    man page.

> Also, is the file that is generated by the ngram-class -class-counts
> option in the same format as class-format? Can a file in the
> class-format format be used directly by the ngram-count program to
> learn a class-based LM?

The -class-counts output is in the right format to be used as a count
input file for ngram-count to estimate a bigram LM for the class labels.
However, this will only work for bigram LMs since ngram-class doesn't
use higher-order statistics.  The recommended procedure is to
again use the replace-words-with-classes command to insert class
labels in your LM training data, and then use ngram-count on
the transformed data to estimate the class ngram probabilities.

Andreas 


From stolcke at speech.sri.com  Wed Apr 25 16:27:24 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Wed, 25 Apr 2007 16:27:24 -0700
Subject: FLM in SRILM
In-Reply-To: <859958.67801.qm@web90404.mail.mud.yahoo.com>
References: <859958.67801.qm@web90404.mail.mud.yahoo.com>
Message-ID: <462FE3DC.5000306@speech.sri.com>

Sara Abd-ElHamed wrote:
> How can i run FLM that is in SRILM?
>
> ------------------------------------------------------------------------
> Ahhh...imagining that irresistible "new car" smell?
> Check out new cars at Yahoo! Autos. 
> <http://us.rd.yahoo.com/evt=48245/*http://autos.yahoo.com/new_cars.html;_ylc=X3oDMTE1YW1jcXJ2BF9TAzk3MTA3MDc2BHNlYwNtYWlsdGFncwRzbGsDbmV3LWNhcnM-> 


Sara,

you question is much too general.  You should first read the available 
documents in $SRILM/flm/doc/.
Then look at the example in $SRILM/tests/fngram-count .
If you then have specific questions you can direct them to the 
srilm-user mailing list (the SRILM web page tells
you how to join).

Andreas


From dianaduraiz at gmail.com  Thu Apr 26 10:39:35 2007
From: dianaduraiz at gmail.com (=?ISO-8859-1?Q?Diana_Dur=E1n?=)
Date: Thu, 26 Apr 2007 19:39:35 +0200
Subject: Perplexity
Message-ID: <da7851910704261039w33e155b5u68bfc8c3e112830@mail.gmail.com>

Hello,

I would like to know what the difference is between ppl and ppll at the
output file when executing ngram.

Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070426/ce9e16d1/attachment.html>

From ioparin at yahoo.co.uk  Thu Apr 26 11:30:23 2007
From: ioparin at yahoo.co.uk (ilya oparin)
Date: Thu, 26 Apr 2007 19:30:23 +0100 (BST)
Subject: Perplexity
In-Reply-To: <da7851910704261039w33e155b5u68bfc8c3e112830@mail.gmail.com>
Message-ID: <627682.9664.qm@web25402.mail.ukl.yahoo.com>

Diana,

Here's an abstract from SRILM manpages on ngram:

"Perplexity is given with two different
normalizations: counting all input tokens (``ppl'')
and excluding end-of-sentence tags (``ppl1'')."

--- Diana Dur?n <dianaduraiz at gmail.com> wrote:

> Hello,
> 
> I would like to know what the difference is between
> ppl and ppll at the
> output file when executing ngram.
> 
> Thank you.
> 


best regards,
Ilya


      ___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/ 


From marco.turchi at gmail.com  Fri May  4 08:12:14 2007
From: marco.turchi at gmail.com (marco turchi)
Date: Fri, 4 May 2007 16:12:14 +0100
Subject: question
Message-ID: <79a042480705040812h4912178dqeb24acf21bd26f84@mail.gmail.com>

Dear experts,
I have a  strange question for u.
if I have two language models, LM1 and LM2, does Srilm have any
scripts to merge them in only 1 language model LM3?

Thanks a lot
Marco


From ioparin at yahoo.co.uk  Fri May  4 09:11:09 2007
From: ioparin at yahoo.co.uk (ilya oparin)
Date: Fri, 4 May 2007 17:11:09 +0100 (BST)
Subject: question
In-Reply-To: <79a042480705040812h4912178dqeb24acf21bd26f84@mail.gmail.com>
Message-ID: <670739.73588.qm@web25410.mail.ukl.yahoo.com>

Dear Marco,

In ngram use -mix-lm options to interpolate the models
and then write resulting LM back with -write-lm
<file>. 
http://www.speech.sri.com/projects/srilm/manpages/ngram.html

--- marco turchi <marco.turchi at gmail.com> wrote:

> Dear experts,
> I have a  strange question for u.
> if I have two language models, LM1 and LM2, does
> Srilm have any
> scripts to merge them in only 1 language model LM3?
> 
> Thanks a lot
> Marco
> 


best regards,
Ilya

Send instant messages to your online friends http://uk.messenger.yahoo.com 


From bplank at science.uva.nl  Tue May 15 07:04:01 2007
From: bplank at science.uva.nl (B. Plank)
Date: Tue, 15 May 2007 16:04:01 +0200 (CEST)
Subject: write-vocab
Message-ID: <3169.145.116.12.178.1179237841.squirrel@webmail.science.uva.nl>

Dear SRILM-team,

is there a parameter to get the n most frequent words out of a LM? (i.e.
like restricing the write-vocab of "ngram -order 1" to just output the
n-most frequent words?) I am sure there is, just now I don't see it.

Thank you for any help,
Barbara


From stolcke at speech.sri.com  Tue May 15 09:28:24 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Tue, 15 May 2007 09:28:24 -0700
Subject: write-vocab
In-Reply-To: <3169.145.116.12.178.1179237841.squirrel@webmail.science.uva.nl>
References: <3169.145.116.12.178.1179237841.squirrel@webmail.science.uva.nl>
Message-ID: <4649DFA8.8080800@speech.sri.com>

B. Plank wrote:
> Dear SRILM-team,
>
> is there a parameter to get the n most frequent words out of a LM? (i.e.
> like restricing the write-vocab of "ngram -order 1" to just output the
> n-most frequent words?) I am sure there is, just now I don't see it.
>
> Thank you for any help,
> Barbara
>
>   
Actually, there is no such tool.  The frequency of words is not 
generally available in the LM, only their unigram
probabilities.  Since the unigram probabilities are usually  a monotonic 
function of the unigram frequencies you
could write a small script that extracts the words from the unigram 
section of the LM file and sorts them by
their probabilities.

Andreas


From stolcke at speech.sri.com  Mon May 21 09:00:35 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Mon, 21 May 2007 09:00:35 -0700
Subject: Inquiry about expanding interpolated LM (class model + word model)
 to word model
In-Reply-To: <e645fdc00705202058u56ae7134s5e95b0dbbf4c1010@mail.gmail.com>
References: <000701c79a43$d8150110$7c216b82@speech.sri.com>	 <200705200318.l4K3ITY04648@huge> <e645fdc00705202058u56ae7134s5e95b0dbbf4c1010@mail.gmail.com>
Message-ID: <4651C223.4070103@speech.sri.com>

Xiaodan Zhuang wrote:
> Dear Andreas,
>
> Thanks for your input.
>
> It looks to me that the output class ngram (either just
> -expand-classes or further interpolated with some word ngram) is a
> normal class ngram followed by the probability of a class emitting a
> particular word. Is that just the class ngram appended by the class
> definition file?
Correct.

>
> If I need to convert the class-ngram or the interpolated LM into a
> pure word ngram model for use elsewhere, shall I replace for example
> the lines in the following 2-grams as indicated:
>
> 2-grams:
> pp1(log) class-1 class-2
>>>>>> change to all possible word pairs, such as "pp1+log(p11)+log(p29)
> word-1 word-9" and "pp1+log(p12)+log(p29) word-2 word-9"
>
> pp2(log) class-1 word-5
>>>>> change to "pp2+log(p11) word-1 word-5" and "pp2+log(p12) word-2 
>>>>> word-5"
>
>
> [[class definition:
> class-1 p11 word-1
> class-1 p12 word-2
> class-2 p29 word-9
> ]]
Yes, except you need to take care to sum probabilities of word ngrams 
that are generated through
multiple distinct class expansions.  Also, in SRILM, classes may have 
multi-word strings as members,
further complicating the situation.

The good news is that ngram -expand-classes already does all this for you.
Beware that expanding large or even moderate-sized class ngrams may not 
be feasible computationally depending
on the cardinality of your classes

Andreas


From stolcke at speech.sri.com  Tue May 29 20:56:03 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Tue, 29 May 2007 20:56:03 PDT
Subject: Class-based LM using the SRILM toolkit? 
In-Reply-To: Your message of Mon, 21 May 2007 23:13:33 +0530.
             <d4929ad00705211043v78272000odaef19023c4f1e41@mail.gmail.com> 
Message-ID: <200705300356.l4U3u3R26372@huge>


> 
> Dear Dr. Stolcke,
> 
> Thank you once again for your invaluable help.
> 
> I have now developed two LMs using your toolkit - a trigram word-based model
> and a class-based model (static models). I now want to interpolate them and
> then apply some form of smoothing on the resultant LM. The ngram program in
> the toolkit has a -mix-lm option which allows linear interpolation; the
> manpages for that option mention:
> 
> "*NOTE: *Unless *-bayes *(see below) is specified, *-mix-lm *triggers a
> static interpolation of the models in memory. In most cases a more
> efficient, dynamic interpolation is sufficient, requested by *-bayes
> 0*.**Also, mixing models of different type (
> e.g., word-based and class-based) will *only *work correctly with dynamic
> interpolation."
> 
> What is dynamic interpolation? Is it applicable in my case? Can

Dynamic interpolation means that the probabilities of the interpolated model
are computed on-the-fly, at test time.
Static interpolation, by contrast, means that a single model is created
ahead of testing, containing the interpolated probabilities in the 
usual backoff format.  This is only possible for models of the same type,
as explained in the note above.

> mixing/interpolation of these models be perfomed only with the -dynamic
> option? In that case, how?

The -dynamic option has nothing to do with dynamic interpolation of the
kind we are discussing here.
Dynamic interpolation is enabled by the -bayes option.

> 
> Also, what is the -bayes interpolation method about? The manpages say for
> the -bayes option:
> "Interpolate the second and the main model using posterior probabilities for
> local N-gram-contexts of length *length*."
> What are you referring to by "N-gram contexts"? Are only the posterior
> probabilities interpolated here? If possible, please provide me with a link
> to a reference text etc. where I can learn more about this.

For an explanation of Bayesian interpolation please consult the technical
report cited at the bottom of the ngram(1) man page.  You can get it at
http://www.speech.sri.com/cgi-bin/run-distill?papers/lm95-report.ps.gz
then check Section 2.3.

Andreas 


From svmats at yahoo.com  Tue May 29 23:57:12 2007
From: svmats at yahoo.com (Mats Svenson)
Date: Tue, 29 May 2007 23:57:12 -0700 (PDT)
Subject: Perplexity in "ngram"
Message-ID: <130070.98900.qm@web31606.mail.mud.yahoo.com>

Hi,
 I have tried to use "ngram" to count perplexity of my
LMs. However, I am not sure how does the srilm
implementation treat OOVs in terms of counting
perplexity. Is it that "log P(<unk>|history) != 0" or
OOVs are just ignored? If a model with a higher number
of OOVs has a lower perplexity than another LM, does
it mean that it is "better" in this -ppl
implementation?

Second, in some discussions, I have heard about -ppl1
option, but the current version does not seem to have
it. In what -ppl1 differs from -ppl?

Third, is there a way how to meaningfully compute
perplexity for a hidden event LM? Or another way how
to evaluate hidden event LM quality?

Thanks for your help,
 Mats


____________________________________________________________________________________
Get your own web address.  
Have a HUGE year through Yahoo! Small Business.
http://smallbusiness.yahoo.com/domains/?p=BESTDEAL


From ioparin at yahoo.co.uk  Fri Jun  1 04:54:43 2007
From: ioparin at yahoo.co.uk (ilya oparin)
Date: Fri, 1 Jun 2007 12:54:43 +0100 (BST)
Subject: [SRILM] lattice rescoring with FLM
Message-ID: <335592.97241.qm@web25411.mail.ukl.yahoo.com>

Hi, everybody,

I'm doing lattice rescoring with FLM (SRILM 1.5.0).
However, I  somehow screw it up - may be somebody
could give me some hints on possible inconsistencies?

I have lattices generated with large bigram model and
then rescore it with small domain-specific trigram
model (word and FLM). For FLM rescoring I convert all
word nodes (HTK format) to FLM representation (e.g.
W=HELLO -> W=W-HELLO:S-HELLO:I-NULL:T-Db--x-) and then
rescore with
lattice-tool -in-lattice-list [list] -unk -read-htk
-no-nulls -no-htk-nulls -htk-words-on-nodes -factored
-lm [FLM_specification_file] -write-htk -htk-logbase
2.71828 -htk-lmscale 12.0 -htk-wdpenalty -10.0
-out-lattice-dir [dir]
I get lots of warnings "might produce unnormalized LM"
(that might be expected due to small size of
domain-specific training data?)
So, even if I use FLM that imitates simple word
trigram LM, the final accuracy on rescored lattices is
3 times lower than with conventional LM. When I check
the perplexities, they coincide. So, there should be
something with the way I do FLM lattice rescoring. 
Did I miss something? 

best regards,
Ilya


      ___________________________________________________________ 
Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html 


From stolcke at speech.sri.com  Fri Jun  1 12:39:08 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Fri, 01 Jun 2007 12:39:08 -0700
Subject: Perplexity in "ngram"
In-Reply-To: <130070.98900.qm@web31606.mail.mud.yahoo.com>
References: <130070.98900.qm@web31606.mail.mud.yahoo.com>
Message-ID: <466075DC.1090300@speech.sri.com>

Mats Svenson wrote:
> Hi,
>  I have tried to use "ngram" to count perplexity of my
> LMs. However, I am not sure how does the srilm
> implementation treat OOVs in terms of counting
> perplexity. Is it that "log P(<unk>|history) != 0" or
> OOVs are just ignored? If a model with a higher number
>   
SRILM excludes words with zero probability from the perplexity 
computation and
reports their tally separately.  That includes OOV words when the LM 
doesn't contain
an unknown word (<unk>) token.

> of OOVs has a lower perplexity than another LM, does
> it mean that it is "better" in this -ppl
> implementation?
>   
Possibly.  You should not compare perplexities of LMs with different 
vocabularies.
> Second, in some discussions, I have heard about -ppl1
> option, but the current version does not seem to have
> it. In what -ppl1 differs from -ppl?
>   
There is no -ppl1 option.  -ppl reports a statistic labeled "ppl1", 
which is explained
in the ngram man page.
> Third, is there a way how to meaningfully compute
> perplexity for a hidden event LM? Or another way how
> to evaluate hidden event LM quality?
>   
Hidden event LMs are LMs, so you can compute a word-based perplexity just
like for any other LM.  If the goal of the HE-LM is to decode hidden events
(like sentence boundaries) then you can obviously evaluate that task as 
well.

Andreas


From ioparin at yahoo.co.uk  Mon Jun  4 04:05:03 2007
From: ioparin at yahoo.co.uk (ilya oparin)
Date: Mon, 4 Jun 2007 12:05:03 +0100 (BST)
Subject: [SRILM]
Message-ID: <823133.46992.qm@web25411.mail.ukl.yahoo.com>

As for my last question (regarding lattice-tool
-factored option), there were memory-overflow problems
with expansion of some lattices I didn't notice at
first and which screwed the results. Sorry for a
false-alarm question.

best regards,
Ilya


      ___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/ 


From yuan_wenxiao at yahoo.com.cn  Wed Jun  6 07:29:47 2007
From: yuan_wenxiao at yahoo.com.cn (=?gb2312?q?=CE=C4=F3=E3=20=D4=B7?=)
Date: Wed, 6 Jun 2007 22:29:47 +0800 (CST)
Subject: Word Insertion Penalty
Message-ID: <434038.72632.qm@web92006.mail.cnb.yahoo.com>

Dear All,
 
 I am wondering if I could set word insertion penalty when rescoring a N-best list using SRILM tool, and if yes, how?
 
 Regards,
 
 Wenxiao
 
 	      
---------------------------------
????????3.5G???20M??? 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070606/e2a9efaa/attachment.html>

From sara_abd_elhamed at yahoo.com  Fri Jun  8 11:26:36 2007
From: sara_abd_elhamed at yahoo.com (Sara Abd-ElHamed)
Date: Fri, 8 Jun 2007 11:26:36 -0700 (PDT)
Subject: Guestion in FLM
Message-ID: <776540.25151.qm@web90407.mail.mud.yahoo.com>

I have a problem in running FLM in SRILM
  I know there are two commands in it "fngram-count and fngram" but i don't know what is the format of input files,plz if anyone can help.
   
  thanks in advance.

       
---------------------------------
Sick sense of humor? Visit Yahoo! TV's Comedy with an Edge to see what's on, when. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070608/23b889ab/attachment.html>

From ioparin at yahoo.co.uk  Fri Jun  8 14:03:09 2007
From: ioparin at yahoo.co.uk (ilya oparin)
Date: Fri, 8 Jun 2007 22:03:09 +0100 (BST)
Subject: Guestion in FLM
In-Reply-To: <776540.25151.qm@web90407.mail.mud.yahoo.com>
Message-ID: <15520.62771.qm@web25409.mail.ukl.yahoo.com>

Hi,

Those functions are very well documented in the 
technical report you should find in your SRILM
distribution in flm/doc/arabic-final.pdf. If you
somehow don't have it, just write and I'll send you
the link.

--- Sara Abd-ElHamed <sara_abd_elhamed at yahoo.com>
wrote:

> I have a problem in running FLM in SRILM
>   I know there are two commands in it "fngram-count
> and fngram" but i don't know what is the format of
> input files,plz if anyone can help.
>    
>   thanks in advance.
> 
>        
> ---------------------------------
> Sick sense of humor? Visit Yahoo! TV's Comedy with
> an Edge to see what's on, when. 


best regards,
Ilya


___________________________________________________________ 
Copy addresses and emails from any email account to Yahoo! Mail - quick, easy and free. http://uk.docs.yahoo.com/trueswitch2.html


From stolcke at speech.sri.com  Fri Jun  8 16:12:53 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Fri, 08 Jun 2007 16:12:53 PDT
Subject: Kullback Leibler 
In-Reply-To: Your message of Fri, 08 Jun 2007 23:46:33 +0200.
             <1594.145.116.14.61.1181339193.squirrel@webmail.science.uva.nl> 
Message-ID: <200706082312.l58NCrX8012764@dylan.speech.sri.com>


In message <1594.145.116.14.61.1181339193.squirrel at webmail.science.uva.nl>you w
rote:
> Dear Andreas Stolcke,
> 
> just a very small question for curiosity. Is there already some tool
> included in the SRILM toolkit to calculate the Kullback Leibler divergence
> between two LMs?

No. it would be cool to have such a tool.

For certain models (e.g., ngram models)
you could come up with an exact computation, similar to what the pruning
algorithm uses.

For the general case you could sample from one of the LMs and 
compute an empirical cross-entropy.

Feel free to implement something ...

Andreas 


From miss_egypt2008 at yahoo.com  Sat Jun  9 09:27:48 2007
From: miss_egypt2008 at yahoo.com (dodo rafik)
Date: Sat, 9 Jun 2007 09:27:48 -0700 (PDT)
Subject: help
Message-ID: <503203.96630.qm@web51604.mail.re2.yahoo.com>

Hello
  I have a question about using SRILM as am using the toolkit in tagging text
  but this tagging is based on Facotred Language Model (FLM) and i don't know how to do that using SRILM toolkit so support me with information needed to do that
  and also there are 2 files i need to know the data types of contens of each file as the extensions of these files .count & .count.lm
  so please answer me as soon as possible
  Thanks in advance
  bye

       
---------------------------------
Luggage? GPS? Comic books? 
Check out fitting  gifts for grads at Yahoo! Search.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070609/8e9cc0e3/attachment.html>

From miss_egypt2008 at yahoo.com  Sat Jun  9 10:29:27 2007
From: miss_egypt2008 at yahoo.com (dodo rafik)
Date: Sat, 9 Jun 2007 10:29:27 -0700 (PDT)
Subject: help
Message-ID: <20070609172927.15523.qmail@web51601.mail.re2.yahoo.com>

Hello
  I have a question about using SRILM as am using the toolkit in tagging text
  but this tagging is based on Facotred Language Model (FLM) and i don't know how to do that using SRILM toolkit so support me with information needed to do that
  and also there are 2 files i need to know the data types of contens of each file as the extensions of these files .count & .count.lm
  so please answer me as soon as possible
  Thanks in advance
  bye

       
---------------------------------
Boardwalk for $500? In 2007? Ha! 
Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070609/29366056/attachment.html>

From stolcke at speech.sri.com  Sat Jun  9 11:29:56 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Sat, 09 Jun 2007 11:29:56 -0700
Subject: FLM Tutorial available
Message-ID: <466AF1A4.3070707@speech.sri.com>


Kevin Duh kindly made available a very nice factored LM tutorial, which 
should answer many of the questions
posted on srilm-user recently.  I am including his original email below.

I also put links to this and other tutorials and overview publications 
on our web server at
http://www.speech.sri.com/projects/srilm/manpages/ .
If you know of other useful material that can help a novice get started, 
please let me know.

Happy modeling,

Andreas


-------- Original Message --------
Subject: Re: Guestion in FLM
Date: Fri, 08 Jun 2007 23:09:15 -0800
From: Kevin Duh <kevinduh at u.washington.edu>
To: Sara Abd-ElHamed <sara_abd_elhamed at yahoo.com>
CC: SRILM mailing list <srilm-user at speech.sri.com>
References: <15520.62771.qm at web25409.mail.ukl.yahoo.com>

We have written a Factored LM tutorial to help people get started with
using FLMs. It is available here:

http://ssli.ee.washington.edu/people/duh/papers/flm-manual.pdf

This tutorial includes:
1. Intro to FLM and generalized backoff
2. Specification and syntax for using FLM in SRILM
3. A step-by-step walk-through for first time users

Some of the material is taken from the "arabic-final" report mentioned
by Ilya Oparin, but updated and re-formatted in a clearer way.

If there are questions and suggestions, please feel free to email.

Thanks,
Kevin Duh

-----------------------------------------------
Signals, Speech, and Language Interpretation Lab
University of Washington, Seattle
http://ssli.ee.washington.edu/people/duh
----------------------------------------------


From marco.turchi at gmail.com  Mon Jun 11 06:48:09 2007
From: marco.turchi at gmail.com (marco turchi)
Date: Mon, 11 Jun 2007 14:48:09 +0100
Subject: Interpolation vs ngram-merge
Message-ID: <79a042480706110648o692375cnbd1f991a71db8ba5@mail.gmail.com>

Dear experts,
i have a question for u.
I have two dataset, and I want to construct a LM that contains both the dataset.
srilm provides me two different paths:
1)to create 2 different LMs and then  interpolate them
2)to count the n-gram for each dataset, merge these counts using
ngram-merge, and at the end construct the final LM.
which are the differences of these methods?
Can u suggest me a paper or book where I can understand these differences?

Thanks a lot
Marco


From ioparin at yahoo.co.uk  Mon Jun 11 12:11:36 2007
From: ioparin at yahoo.co.uk (ilya oparin)
Date: Mon, 11 Jun 2007 20:11:36 +0100 (BST)
Subject: Interpolation vs ngram-merge
In-Reply-To: <79a042480706110648o692375cnbd1f991a71db8ba5@mail.gmail.com>
Message-ID: <325691.25850.qm@web25408.mail.ukl.yahoo.com>

I have experience with training LMs on huge data
(hundreds millions wordfors). If this is the case for
you it can be actually be more efficient (or even
possible at all) to interpolate trained LMs, than join
the counts and train (due to time and memory
expenses).
Moreover, it allows to give models different weights
and tune those according to perplexity results on some
test data if the "target speech" for recognition is
already known. 

--- marco turchi <marco.turchi at gmail.com> wrote:

> Dear experts,
> i have a question for u.
> I have two dataset, and I want to construct a LM
> that contains both the dataset.
> srilm provides me two different paths:
> 1)to create 2 different LMs and then  interpolate
> them
> 2)to count the n-gram for each dataset, merge these
> counts using
> ngram-merge, and at the end construct the final LM.
> which are the differences of these methods?
> Can u suggest me a paper or book where I can
> understand these differences?
> 
> Thanks a lot
> Marco
> 


best regards,
Ilya


      ___________________________________________________________ 
Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html 


From sara_abd_elhamed at yahoo.com  Mon Jun 11 18:45:58 2007
From: sara_abd_elhamed at yahoo.com (Sara Abd-ElHamed)
Date: Mon, 11 Jun 2007 18:45:58 -0700 (PDT)
Subject: Q in Factor file
Message-ID: <820653.31594.qm@web90412.mail.mud.yahoo.com>

Hi,
  I have another question in FLM.
  In the factor file we have two file:
    
   The count file(.count.gz)   
   The language model file (.lm.gz)
  I want to know where these file come from?
  Are those file the output of ngram and ngram-count?
  Thanks in advance. 

       
---------------------------------
You snooze, you lose. Get messages ASAP with AutoCheck
 in the all-new Yahoo! Mail Beta. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070611/1557e19d/attachment.html>

From kevinduh at u.washington.edu  Mon Jun 11 20:53:06 2007
From: kevinduh at u.washington.edu (Kevin Duh)
Date: Mon, 11 Jun 2007 19:53:06 -0800
Subject: Q in Factor file
In-Reply-To: <820653.31594.qm@web90412.mail.mud.yahoo.com>
References: <820653.31594.qm@web90412.mail.mud.yahoo.com>
Message-ID: <466E18A2.4080908@u.washington.edu>

Hi Sara,

The count and LM files are the outputs of fngram-count and inputs to 
fngram. The filenames are specified by the factor-file.

Hope that helps,
Kevin


Sara Abd-ElHamed wrote:
> Hi,
> I have another question in FLM.
> In the factor file we have two file:
> 
>    1. The count file(.count.gz)
>    2. The language model file (.lm.gz)
> 
> I want to know where these file come from?
> Are those file the output of ngram and ngram-count?
> Thanks in advance. 
> 
> ------------------------------------------------------------------------
> You snooze, you lose. Get messages ASAP with AutoCheck 
> <http://us.rd.yahoo.com/evt=47959/*http://advision.webevents.yahoo.com/mailbeta/newmail_html.html>
> in the all-new Yahoo! Mail Beta.


From svp at zuzino.net.ru  Mon Jun 11 23:20:38 2007
From: svp at zuzino.net.ru (Sergey Protasov)
Date: Tue, 12 Jun 2007 10:20:38 +0400
Subject: add new words to current classes
Message-ID: <150c31280706112320v4c0f0af0nd669850c67cce5d@mail.gmail.com>

Dear experts,

I have small corpora with dictionary of 10K words that split on 200 classes.

And I have big corpora with dictionary of 30K words (20K of new words).

I want to split 20K new words to the 200 classes that exist.

How can I do it? (using srilm)

I dont want to move any of old 10K words from class to class.

Thanks in advance!


From stolcke at speech.sri.com  Tue Jun 12 10:20:13 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Tue, 12 Jun 2007 10:20:13 -0700
Subject: Word Insertion Penalty
In-Reply-To: <434038.72632.qm@web92006.mail.cnb.yahoo.com>
References: <434038.72632.qm@web92006.mail.cnb.yahoo.com>
Message-ID: <466ED5CD.4040107@speech.sri.com>

?? ? wrote:
> Dear All,
>
> I am wondering if I could set word insertion penalty when rescoring a
> N-best list using SRILM tool, and if yes, how?
Word insertion penalty and other score weight parameters (like the LM
weight) are
typically optimized on held-out test set, by minimizing the empirical
word error.

The program nbest-optimize serves exactly this purpose.

Andreas


From stolcke at speech.sri.com  Tue Jun 12 10:49:30 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Tue, 12 Jun 2007 10:49:30 -0700
Subject: add new words to current classes
In-Reply-To: <150c31280706112320v4c0f0af0nd669850c67cce5d@mail.gmail.com>
References: <150c31280706112320v4c0f0af0nd669850c67cce5d@mail.gmail.com>
Message-ID: <466EDCAA.2090202@speech.sri.com>

Sergey Protasov wrote:
> Dear experts,
>
> I have small corpora with dictionary of 10K words that split on 200 
> classes.
>
> And I have big corpora with dictionary of 30K words (20K of new words).
>
> I want to split 20K new words to the 200 classes that exist.
>
> How can I do it? (using srilm)
>
> I dont want to move any of old 10K words from class to class. 

I agree this would be a useful function to have, but unfortunately it is 
not currently implemented.
It should be fairly straightforward to do based on the existing code.

You basically  need to load an existing class definition, then create 
singleton classes for the
new words, and start incremental merging with the number of classes 
limited to the original set.

If you care about this problem you should try to modify ngram-class.cc 
and share the results with
the rest of us! I'd be happy to give some guidance and review changes if 
you are willing to do the work.

Andreas


Andreas


From svp at zuzino.net.ru  Tue Jun 12 23:42:13 2007
From: svp at zuzino.net.ru (Sergey Protasov)
Date: Wed, 13 Jun 2007 10:42:13 +0400
Subject: add new words to current classes
In-Reply-To: <466EDCAA.2090202@speech.sri.com>
References: <150c31280706112320v4c0f0af0nd669850c67cce5d@mail.gmail.com>
	 <466EDCAA.2090202@speech.sri.com>
Message-ID: <150c31280706122342k7d4433b9kc5493a585b58423@mail.gmail.com>

Thank you, Andreas, for your answer.

Unfortunately I dont have a good skills in C++ language at the moment.

But I can try to develop some perl script for this idea.

> I agree this would be a useful function to have, but unfortunately it is
> not currently implemented.
> It should be fairly straightforward to do based on the existing code.
>
> You basically  need to load an existing class definition, then create
> singleton classes for the
> new words, and start incremental merging with the number of classes
> limited to the original set.
>
> If you care about this problem you should try to modify ngram-class.cc
> and share the results with
> the rest of us! I'd be happy to give some guidance and review changes if
> you are willing to do the work.
>
> Andreas
>
>
> Andreas
>
>
>


From sara_abd_elhamed at yahoo.com  Wed Jun 13 13:29:22 2007
From: sara_abd_elhamed at yahoo.com (Sara Abd-ElHamed)
Date: Wed, 13 Jun 2007 13:29:22 -0700 (PDT)
Subject: Question
Message-ID: <12474.38130.qm@web90413.mail.mud.yahoo.com>


    Hi,
  When i tried to run the FLM it give me an error.
  Error: couldn't form int for number of factored LMs in when reading FLM spec file
  What is the cause of this error?
  I tried the examples in the papers you give to me and it also give me the same error.
  For example i tried the example of unigram
   
  ## word unigram
W : 0 word_1gram.count.gz word_1gram.lm.gz 1
0b0 0b0 kndiscount gtmin 1
   
  Sorry for disturbance.


---------------------------------
Got a little couch potato? 
Check out fun summer activities for kids.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070613/12d30ff6/attachment.html>

From sara_abd_elhamed at yahoo.com  Wed Jun 13 13:30:23 2007
From: sara_abd_elhamed at yahoo.com (Sara Abd-ElHamed)
Date: Wed, 13 Jun 2007 13:30:23 -0700 (PDT)
Subject: Fwd: Question
Message-ID: <649062.28869.qm@web90404.mail.mud.yahoo.com>

  Hi,
  When i tried to run the FLM it give me an error.
  Error: couldn't form int for number of factored LMs in when reading FLM spec file
  What is the cause of this error?
  I tried the examples in the papers you give to me and it also give me the same error.
  For example i tried the example of unigram
   
  ## word unigram
W : 0 word_1gram.count.gz word_1gram.lm.gz 1
0b0 0b0 kndiscount gtmin 1
   
  Sorry for disturbance.


---------------------------------
Pinpoint customers who are looking for what you sell. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070613/10aa5917/attachment.html>

From ioparin at yahoo.co.uk  Wed Jun 13 23:08:43 2007
From: ioparin at yahoo.co.uk (ilya oparin)
Date: Thu, 14 Jun 2007 07:08:43 +0100 (BST)
Subject: Fwd: Question
In-Reply-To: <649062.28869.qm@web90404.mail.mud.yahoo.com>
Message-ID: <831489.57332.qm@web25401.mail.ukl.yahoo.com>

Hi,

you should put the number corresponding to the number
of FLMs in the FLM specification file (in your case 1)
right after the comment line, as e.g.
## word unigram
1
W : 0 word_1gram.count.gz word_1gram.lm.gz 1
...

--- Sara Abd-ElHamed <sara_abd_elhamed at yahoo.com>
wrote:

>   Hi,
>   When i tried to run the FLM it give me an error.
>   Error: couldn't form int for number of factored
> LMs in when reading FLM spec file
>   What is the cause of this error?
>   I tried the examples in the papers you give to me
> and it also give me the same error.
>   For example i tried the example of unigram
>    
>   ## word unigram
> W : 0 word_1gram.count.gz word_1gram.lm.gz 1
> 0b0 0b0 kndiscount gtmin 1
>    
>   Sorry for disturbance.
> 
> 
>        
> ---------------------------------
> Pinpoint customers who are looking for what you
> sell. 


best regards,
Ilya


      ___________________________________________________________ 
Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html 


From miss_egypt2008 at yahoo.com  Fri Jun 15 05:35:01 2007
From: miss_egypt2008 at yahoo.com (dodo rafik)
Date: Fri, 15 Jun 2007 05:35:01 -0700 (PDT)
Subject: FLM
Message-ID: <576668.66895.qm@web51605.mail.re2.yahoo.com>

hi,
   
  i tried to run the FLM but I've a problem even after i put the number corresponding to the number of FLMs in the FLM specification file , it didn't give errors but it didn't work and didn't give any result 
   
  Sorry for disturbance.


---------------------------------
Got a little couch potato? 
Check out fun summer activities for kids.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070615/34b0cd07/attachment.html>

From miss_egypt2008 at yahoo.com  Fri Jun 15 05:33:54 2007
From: miss_egypt2008 at yahoo.com (dodo rafik)
Date: Fri, 15 Jun 2007 05:33:54 -0700 (PDT)
Subject: FLM
Message-ID: <765043.51068.qm@web51611.mail.re2.yahoo.com>

hi,
   
  i tried to run the FLM but I've a problem even after i put the number corresponding to the number of FLMs in the FLM specification file , it didn't give errors but it didn't work and didn't give any result 
  Sorry for disturbance.


---------------------------------
Need a vacation? Get great deals to amazing places on Yahoo! Travel. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070615/027d4c99/attachment.html>

From ioparin at yahoo.co.uk  Fri Jun 15 07:30:44 2007
From: ioparin at yahoo.co.uk (ilya oparin)
Date: Fri, 15 Jun 2007 15:30:44 +0100 (BST)
Subject: FLM
In-Reply-To: <576668.66895.qm@web51605.mail.re2.yahoo.com>
Message-ID: <462791.81485.qm@web25409.mail.ukl.yahoo.com>

Then you probably do something wrong. It should work.
Please check you FLM-specification file and
(fngram-count) input -text file (that should contain
your training text in FLM format (!)). 

P.S. You know, there is little point in writing "it
didn't give errors but it didn't work and didn't give
any result". You must specify the problem precisely if
you want any precise answers.

--- dodo rafik <miss_egypt2008 at yahoo.com> wrote:

> hi,
>    
>   i tried to run the FLM but I've a problem even
> after i put the number corresponding to the number
> of FLMs in the FLM specification file , it didn't
> give errors but it didn't work and didn't give any
> result 
>    
>   Sorry for disturbance.
> 
> 
>        
> ---------------------------------
> Got a little couch potato? 
> Check out fun summer activities for kids.


best regards,
Ilya


___________________________________________________________ 
Now you can scan emails quickly with a reading pane. Get the new Yahoo! Mail. http://uk.docs.yahoo.com/nowyoucan.html


From miss_egypt2008 at yahoo.com  Fri Jun 15 09:57:29 2007
From: miss_egypt2008 at yahoo.com (dodo rafik)
Date: Fri, 15 Jun 2007 09:57:29 -0700 (PDT)
Subject: FLM
Message-ID: <6536.69211.qm@web51611.mail.re2.yahoo.com>

sorry again for disturbance
   
  my FLM specefication file contains the following :
  ## word unigram
1
W : 0 word_1gram.count.gz word_1gram.lm.gz 1
0b0 0b0 kndiscount gtmin 1

  the text file contains the following :
  W-the:P-article W-brown:P-adjective W-dog:P-noun W-ate:P-verb 
  W-a:P-article W-bone:P-noun
   
  and the syntax i use is:
   
  " fngram-count.exe -factor-file my.flm -text train.txt "
   
  as my.flm is the FLM specefication file and train.txt is the text file
   
  so can you please tell me what are the modifications that need to be added
  to create LM file and count file
   
  hope you answer me as soon as possible 
  thanks in advance

       
---------------------------------
You snooze, you lose. Get messages ASAP with AutoCheck
 in the all-new Yahoo! Mail Beta. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070615/a15ad382/attachment.html>

From miss_egypt2008 at yahoo.com  Fri Jun 15 10:03:03 2007
From: miss_egypt2008 at yahoo.com (dodo rafik)
Date: Fri, 15 Jun 2007 10:03:03 -0700 (PDT)
Subject: FLM
Message-ID: <827471.74730.qm@web51610.mail.re2.yahoo.com>

sorry again for disturbance
   
  my FLM specefication file contains the following :
  ## word unigram
1
W : 0 word_1gram.count.gz word_1gram.lm.gz 1
0b0 0b0 kndiscount gtmin 1

  the text file contains the following :
  W-the:P-article W-brown:P-adjective W-dog:P-noun W-ate:P-verb 
  W-a:P-article W-bone:P-noun
   
  and the syntax i use is:
   
  " fngram-count.exe -factor-file my.flm -text train.txt "
   
  as my.flm is the FLM specefication file and train.txt is the text file
   
  so can you please tell me what are the modifications that need to be added
  to create LM file and count file
   
  hope you answer me as soon as possible 
  thanks in advance

       
---------------------------------
Choose the right car based on your needs.  Check out Yahoo! Autos new Car Finder tool.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070615/91531fa8/attachment.html>

From miss_egypt2008 at yahoo.com  Fri Jun 15 10:03:45 2007
From: miss_egypt2008 at yahoo.com (dodo rafik)
Date: Fri, 15 Jun 2007 10:03:45 -0700 (PDT)
Subject: FLM
Message-ID: <20070615170345.19056.qmail@web51602.mail.re2.yahoo.com>

sorry again for disturbance
   
  my FLM specefication file contains the following :
  ## word unigram
1
W : 0 word_1gram.count.gz word_1gram.lm.gz 1
0b0 0b0 kndiscount gtmin 1

  the text file contains the following :
  W-the:P-article W-brown:P-adjective W-dog:P-noun W-ate:P-verb 
  W-a:P-article W-bone:P-noun
   
  and the syntax i use is:
   
  " fngram-count.exe -factor-file my.flm -text train.txt "
   
  as my.flm is the FLM specefication file and train.txt is the text file
   
  so can you please tell me what are the modifications that need to be added
  to create LM file and count file
   
  hope you answer me as soon as possible 
  thanks in advance

       
---------------------------------
Be a better Heartthrob. Get better relationship answers from someone who knows.
Yahoo! Answers - Check it out. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070615/4e8881b6/attachment.html>

From ioparin at yahoo.co.uk  Fri Jun 15 12:25:11 2007
From: ioparin at yahoo.co.uk (ilya oparin)
Date: Fri, 15 Jun 2007 20:25:11 +0100 (BST)
Subject: FLM
In-Reply-To: <20070615170345.19056.qmail@web51602.mail.re2.yahoo.com>
Message-ID: <29831.33330.qm@web25404.mail.ukl.yahoo.com>

FLM "manual" /flm/doc/arabic-final.pdf, page 55
It needs just a bit of careful reading
-lm and -write-counts

--- dodo rafik <miss_egypt2008 at yahoo.com> wrote:

> sorry again for disturbance
>    
>   my FLM specefication file contains the following :
>   ## word unigram
> 1
> W : 0 word_1gram.count.gz word_1gram.lm.gz 1
> 0b0 0b0 kndiscount gtmin 1
> 
>   the text file contains the following :
>   W-the:P-article W-brown:P-adjective W-dog:P-noun
> W-ate:P-verb 
>   W-a:P-article W-bone:P-noun
>    
>   and the syntax i use is:
>    
>   " fngram-count.exe -factor-file my.flm -text
> train.txt "
>    
>   as my.flm is the FLM specefication file and
> train.txt is the text file
>    
>   so can you please tell me what are the
> modifications that need to be added
>   to create LM file and count file
>    
>   hope you answer me as soon as possible 
>   thanks in advance
> 
>        
> ---------------------------------
> Be a better Heartthrob. Get better relationship
> answers from someone who knows.
> Yahoo! Answers - Check it out. 


best regards,
Ilya


___________________________________________________________ 
All New Yahoo! Mail ? Tired of unwanted email come-ons? Let our SpamGuard protect you. http://uk.docs.yahoo.com/nowyoucan.html


From miss_egypt2008 at yahoo.com  Fri Jun 15 14:12:02 2007
From: miss_egypt2008 at yahoo.com (dodo rafik)
Date: Fri, 15 Jun 2007 14:12:02 -0700 (PDT)
Subject: please
Message-ID: <221170.78405.qm@web51611.mail.re2.yahoo.com>

thnx alot for your reply
  i already did what you said before but the output was only the count file 
  what about LM file?
sorry again for disturbance

       
---------------------------------
Luggage? GPS? Comic books? 
Check out fitting  gifts for grads at Yahoo! Search.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070615/6b5d9b3c/attachment.html>

From miss_egypt2008 at yahoo.com  Fri Jun 15 14:12:46 2007
From: miss_egypt2008 at yahoo.com (dodo rafik)
Date: Fri, 15 Jun 2007 14:12:46 -0700 (PDT)
Subject: No subject
Message-ID: <20070615211246.60643.qmail@web51603.mail.re2.yahoo.com>

thnx alot for your reply
  i already did what you said before but the output was only the count file 
  what about LM file?
sorry again for disturbance

       
---------------------------------
Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel and lay it on us.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070615/8e5366cc/attachment.html>

From miss_egypt2008 at yahoo.com  Fri Jun 15 14:13:00 2007
From: miss_egypt2008 at yahoo.com (dodo rafik)
Date: Fri, 15 Jun 2007 14:13:00 -0700 (PDT)
Subject: FLM
Message-ID: <20070615211300.60708.qmail@web51603.mail.re2.yahoo.com>

thnx alot for your reply
  i already did what you said before but the output was only the count file 
  what about LM file?
sorry again for disturbance

       
---------------------------------
Got a little couch potato? 
Check out fun summer activities for kids.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070615/1e54c8b7/attachment.html>

From miss_egypt2008 at yahoo.com  Fri Jun 15 14:39:27 2007
From: miss_egypt2008 at yahoo.com (dodo rafik)
Date: Fri, 15 Jun 2007 14:39:27 -0700 (PDT)
Subject: FLM
Message-ID: <228364.3654.qm@web51609.mail.re2.yahoo.com>

thnx alot for your reply
  i already did what you said before but the output was only the count file 
  what about LM file?
sorry again for disturbance

       
---------------------------------
Be a better Globetrotter. Get better travel answers from someone who knows.
Yahoo! Answers - Check it out.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070615/3bba135c/attachment.html>

From miss_egypt2008 at yahoo.com  Fri Jun 15 14:40:21 2007
From: miss_egypt2008 at yahoo.com (dodo rafik)
Date: Fri, 15 Jun 2007 14:40:21 -0700 (PDT)
Subject: FLM
Message-ID: <282560.90176.qm@web51611.mail.re2.yahoo.com>

thnx alot for your reply
  i already did what you said before but the output was only the count file 
  what about LM file?
sorry again for disturbance

 
---------------------------------
Bored stiff? Loosen up...
Download and play hundreds of games for free on Yahoo! Games.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070615/0c5a323c/attachment.html>

From zeman at ufal.ms.mff.cuni.cz  Sat Jun 16 00:39:02 2007
From: zeman at ufal.ms.mff.cuni.cz (Daniel Zeman)
Date: Sat, 16 Jun 2007 09:39:02 +0200
Subject: FLM
In-Reply-To: <228364.3654.qm@web51609.mail.re2.yahoo.com>
References: <228364.3654.qm@web51609.mail.re2.yahoo.com>
Message-ID: <46739396.90301@ufal.mff.cuni.cz>

Dear dodo,
don't apologize for disturbance. Rather, please don't post every 
question three times or so.
Dan

dodo rafik napsal(a):
> thnx alot for your reply
> i already did what you said before but the output was only the count file
> what about LM file?
> sorry again for disturbance
>
> ------------------------------------------------------------------------
> Be a better Globetrotter. Get better travel answers 
> <http://us.rd.yahoo.com/evt=48254/*http://answers.yahoo.com/dir/_ylc=X3oDMTI5MGx2aThyBF9TAzIxMTU1MDAzNTIEX3MDMzk2NTQ1MTAzBHNlYwNCQUJwaWxsYXJfTklfMzYwBHNsawNQcm9kdWN0X3F1ZXN0aW9uX3BhZ2U-?link=list&sid=396545469>from 
> someone who knows.
> Yahoo! Answers - Check it out. 


From sahar_magdy_mansor at yahoo.com  Sat Jun 16 10:37:35 2007
From: sahar_magdy_mansor at yahoo.com (sahar magdy)
Date: Sat, 16 Jun 2007 10:37:35 -0700 (PDT)
Subject: help
Message-ID: <336200.28335.qm@web35715.mail.mud.yahoo.com>

hi ,
  I downloaded SRILM version 1.5.4 but when i decompress the package it gives me the following errors i also tried to downlaod it more than one time and it gives me tha same result
   
   
  !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open mips-elf (bin\sgi --> mips-elf)
!   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points to missing file
!   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open sparc-elf (bin\sun4 --> sparc-elf)
!   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points to missing file
!   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open sparc-elf (bin\sun4_solaris --> sparc-elf)
!   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points to missing file
!   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open i386-solaris_g (lib\i386-solaris_m --> i386-solaris_g)
!   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points to missing file
!   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open i386-solaris_c (lib\i386-solaris-p4_c --> i386-solaris_c)
!   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points to missing file

       
---------------------------------
Park yourself in front of a world of choices in alternative vehicles.
Visit the Yahoo! Auto Green Center.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070616/bdbc6f59/attachment.html>

From stolcke at speech.sri.com  Sat Jun 16 10:51:26 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Sat, 16 Jun 2007 10:51:26 PDT
Subject: help 
In-Reply-To: Your message of Sat, 16 Jun 2007 10:37:35 -0700.
             <336200.28335.qm@web35715.mail.mud.yahoo.com> 
Message-ID: <200706161751.l5GHpRN20224@huge>


You can ignore these errors. they don't affect your building of SRILM
on a Windows system.

--Andreas

In message <336200.28335.qm at web35715.mail.mud.yahoo.com>you wrote:
> --0-1019792502-1182015455=:28335
> Content-Type: text/plain; charset=iso-8859-1
> Content-Transfer-Encoding: 8bit
> 
> hi ,
>   I downloaded SRILM version 1.5.4 but when i decompress the package it gives
>  me the following errors i also tried to downlaod it more than one time and i
> t gives me tha same result
>    
>    
>   !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open mips-elf
>  (bin\sgi --> mips-elf)
> !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points t
> o missing file
> !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open sparc-elf 
> (bin\sun4 --> sparc-elf)
> !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points t
> o missing file
> !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open sparc-elf 
> (bin\sun4_solaris --> sparc-elf)
> !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points t
> o missing file
> !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open i386-solar
> is_g (lib\i386-solaris_m --> i386-solaris_g)
> !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points t
> o missing file
> !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open i386-solar
> is_c (lib\i386-solaris-p4_c --> i386-solaris_c)
> !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points t
> o missing file
> 
>        
> ---------------------------------
> Park yourself in front of a world of choices in alternative vehicles.
> Visit the Yahoo! Auto Green Center.
> --0-1019792502-1182015455=:28335
> Content-Type: text/html; charset=iso-8859-1
> Content-Transfer-Encoding: 8bit
> 
> <DIV>hi ,</DIV>  <DIV>I&nbsp;downloaded SRILM&nbsp;version 1.5.4 but when i d
> ecompress the package it gives me the following errors i also tried to downla
> od it more than one time and it gives me tha same result</DIV>  <DIV>&nbsp;</
> DIV>  <DIV>&nbsp;</DIV>  <DIV>!&nbsp;&nbsp; C:\Documents and Settings\sahar\D
> esktop\srilm.tgz: Cannot open mips-elf (bin\sgi --&gt; mips-elf)<BR>!&nbsp;&n
> bsp; C:\Documents and Settings\sahar\Desktop\srilm.tgz: <SPAN id=lw_118201532
> 5_0 style="CURSOR: hand; BORDER-BOTTOM: #0066cc 1px dashed; HEIGHT: 1em">Symb
> olic link points</SPAN> to missing file<BR>!&nbsp;&nbsp; C:\Documents and Set
> tings\sahar\Desktop\srilm.tgz: Cannot open sparc-elf (bin\sun4 --&gt; sparc-e
> lf)<BR>!&nbsp;&nbsp; C:\Documents and Settings\sahar\Desktop\srilm.tgz: <SPAN
>  id=lw_1182015325_1 style="CURSOR: hand; BORDER-BOTTOM: #0066cc 1px dashed; H
> EIGHT: 1em">Symbolic link points</SPAN> to missing file<BR>!&nbsp;&nbsp; C:\D
> ocuments and Settings\sahar\Desktop\srilm.tgz: Cannot open
>  sparc-elf (bin\sun4_solaris --&gt; sparc-elf)<BR>!&nbsp;&nbsp; C:\Documents 
> and Settings\sahar\Desktop\srilm.tgz: <SPAN id=lw_1182015325_2 style="CURSOR:
>  hand; BORDER-BOTTOM: #0066cc 1px dashed; HEIGHT: 1em">Symbolic link points</
> SPAN> to missing file<BR>!&nbsp;&nbsp; C:\Documents and Settings\sahar\Deskto
> p\srilm.tgz: Cannot open i386-solaris_g (lib\i386-solaris_m --&gt; i386-solar
> is_g)<BR>!&nbsp;&nbsp; C:\Documents and Settings\sahar\Desktop\srilm.tgz: <SP
> AN id=lw_1182015325_3 style="CURSOR: hand; BORDER-BOTTOM: #0066cc 1px dashed;
>  HEIGHT: 1em">Symbolic link points</SPAN> to missing file<BR>!&nbsp;&nbsp; C:
> \Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open i386-solaris_c (
> lib\i386-solaris-p4_c --&gt; i386-solaris_c)<BR>!&nbsp;&nbsp; C:\Documents an
> d Settings\sahar\Desktop\srilm.tgz: <SPAN id=lw_1182015325_4 style="BACKGROUN
> D: none transparent scroll repeat 0% 0%; CURSOR: hand; BORDER-BOTTOM: #0066cc
>  1px dashed; HEIGHT: 1em">Symbolic link points</SPAN> to missing
>  file</DIV><p>&#32;
> 
>       <hr size=1>Park yourself in front of a world of choices in alternative 
> vehicles.<br><a href="http://us.rd.yahoo.com/evt=48246/*http://autos.yahoo.co
> m/green_center/;_ylc=X3oDMTE5cDF2bXZzBF9TAzk3MTA3MDc2BHNlYwNtYWlsdGFncwRzbGsD
> Z3JlZW4tY2VudGVy">Visit the Yahoo! Auto Green Center.</a>
> --0-1019792502-1182015455=:28335--


From sahar_magdy_mansor at yahoo.com  Sat Jun 16 14:54:09 2007
From: sahar_magdy_mansor at yahoo.com (sahar magdy)
Date: Sat, 16 Jun 2007 14:54:09 -0700 (PDT)
Subject: help
Message-ID: <541903.49958.qm@web35702.mail.mud.yahoo.com>

hi ,
  when i run SRILM version 1.3 , i 've a problem that the fngram-count output is only the count file and it is empty file , and when i opened it , it gives me an error(the archive is either in unknown format or damaged) and then I downloaded SRILM version 1.5.4 but when i decompress the package it gives me the following errors i also tried to downlaod it more than one time and it gives me tha same result
   
   
  !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open mips-elf (bin\sgi --> mips-elf)
!   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points to missing file
!   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open sparc-elf (bin\sun4 --> sparc-elf)
!   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points to missing file
!   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open sparc-elf (bin\sun4_solaris --> sparc-elf)
!   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points to missing file
!   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open i386-solaris_g (lib\i386-solaris_m --> i386-solaris_g)
!   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points to missing file
!   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open i386-solaris_c (lib\i386-solaris-p4_c --> i386-solaris_c)
!   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points to missing file

 
---------------------------------
We won't tell. Get more on shows you hate to love
(and love to hate): Yahoo! TV's Guilty Pleasures list.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070616/62b56cfd/attachment.html>

From stolcke at speech.sri.com  Sun Jun 17 13:58:11 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Sun, 17 Jun 2007 13:58:11 -0700
Subject: [Fwd: SRILM]
Message-ID: <4675A063.3070604@speech.sri.com>


Anyone can give J. some pointers on how to build class-based LMs?

Andreas

-------------- next part --------------
An embedded message was scrubbed...
From: "J.Sashank" <sashank at cse.iitb.ac.in>
Subject: SRILM
Date: Sat, 16 Jun 2007 16:20:18 +0530 (IST)
Size: 2961
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070617/9bf2ca90/attachment.mht>

From ioparin at yahoo.co.uk  Sun Jun 17 22:52:50 2007
From: ioparin at yahoo.co.uk (ilya oparin)
Date: Mon, 18 Jun 2007 06:52:50 +0100 (BST)
Subject: [Fwd: SRILM]
In-Reply-To: <4675A063.3070604@speech.sri.com>
Message-ID: <963905.98762.qm@web25404.mail.ukl.yahoo.com>

Hi, J.

You can use class-ngram (see manpages) to generate
classes from text automatically. Two files, class
count (standard N-gram file with class labels as
units) and class definition (telling you class
assignments for words) files are generated. Use those
to train LMs with ngram-count as usual, you need just
to add -classes option to refer to the
class-definition file.
If you want to use classes of your own, it's a bit
more tricky, since you have to take care of correct
class-definition file forming.

regards,
Ilya

--- Andreas Stolcke <stolcke at speech.sri.com> wrote:

> 
> Anyone can give J. some pointers on how to build
> class-based LMs?
> 
> Andreas
> 
> > Date: Sat, 16 Jun 2007 16:20:18 +0530 (IST)
> Subject: SRILM
> From: "J.Sashank" <sashank at cse.iitb.ac.in>
> To: stolcke at speech.sri.com
> 
> Sir,
>     I am undergraduate student studying in IIT
> Bombay . I am working on a
> research project which involves trigram model.I want
> to use
> class-based trigram model but I cannot find its the
> usage in the SRILM
> package . Can you please tell me about the usage of
> the package for
> this model.
> 
> Thanking You,
> 
> J.Sashank
> Junior Undergraduate
> Computer Science and Engineering
> IIT Bombay
> 
> 


      ___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/ 


From stolcke at speech.sri.com  Sun Jun 17 23:24:19 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Sun, 17 Jun 2007 23:24:19 -0700
Subject: help
In-Reply-To: <541903.49958.qm@web35702.mail.mud.yahoo.com>
References: <541903.49958.qm@web35702.mail.mud.yahoo.com>
Message-ID: <46762513.7000002@speech.sri.com>

sahar magdy wrote:
> hi ,
> when i run SRILM version 1.3 , i 've a problem that the fngram-count 
> output is only the count file and it is empty file , and when i opened 
> it , it gives me an error(the archive is either in unknown format or 
> damaged) and then I downloaded SRILM version 1.5.4 but when i 
> decompress the package it gives me the following errors i also tried 
> to downlaod it more than one time and it gives me tha same result
>  
I don't know what tool you're using to unpack the compressed tar file.
But I know for a fact that if you use GNU tar (as part of the cygwin 
utilities) it will work.
I suggest you use that.

Andreas

>  
> !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open 
> mips-elf (bin\sgi --> mips-elf)
> !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link 
> points to missing file
> !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open 
> sparc-elf (bin\sun4 --> sparc-elf)
> !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link 
> points to missing file
> !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open 
> sparc-elf (bin\sun4_solaris --> sparc-elf)
> !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link 
> points to missing file
> !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open 
> i386-solaris_g (lib\i386-solaris_m --> i386-solaris_g)
> !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link 
> points to missing file
> !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open 
> i386-solaris_c (lib\i386-solaris-p4_c --> i386-solaris_c)
> !   C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link 
> points to missing file
>
> ------------------------------------------------------------------------
> We won't tell. Get more on shows you hate to love 
> <http://us.rd.yahoo.com/evt=49980/*http://tv.yahoo.com/collections/265>
> (and love to hate): Yahoo! TV's Guilty Pleasures list. 
> <http://us.rd.yahoo.com/evt=49980/*http://tv.yahoo.com/collections/265> 


From stolcke at speech.sri.com  Sun Jun 17 23:29:40 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Sun, 17 Jun 2007 23:29:40 PDT
Subject: [Fwd: SRILM] 
In-Reply-To: Your message of Mon, 18 Jun 2007 06:52:50 +0100.
             <963905.98762.qm@web25404.mail.ukl.yahoo.com> 
Message-ID: <200706180629.l5I6TeJ11816@huge>


In message <963905.98762.qm at web25404.mail.ukl.yahoo.com>you wrote:
> Hi, J.
> 
> You can use class-ngram (see manpages) to generate
> classes from text automatically. Two files, class
> count (standard N-gram file with class labels as
> units) and class definition (telling you class
> assignments for words) files are generated. Use those
> to train LMs with ngram-count as usual, you need just
> to add -classes option to refer to the
> class-definition file.
> If you want to use classes of your own, it's a bit
> more tricky, since you have to take care of correct
> class-definition file forming.

I would add that one you have your class definitions (by hand, or as 
the result of ngram-class), the recommended procedure is to 
filter your training data through 

	replace-words-with-classes classes=CLASS-DEFINITIONS-FILE 

and the train the ngram model on the output (see training-scripts(1) manpage).

In testing you use that model together with the ngram -classes option 
as Ilya said.

Andreas 

> 
> regards,
> Ilya
> 
> --- Andreas Stolcke <stolcke at speech.sri.com> wrote:
> 
> > 
> > Anyone can give J. some pointers on how to build
> > class-based LMs?
> > 
> > Andreas
> > 
> > > Date: Sat, 16 Jun 2007 16:20:18 +0530 (IST)
> > Subject: SRILM
> > From: "J.Sashank" <sashank at cse.iitb.ac.in>
> > To: stolcke at speech.sri.com
> > 
> > Sir,
> >     I am undergraduate student studying in IIT
> > Bombay . I am working on a
> > research project which involves trigram model.I want
> > to use
> > class-based trigram model but I cannot find its the
> > usage in the SRILM
> > package . Can you please tell me about the usage of
> > the package for
> > this model.
> > 
> > Thanking You,
> > 
> > J.Sashank
> > Junior Undergraduate
> > Computer Science and Engineering
> > IIT Bombay
> > 
> > 
> 
> 
> 
>       ___________________________________________________________
> Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
> now.
> http://uk.answers.yahoo.com/ 


From svp at zuzino.net.ru  Wed Jun 20 12:33:40 2007
From: svp at zuzino.net.ru (Sergey Protasov)
Date: Wed, 20 Jun 2007 23:33:40 +0400
Subject: using negative data
Message-ID: <150c31280706201233q3d4ebb60haa64c5675928da62@mail.gmail.com>

Dear Experts...

Suppose I have big corpus of incorrect sentences.

*The event that he smiled at me gives me hope
*But my presents to win his heart have failed
*Absence to comply may result in dismissal
*The party is who we should invite
*I really like the fashion you do your hair
...


Can I (How can I) improve language model using this negative examples?

Can I improve ngram model using SRILM?