From abbas.malik at gmail.com  Wed Jul  1 07:17:22 2009
From: abbas.malik at gmail.com (Abbas Malik)
Date: Wed, 1 Jul 2009 16:17:22 +0200
Subject: MAP File
Message-ID: <5462500907010717l2121903fm9e6ac0c305db88a3@mail.gmail.com>

Dear All,

I am running a statistical system using the 'disambig' command with a map
file.

In the map file, I want to map an empty string on a word from the corpus,

nothing w1 w2 w3...

I want to establish a link between an empty string of V1 (nothing) and a
multiple choices of from the data V2.

Normal entry in the Map file is like follow

w w1 w2 ...

Is it possible that I just delete the w, put a space at the start of the
line and give the possible word list after this first space? I do not know
that this line will establish links with EMPTY STRING and word list followed
by the space at the start of the line or not. I hope that someone of you can
help me.

Thank you in advance,

best regards,
-- 
---
M G Abbas Malik
Doctorant (PhD Student)
Universit? Joseph Fourier,
Groupe d'Etude pour la Traduction Automatique et le Traitement Automatis?
des Langues et de la Parole (GETALP)
Laboratoire d'Informatique de Grenoble (LIG) / Grenoble Informatics
Laboratory

GETALP, LIG-Campus, BP53
385 Rue de la Biblioth?que,
38041 Grenoble Cedex 9, France
Off:      +33 (0)4 76 51 48 17
Mob:    +33 (0)6 74 50 46 01
e-mail: abbas.malik at imag.fr abbas.malik at gmail.com
URL:    www.puran.info
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20090701/00ea4513/attachment.html>

From abbas.malik at gmail.com  Wed Jul  1 07:41:07 2009
From: abbas.malik at gmail.com (Abbas Malik)
Date: Wed, 1 Jul 2009 16:41:07 +0200
Subject: MAP File
In-Reply-To: <5462500907010717l2121903fm9e6ac0c305db88a3@mail.gmail.com>
References: <5462500907010717l2121903fm9e6ac0c305db88a3@mail.gmail.com>
Message-ID: <5462500907010741v4bf1b139p756917973dd64467@mail.gmail.com>

Dear All,

Issue 1:

I am running a statistical system using the 'disambig' command with a map
file.

In the map file, I want to map an empty string on a word from the corpus,

nothing w1 w2 w3...

I want to establish a link between an empty string of V1 (nothing) and a
multiple choices of from the data V2.

Normal entry in the Map file is like follow

w w1 w2 ...

Is it possible that I just delete the w, put a space at the start of the
line and give the possible word list after this first space? I do not know
that this line will establish links with EMPTY STRING and word list followed
by the space at the start of the line or not. I hope that someone of you can
help me.

Issue 2:

In the map file, is it possible to map one word of V1 on to multiple words
of of V2. I mean that if we encounter a word w from V1 then it is replaced
or transformed by both words [w1 w2] of V2, such that w maps on the set [w1
w2] and does not map on w1 and w2 separately. I hope that I have cleared my
point.

Thank you in advance,


---
M G Abbas Malik
Doctorant (PhD Student)
Universit? Joseph Fourier,
Groupe d'Etude pour la Traduction Automatique et le Traitement Automatis?
des Langues et de la Parole (GETALP)
Laboratoire d'Informatique de Grenoble (LIG) / Grenoble Informatics
Laboratory

GETALP, LIG-Campus, BP53
385 Rue de la Biblioth?que,
38041 Grenoble Cedex 9, France
Off:      +33 (0)4 76 51 48 17
Mob:    +33 (0)6 74 50 46 01
e-mail: abbas.malik at imag.fr abbas.malik at gmail.com
URL:    www.puran.info
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20090701/05fcfc11/attachment.html>

From fsanchez at dlsi.ua.es  Wed Jul  1 08:50:52 2009
From: fsanchez at dlsi.ua.es (Felipe =?ISO-8859-1?Q?S=E1nchez_Mart=EDnez?=)
Date: Wed, 01 Jul 2009 16:50:52 +0100
Subject: Lattice Viterbi decoding
Message-ID: <1246463452.6600.43.camel@pipe>

Hi all,

I am using SRILM to score a set of translation candidates of a given
sentence. 

The sentence is divide into chunks, some of them having a fix
translation and others having different alternatives:

text1 | text2 | text3.1 or text3.2 | text4 | text5.1 or text5.2

As the number of combinations is exponential in the length of the
sentences I have been trying to use lattice-tool to compute the Viterbi
path but I am not able to make it work. I am using the following command
line:

$ lattice-tool -viterbi-decode -in-lattice lattice.pfsg -lm model.lm
-order 5 -debug 1

but I get exactly the same result with 5, 3 or even 0 n-gram order.

In addition, with the example sentence I am working with I get a
different path if I use SRILM in the usual way by scoring all possible
translations of the sentence.

What am I doing wrong? Thank you very much in advance.

PS: I am using srilm 1.5.7
-- 
Felipe S?nchez Mart?nez
Departamento de Lenguajes y Sistemas Inform?ticos
Universidad de Alicante, E-03071 Alicante (Spain)
Tel.: +34 965 903 400, ext: 2966 Fax: +34 965 909 326
http://www.dlsi.ua.es/~fsanchez


From stolcke at speech.sri.com  Tue Jul 28 16:21:00 2009
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Tue, 28 Jul 2009 16:21:00 PDT
Subject: Question Concerning ARPA-Format 
In-Reply-To: Your message of Thu, 23 Jul 2009 01:49:10 -0700.
             <454548.84797.qm@web63405.mail.re1.yahoo.com> 
Message-ID: <200907282321.n6SNL0d23219@ns2>


In message <454548.84797.qm at web63405.mail.re1.yahoo.com>you wrote:
> 
> Dear Andreas Stolcke,
> I have a question concerning your toolkit/arpa-format. I know, that this question c
> ould probably be answered by doing research - but after exhaustive research I found
>  no real answer...
> 
> I want to include a list of, say, syntactically equal words in an ARPA-slm, if poss
> ible as an external file. With this, my input-sentences would look like this, f.e.:
> 
> "Please give me the OBJECT"
> "Can I have the OBJECT"
> 
> OBJECT: spoon, book, remote-control ... (these in an external file)
> 
> 
> Can you have such an external reference with ARPA and your toolkit - or do you have
>  to copy the sentences, like this:
> 
> "Please give me the spoon"
> "Please give me the book"
> "Please give me the remote-control"
> 
> "Can I have the spoon"
> "Can I have the book"
> "Can I have the remote-control"
> 
> 
> It would be great, if you could give me a brief answer.

What you are describing is known as a "class-based" ngram LM.
It is supported by SRILM.

The steps are roughly:

1. Define the classes and their membership.
   The format is defined in the classes-format(5) man page.
   You can create one by hand, or induce word classes from a corpus based on 
   bigram cooccurrence statistics, using the ngram-class(1) program.

2. Preprocess your training corpus to replace words with classes.  
   See the replace-words-with-classes script described in the training-scripts(1)
   man page.

3. Training a standard ngram on the processed data, using ngram-count(1).

4. Test the class-based LM using ngram or another tool, supplying both the LM file
   and the class definitions file (from step 1), via the -classes option. 
   See the ngram(1) man page.

Andreas 


From kereoz at kereoz.org  Thu Aug  6 00:11:14 2009
From: kereoz at kereoz.org (Christophe)
Date: Thu, 6 Aug 2009 16:11:14 +0900
Subject: Acoustic model
Message-ID: <20090806071110.GL10626@puredyne.hil.t.u-tokyo.ac.jp>

(I might have sent a similar message last week, but I don't think it 
actually worked - sorry if it did).

Hello,

I would like to use SRILM to compute the most liely sequence of words
given an acoustic model and a language model.

My acoustic model is a simple matrix of observations. It corresponds to
a sequence of observations along with the probabilities that they match words from the vocabulary.
The vocabulary itself is composed of 24 words, so the matrix is Nx24
with N being the length of the sequence.
In other words, I can get the probability of each node from this matrix.

My language model is a n-gram model generated by ngram-count. It gives
me transition probabilities between nodes.

I think that the lattice-tool from SRILM can do the viterbi decoding
stuff, but I can't figure out how to import the acoustic model in it.

As far as I understand, it is expected to be in the pfsg format. Are
there any tool that would allow me to generate such a lattice from what
I have ?

Thank you.

Kind regards,

-- 
Christophe
http://www.kereoz.org


From stolcke at speech.sri.com  Wed Aug 12 10:47:11 2009
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Wed, 12 Aug 2009 10:47:11 -0700
Subject: language models
In-Reply-To: <939770.48418.qm@web38005.mail.mud.yahoo.com>
References: <821248.89886.qm@web38007.mail.mud.yahoo.com>
 <4A53321D.9050807@speech.sri.com>
 <785361.72829.qm@web38004.mail.mud.yahoo.com>
 <4A664D26.5000805@speech.sri.com> <939770.48418.qm@web38005.mail.mud.yahoo.com>
Message-ID: <4A83001F.30006@speech.sri.com>

Md. Akmal Haidar wrote:
> Dear Andreas,
> Thanks for your reply.
> Is the sum of n-gram probabilities sharing common (n-1) gram should be 
> equal to 1?
No, because smoothing results in some probability mass being assigned to 
ngrams not observed in the training data (and hence in the LM).  This 
probability mass is then assigned to the unobserved ngrams via the 
backoff formula.
> if yes,
> Is there any tool to normalize the language model probabilities such 
> that sum of n-gram probabilities sharing common (n-1) gram is equal to 1?
To make the probabilities of only the observed ngrams add up to 1 you 
need to disable smoothing, and also make sure all observed ngrams are 
include in the model.  Try ngram-count with these options:

-gt3min 1 -gt4min 1  (etc.)
-gt1max 0 -gt2max 0 -gt3max 0 -gt4max 0 (etc. up to the order of ngram 
you need)

For more details on smoothing check
http://www.speech.sri.com/projects/srilm/manpages/ngram-discount.7.html

Andreas


Andreas


> 1
> Thanks
> Best Regards
> Akmal
>
>
> ------------------------------------------------------------------------
> *From:* Andreas Stolcke <stolcke at speech.sri.com>
> *To:* Md. Akmal Haidar <akmalcuet00 at yahoo.com>
> *Sent:* Tuesday, July 21, 2009 7:20:06 PM
> *Subject:* Re: language models
>
> Md. Akmal Haidar wrote:
> > Dear Andreas,
> >  Thanks for your reply.
> >  what is the difference between language model creating from a text 
> file and a count file.
> > if i use like -text textfile -lm lmfile & -read countfile -vocab 
> vocabfile -lm lmfile. the first one gives smaller perplexity.
> The difference is probably due to use of the -vocab option.  It limits 
> the vocabulary of the LM.
> If you use it in both cases, or not at all your should get the same 
> results.
>
> Andreas
>
> >  Could you please tell me what's the reason?
> > Thanks & Regards
> > Akmal
> >
> > ------------------------------------------------------------------------
> > *From:* Andreas Stolcke <stolcke at speech.sri.com 
> <mailto:stolcke at speech.sri.com>>
> > *To:* Md. Akmal Haidar <akmalcuet00 at yahoo.com 
> <mailto:akmalcuet00 at yahoo.com>>
> > *Sent:* Tuesday, July 7, 2009 7:31:41 AM
> > *Subject:* Re: Mixing several topic models
> >
> > Md. Akmal Haidar wrote:
> > > Hi,
> > >
> > > I am new in srilm.
> > >
> > > I am working for language model adaptation using LDA. I need to mix
> > > several topic models through weighting factor. >
> > > Is there any way in srilm to mix several language models?
> > Read the ngram(1) man page, specifically about the options -mix-lm,
> > -mix-lm2, etc.
> >
> > Andreas
> >
> > >
> > > Thanks
> > >
> > > Kind Regards
> > > Akmal
> > >
> > >
> >
> >
>
>


From jmcrego at limsi.fr  Mon Aug 17 06:05:53 2009
From: jmcrego at limsi.fr (Josep Maria Crego)
Date: Mon, 17 Aug 2009 15:05:53 +0200
Subject: using FLM library
Message-ID: <409a8e0c0908170605y2f542550i215c724b5cbaa768@mail.gmail.com>

dear users,

I am trying to use the factored LM library classes (version 1.5.8) directly
in my code. Mainly, I would like to use the flm function which is equivalent
to *wordProb(ixWord, history)* in standard lm's of srilm.

So, my question is: does it exist for FLM's a function equivalent to
*wordProb(ixWord,
history)* where ixWord and history consist of a vector of factors??? does
anyone have an example of code employing ngram probabilities from a sequence
of factors (Ex: W-wrd1:T-pos1 W-wrd2:T-pos2 ... W-wrdN:T-posN) according to
a FLM description file???

thanks in advance,
jm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20090817/f3edcc97/attachment.html>

From stolcke at speech.sri.com  Mon Aug 17 14:26:18 2009
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Mon, 17 Aug 2009 14:26:18 -0700
Subject: srilm-user list changes
Message-ID: <4A89CAFA.6000709@speech.sri.com>


Hi all,

we'll be switching the srilm-user mailing list management software from 
Majordomo to GNU Mailman some time later today.
The list will be offline until then.  All current members will be added 
to the new list, and you'll be getting a welcome message with 
instructions on how to manage your list membership..

Sorry for the troubles we've been having with the software in the recent 
past, and for the temporary disruption.

Andreas


From stolcke at speech.sri.com  Mon Aug 17 14:40:56 2009
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Mon, 17 Aug 2009 14:40:56 PDT
Subject: SRILM-USER list changes
Message-ID: <200908172140.n7HLeu410817@ns2>


Hi all,

we will be switching the srilm-user mailing list management software from 
Majordomo to GNU Mailman some time later today.
The list will be offline until then.  All current members will be added 
to the new list, and you'll be getting a welcome message with 
instructions on how to manage your list membership..

Sorry for the troubles we've been having with the software in the recent 
past, and for the temporary disruption.

Andreas


From stolcke at speech.sri.com  Mon Aug 17 15:47:00 2009
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Mon, 17 Aug 2009 15:47:00 -0700
Subject: [SRILM User List] Testing new srilm-user list
Message-ID: <4A89DDE4.5070709@speech.sri.com>


From stolcke at speech.sri.com  Wed Aug 19 17:05:40 2009
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Wed, 19 Aug 2009 17:05:40 -0700
Subject: [SRILM User List] language models
In-Reply-To: <595005.19488.qm@web38006.mail.mud.yahoo.com>
References: <200908131724.n7DHOfW29493@ns2>
	<595005.19488.qm@web38006.mail.mud.yahoo.com>
Message-ID: <4A8C9354.4010901@speech.sri.com>

Md. Akmal Haidar wrote:
> Hi,
> I have three 3 lm file.
> The first one i got by ngram-count.
> The second one is by applying some matlab programming on the first.
> The third one is by renormalizing the second one using ngram -renorm 
> option.
>  
> In creating the third one, i faced some message like: BOW denominator 
> for context "been has" is -0.382151<=0, numerator is 0.846874
That's expected if you changed the probabilities such that they sum to > 
1 for a given context.
ngram -renorm cannot deal with this.  It simply recomputes the backoff 
weights to normalize the model, but it won't change the existing ngram 
probabilities.  Obviously if just the explicit ngram probabilities sum 
to > 1 there is no way to assign backoff weights such that the model is 
normalized, hence the above message.
>  
> The second and third one gives too lowest perplexity(7.53 & 5.70) . 
> The first one gives 73.73
That's right, if your probabilities don't sum to 1 (over the entire 
vocabulary, for all contexts) perplexities are meaningless.

You can run ngram -debug 3 -ppl to check that probabilities are 
normalized for all contexts occurring in your test set.

I don't have a simple solution for your problem.  Since you manipulated 
the probabilities you have to figure out a way to get them normalized 
!   I suggest you use the srilm-user mailing list if you want to seek 
further advice this.  But you would first have to explain in more detail 
how you assign your probabilities.

Andreas

>  
> Could you please tell me whats the meaning of these message?
>  
> Thanks & Regards
> Haidar
>
>  
> ------------------------------------------------------------------------
> *From:* Andreas Stolcke <stolcke at speech.sri.com>
> *To:* Md. Akmal Haidar akmalcuet00 at yahoo.com 
> <mailto:akmalcuet00 at yahoo.com>
> *Sent:* Thursday, August 13, 2009 1:24:41 PM
> *Subject:* Re: language models
>
>
> In message <92580.94445.qm at web38002.mail.mud.yahoo.com 
> <mailto:92580.94445.qm at web38002.mail.mud.yahoo.com>>you wrote:
> >
> > Dear Andreas,
> > I attahced 2 lm file.
> > here, train3.lm is the original lm file which i got by applying 
> ngram-count.
>
> So does that file have probabilities summing to 1?
> I would think not.
>
> > ntrain3.lm is the modified lm which i got by some matlab 
> programming. But, he
> > re sum the of seen 2-gram probabilities sharing common 1 gram is 
> greater than
> >  1.
>
> I cannot help you debugging you matlab script if that's what's giving
> you unnormalized probabilities.
>
> >
> > If i changed the 1 gram back off weight to make the sum of 
> 2-gram(seen & unse
> > en) proability sharing common 1 gram is equal to 1, is the method 
> will correc
> > t?
>
> yes.
>
> ngram -renorm will also do this for you.
>
> Andreas
>
>


From stolcke at speech.sri.com  Fri Aug 21 10:45:03 2009
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Fri, 21 Aug 2009 10:45:03 -0700
Subject: [SRILM User List] a question about FLMs and SRILM
In-Reply-To: <409a8e0c0908210905q6e3be46bi4200c48915d5d95d@mail.gmail.com>
References: <409a8e0c0908210905q6e3be46bi4200c48915d5d95d@mail.gmail.com>
Message-ID: <4A8EDD1F.8080200@speech.sri.com>

Josep Maria Crego wrote:
> dear Andreas,
>
> My name is Josep M. Crego, a post-doc working on SMT at LIMSI-CNRS 
> (France)
>
> I am trying to use the factored LM library classes (version 1.5.8) 
> directly in my code. Mainly, I would like to use an flm function 
> equivalent to /*wordProb(ixWord, history)*/ for standard lm's of srilm.
>
> So, my question is: does it exist for FLM's a function equivalent to 
> /*wordProb(ixWord, history)*/ where ixWord and history consist of a 
> vector of factors??? it would be perfect if you could send me an 
> example of code employing ngram probabilities from a sequence of 
> factors (Ex: W-wrd1:T-pos1 W-wrd2:T-pos2 ... W-wrdN:T-posN) according 
> to a FLM description file.
There is a wordProb function for FLMs, since FLMs are just a special 
kind of LM class.
You need create an LM object of class ProductNgram and then invoke the 
wordProb function in it.
Look in lm/src/ngram.cc for an example (look in the places where the 
variable "factored" is used).

>
> thanks in advance,
> jm
>
> PS: sorry for sending directly the question to you... I don't know why 
> I couldn't use the srilm mailing list
There were problems with the mailing list admin software.  We solved 
those and there is now an easy way to join/leave the list.  Just go to 
http://www.speech.sri.com/mailman/listinfo/srilm-user/ and follow the 
instructions there.

Andreas

>
> -- 
> Josep-Maria Crego
> LIMSI-CNRS
> Phone: +33/0 1 69 85 80 68
> Postmail: BP 133, 91403 Orsay (France) 


From shl.thcn at yahoo.com.cn  Thu Aug 27 00:21:25 2009
From: shl.thcn at yahoo.com.cn (=?gb2312?B?uqPB+iDKtw==?=)
Date: Thu, 27 Aug 2009 00:21:25 -0700 (PDT)
Subject: [SRILM User List] A confusion of the interpolated language model
Message-ID: <942355.92368.qm@web15307.mail.cnb.yahoo.com>


I am a new student user of srilm from Asia.Here I used the command below to construct a interpolated mod-kn discount language model:
~ ngram-count -read merge_counts_1994-2003.gz -gt1min 0 -gt2min 0 -gt3min 2 -kndiscount -interpolate -order 3 -vocab ChWord.lexno -lm 1994-2003_lm_all_pruned.lm


 However in my model several N-grams' back-off werght(bow) appears to be greater than 1.That is ,in the text LM file,I've got a line:
-6.457229    <s> 1635    0.1270406
(Here we just use a kind of index to represent a chinese word)
in whitch the 1og10(bow) is greater than 0.We don't think a normal interplotate discount method can produce an N-gram bow greater than 1,besides this circumstance only occured to several(less than 5) different N-grams.So I am confused and would like to ask if there is someyone who encounterd this circumstance or happens to know what is wrong.
Thank you very much!

???
Hailoon Shi
w63,EE Dpt.Thu Univ.PRC

__________________________________________________
???????????????
http://cn.mail.yahoo.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20090827/c0a2cf61/attachment.html>

From yannick.esteve at lium.univ-lemans.fr  Thu Aug 27 01:19:44 2009
From: yannick.esteve at lium.univ-lemans.fr (=?GB2312?Q?Yannick_Est=A8=A8ve?=)
Date: Thu, 27 Aug 2009 10:19:44 +0200
Subject: [SRILM User List] A confusion of the interpolated language model
In-Reply-To: <942355.92368.qm@web15307.mail.cnb.yahoo.com>
References: <942355.92368.qm@web15307.mail.cnb.yahoo.com>
Message-ID: <42586A5A-C583-4404-9018-EF2C0193C5F9@lium.univ-lemans.fr>

Hi,

Back-off weights are not probabilities: they can be greater than 1.
So, your values are normal. You can have some explanations about back- 
off weight computation here, particularly for the use of the modified  
Kneser-Ney discounting method:
http://www.speech.sri.com/projects/srilm/manpages/pdfs/chen-goodman-tr-10-98.pdf

Regards,
Yannick Est?ve
LIUM - University of Le Mans
France

Le 27 ao?t 09 ? 09:21, ?? ? a ?crit :

>
>
>
>
> I am a new student user of srilm from Asia.Here I used the command  
> below to construct a interpolated mod-kn discount language model:
> ~ ngram-count -read merge_counts_1994-2003.gz -gt1min 0 -gt2min 0 - 
> gt3min 2 -kndiscount -interpolate -order 3 -vocab ChWord.lexno -lm  
> 1994-2003_lm_all_pruned.lm
>
>  However in my model several N-grams' back-off werght(bow) appears  
> to be greater than 1.That is ,in the text LM file,I've got a line:
> -6.457229    <s> 1635    0.1270406
> (Here we just use a kind of index to represent a chinese word)
> in whitch the 1og10(bow) is greater than 0.We don't think a normal  
> interplotate discount method can produce an N-gram bow greater than  
> 1,besides this circumstance only occured to several(less than 5)  
> different N-grams.So I am confused and would like to ask if there is  
> someyone who encounterd this circumstance or happens to know what is  
> wrong.
> Thank you very much!
>
> ???
> Hailoon Shi
> w63,EE Dpt.Thu Univ.PRC
>
>
>
>
>
> __________________________________________________
> ???????????????
> http:// 
> cn.mail.yahoo.com_______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20090827/11b93d43/attachment.html>

From stolcke at speech.sri.com  Thu Aug 27 13:38:35 2009
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Thu, 27 Aug 2009 13:38:35 -0700
Subject: [SRILM User List] language models
In-Reply-To: <2037.50270.qm@web38003.mail.mud.yahoo.com>
References: <200908131724.n7DHOfW29493@ns2>
	<595005.19488.qm@web38006.mail.mud.yahoo.com>
	<4A8C9354.4010901@speech.sri.com>
	<2037.50270.qm@web38003.mail.mud.yahoo.com>
Message-ID: <4A96EECB.4090107@speech.sri.com>

Md. Akmal Haidar wrote:
>
>  Hi,
> Thanks for your reply.
> I need to mix 20 topic models. srilm provide 10 LM file one at a time.
> I use the following command:(t:topic,w:topic weight)
> ngram -lm t1.lm w1 -mix-lm t2.lm w2 -mix-lm2 t3.lm w3 
> .............-mix-lm9 t10.lm w10 -write-lm t1to10.lm
> ngram -lm t11.lm w11 -mix-lm t12.lm w12 -mix-lm2 t13.lm w13 
> .............-mix-lm9 t20.lm w20 -write-lm t11to20.lm
> ngram -lm t1to10.lm .5 -mix-lm t11to20.lm .5 -write-lm t1to20.lm
You can mix the models recursively.  To mix three models  L1 L2 L3 with 
weights w1 w2 w3 (w1 + w2+ w3  = 1)
you first build

       L12 = w1/(w1+w2) L1 + w2/(w1+w2) L2

and then

       L = (w1 + w2) L12 + w3 L3.

I'll leave it to you to generalize this to a larger number of models.

Please direct future questions of this nature to the srilm-user mailing 
list.

Andreas

>
> could you please tell me is the command correct for mixing LM file?
>
> Thanks
> Akmal
>
> ------------------------------------------------------------------------
> *From:* Andreas Stolcke <stolcke at speech.sri.com>
> *To:* Md. Akmal Haidar <akmalcuet00 at yahoo.com>
> *Cc:* srilm-user <srilm-user at speech.sri.com>
> *Sent:* Wednesday, August 19, 2009 8:05:40 PM
> *Subject:* Re: language models
>
> Md. Akmal Haidar wrote:
> > Hi,
> > I have three 3 lm file.
> > The first one i got by ngram-count.
> > The second one is by applying some matlab programming on the first.
> > The third one is by renormalizing the second one using ngram -renorm 
> option.
> >  In creating the third one, i faced some message like: BOW 
> denominator for context "been has" is -0.382151<=0, numerator is 0.846874
> That's expected if you changed the probabilities such that they sum to 
> > 1 for a given context.
> ngram -renorm cannot deal with this.  It simply recomputes the backoff 
> weights to normalize the model, but it won't change the existing ngram 
> probabilities.  Obviously if just the explicit ngram probabilities sum 
> to > 1 there is no way to assign backoff weights such that the model 
> is normalized, hence the above message.
> >  The second and third one gives too lowest perplexity(7.53 & 5.70) . 
> The first one gives 73.73
> That's right, if your probabilities don't sum to 1 (over the entire 
> vocabulary, for all contexts) perplexities are meaningless.
>
> You can run ngram -debug 3 -ppl to check that probabilities are 
> normalized for all contexts occurring in your test set.
>
> I don't have a simple solution for your problem.  Since you 
> manipulated the probabilities you have to figure out a way to get them 
> normalized !  I suggest you use the srilm-user mailing list if you 
> want to seek further advice this.  But you would first have to explain 
> in more detail how you assign your probabilities.
>
> Andreas
>
> >  Could you please tell me whats the meaning of these message?
> >  Thanks & Regards
> > Haidar
> >
> >  
> ------------------------------------------------------------------------
> > *From:* Andreas Stolcke <stolcke at speech.sri.com 
> <mailto:stolcke at speech.sri.com>>
> > *To:* Md. Akmal Haidar akmalcuet00 at yahoo.com 
> <mailto:akmalcuet00 at yahoo.com> <mailto:akmalcuet00 at yahoo.com 
> <mailto:akmalcuet00 at yahoo.com>>
> > *Sent:* Thursday, August 13, 2009 1:24:41 PM
> > *Subject:* Re: language models
> >
> >
> > In message <92580.94445.qm at web38002.mail.mud.yahoo.com 
> <mailto:92580.94445.qm at web38002.mail.mud.yahoo.com> 
> <mailto:92580.94445.qm at web38002.mail.mud.yahoo.com 
> <mailto:92580.94445.qm at web38002.mail.mud.yahoo.com>>>you wrote:
> > >
> > > Dear Andreas,
> > > I attahced 2 lm file.
> > > here, train3.lm is the original lm file which i got by applying 
> ngram-count.
> >
> > So does that file have probabilities summing to 1?
> > I would think not.
> >
> > > ntrain3.lm is the modified lm which i got by some matlab 
> programming. But, he
> > > re sum the of seen 2-gram probabilities sharing common 1 gram is 
> greater than
> > >  1.
> >
> > I cannot help you debugging you matlab script if that's what's giving
> > you unnormalized probabilities.
> >
> > >
> > > If i changed the 1 gram back off weight to make the sum of 
> 2-gram(seen & unse
> > > en) proability sharing common 1 gram is equal to 1, is the method 
> will correc
> > > t?
> >
> > yes.
> >
> > ngram -renorm will also do this for you.
> >
> > Andreas
> >
> >
>
>


From akmalcuet00 at yahoo.com  Fri Aug 28 09:50:07 2009
From: akmalcuet00 at yahoo.com (Md. Akmal Haidar)
Date: Fri, 28 Aug 2009 09:50:07 -0700 (PDT)
Subject: [SRILM User List] different perplexity
Message-ID: <553811.97317.qm@web38005.mail.mud.yahoo.com>

Hi,

i faced a problem in perplexity calculation..
when i used the commands: 1) ngram -lm l1.lm -ppl t.txt?
? ??????????????????????????????????????? 2) ngram -lm l2.lm -lambda 0 -mix-lm l1.lm -ppl? t.txt

the first gives lowest perplexity that the second one. 
Should the above commands give the different perplexity?

thanks

Akmal


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20090828/8b56ec0a/attachment.html>

From stolcke at speech.sri.com  Fri Aug 28 10:39:45 2009
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Fri, 28 Aug 2009 10:39:45 -0700
Subject: [SRILM User List] different perplexity
In-Reply-To: <553811.97317.qm@web38005.mail.mud.yahoo.com>
References: <553811.97317.qm@web38005.mail.mud.yahoo.com>
Message-ID: <4A981661.9050306@speech.sri.com>

Md. Akmal Haidar wrote:
> Hi,
>  
> i faced a problem in perplexity calculation..
> when i used the commands: 1) ngram -lm l1.lm -ppl t.txt 
>                                           2) ngram -lm l2.lm -lambda 0 
> -mix-lm l1.lm -ppl  t.txt
>  
> the first gives lowest perplexity that the second one.
> Should the above commands give the different perplexity?
They may, though not by much.

Realize that ngram -mix-lm WITHOUT the -bayes option performs an "ngram 
merging" that APPROXIMATES the result of interpolating the two LMs 
according to the classical formula.  This is describe in the the SRILM 
paper:
> The ability to approximate class-based and interpolated Ngram
> LMs by a single word N-gram model deserves some discussion.
> Both of these operations are useful in situations where
> other software (e.g., a speech recognizer) supports only standard
> N-grams. Class N-grams are approximated by expanding class labels
> into their members (which can contain multiword strings) and
> then computing the marginal probabilities of word N-gram strings.
> This operation increases the number of N-grams combinatorially,
> and is therefore feasible only for relatively small models.
> An interpolated backoff model is obtained by taking the union
> of N-grams of the input models, assigning each N-gram the
> weighted average of the probabilities from those models (in some
> of the models this probability might be computed by backoff), and
> then renormalizing the new model. We found that such interpolated
> backoff models consistently give slightly lower perplexities
> than the corresponding standard word-level interpolated models.
> The reason could be that the backoff distributions are themselves
> obtained by interpolation, unlike in standard interpolation, where
> each component model backs off individually.
So the result may differ because because the merging process introduces 
new backoff nodes into the LM and that may change some probabilities 
arrived at through backing off. However, if you use

    ngram -lm l2.lm -lambda 0 -mix-lm l1.lm -ppl  t.txt -bayes 0

you get exact interpolation and then the perplexities should be identical.
But you cannot save such an interpolated model back into a single ngram LM.

In practice the difference should not matter (at least in my experience).

Andreas


>  
> thanks
>  
> Akmal
>
>  
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user


From akmalcuet00 at yahoo.com  Fri Aug 28 12:17:10 2009
From: akmalcuet00 at yahoo.com (Md. Akmal Haidar)
Date: Fri, 28 Aug 2009 12:17:10 -0700 (PDT)
Subject: [SRILM User List] different perplexity
In-Reply-To: <4A981661.9050306@speech.sri.com>
References: <553811.97317.qm@web38005.mail.mud.yahoo.com>
	<4A981661.9050306@speech.sri.com>
Message-ID: <309353.14233.qm@web38002.mail.mud.yahoo.com>


?
Hi,
Thanks for your reply.
I need to compare two lm file by perplexity evaluation.
?
1. i) ngram -lm general.lm -lambda .5 -mix-lm l1.lm -ppl test1.txt 
??? ii) ngram -lm general.lm -lambda .5 -mix-lm l1.lm -ppl test1.txt -bayes 0
??????? in both commands it gives same perplexity but when
2. i) ngram -lm general.lm -lambda .5 -mix-lm l2.lm -ppl test1.txt 
??????? ppl=460
?? ii)ngram -lm general.lm -lambda .5 -mix-lm l2.lm -ppl test1.txt -bayes 0
????? ppl=148
??? the 2(ii)??command gives lower perplexity.
?
could you please tell me why the second one gives lower perplexity? 
?
thanks
akmal
??? 
Md. Akmal Haidar wrote:
> Hi,
>? i faced a problem in perplexity calculation..
> when i used the commands: 1) ngram -lm l1.lm -ppl t.txt? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2) ngram -lm l2.lm -lambda 0 -mix-lm l1.lm -ppl? t.txt
>? the first gives lowest perplexity that the second one.
> Should the above commands give the different perplexity?
They may, though not by much.

Realize that ngram -mix-lm WITHOUT the -bayes option performs an "ngram merging" that APPROXIMATES the result of interpolating the two LMs according to the classical formula.? This is describe in the the SRILM paper:
> The ability to approximate class-based and interpolated Ngram
> LMs by a single word N-gram model deserves some discussion.
> Both of these operations are useful in situations where
> other software (e.g., a speech recognizer) supports only standard
> N-grams. Class N-grams are approximated by expanding class labels
> into their members (which can contain multiword strings) and
> then computing the marginal probabilities of word N-gram strings.
> This operation increases the number of N-grams combinatorially,
> and is therefore feasible only for relatively small models.
> An interpolated backoff model is obtained by taking the union
> of N-grams of the input models, assigning each N-gram the
> weighted average of the probabilities from those models (in some
> of the models this probability might be computed by backoff), and
> then renormalizing the new model. We found that such interpolated
> backoff models consistently give slightly lower perplexities
> than the corresponding standard word-level interpolated models.
> The reason could be that the backoff distributions are themselves
> obtained by interpolation, unlike in standard interpolation, where
> each component model backs off individually.
So the result may differ because because the merging process introduces new backoff nodes into the LM and that may change some probabilities arrived at through backing off. However, if you use

? ngram -lm l2.lm -lambda 0 -mix-lm l1.lm -ppl? t.txt -bayes 0

you get exact interpolation and then the perplexities should be identical.
But you cannot save such an interpolated model back into a single ngram LM.

In practice the difference should not matter (at least in my experience).

Andreas


>? thanks
>? Akmal
> 
>? 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user


________________________________

From: Andreas Stolcke <stolcke at speech.sri.com>
To: Md. Akmal Haidar <akmalcuet00 at yahoo.com>
Cc: srilm-user <srilm-user at speech.sri.com>
Sent: Friday, August 28, 2009 1:39:45 PM
Subject: Re: [SRILM User List] different perplexity


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20090828/ca260f0e/attachment.html>

From akmalcuet00 at yahoo.com  Fri Aug 28 14:01:36 2009
From: akmalcuet00 at yahoo.com (Md. Akmal Haidar)
Date: Fri, 28 Aug 2009 14:01:36 -0700 (PDT)
Subject: [SRILM User List] different perplexity
In-Reply-To: <4A984434.9090909@speech.sri.com>
References: <553811.97317.qm@web38005.mail.mud.yahoo.com>
	<4A981661.9050306@speech.sri.com>
	<309353.14233.qm@web38002.mail.mud.yahoo.com>
	<4A984434.9090909@speech.sri.com>
Message-ID: <477380.80961.qm@web38002.mail.mud.yahoo.com>

the perplexity for 1(i)=450, 1(ii)=450. both are same

by the way, some back-off weights for l2.lm are greater than 1.

thanks
Akmal


________________________________
From: Andreas Stolcke <stolcke at speech.sri.com>
To: Md. Akmal Haidar <akmalcuet00 at yahoo.com>
Sent: Friday, August 28, 2009 4:55:16 PM
Subject: Re: [SRILM User List] different perplexity

Md. Akmal Haidar wrote:
> 
>  Hi,
> Thanks for your reply.
> I need to compare two lm file by perplexity evaluation.
>  1. i) ngram -lm general.lm -lambda .5 -mix-lm l1.lm -ppl test1.txt
>     ii) ngram -lm general.lm -lambda .5 -mix-lm l1.lm -ppl test1.txt -bayes 0
>         in both commands it gives same perplexity but when
> 2. i) ngram -lm general.lm -lambda .5 -mix-lm l2.lm -ppl test1.txt
>         ppl=460
>    ii)ngram -lm general.lm -lambda .5 -mix-lm l2.lm -ppl test1.txt -bayes 0
>       ppl=148
>     the 2(ii)  command gives lower perplexity.
that is quite odd.  What is the perplexity for 1(i) and 1(ii) ?

andreas

>  could you please tell me why the second one gives lower perplexity?
>  thanks
> akmal
>    ------------------------------------------------------------------------
> *From:* Andreas Stolcke <stolcke at speech.sri.com>
> *To:* Md. Akmal Haidar <akmalcuet00 at yahoo.com>
> *Cc:* srilm-user <srilm-user at speech.sri.com>
> *Sent:* Friday, August 28, 2009 1:39:45 PM
> *Subject:* Re: [SRILM User List] different perplexity
> 
> Md. Akmal Haidar wrote:
> > Hi,
> >  i faced a problem in perplexity calculation..
> > when i used the commands: 1) ngram -lm l1.lm -ppl t.txt                                          2) ngram -lm l2.lm -lambda 0 -mix-lm l1.lm -ppl  t.txt
> >  the first gives lowest perplexity that the second one.
> > Should the above commands give the different perplexity?
> They may, though not by much.
> 
> Realize that ngram -mix-lm WITHOUT the -bayes option performs an "ngram merging" that APPROXIMATES the result of interpolating the two LMs according to the classical formula.  This is describe in the the SRILM paper:
> > The ability to approximate class-based and interpolated Ngram
> > LMs by a single word N-gram model deserves some discussion.
> > Both of these operations are useful in situations where
> > other software (e.g., a speech recognizer) supports only standard
> > N-grams. Class N-grams are approximated by expanding class labels
> > into their members (which can contain multiword strings) and
> > then computing the marginal probabilities of word N-gram strings.
> > This operation increases the number of N-grams combinatorially,
> > and is therefore feasible only for relatively small models.
> > An interpolated backoff model is obtained by taking the union
> > of N-grams of the input models, assigning each N-gram the
> > weighted average of the probabilities from those models (in some
> > of the models this probability might be computed by backoff), and
> > then renormalizing the new model. We found that such interpolated
> > backoff models consistently give slightly lower perplexities
> > than the corresponding standard word-level interpolated models.
> > The reason could be that the backoff distributions are themselves
> > obtained by interpolation, unlike in standard interpolation, where
> > each component model backs off individually.
> So the result may differ because because the merging process introduces new backoff nodes into the LM and that may change some probabilities arrived at through backing off. However, if you use
> 
>   ngram -lm l2.lm -lambda 0 -mix-lm l1.lm -ppl  t.txt -bayes 0
> 
> you get exact interpolation and then the perplexities should be identical.
> But you cannot save such an interpolated model back into a single ngram LM.
> 
> In practice the difference should not matter (at least in my experience).
> 
> Andreas
> 
> 
> >  thanks
> >  Akmal
> >
> > > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > SRILM-User site list
> > SRILM-User at speech.sri.com <mailto:SRILM-User at speech.sri.com>
> > http://www.speech.sri.com/mailman/listinfo/srilm-user
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20090828/49d65838/attachment.html>

From stolcke at speech.sri.com  Fri Aug 28 18:04:55 2009
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Fri, 28 Aug 2009 18:04:55 -0700
Subject: [SRILM User List] different perplexity
In-Reply-To: <477380.80961.qm@web38002.mail.mud.yahoo.com>
References: <553811.97317.qm@web38005.mail.mud.yahoo.com>
	<4A981661.9050306@speech.sri.com>
	<309353.14233.qm@web38002.mail.mud.yahoo.com>
	<4A984434.9090909@speech.sri.com>
	<477380.80961.qm@web38002.mail.mud.yahoo.com>
Message-ID: <4A987EB7.5000003@speech.sri.com>

Md. Akmal Haidar wrote:
> the perplexity for 1(i)=450, 1(ii)=450. both are same
>
> by the way, some back-off weights for l2.lm are greater than 1.
My guess would be that l2.lm is not properly normalized.
Try running it with ngram -debug 3 -ppl on some test data.

When you interpolate with -bayes 0 no normalization is applied to the 
resulting model (it should be automatically normalized assuming the 
component models are normalized), to the resulting more will also be 
unnormalized and give bogus low perplexity.

Andreas

>
> thanks
> Akmal
>
>
> ------------------------------------------------------------------------
> *From:* Andreas Stolcke <stolcke at speech.sri.com>
> *To:* Md. Akmal Haidar <akmalcuet00 at yahoo.com>
> *Sent:* Friday, August 28, 2009 4:55:16 PM
> *Subject:* Re: [SRILM User List] different perplexity
>
> Md. Akmal Haidar wrote:
> >
> >  Hi,
> > Thanks for your reply.
> > I need to compare two lm file by perplexity evaluation.
> >  1. i) ngram -lm general.lm -lambda .5 -mix-lm l1.lm -ppl test1.txt
> >    ii) ngram -lm general.lm -lambda .5 -mix-lm l1.lm -ppl test1.txt 
> -bayes 0
> >        in both commands it gives same perplexity but when
> > 2. i) ngram -lm general.lm -lambda .5 -mix-lm l2.lm -ppl test1.txt
> >        ppl=460
> >    ii)ngram -lm general.lm -lambda .5 -mix-lm l2.lm -ppl test1.txt 
> -bayes 0
> >      ppl=148
> >    the 2(ii)  command gives lower perplexity.
> that is quite odd.  What is the perplexity for 1(i) and 1(ii) ?
>
> andreas
>
> >  could you please tell me why the second one gives lower perplexity?
> >  thanks
> > akmal
> >    
> ------------------------------------------------------------------------
> > *From:* Andreas Stolcke <stolcke at speech.sri.com 
> <mailto:stolcke at speech.sri.com>>
> > *To:* Md. Akmal Haidar <akmalcuet00 at yahoo.com 
> <mailto:akmalcuet00 at yahoo.com>>
> > *Cc:* srilm-user <srilm-user at speech.sri.com 
> <mailto:srilm-user at speech.sri.com>>
> > *Sent:* Friday, August 28, 2009 1:39:45 PM
> > *Subject:* Re: [SRILM User List] different perplexity
> >
> > Md. Akmal Haidar wrote:
> > > Hi,
> > >  i faced a problem in perplexity calculation..
> > > when i used the commands: 1) ngram -lm l1.lm -ppl t.txt            
>                               2) ngram -lm l2.lm -lambda 0 -mix-lm 
> l1.lm -ppl  t.txt
> > >  the first gives lowest perplexity that the second one.
> > > Should the above commands give the different perplexity?
> > They may, though not by much.
> >
> > Realize that ngram -mix-lm WITHOUT the -bayes option performs an 
> "ngram merging" that APPROXIMATES the result of interpolating the two 
> LMs according to the classical formula.  This is describe in the the 
> SRILM paper:
> > > The ability to approximate class-based and interpolated Ngram
> > > LMs by a single word N-gram model deserves some discussion.
> > > Both of these operations are useful in situations where
> > > other software (e.g., a speech recognizer) supports only standard
> > > N-grams. Class N-grams are approximated by expanding class labels
> > > into their members (which can contain multiword strings) and
> > > then computing the marginal probabilities of word N-gram strings.
> > > This operation increases the number of N-grams combinatorially,
> > > and is therefore feasible only for relatively small models.
> > > An interpolated backoff model is obtained by taking the union
> > > of N-grams of the input models, assigning each N-gram the
> > > weighted average of the probabilities from those models (in some
> > > of the models this probability might be computed by backoff), and
> > > then renormalizing the new model. We found that such interpolated
> > > backoff models consistently give slightly lower perplexities
> > > than the corresponding standard word-level interpolated models.
> > > The reason could be that the backoff distributions are themselves
> > > obtained by interpolation, unlike in standard interpolation, where
> > > each component model backs off individually.
> > So the result may differ because because the merging process 
> introduces new backoff nodes into the LM and that may change some 
> probabilities arrived at through backing off. However, if you use
> >
> >  ngram -lm l2.lm -lambda 0 -mix-lm l1.lm -ppl  t.txt -bayes 0
> >
> > you get exact interpolation and then the perplexities should be 
> identical.
> > But you cannot save such an interpolated model back into a single 
> ngram LM.
> >
> > In practice the difference should not matter (at least in my 
> experience).
> >
> > Andreas
> >
> >
> > >  thanks
> > >  Akmal
> > >
> > > > 
> ------------------------------------------------------------------------
> > >
> > > _______________________________________________
> > > SRILM-User site list
> > > SRILM-User at speech.sri.com <mailto:SRILM-User at speech.sri.com> 
> <mailto:SRILM-User at speech.sri.com <mailto:SRILM-User at speech.sri.com>>
> > > http://www.speech.sri.com/mailman/listinfo/srilm-user
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > SRILM-User site list
> > SRILM-User at speech.sri.com <mailto:SRILM-User at speech.sri.com>
> > http://www.speech.sri.com/mailman/listinfo/srilm-user
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user


From shl.thcn at yahoo.com.cn  Fri Aug 28 22:21:25 2009
From: shl.thcn at yahoo.com.cn (=?utf-8?B?5rW36b6ZIOWPsg==?=)
Date: Fri, 28 Aug 2009 22:21:25 -0700 (PDT)
Subject: [SRILM User List] =?utf-8?b?5Zue5aSN77yaICBBIGNvbmZ1c2lvbiBvZiB0?=
 =?utf-8?q?he_interpolated_language_model?=
In-Reply-To: <42586A5A-C583-4404-9018-EF2C0193C5F9@lium.univ-lemans.fr>
References: <942355.92368.qm@web15307.mail.cnb.yahoo.com>
	<42586A5A-C583-4404-9018-EF2C0193C5F9@lium.univ-lemans.fr>
Message-ID: <549193.81220.qm@web15303.mail.cnb.yahoo.com>

Hi,Thanks for your concern!
I do know that back-off weight is not a probability,but in the interpolated mod-kn smoothing method,bows are not supposed to be greater than 1.
In the man document of srilm ngram-discount.7.html,I've got this:
For back-off smoothing,there is
(1)   p(a_z) = (c(a_z) > 0) ? f(a_z) : bow(a_) p(_z) 
where f(a_z) depends on the smoothing method and the bow(a_) is calculated below:
    Sum_Z p(a_z) = 1 Sum_Z1 f(a_z) + Sum_Z0 bow(a_) p(_z) = 1 
(2)   bow(a_) = (1- Sum_Z1 f(a_z)) / Sum_Z0 p(_z) 
            = (1 - Sum_Z1 f(a_z)) / (1 - Sum_Z1 p(_z)) 
            = (1 - Sum_Z1 f(a_z)) / (1 - Sum_Z1 f(_z)) 
but for interpolated smoothing, there is
(3)    f(a_z) = g(a_z) + bow(a_) p(_z) 
(4)    p(a_z) = (c(a_z) > 0) ? f(a_z) : bow(a_) p(_z) 
and
    Sum_Z p(a_z) = 1 
    Sum_Z1 g(a_z) + Sum_Z bow(a_) p(_z) = 1 
(5)    	bow(a_) = 1 - Sum_Z1 g(a_z) 

 (Where Z be the set of all words in the vocabulary, Z0 be the set of 
all words with c(a_z) = 0, and Z1 be the set of all 
words with c(a_z) > 0)

However in the srilm sourse codes ,it seems that the interpolated bows is calculated using (5) and then the probs and bows is trasfered into back-off model using (3) ,then the back-off version of the bows are recomputed using (2).I just don't understand why srilm do not use the bow calculated using (5)directedly.
Besides,I used to use the entropy-prune method to construct a language model:
~ngram-count -read merge_counts_1994-2003.gz -gt1min 0 -gt2min 0 -gt3min 0 -kndiscount -interpolate -prune 0.000000001 -order 3 -vocab ChWord.lexno -lm 1994-2003_lm_pruned1e-9.lm
and there is definitely no bow greater than 1.
So this problem is wired and I wonder if anyone of you knows that.And was the command I used to build the mod-kn discount language model(where I want to exclude the 3-grams with the count of 1) correct?
~
ngram-count -read merge_counts_1994-2003.gz -gt1min 0 -gt2min 0 -gt3min
2 -kndiscount -interpolate -order 3 -vocab ChWord.lexno -lm
1994-2003_lm_all_pruned.lm

Thank you very much!


???
Hailoon Shi
w63,EE Dpt.Tinghua.Unv.Beijing.China
?????????


________________________________
???? Yannick Est?ve <yannick.esteve at lium.univ-lemans.fr>
???? ?? ? <shl.thcn at yahoo.com.cn>
??? srilm-user at speech.sri.com
???? 2009/8/27(??), ??4:19:44
??? Re: [SRILM User List] A confusion of the interpolated language model

Hi,

Back-off weights are not probabilities: they can be greater than 1.
So, your values are normal. You can have some explanations about back-off weight computation here, particularly for the use of the modified Kneser-Ney discounting method:
http://www.speech.sri.com/projects/srilm/manpages/pdfs/chen-goodman-tr-10-98.pdf

Regards,
Yannick Est?ve
LIUM - University of Le Mans
France

Le 27 ao?t 09 ? 09:21, ?? ? a ?crit :


>
>
>
> 
> 
>I am a new student user of srilm from Asia.Here I used the command below to construct a interpolated mod-kn discount language model:
>~ ngram-count -read merge_counts_1994-2003.gz -gt1min 0 -gt2min 0 -gt3min 2 -kndiscount -interpolate -order 3 -vocab ChWord.lexno -lm 1994-2003_lm_all_pruned.lm
>
>
> However in my model several N-grams' back-off werght(bow) appears to be greater than 1.That is ,in the text LM file,I've got a line:
>-6.457229    <s> 1635    0.1270406
>(Here we just use a kind of index to represent a chinese word)
>in whitch the 1og10(bow) is greater than 0.We don't think a normal interplotate discount method can produce an N-gram bow greater than 1,besides this circumstance only occured to several(less than 5) different
> N-grams.So I am confused and would like to ask if there is someyone who encounterd this circumstance or happens to know what is wrong.
>Thank you very much!
>
>???
>Hailoon Shi
>w63,EE Dpt.Thu Univ.PRC
>
>
>
> 
>
>  
>  
>__________________________________________________
>???????????????
>http://cn.mail.yahoo.com_______________________________________________
>SRILM-User site list
>SRILM-User at speech.sri.com
>http://www.speech.sri.com/mailman/listinfo/srilm-user


      ___________________________________________________________ 
  ????????????????? 
http://card.mail.cn.yahoo.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20090828/6b69f7d3/attachment.html>

From fsanchez at dlsi.ua.es  Mon Aug 31 00:43:32 2009
From: fsanchez at dlsi.ua.es (Felipe =?ISO-8859-1?Q?S=E1nchez_Mart=EDnez?=)
Date: Mon, 31 Aug 2009 09:43:32 +0200
Subject: [SRILM User List] Lattice Viterbi decoding
Message-ID: <1251704612.8359.2.camel@pipe>

Hi all,

I am using SRILM to score a set of translation candidates of a given
sentence. 

The sentence is divide into chunks, some of them having a fix
translation and others having different alternatives:

text1 | text2 | text3.1 or text3.2 | text4 | text5.1 or text5.2

As the number of combinations is exponential in the length of the
sentences I have been trying to use lattice-tool to compute the Viterbi
path but I am not able to make it work. I am using the following command
line:

$ lattice-tool -viterbi-decode -in-lattice lattice.pfsg -lm model.lm
-order 5 -debug 1

but I get exactly the same result with 5, 3 or even 0 n-gram order.

In addition, with the example sentence I am working with I get a
different path if I use SRILM in the usual way by scoring all possible
translations of the sentence.

What am I doing wrong? Thank you very much in advance.

PS: I am using srilm 1.5.7
-- 
Felipe


From beleira at gmail.com  Tue Sep  8 10:34:31 2009
From: beleira at gmail.com (Manuel Alves)
Date: Tue, 8 Sep 2009 18:34:31 +0100
Subject: [SRILM User List] Ngram Command
Message-ID: <495c9ccd0909081034j60f99470lb1cbd97dcfe1534a@mail.gmail.com>

Thanks Andreas.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20090908/e494d1d1/attachment.html>

From stolcke at speech.sri.com  Thu Sep 10 08:25:54 2009
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Thu, 10 Sep 2009 08:25:54 -0700
Subject: [SRILM User List] Question on SRILM Toolkit
In-Reply-To: <b674f7310909100221g4d086b8bk1b1e498c8ba2ec39@mail.gmail.com>
References: <b674f7310909100221g4d086b8bk1b1e498c8ba2ec39@mail.gmail.com>
Message-ID: <4AA91A82.50703@speech.sri.com>

Saeedeh Momtazi wrote:
> Dear Andreas Stolcke,
>
> I, Saeedeh Momtazi, use the SRILM toolkit for a while. The main part 
> that I use from this toolkit is "ngram-class". So far, I had no 
> problem with this toolkit. However, recently I tried to cluster the 
> terms that I have based on a count file which is about 6 GB. I faced 
> an error message that I copy here:
>
> ngram-class: ../../include/LHash.cc:138: void LHash<KeyT, 
> DataT>::alloc(unsigned int) [with KeyT = unsigned int, DataT = 
> Trie<unsigned int,  long unsigned int>]: Assertion `body != 0' failed.
> /var/torque/mom_priv/jobs/53195.maste.SC <http://53195.maste.SC>: line 
> 39: 25464 Aborted
You are simply running out of memory.  You need more memory or swap 
space, and probably you need to switch
to a 64bit machine.  However, first you should make sure to use the 
memory-optimized version of the tools (compiled with make OPTION=_c).

You can always sample your data, or simply prune the count file by 
eliminating low-count ngrams.  This might not change your results much.  
When inducing word classes the words with low counts are not handled 
robustly anyway.  I found it best to replace all words with low counts 
with an "Infrequent word" class label ahead of time. As a by product, 
this will dramatically reduce the number of distinct bigrams because 
most of the bigrams involve rare words (Zipf's law etc.).

Andreas
>
>
> I appreciate in advance if you let me know how I can solve this problem.
> To be more precise, my vocabulary is about 35000 words and I want to 
> cluster them into 3000 classes. The input items that I use when 
> calling ngram-class are the vocab file (-vocab), the count file 
> (-counts) and the number of classes (-numclasses). The only output 
> that I need is a mapping between words and classes (-classes).
>
>
> Looking forward to hearing from you.
>
> Thanks in advance,
> Saeedeh Momtazi


From heintz.38 at osu.edu  Sun Sep 20 12:03:55 2009
From: heintz.38 at osu.edu (Ilana Heintz)
Date: Sun, 20 Sep 2009 15:03:55 -0400 (EDT)
Subject: [SRILM User List] vocab size from make-batch-counts
Message-ID: <alpine.DEB.1.10.0909201426520.29851@brutus.ling.ohio-state.edu>

Hello,

I am wondering about what type of ngram pruning is done in the 
make-batch-counts training script, and if it can be handled with flags. 
I've looked through the code and man pages but I'm not sure whether I can 
pass the right argument.  I discovered that the pruning happens because, 
when I vary the batch size, the resulting vocabulary size changes.  For 
instance, on a small development corpus:

> make-batch-counts files.list 10 xmlfilter.sh counts_10perbatch
> merge-batch-counts counts_10perbatch
> ngram-count -read counts_10perbatch/files.list-1.ngrams.gz -write-vocab 
10perbatch.vocab
> wc 10perbatch.vocab
   2763  2763 32999 10perbatch.vocab

> make-batch-counts files.list 1 xmlfilter.sh counts_1perbatch
> merge-batch-counts counts_1perbatch
> ngram-count -read counts_1perbatch/merge-iter2-1.ngrams.gz -write-vocab 
1perbatch.vocab
> wc 1perbatch.vocab
   5923  5923 72237 1perbatch.vocab

Same sort of result when I use a larger corpus or other batch sizes; the 
vocab decreases with an increase in the size of the batch.  I have tried 
experimenting with -gtmin to change the output, without success.  I'm 
confused as to why batch size would make a difference here.

I am using version 1.5.5.

Thanks,
Ilana


Ilana Heintz
Department of Linguistics
Ohio State University
http://www.ling.ohio-state.edu/~bromberg


From stolcke at speech.sri.com  Mon Sep 21 01:07:36 2009
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Mon, 21 Sep 2009 01:07:36 -0700
Subject: [SRILM User List] vocab size from make-batch-counts
In-Reply-To: <alpine.DEB.1.10.0909201426520.29851@brutus.ling.ohio-state.edu>
References: <alpine.DEB.1.10.0909201426520.29851@brutus.ling.ohio-state.edu>
Message-ID: <4AB73448.6010902@speech.sri.com>

Ilana Heintz wrote:
> Hello,
>
> I am wondering about what type of ngram pruning is done in the 
> make-batch-counts training script, and if it can be handled with 
> flags. I've looked through the code and man pages but I'm not sure 
> whether I can pass the right argument.  I discovered that the pruning 
> happens because, when I vary the batch size, the resulting vocabulary 
> size changes.  For instance, on a small development corpus:
>
>> make-batch-counts files.list 10 xmlfilter.sh counts_10perbatch
>> merge-batch-counts counts_10perbatch
>> ngram-count -read counts_10perbatch/files.list-1.ngrams.gz -write-vocab
What you are doing is not working as intended.  make-batch-counts passes 
the -write-vocab option to ngram-count,
but each ngram-count invocation will dump only the vocabulary of the 
batch it is seeing (hence the result you observed).

To get the combined vocab of your data, run

ngram-count -order 1 -read COUNTS -write-vocab VOCAB

on the final count file.

Andreas

> 10perbatch.vocab
>> wc 10perbatch.vocab
>   2763  2763 32999 10perbatch.vocab
>
>> make-batch-counts files.list 1 xmlfilter.sh counts_1perbatch
>> merge-batch-counts counts_1perbatch
>> ngram-count -read counts_1perbatch/merge-iter2-1.ngrams.gz -write-vocab 
> 1perbatch.vocab
>> wc 1perbatch.vocab
>   5923  5923 72237 1perbatch.vocab
>
> Same sort of result when I use a larger corpus or other batch sizes; 
> the vocab decreases with an increase in the size of the batch.  I have 
> tried experimenting with -gtmin to change the output, without 
> success.  I'm confused as to why batch size would make a difference here.
>
> I am using version 1.5.5.
>
> Thanks,
> Ilana
>
>
> Ilana Heintz
> Department of Linguistics
> Ohio State University
> http://www.ling.ohio-state.edu/~bromberg
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user


From sylvain.raybaud at crans.org  Mon Sep 21 05:43:16 2009
From: sylvain.raybaud at crans.org (Sylvain Raybaud)
Date: Mon, 21 Sep 2009 14:43:16 +0200
Subject: [SRILM User List] problem getting srilm
Message-ID: <200909211443.16924.sylvain.raybaud@crans.org>

Hello everyone

Getting an archive of srilm toolkit from the website has been near-impossible 
for me for some months now... After I fill in the form at 
http://www.speech.sri.com/projects/srilm/download.html and accept the license 
the download begins but is incredibly slow (throughput varying between 0 and 
1 kilo bytes per second). After a few hours I get a message saying that the 
connection was closed or something like that. Did I miss something? thanks a 
lot.

regards,

-- 
Sylvain Raybaud

LORIA/Nancy/France

From stolcke at speech.sri.com  Mon Sep 21 07:49:22 2009
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Mon, 21 Sep 2009 07:49:22 -0700
Subject: [SRILM User List] problem getting srilm
In-Reply-To: <200909211443.16924.sylvain.raybaud@crans.org>
References: <200909211443.16924.sylvain.raybaud@crans.org>
Message-ID: <4AB79272.6000102@speech.sri.com>

Sylvain Raybaud wrote:
> Hello everyone
>
> Getting an archive of srilm toolkit from the website has been near-impossible 
> for me for some months now... After I fill in the form at 
> http://www.speech.sri.com/projects/srilm/download.html and accept the license 
> the download begins but is incredibly slow (throughput varying between 0 and 
> 1 kilo bytes per second). After a few hours I get a message saying that the 
> connection was closed or something like that. Did I miss something? thanks a 
> lot.
>   
I'm sure I would have heard lots of complaints if this was a general 
issue.  Also, I just downloaded the package from my home cable 
connection, at about 300kB/s.   So I would first investigate possible 
issues with your internet connection.
A good site to for measuring network bandwidth is 
http://www.speakeasy.net/speedtest/ .

Andreas

> regards,
>
>   


From beleira at gmail.com  Mon Sep 21 17:23:25 2009
From: beleira at gmail.com (Nel Alves)
Date: Tue, 22 Sep 2009 00:23:25 +0000 (GMT)
Subject: [SRILM User List] FREE international calls
Message-ID: <909694296.10972061253579005492.JavaMail.tomcat@que1.sv.jaxtr.com>

Hello,

I am using jaxtr, and if you also sign up, we can talk for free on the phone at any time.  

-Nel  

P.S. Here is the link to sign up: 
http://www.jaxtr.com/user/ticket?n=T1glgga0fkegip&type=joininvite&tId=457238224_468_525 

Delivered by jaxtr, Inc. 855 Oak Grove Ave., Menlo Park, CA 94025
To stop receiving jaxtr emails, send email to blockme at jaxtr.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20090922/0d54da02/attachment.html>

From sylvain.raybaud at crans.org  Tue Sep 22 01:45:09 2009
From: sylvain.raybaud at crans.org (Sylvain Raybaud)
Date: Tue, 22 Sep 2009 10:45:09 +0200
Subject: [SRILM User List] problem getting srilm
In-Reply-To: <4AB79272.6000102@speech.sri.com>
References: <200909211443.16924.sylvain.raybaud@crans.org>
	<4AB79272.6000102@speech.sri.com>
Message-ID: <200909221045.09723.sylvain.raybaud@crans.org>

On Monday 21 September 2009 16:49:22 Andreas Stolcke wrote:
> Sylvain Raybaud wrote:
> > Hello everyone
> >
> > Getting an archive of srilm toolkit from the website has been
> > near-impossible for me for some months now... After I fill in the form at
> > http://www.speech.sri.com/projects/srilm/download.html and accept the
> > license the download begins but is incredibly slow (throughput varying
> > between 0 and 1 kilo bytes per second). After a few hours I get a message
> > saying that the connection was closed or something like that. Did I miss
> > something? thanks a lot.
>
> I'm sure I would have heard lots of complaints if this was a general
> issue.  Also, I just downloaded the package from my home cable
> connection, at about 300kB/s.   So I would first investigate possible
> issues with your internet connection.
> A good site to for measuring network bandwidth is
> http://www.speakeasy.net/speedtest/ .
>
> Andreas
>
> > regards,

Hello

  You are right, it seems that the problem only occurs when attempting to 
download from my institute... puzzling... maybe it is because we use ipv6? 
I've encountered problem with ipv6-enabled servers before. Anyway, the 
problem is probably not related to sri, sorry about the fuss for nothing :)

thanks a lot, keep up the good work,

-- 
Sylvain

From paul.a.johnston at manchester.ac.uk  Wed Sep 23 00:52:32 2009
From: paul.a.johnston at manchester.ac.uk (Paul Johnston)
Date: Wed, 23 Sep 2009 08:52:32 +0100
Subject: [SRILM User List] Compiling srilm
Message-ID: <F3A7CE98C787DE4ABAE4E50090CC3D1B03E46FFE@dalek.co.umist.ac.uk>

Hi on my first attempt to build srilm version 1.5.9

I get lots of messages like

 
gcc -mtune=pentium3 -Wreturn-type -Wimplicit -Wimplicit-int
-D_FILE_OFFSET_BITS=64    -I. -I../../include   -c -g -O3 -o
../obj/i686/option.o option.c

option.c:1: error: CPU you selected does not support x86-64 instruction
set

 
Therefore make World fails, looking into it, the return value of
machine-type is

 
>/home/CO/mcasspj/srilm_dir/sbin/machine-type 

i686

 
And the actual machine is 

 
uname -a

Linux servalan.humanities.manchester.ac.uk 2.6.18-128.el5 #1 SMP Wed Dec
17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

 
More specifically 

 
cat /etc/redhat-release

Red Hat Enterprise Linux Server release 5.3 (Tikanga)

 
Anyone seen this and have any ideas as to solving the problem?

As I intend to build a machine dedicated to this system can anyone
recommend a system they use i.e. Solaris, any of the BSDs or a variety
of Linux just not Windows or a Mac :-)

 
Many thanks!

 
Paul Johnston

Humanities ICT (Infrastructure)

Samuel Alexander Building

Room W1.19

 
e-mail Paul.Johnston at manchester.ac.uk

web http://web-1.humanities.manchester.ac.uk/prjs/mcasspj/

 
Tuzoqlar granatalardan yuksak darajali portlovchi moddalardan yoki
bosshqa narslardan qilingan?

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20090923/f52c9696/attachment.html>

From sylvain.raybaud at crans.org  Wed Sep 23 01:15:43 2009
From: sylvain.raybaud at crans.org (Sylvain Raybaud)
Date: Wed, 23 Sep 2009 10:15:43 +0200
Subject: [SRILM User List] Compiling srilm
In-Reply-To: <F3A7CE98C787DE4ABAE4E50090CC3D1B03E46FFE@dalek.co.umist.ac.uk>
References: <F3A7CE98C787DE4ABAE4E50090CC3D1B03E46FFE@dalek.co.umist.ac.uk>
Message-ID: <200909231015.43259.sylvain.raybaud@crans.org>

On Wednesday 23 September 2009 09:52:32 Paul Johnston wrote:
> Hi on my first attempt to build srilm version 1.5.9
>
> I get lots of messages like
>
>
>
> gcc -mtune=pentium3 -Wreturn-type -Wimplicit -Wimplicit-int
> -D_FILE_OFFSET_BITS=64    -I. -I../../include   -c -g -O3 -o
> ../obj/i686/option.o option.c
>
> option.c:1: error: CPU you selected does not support x86-64 instruction
> set
>
>
>
> Therefore make World fails, looking into it, the return value of
> machine-type is
>
> >/home/CO/mcasspj/srilm_dir/sbin/machine-type
>
> i686
>
>
>
> And the actual machine is
>
>
>
> uname -a
>
> Linux servalan.humanities.manchester.ac.uk 2.6.18-128.el5 #1 SMP Wed Dec
> 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
>
>
>
> More specifically
>
>
>
> cat /etc/redhat-release
>
> Red Hat Enterprise Linux Server release 5.3 (Tikanga)
>
>
>
> Anyone seen this and have any ideas as to solving the problem?
>
> As I intend to build a machine dedicated to this system can anyone
> recommend a system they use i.e. Solaris, any of the BSDs or a variety
> of Linux just not Windows or a Mac :-)
>
>
>
> Many thanks!
>

Hello

I modified sbin/machine-type and added a common/Makefile.machine.x86_64-gcc4 
so that it compiles for 64 bits. It works on my machine, no guarantee it will 
work anywhere else... You will find them attached. Hope this helps.

regards,


-- 
Sylvain

A: You see !
> Q: You think ?
>> A: Because it reverses the logical flow of conversation.
>>> Q: Why is top posting annoying in email?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: machine-type
Type: application/x-shellscript
Size: 4126 bytes
Desc: not available
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20090923/7b171c67/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Makefile.machine.x86_64-gcc4
Type: text/x-makefile
Size: 1744 bytes
Desc: not available
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20090923/7b171c67/attachment-0001.bin>

From stolcke at speech.sri.com  Wed Sep 23 09:56:49 2009
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Wed, 23 Sep 2009 09:56:49 -0700
Subject: [SRILM User List] Compiling srilm
In-Reply-To: <200909231015.43259.sylvain.raybaud@crans.org>
References: <F3A7CE98C787DE4ABAE4E50090CC3D1B03E46FFE@dalek.co.umist.ac.uk>
	<200909231015.43259.sylvain.raybaud@crans.org>
Message-ID: <4ABA5351.50403@speech.sri.com>

Sylvain Raybaud wrote:
> On Wednesday 23 September 2009 09:52:32 Paul Johnston wrote:
>   
>> Hi on my first attempt to build srilm version 1.5.9
>>
>> I get lots of messages like
>>
>>
>>
>> gcc -mtune=pentium3 -Wreturn-type -Wimplicit -Wimplicit-int
>> -D_FILE_OFFSET_BITS=64    -I. -I../../include   -c -g -O3 -o
>> ../obj/i686/option.o option.c
>>
>> option.c:1: error: CPU you selected does not support x86-64 instruction
>> set
>>
>>
>>
>> Therefore make World fails, looking into it, the return value of
>> machine-type is
>>
>>     
>>> /home/CO/mcasspj/srilm_dir/sbin/machine-type
>>>       
>> i686
>>
>>
>>
>> And the actual machine is
>>
>>
>>
>> uname -a
>>
>> Linux servalan.humanities.manchester.ac.uk 2.6.18-128.el5 #1 SMP Wed Dec
>> 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
>>
>>
>>
>> More specifically
>>
>>
>>
>> cat /etc/redhat-release
>>
>> Red Hat Enterprise Linux Server release 5.3 (Tikanga)
>>
>>
>>
>> Anyone seen this and have any ideas as to solving the problem?
>>
>> As I intend to build a machine dedicated to this system can anyone
>> recommend a system they use i.e. Solaris, any of the BSDs or a variety
>> of Linux just not Windows or a Mac :-)
>>
>>
>>
>> Many thanks!
>>
>>     
>
> Hello
>
> I modified sbin/machine-type and added a common/Makefile.machine.x86_64-gcc4 
> so that it compiles for 64 bits. It works on my machine, no guarantee it will 
> work anywhere else... You will find them attached. Hope this helps.
>   
The idea is that the default even for 64bit i686 machines is 32bit 
compilation. That's why machine-type returns i686 by default.
I think the problem you saw can be avoided by adding -m32 in 
common/Makefile.machine.i686.

To build 64bit binaries use

    make MACHINE_TYPE=i686-m64

Andreas

> regards,
>
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user