From nickakosu at yahoo.com  Mon Oct 24 22:08:07 2011
From: nickakosu at yahoo.com (Nick Akosu)
Date: Tue, 25 Oct 2011 06:08:07 +0100 (BST)
Subject: [SRILM User List] Difficulty installing SRILM
Message-ID: <1319519287.76246.YahooMailNeo@web24603.mail.ird.yahoo.com>

Hi everyone,
I need to use SRILM to do some language modeling research but I am having difficulty installing the package. I am using windows 7. Please I need advice/directions.

Kindly help.

Nicholas Akosu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20111025/6094911e/attachment.html>

From burkay at mit.edu  Tue Oct 25 12:41:43 2011
From: burkay at mit.edu (Burkay Gur)
Date: Tue, 25 Oct 2011 15:41:43 -0400
Subject: [SRILM User List] Question about 3-gram Language Model with OOV
	triplets
Message-ID: <4EA710F7.7020802@mit.edu>

Hi,

I have just started using SRILM, and it is a great tool. But I ran 
across this issue. The situation is that I have:

corpusA.txt
corpusB.txt

What I want to do is create two different 3-gram language models for 
both corpora. But I want to make sure that if a triplet is non-existent 
in the other corpus, then a smoothed probability should be assigned to 
that. For example;

if corpusA has triplet counts:

this is a    1
is a test    1

and corpusB has triplet counts:

that is a    1
is a test    1

then the final counts for corpusA should be:

this is a    1
is a test    1
that is a    0

because "that is a" is in B but not A.

similarly corpusB should be:

that is a    1
is a test    1
this is a    0

After the counts are setup, some smoothing algorithm might be used. I 
have manually tried to make the triple word counts 0, however it does 
not seem to work. As they are omitted from 3-grams.

Can you recommend any other ways of doing this?

Thank you,
Burkay


From stolcke at icsi.berkeley.edu  Tue Oct 25 15:13:05 2011
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Tue, 25 Oct 2011 15:13:05 -0700
Subject: [SRILM User List] Difficulty installing SRILM
In-Reply-To: <1319519287.76246.YahooMailNeo@web24603.mail.ird.yahoo.com>
References: <1319519287.76246.YahooMailNeo@web24603.mail.ird.yahoo.com>
Message-ID: <4EA73471.7080202@icsi.berkeley.edu>

Nick Akosu wrote:
> Hi everyone,
> I need to use SRILM to do some language modeling research but I am 
> having difficulty installing the package. I am using windows 7. Please 
> I need advice/directions.
Please try to follow the directions in the INSTALL and 
doc/README.windows files, then, if you run into problems, ask again with 
specific information (error messages, output from make).

Andreas


From burkay at mit.edu  Tue Oct 25 15:16:49 2011
From: burkay at mit.edu (Burkay Gur)
Date: Tue, 25 Oct 2011 18:16:49 -0400
Subject: [SRILM User List] Follow Up: Question about 3-gram Language Model
	with OOV triplets
In-Reply-To: <4EA710F7.7020802@mit.edu>
References: <4EA710F7.7020802@mit.edu>
Message-ID: <4EA73551.6070104@mit.edu>

To follow up, basically, when I edit the .count file and add 0 counts 
for some trigrams, they will not be included in the final .lm file, when 
I try to read from the .count file and create a language model.

On 10/25/11 3:41 PM, Burkay Gur wrote:
> Hi,
>
> I have just started using SRILM, and it is a great tool. But I ran 
> across this issue. The situation is that I have:
>
> corpusA.txt
> corpusB.txt
>
> What I want to do is create two different 3-gram language models for 
> both corpora. But I want to make sure that if a triplet is 
> non-existent in the other corpus, then a smoothed probability should 
> be assigned to that. For example;
>
> if corpusA has triplet counts:
>
> this is a    1
> is a test    1
>
> and corpusB has triplet counts:
>
> that is a    1
> is a test    1
>
> then the final counts for corpusA should be:
>
> this is a    1
> is a test    1
> that is a    0
>
> because "that is a" is in B but not A.
>
> similarly corpusB should be:
>
> that is a    1
> is a test    1
> this is a    0
>
> After the counts are setup, some smoothing algorithm might be used. I 
> have manually tried to make the triple word counts 0, however it does 
> not seem to work. As they are omitted from 3-grams.
>
> Can you recommend any other ways of doing this?
>
> Thank you,
> Burkay
>


From stolcke at icsi.berkeley.edu  Tue Oct 25 15:38:41 2011
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Tue, 25 Oct 2011 15:38:41 -0700
Subject: [SRILM User List] Follow Up: Question about 3-gram Language
 Model with OOV triplets
In-Reply-To: <4EA73551.6070104@mit.edu>
References: <4EA710F7.7020802@mit.edu> <4EA73551.6070104@mit.edu>
Message-ID: <4EA73A71.1020002@icsi.berkeley.edu>

Burkay Gur wrote:
> To follow up, basically, when I edit the .count file and add 0 counts 
> for some trigrams, they will not be included in the final .lm file, 
> when I try to read from the .count file and create a language model.
A zero  count is complete equivalent to a  non-existent count, so what 
you're seeing it expected.

It is not clear what precisely you want to happen.   As a result of 
discounting and backing off, your LM, even without the unobserved 
trigram, will already assign a non-zero probability to that trigram.  
That's exactly what the ngram smoothing algorithms are for.

If you want to inject some specific statistical information rom another 
dataset into your target LM you could interpolate (mix) the two LMs to 
obtain a third LM.   See the description of the ngram -mix-lm option.

Andreas

>
> On 10/25/11 3:41 PM, Burkay Gur wrote:
>> Hi,
>>
>> I have just started using SRILM, and it is a great tool. But I ran 
>> across this issue. The situation is that I have:
>>
>> corpusA.txt
>> corpusB.txt
>>
>> What I want to do is create two different 3-gram language models for 
>> both corpora. But I want to make sure that if a triplet is 
>> non-existent in the other corpus, then a smoothed probability should 
>> be assigned to that. For example;
>>
>> if corpusA has triplet counts:
>>
>> this is a    1
>> is a test    1
>>
>> and corpusB has triplet counts:
>>
>> that is a    1
>> is a test    1
>>
>> then the final counts for corpusA should be:
>>
>> this is a    1
>> is a test    1
>> that is a    0
>>
>> because "that is a" is in B but not A.
>>
>> similarly corpusB should be:
>>
>> that is a    1
>> is a test    1
>> this is a    0
>>
>> After the counts are setup, some smoothing algorithm might be used. I 
>> have manually tried to make the triple word counts 0, however it does 
>> not seem to work. As they are omitted from 3-grams.
>>
>> Can you recommend any other ways of doing this?
>>
>> Thank you,
>> Burkay
>>
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user


From burkay at mit.edu  Tue Oct 25 17:29:35 2011
From: burkay at mit.edu (Burkay Gur)
Date: Tue, 25 Oct 2011 20:29:35 -0400
Subject: [SRILM User List] Follow Up: Question about 3-gram Language
 Model with OOV triplets
In-Reply-To: <4EA73A71.1020002@icsi.berkeley.edu>
References: <4EA710F7.7020802@mit.edu> <4EA73551.6070104@mit.edu>
	<4EA73A71.1020002@icsi.berkeley.edu>
Message-ID: <4EA7546F.6020506@mit.edu>

thank you, i understand that. but the problem is, like you said, how do 
we introduce these "unobserved trigrams" into the language model. i ll 
give another example if it helps:

say you have this test.count file:

1-gram
this
is
a
test

2-gram
this is
is a
a test

3-gram
this is a
is a test

then, say you want to extend this language model with this trigram:

"this is not"

which basically has no previous count. and without smoothing in the 
3-gram model, it will have zero probability. but how do we make sure 
that the smooth language model has a non-zero probability for this 
additional trigram?

i thought i could do this my manually by updating the test.count with 
"this is not" with count 0. but apparently this is not working..

On 10/25/11 6:38 PM, Andreas Stolcke wrote:
> Burkay Gur wrote:
>> To follow up, basically, when I edit the .count file and add 0 counts 
>> for some trigrams, they will not be included in the final .lm file, 
>> when I try to read from the .count file and create a language model.
> A zero  count is complete equivalent to a  non-existent count, so what 
> you're seeing it expected.
>
> It is not clear what precisely you want to happen.   As a result of 
> discounting and backing off, your LM, even without the unobserved 
> trigram, will already assign a non-zero probability to that trigram.  
> That's exactly what the ngram smoothing algorithms are for.
>
> If you want to inject some specific statistical information rom 
> another dataset into your target LM you could interpolate (mix) the 
> two LMs to obtain a third LM.   See the description of the ngram 
> -mix-lm option.
>
> Andreas
>
>>
>> On 10/25/11 3:41 PM, Burkay Gur wrote:
>>> Hi,
>>>
>>> I have just started using SRILM, and it is a great tool. But I ran 
>>> across this issue. The situation is that I have:
>>>
>>> corpusA.txt
>>> corpusB.txt
>>>
>>> What I want to do is create two different 3-gram language models for 
>>> both corpora. But I want to make sure that if a triplet is 
>>> non-existent in the other corpus, then a smoothed probability should 
>>> be assigned to that. For example;
>>>
>>> if corpusA has triplet counts:
>>>
>>> this is a    1
>>> is a test    1
>>>
>>> and corpusB has triplet counts:
>>>
>>> that is a    1
>>> is a test    1
>>>
>>> then the final counts for corpusA should be:
>>>
>>> this is a    1
>>> is a test    1
>>> that is a    0
>>>
>>> because "that is a" is in B but not A.
>>>
>>> similarly corpusB should be:
>>>
>>> that is a    1
>>> is a test    1
>>> this is a    0
>>>
>>> After the counts are setup, some smoothing algorithm might be used. 
>>> I have manually tried to make the triple word counts 0, however it 
>>> does not seem to work. As they are omitted from 3-grams.
>>>
>>> Can you recommend any other ways of doing this?
>>>
>>> Thank you,
>>> Burkay
>>>
>>
>> _______________________________________________
>> SRILM-User site list
>> SRILM-User at speech.sri.com
>> http://www.speech.sri.com/mailman/listinfo/srilm-user
>


From stolcke at icsi.berkeley.edu  Tue Oct 25 19:10:40 2011
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Tue, 25 Oct 2011 19:10:40 -0700
Subject: [SRILM User List] Follow Up: Question about 3-gram Language
 Model with OOV triplets
In-Reply-To: <4EA7546F.6020506@mit.edu>
References: <4EA710F7.7020802@mit.edu> <4EA73551.6070104@mit.edu>
	<4EA73A71.1020002@icsi.berkeley.edu> <4EA7546F.6020506@mit.edu>
Message-ID: <4EA76C20.6010005@icsi.berkeley.edu>

Burkay Gur wrote:
> thank you, i understand that. but the problem is, like you said, how 
> do we introduce these "unobserved trigrams" into the language model. i 
> ll give another example if it helps:
>
> say you have this test.count file:
>
> 1-gram
> this
> is
> a
> test
>
> 2-gram
> this is
> is a
> a test
>
> 3-gram
> this is a
> is a test
>
> then, say you want to extend this language model with this trigram:
>
> "this is not"
>
> which basically has no previous count. and without smoothing in the 
> 3-gram model, it will have zero probability. but how do we make sure 
> that the smooth language model has a non-zero probability for this 
> additional trigram?
>
> i thought i could do this my manually by updating the test.count with 
> "this is not" with count 0. but apparently this is not working..
The smoothed 3gram LM will have a non-zero probability, for ALL 
trigrams, trust me ;-)

Try
    echo "this is not"  | ngram -lm LM -ppl - -debug 2

to see it in action.

Andreas

>
> On 10/25/11 6:38 PM, Andreas Stolcke wrote:
>> Burkay Gur wrote:
>>> To follow up, basically, when I edit the .count file and add 0 
>>> counts for some trigrams, they will not be included in the final .lm 
>>> file, when I try to read from the .count file and create a language 
>>> model.
>> A zero  count is complete equivalent to a  non-existent count, so 
>> what you're seeing it expected.
>>
>> It is not clear what precisely you want to happen.   As a result of 
>> discounting and backing off, your LM, even without the unobserved 
>> trigram, will already assign a non-zero probability to that trigram.  
>> That's exactly what the ngram smoothing algorithms are for.
>>
>> If you want to inject some specific statistical information rom 
>> another dataset into your target LM you could interpolate (mix) the 
>> two LMs to obtain a third LM.   See the description of the ngram 
>> -mix-lm option.
>>
>> Andreas
>>
>>>
>>> On 10/25/11 3:41 PM, Burkay Gur wrote:
>>>> Hi,
>>>>
>>>> I have just started using SRILM, and it is a great tool. But I ran 
>>>> across this issue. The situation is that I have:
>>>>
>>>> corpusA.txt
>>>> corpusB.txt
>>>>
>>>> What I want to do is create two different 3-gram language models 
>>>> for both corpora. But I want to make sure that if a triplet is 
>>>> non-existent in the other corpus, then a smoothed probability 
>>>> should be assigned to that. For example;
>>>>
>>>> if corpusA has triplet counts:
>>>>
>>>> this is a    1
>>>> is a test    1
>>>>
>>>> and corpusB has triplet counts:
>>>>
>>>> that is a    1
>>>> is a test    1
>>>>
>>>> then the final counts for corpusA should be:
>>>>
>>>> this is a    1
>>>> is a test    1
>>>> that is a    0
>>>>
>>>> because "that is a" is in B but not A.
>>>>
>>>> similarly corpusB should be:
>>>>
>>>> that is a    1
>>>> is a test    1
>>>> this is a    0
>>>>
>>>> After the counts are setup, some smoothing algorithm might be used. 
>>>> I have manually tried to make the triple word counts 0, however it 
>>>> does not seem to work. As they are omitted from 3-grams.
>>>>
>>>> Can you recommend any other ways of doing this?
>>>>
>>>> Thank you,
>>>> Burkay
>>>>
>>>
>>> _______________________________________________
>>> SRILM-User site list
>>> SRILM-User at speech.sri.com
>>> http://www.speech.sri.com/mailman/listinfo/srilm-user
>>
>


From burkay at mit.edu  Tue Oct 25 19:53:06 2011
From: burkay at mit.edu (Burkay Gur)
Date: Tue, 25 Oct 2011 22:53:06 -0400
Subject: [SRILM User List] Follow Up: Question about 3-gram Language
	Model with OOV triplets
In-Reply-To: <4EA76C20.6010005@icsi.berkeley.edu>
References: <4EA710F7.7020802@mit.edu> <4EA73551.6070104@mit.edu>
	<4EA73A71.1020002@icsi.berkeley.edu> <4EA7546F.6020506@mit.edu>
	<4EA76C20.6010005@icsi.berkeley.edu>
Message-ID: <50DBF2C0-634E-4391-8379-FD5017CF198E@mit.edu>

But we have not even added "this is not" into the language model yet. If it is not a hard task, can you write a sample to show me how this works?

On Oct 25, 2011, at 10:10 PM, Andreas Stolcke <stolcke at icsi.berkeley.edu> wrote:

> Burkay Gur wrote:
>> thank you, i understand that. but the problem is, like you said, how do we introduce these "unobserved trigrams" into the language model. i ll give another example if it helps:
>> 
>> say you have this test.count file:
>> 
>> 1-gram
>> this
>> is
>> a
>> test
>> 
>> 2-gram
>> this is
>> is a
>> a test
>> 
>> 3-gram
>> this is a
>> is a test
>> 
>> then, say you want to extend this language model with this trigram:
>> 
>> "this is not"
>> 
>> which basically has no previous count. and without smoothing in the 3-gram model, it will have zero probability. but how do we make sure that the smooth language model has a non-zero probability for this additional trigram?
>> 
>> i thought i could do this my manually by updating the test.count with "this is not" with count 0. but apparently this is not working..
> The smoothed 3gram LM will have a non-zero probability, for ALL trigrams, trust me ;-)
> 
> Try
>   echo "this is not"  | ngram -lm LM -ppl - -debug 2
> 
> to see it in action.
> 
> Andreas
> 
>> 
>> On 10/25/11 6:38 PM, Andreas Stolcke wrote:
>>> Burkay Gur wrote:
>>>> To follow up, basically, when I edit the .count file and add 0 counts for some trigrams, they will not be included in the final .lm file, when I try to read from the .count file and create a language model.
>>> A zero  count is complete equivalent to a  non-existent count, so what you're seeing it expected.
>>> 
>>> It is not clear what precisely you want to happen.   As a result of discounting and backing off, your LM, even without the unobserved trigram, will already assign a non-zero probability to that trigram.  That's exactly what the ngram smoothing algorithms are for.
>>> 
>>> If you want to inject some specific statistical information rom another dataset into your target LM you could interpolate (mix) the two LMs to obtain a third LM.   See the description of the ngram -mix-lm option.
>>> 
>>> Andreas
>>> 
>>>> 
>>>> On 10/25/11 3:41 PM, Burkay Gur wrote:
>>>>> Hi,
>>>>> 
>>>>> I have just started using SRILM, and it is a great tool. But I ran across this issue. The situation is that I have:
>>>>> 
>>>>> corpusA.txt
>>>>> corpusB.txt
>>>>> 
>>>>> What I want to do is create two different 3-gram language models for both corpora. But I want to make sure that if a triplet is non-existent in the other corpus, then a smoothed probability should be assigned to that. For example;
>>>>> 
>>>>> if corpusA has triplet counts:
>>>>> 
>>>>> this is a    1
>>>>> is a test    1
>>>>> 
>>>>> and corpusB has triplet counts:
>>>>> 
>>>>> that is a    1
>>>>> is a test    1
>>>>> 
>>>>> then the final counts for corpusA should be:
>>>>> 
>>>>> this is a    1
>>>>> is a test    1
>>>>> that is a    0
>>>>> 
>>>>> because "that is a" is in B but not A.
>>>>> 
>>>>> similarly corpusB should be:
>>>>> 
>>>>> that is a    1
>>>>> is a test    1
>>>>> this is a    0
>>>>> 
>>>>> After the counts are setup, some smoothing algorithm might be used. I have manually tried to make the triple word counts 0, however it does not seem to work. As they are omitted from 3-grams.
>>>>> 
>>>>> Can you recommend any other ways of doing this?
>>>>> 
>>>>> Thank you,
>>>>> Burkay
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> SRILM-User site list
>>>> SRILM-User at speech.sri.com
>>>> http://www.speech.sri.com/mailman/listinfo/srilm-user
>>> 
>> 
> 


From stolcke at icsi.berkeley.edu  Tue Oct 25 20:54:41 2011
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Tue, 25 Oct 2011 20:54:41 -0700
Subject: [SRILM User List] Follow Up: Question about 3-gram Language
	Model with OOV triplets
In-Reply-To: Your message of Tue, 25 Oct 2011 22:53:06 -0400.
	<50DBF2C0-634E-4391-8379-FD5017CF198E@mit.edu>
Message-ID: <201110260354.p9Q3sfS9009896@fruitcake.ICSI.Berkeley.EDU>

In message <50DBF2C0-634E-4391-8379-FD5017CF198E at mit.edu>you wrote:
> But we have not even added "this is not" into the language model yet. If it is not a hard task, can you write a sample to show me h
> ow this works?

There no need to "add" this trigram to the LM.  It can compute a non-zero probability 
for it even if it hasn't occurred in the training data.

I suggest you review the basics of N-gram LM smoothing as described in the two text book chapters
referenced at http://www.speech.sri.com/projects/srilm/ .

Andreas 

> 
> On Oct 25, 2011, at 10:10 PM, Andreas Stolcke <stolcke at icsi.berkeley.edu> wrote:
> 
> > Burkay Gur wrote:
> >> thank you, i understand that. but the problem is, like you said, how do we introduce these "unobserved trigrams" into the langua
> ge model. i ll give another example if it helps:
> >> 
> >> say you have this test.count file:
> >> 
> >> 1-gram
> >> this
> >> is
> >> a
> >> test
> >> 
> >> 2-gram
> >> this is
> >> is a
> >> a test
> >> 
> >> 3-gram
> >> this is a
> >> is a test
> >> 
> >> then, say you want to extend this language model with this trigram:
> >> 
> >> "this is not"
> >> 
> >> which basically has no previous count. and without smoothing in the 3-gram model, it will have zero probability. but how do we m
> ake sure that the smooth language model has a non-zero probability for this additional trigram?
> >> 
> >> i thought i could do this my manually by updating the test.count with "this is not" with count 0. but apparently this is not wor
> king..
> > The smoothed 3gram LM will have a non-zero probability, for ALL trigrams, trust me ;-)
> > 
> > Try
> >   echo "this is not"  | ngram -lm LM -ppl - -debug 2
> > 
> > to see it in action.
> > 
> > Andreas
> > 
> >> 
> >> On 10/25/11 6:38 PM, Andreas Stolcke wrote:
> >>> Burkay Gur wrote:
> >>>> To follow up, basically, when I edit the .count file and add 0 counts for some trigrams, they will not be included in the fina
> l .lm file, when I try to read from the .count file and create a language model.
> >>> A zero  count is complete equivalent to a  non-existent count, so what you're seeing it expected.
> >>> 
> >>> It is not clear what precisely you want to happen.   As a result of discounting and backing off, your LM, even without the unob
> served trigram, will already assign a non-zero probability to that trigram.  That's exactly what the ngram smoothing algorithms are
>  for.
> >>> 
> >>> If you want to inject some specific statistical information rom another dataset into your target LM you could interpolate (mix)
>  the two LMs to obtain a third LM.   See the description of the ngram -mix-lm option.
> >>> 
> >>> Andreas
> >>> 
> >>>> 
> >>>> On 10/25/11 3:41 PM, Burkay Gur wrote:
> >>>>> Hi,
> >>>>> 
> >>>>> I have just started using SRILM, and it is a great tool. But I ran across this issue. The situation is that I have:
> >>>>> 
> >>>>> corpusA.txt
> >>>>> corpusB.txt
> >>>>> 
> >>>>> What I want to do is create two different 3-gram language models for both corpora. But I want to make sure that if a triplet 
> is non-existent in the other corpus, then a smoothed probability should be assigned to that. For example;
> >>>>> 
> >>>>> if corpusA has triplet counts:
> >>>>> 
> >>>>> this is a    1
> >>>>> is a test    1
> >>>>> 
> >>>>> and corpusB has triplet counts:
> >>>>> 
> >>>>> that is a    1
> >>>>> is a test    1
> >>>>> 
> >>>>> then the final counts for corpusA should be:
> >>>>> 
> >>>>> this is a    1
> >>>>> is a test    1
> >>>>> that is a    0
> >>>>> 
> >>>>> because "that is a" is in B but not A.
> >>>>> 
> >>>>> similarly corpusB should be:
> >>>>> 
> >>>>> that is a    1
> >>>>> is a test    1
> >>>>> this is a    0
> >>>>> 
> >>>>> After the counts are setup, some smoothing algorithm might be used. I have manually tried to make the triple word counts 0, h
> owever it does not seem to work. As they are omitted from 3-grams.
> >>>>> 
> >>>>> Can you recommend any other ways of doing this?
> >>>>> 
> >>>>> Thank you,
> >>>>> Burkay
> >>>>> 
> >>>> 
> >>>> _______________________________________________
> >>>> SRILM-User site list
> >>>> SRILM-User at speech.sri.com
> >>>> http://www.speech.sri.com/mailman/listinfo/srilm-user
> >>> 
> >> 
> > 
> 


--Andreas

From stolcke at icsi.berkeley.edu  Wed Oct 26 15:52:33 2011
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Wed, 26 Oct 2011 15:52:33 -0700
Subject: [SRILM User List] Follow Up: Question about 3-gram Language
 Model with OOV triplets
In-Reply-To: <4EA82FA5.6070103@mit.edu>
References: <4EA710F7.7020802@mit.edu> <4EA73551.6070104@mit.edu>
	<4EA73A71.1020002@icsi.berkeley.edu> <4EA7546F.6020506@mit.edu>
	<4EA76C20.6010005@icsi.berkeley.edu> <4EA82FA5.6070103@mit.edu>
Message-ID: <4EA88F31.10004@icsi.berkeley.edu>

Burkay Gur wrote:
> Try
>    echo "this is not"  | ngram -lm LM -ppl - -debug 2
>
>
> ok, this returns a non-zero probability. but i want to now include 
> "this is not" in the language model. and still have all the 
> probabilities in the language model sum up to 1.
>
> in other words i want to expand my language model with multiple 
> tri-grams that are unseen events.
>
> maybe if i tell you the main reason why i want to do this it will be 
> more clear.
>
> i am trying to find the symmetric KL Divergence of two distributions. 
> and these two distributions will be two language models.
>
> the formula for symmetric KL divergence is:
>
> i being all trigrams in both models:
>
> sum[ p(i) *  (log(p(i)) / log(q(i)))  ]  + sum[ q(i) *  (log(q(i)) / 
> log(p(i)))  ]
>
> sums are over all i's.
>
> p(i) is the probability in language model 1. and q(i) is the 
> probability in language model 2.
>
> since we are doing this over all i's, it means we have to include the 
> probabilities of trigrams that occur in one LM and no the other in 
> that particular LM. otherwise we will get log(0) error. so we will 
> need some kind of smoothing.
But you don't get log(0) because the LM is smoothed and therefore the 
p's and q's are all > 0.
BTW, you only get a problem when the term in the denominator is 
undefined, because 0 * log(0) = 0.

So you can sum over the UNION of all ngrams in both models, and when you 
need to compute the p(i) or q(i) for an ngram that is not in the 
particular model you use the backoff estimate (i.e., just what SRILM 
will compute when you ask it to compute a probability that is not 
explicitly represented in the model).

BTW, for this type of thing you wants to use ngram -counts , and then 
postprocess the output.

Andreas

>
> say LM1 has these trigrams:
>
> a  1/3
> b  1/3
> c  1/3
>
> and LM2 has these:
>
> a  1/2
> d  1/2
>
> now when we re doing the KL divergence calculation, we need to make 
> sure "d" is in LM1, and also "b" and "c" are in LM2. otherwise we ll 
> get log(0). so we ll need to modify LM1 and LM2 by smoothing, so they 
> include the non-zero probabilities for b,c and d. and still each sum 
> up to 1 with their probabilities.
>
> if we use the test-training approach, and try to see the probabilities 
> of unseen events, we are not updating out current LM to include those 
> unseen events. in fact that is what i want to do. include a list of 
> unseen trigrams, (that might might possibly have lower orders of 
> n-grams in the model) in that language model.
>
>
> On 10/25/11 10:10 PM, Andreas Stolcke wrote:
>> Burkay Gur wrote:
>>> thank you, i understand that. but the problem is, like you said, how 
>>> do we introduce these "unobserved trigrams" into the language model. 
>>> i ll give another example if it helps:
>>>
>>> say you have this test.count file:
>>>
>>> 1-gram
>>> this
>>> is
>>> a
>>> test
>>>
>>> 2-gram
>>> this is
>>> is a
>>> a test
>>>
>>> 3-gram
>>> this is a
>>> is a test
>>>
>>> then, say you want to extend this language model with this trigram:
>>>
>>> "this is not"
>>>
>>> which basically has no previous count. and without smoothing in the 
>>> 3-gram model, it will have zero probability. but how do we make sure 
>>> that the smooth language model has a non-zero probability for this 
>>> additional trigram?
>>>
>>> i thought i could do this my manually by updating the test.count 
>>> with "this is not" with count 0. but apparently this is not working..
>> The smoothed 3gram LM will have a non-zero probability, for ALL 
>> trigrams, trust me ;-)
>>
>> Try
>>    echo "this is not"  | ngram -lm LM -ppl - -debug 2
>>
>> to see it in action.
>>
>> Andreas
>>
>>>
>>> On 10/25/11 6:38 PM, Andreas Stolcke wrote:
>>>> Burkay Gur wrote:
>>>>> To follow up, basically, when I edit the .count file and add 0 
>>>>> counts for some trigrams, they will not be included in the final 
>>>>> .lm file, when I try to read from the .count file and create a 
>>>>> language model.
>>>> A zero  count is complete equivalent to a  non-existent count, so 
>>>> what you're seeing it expected.
>>>>
>>>> It is not clear what precisely you want to happen.   As a result of 
>>>> discounting and backing off, your LM, even without the unobserved 
>>>> trigram, will already assign a non-zero probability to that 
>>>> trigram.  That's exactly what the ngram smoothing algorithms are for.
>>>>
>>>> If you want to inject some specific statistical information rom 
>>>> another dataset into your target LM you could interpolate (mix) the 
>>>> two LMs to obtain a third LM.   See the description of the ngram 
>>>> -mix-lm option.
>>>>
>>>> Andreas
>>>>
>>>>>
>>>>> On 10/25/11 3:41 PM, Burkay Gur wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I have just started using SRILM, and it is a great tool. But I 
>>>>>> ran across this issue. The situation is that I have:
>>>>>>
>>>>>> corpusA.txt
>>>>>> corpusB.txt
>>>>>>
>>>>>> What I want to do is create two different 3-gram language models 
>>>>>> for both corpora. But I want to make sure that if a triplet is 
>>>>>> non-existent in the other corpus, then a smoothed probability 
>>>>>> should be assigned to that. For example;
>>>>>>
>>>>>> if corpusA has triplet counts:
>>>>>>
>>>>>> this is a    1
>>>>>> is a test    1
>>>>>>
>>>>>> and corpusB has triplet counts:
>>>>>>
>>>>>> that is a    1
>>>>>> is a test    1
>>>>>>
>>>>>> then the final counts for corpusA should be:
>>>>>>
>>>>>> this is a    1
>>>>>> is a test    1
>>>>>> that is a    0
>>>>>>
>>>>>> because "that is a" is in B but not A.
>>>>>>
>>>>>> similarly corpusB should be:
>>>>>>
>>>>>> that is a    1
>>>>>> is a test    1
>>>>>> this is a    0
>>>>>>
>>>>>> After the counts are setup, some smoothing algorithm might be 
>>>>>> used. I have manually tried to make the triple word counts 0, 
>>>>>> however it does not seem to work. As they are omitted from 3-grams.
>>>>>>
>>>>>> Can you recommend any other ways of doing this?
>>>>>>
>>>>>> Thank you,
>>>>>> Burkay
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> SRILM-User site list
>>>>> SRILM-User at speech.sri.com
>>>>> http://www.speech.sri.com/mailman/listinfo/srilm-user
>>>>
>>>
>>
>


From wuxichuan.go at gmail.com  Tue Nov  1 10:36:39 2011
From: wuxichuan.go at gmail.com (Xichuan Wu)
Date: Tue, 1 Nov 2011 18:36:39 +0100
Subject: [SRILM User List] Problem on Installing SRILM
Message-ID: <CAGpUkWYER3wYV_uydEP=NaQ8Zfpg-Q1Ua5yzHHQ-ZLHg2z+o6g@mail.gmail.com>

Hi All,

I have been trying to install SRILM but confronted with one problem, which
googling does not help. Some infos about the platform: Win7, 32bit, Cygwin
including '*csh*' and '*tcsh*'. I am working with *Joshua decoder*.

After downloading and unzipping *srilm.tgz*, I tried *make* command and got
the following:
make: /sbin/machine-type: Command not found
mkdir include lib bin
mkdir: cannot create directory `include': File exists
mkdir: cannot create directory `lib': File exists
mkdir: cannot create directory `bin': File exists
make: [dirs] Error 1 (ignored)
make init
make[1]: /sbin/machine-type: Command not found
make[1]: Entering directory `/cygdrive/f/CL/Drei/Project/Joshua/srilm'
make[1]: Entering directory `/cygdrive/f/CL/Drei/Project/Joshua/srilm'
for subdir in misc dstruct lm flm lattice utils; do \
(cd $subdir/src; make SRILM= MACHINE_TYPE= OPTION= MAKE_PIC= init) || exit
1; \
done
make[2]: Entering directory
`/cygdrive/f/CL/Drei/Project/Joshua/srilm/misc/src'
Makefile:24: /common/Makefile.common.variables: No such file or directory
Makefile:139: /common/Makefile.common.targets: No such file or directory
make[2]: *** No rule to make target `/common/Makefile.common.targets'.
 Stop.
make[2]: Leaving directory
`/cygdrive/f/CL/Drei/Project/Joshua/srilm/misc/src'
make[1]: *** [init] Error 1
make[1]: Leaving directory `/cygdrive/f/CL/Drei/Project/Joshua/srilm'
make: *** [World] Error 2


After I changed *Makefile* in the top level, specifically *PACKAGE_DIR =
F:/CL/Drei/Project/Joshua/srilm*, where the directory is the one
*srilm.tgz*unzipped into. When I try make command, what I get is the
following:

Makefile:100: *** target pattern contains no `%'.  Stop.

I know there is some problem with line 100 in the *Makefile* file, which is:

package: $(PACKAGE_DIR)/EXCLUDE
$(TAR) cvzXf $(PACKAGE_DIR)/EXCLUDE $(PACKAGE_DIR)/srilm-$(RELEASE).tar.gz .

Where should I add `%'? Or there are other problem in this? Please help.

Thanks.
Xichuan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20111101/f5308803/attachment.html>

From stolcke at icsi.berkeley.edu  Tue Nov  1 14:27:37 2011
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Tue, 01 Nov 2011 14:27:37 -0700
Subject: [SRILM User List] Problem on Installing SRILM
In-Reply-To: Your message of Tue, 01 Nov 2011 18:36:39 +0100.
	<CAGpUkWYER3wYV_uydEP=NaQ8Zfpg-Q1Ua5yzHHQ-ZLHg2z+o6g@mail.gmail.com>
Message-ID: <201111012128.pA1LRbXX022986@fruitcake.ICSI.Berkeley.EDU>


You didn't set the SRILM variable correctly.
Either edit the top-level Makefile or invoke make with 

	make SRILM=/absolute/path/to/srilm  World 

Do not change PACKAGE_DIR, change the SRILM variable instead.
Do not use DOS-style path names (F:\...).  Use cygwin paths, like /home/username/srilm.

Andreas 

In message <CAGpUkWYER3wYV_uydEP=NaQ8Zfpg-Q1Ua5yzHHQ-ZLHg2z+o6g at mail.gmail.com>you wrote:
> 
> Hi All,
> 
> I have been trying to install SRILM but confronted with one problem, which
> googling does not help. Some infos about the platform: Win7, 32bit, Cygwin
> including '*csh*' and '*tcsh*'. I am working with *Joshua decoder*.
> 
> After downloading and unzipping *srilm.tgz*, I tried *make* command and got
> the following:
> make: /sbin/machine-type: Command not found
> mkdir include lib bin
> mkdir: cannot create directory `include': File exists
> mkdir: cannot create directory `lib': File exists
> mkdir: cannot create directory `bin': File exists
> make: [dirs] Error 1 (ignored)
> make init
> make[1]: /sbin/machine-type: Command not found
> make[1]: Entering directory `/cygdrive/f/CL/Drei/Project/Joshua/srilm'
> make[1]: Entering directory `/cygdrive/f/CL/Drei/Project/Joshua/srilm'
> for subdir in misc dstruct lm flm lattice utils; do \
> (cd $subdir/src; make SRILM= MACHINE_TYPE= OPTION= MAKE_PIC= init) || exit
> 1; \
> done
> make[2]: Entering directory
> `/cygdrive/f/CL/Drei/Project/Joshua/srilm/misc/src'
> Makefile:24: /common/Makefile.common.variables: No such file or directory
> Makefile:139: /common/Makefile.common.targets: No such file or directory
> make[2]: *** No rule to make target `/common/Makefile.common.targets'.
>  Stop.
> make[2]: Leaving directory
> `/cygdrive/f/CL/Drei/Project/Joshua/srilm/misc/src'
> make[1]: *** [init] Error 1
> make[1]: Leaving directory `/cygdrive/f/CL/Drei/Project/Joshua/srilm'
> make: *** [World] Error 2
> 
> 
> After I changed *Makefile* in the top level, specifically *PACKAGE_DIR =
> F:/CL/Drei/Project/Joshua/srilm*, where the directory is the one
> *srilm.tgz*unzipped into. When I try make command, what I get is the
> following:
> 
> Makefile:100: *** target pattern contains no `%'.  Stop.
> 

From d_emps at yahoo.com  Sat Nov 19 07:13:02 2011
From: d_emps at yahoo.com (Simon h s)
Date: Sat, 19 Nov 2011 07:13:02 -0800 (PST)
Subject: [SRILM User List] problem installing ubuntu 11.10
Message-ID: <1321715582.63739.YahooMailNeo@web110608.mail.gq1.yahoo.com>

Dear all,?

I have a problem when compiling SRILM in Ubuntu 11.10

after installing all required package mentioned in INSTALL, I have the following error when running make World:

make[2]: Entering directory `/home/ndriks/Thesis/tools/atools/srilm/misc/src'
/usr/bin/gcc -march=athlon64 -m64 -Wall -Wno-unused-variable -Wno-uninitialized -D_FILE_OFFSET_BITS=64 ? /usr/include/tcl8.5/tcl.h -I. -I../../include ? -c -g -O3 -o ../obj/i686/option.o option.c
gcc: fatal error: cannot specify -o with -c, -S or -E with multiple files
compilation terminated.
make[2]: *** [../obj/i686/option.o] Error 4
make[2]: Leaving directory `/home/ndriks/Thesis/tools/atools/srilm/misc/src'
make[1]: *** [release-libraries] Error 1
make[1]: Leaving directory `/home/ndriks/Thesis/tools/atools/srilm'
make: *** [World] Error 2
?
full error is attached.

FYI I'm using srilm 1.5.12

Please help??
Thanks before

--
Simon H S
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20111119/cb20e5fd/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: makeworld.out
Type: application/octet-stream
Size: 13574 bytes
Desc: not available
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20111119/cb20e5fd/attachment.obj>

From wuxichuan.go at gmail.com  Sat Nov 19 10:00:59 2011
From: wuxichuan.go at gmail.com (Xichuan Wu)
Date: Sat, 19 Nov 2011 19:00:59 +0100
Subject: [SRILM User List] problem installing ubuntu 11.10
In-Reply-To: <1321715582.63739.YahooMailNeo@web110608.mail.gq1.yahoo.com>
References: <1321715582.63739.YahooMailNeo@web110608.mail.gq1.yahoo.com>
Message-ID: <CAGpUkWa5TDM8eKQGETqW0kY-b5nWwSq_DF=mwVc34WUuPHdZTA@mail.gmail.com>

Hi Simon,

I made it recently on Ubuntu 11.10. Here is a suggestion can follow:
1. In this file /common/Makefile.machine.$MACHINE_TYPE, change
GCC_FLAGS = -march=athlon64 -m64 -Wall -Wno-unused-variable
-Wno-uninitialized
to
GCC_FLAGS = -march=athlon64 -m64 -Wall -Wno-unused-variable
-Wno-uninitialized -fPIC
then, run with command: make MAKE_PIC=yes MACHINE_TYPE=$MACHINE_TYPE World
Note that 1) here $MACHINE_TYPE refers to your machine type (you can try
either "i686-m64" or "i686-ubuntu"); 2) before the make command, you might
need to use "make clean" to clean what's left from previous compile.
2. consult "Joshua technical support" googlegroup and you will find more
info.
Good luck!
Xichuan
On Sat, Nov 19, 2011 at 4:13 PM, Simon h s <d_emps at yahoo.com> wrote:

> Dear all,
>
> I have a problem when compiling SRILM in Ubuntu 11.10
>
> after installing all required package mentioned in INSTALL, I have the
> following error when running make World:
>
> make[2]: Entering directory
> `/home/ndriks/Thesis/tools/atools/srilm/misc/src'
> /usr/bin/gcc -march=athlon64 -m64 -Wall -Wno-unused-variable
> -Wno-uninitialized -D_FILE_OFFSET_BITS=64   /usr/include/tcl8.5/tcl.h -I.
> -I../../include   -c -g -O3 -o ../obj/i686/option.o option.c
> gcc: fatal error: cannot specify -o with -c, -S or -E with multiple files
> compilation terminated.
> make[2]: *** [../obj/i686/option.o] Error 4
> make[2]: Leaving directory
> `/home/ndriks/Thesis/tools/atools/srilm/misc/src'
> make[1]: *** [release-libraries] Error 1
> make[1]: Leaving directory `/home/ndriks/Thesis/tools/atools/srilm'
> make: *** [World] Error 2
>
> full error is attached.
>
> FYI I'm using srilm 1.5.12
>
> Please help?
> Thanks before
>
> --
> Simon H S
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20111119/3da7557c/attachment.html>

From stolcke at icsi.berkeley.edu  Sat Nov 19 14:39:35 2011
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Sat, 19 Nov 2011 14:39:35 -0800
Subject: [SRILM User List] problem installing ubuntu 11.10
In-Reply-To: <1321715582.63739.YahooMailNeo@web110608.mail.gq1.yahoo.com>
References: <1321715582.63739.YahooMailNeo@web110608.mail.gq1.yahoo.com>
Message-ID: <4EC83027.3000507@icsi.berkeley.edu>

Simon h s wrote:
> Dear all, 
>
> I have a problem when compiling SRILM in Ubuntu 11.10
>
> after installing all required package mentioned in INSTALL, I have the 
> following error when running make World:
>
> make[2]: Entering directory 
> `/home/ndriks/Thesis/tools/atools/srilm/misc/src'
> /usr/bin/gcc -march=athlon64 -m64 -Wall -Wno-unused-variable 
> -Wno-uninitialized -D_FILE_OFFSET_BITS=64   /usr/include/tcl8.5/tcl.h 
> -I. -I../../include   -c -g -O3 -o ../obj/i686/option.o option.c
> gcc: fatal error: cannot specify -o with -c, -S or -E with multiple files
> compilation terminated.
> make[2]: *** [../obj/i686/option.o] Error 4
> make[2]: Leaving directory 
> `/home/ndriks/Thesis/tools/atools/srilm/misc/src'
> make[1]: *** [release-libraries] Error 1
> make[1]: Leaving directory `/home/ndriks/Thesis/tools/atools/srilm'
> make: *** [World] Error 2
I believe the problem is caused by having

          TCL_INCLUDE = /usr/include/tcl8.5/tcl.h

You should use

          TCL_INCLUDE = -I/usr/include/tcl8.5

instead in Makefile.machine.i686 (or Makefile.machine.i686-m64).

Andreas


From dmytro.prylipko at ovgu.de  Fri Dec 16 01:47:57 2011
From: dmytro.prylipko at ovgu.de (Dmytro Prylipko)
Date: Fri, 16 Dec 2011 10:47:57 +0100
Subject: [SRILM User List] A problem with expanding class-based LMs
Message-ID: <4EEB13CD.1050007@ovgu.de>

Hi Andreas,

I have a class-based LM, which gives a particular perplexity value on 
the test set:

ngram -ppl test.fold3.txt -lm 2-gram.class.dd150.fold3.lm -classes 
class.dd150.fold3.defs -order 2 -vocab ../all.wlist

file test.fold3.txt: 1397 sentences, 37403 words, 0 OOVs
427 zeroprobs, logprob= -72617.1 ppl= 78.0551 ppl1= 92.0235

I expanded it and got a word-level model:

ngram -lm 2-gram.class.dd150.fold3.lm -classes class.dd150.fold3.defs 
-order 2 -write-lm 2-gram.class.dd150.expanded_exact.fold3.lm 
-expand-classes 2 -expand-exact 2 -vocab ../all.wlist


But the new model provides different result:

ngram -ppl test.fold3.txt -lm 2-gram.class.dd150.expanded_exact.fold3.lm 
-order 2 -vocab ../all.wlist

file test.fold3.txt: 1397 sentences, 37403 words, 0 OOVs
0 zeroprobs, logprob= -78108.4 ppl= 103.063 ppl1= 122.544

You can see there is no more zeroprobs in the new one, which .affects 
the perplexity.


I can show you detailed output from both models:

Class-based:

<s> gruess gott frau traub </s>
         p( gruess | <s> )       = [OOV][2gram] 0.0167159 [ -1.77687 ]
         p( gott | gruess ...)   = [OOV][1gram][OOV][2gram] 0.658525 [ 
-0.181428 ]
         p( frau | gott ...)     = [OOV][1gram][OOV][2gram] 0.119973 [ 
-0.920917 ]
         p( traub | frau ...)    = [OOV][OOV] 0 [ -inf ]
         p( </s> | traub ...)    = [1gram] 0.0377397 [ -1.4232 ]
1 sentences, 4 words, 0 OOVs
1 zeroprobs, logprob= -4.30242 ppl= 11.9016 ppl1= 27.1731


And the same sentence with expanded LM:

<s> gruess gott frau traub </s>
         p( gruess | <s> )       = [2gram] 0.0167159 [ -1.77687 ]
         p( gott | gruess ...)   = [2gram] 0.658525 [ -0.181428 ]
         p( frau | gott ...)     = [2gram] 0.119973 [ -0.920917 ]
         p( traub | frau ...)    = [1gram] 3.84699e-14 [ -13.4149 ]
         p( </s> | traub ...)    = [1gram] 0.0377397 [ -1.4232 ]
1 sentences, 4 words, 0 OOVs
0 zeroprobs, logprob= -17.7173 ppl= 3495.1 ppl1= 26873.5


 From my point of view it looks like a computational error, such a small 
probabilities should be treated as zero.
BTW, how can zero probabilities appear there? They should be smoothed, 
right?

I divided my corpus on 10 folds and performed these actions on all of 
them. With 6 folds everything is fine, perplexities are almost the same 
for both models, but with other 4 parts I have such a problem.

I would be greatly appreciated for any help.

Sincerely yours,
Dmytro Prylipko.


From dyuret at ku.edu.tr  Mon Dec 19 03:52:13 2011
From: dyuret at ku.edu.tr (Deniz Yuret)
Date: Mon, 19 Dec 2011 13:52:13 +0200
Subject: [SRILM User List] lines starting with ## skipped
Message-ID: <CACAd_eO_-kg4zYbMNmvR8eohg2fzsid+8qZzM3SC9iu5x6v6Ew@mail.gmail.com>

Hi,

I was working on the reuters rcv1 corpus and while investigating a
discrepancy in the language model output I realized that the ngram
command skips lines in the test file that start with '##'.  Is this a
documented feature or a bug?

best,
deniz

From stolcke at icsi.berkeley.edu  Mon Dec 19 11:32:19 2011
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Mon, 19 Dec 2011 11:32:19 -0800
Subject: [SRILM User List] lines starting with ## skipped
In-Reply-To: <CACAd_eO_-kg4zYbMNmvR8eohg2fzsid+8qZzM3SC9iu5x6v6Ew@mail.gmail.com>
References: <CACAd_eO_-kg4zYbMNmvR8eohg2fzsid+8qZzM3SC9iu5x6v6Ew@mail.gmail.com>
Message-ID: <4EEF9143.2050804@icsi.berkeley.edu>

Deniz Yuret wrote:
> Hi,
>
> I was working on the reuters rcv1 corpus and while investigating a
> discrepancy in the language model output I realized that the ngram
> command skips lines in the test file that start with '##'.  Is this a
> documented feature or a bug?
>   
Yes, it's a feature of the File::getline() function, but not documented.
In the API you can disable this by setting the skipComments variable in 
the File object to false.
There is currently no way to do it at the command line (but would be 
easy to add an option).

A workaround is to insert a space character at the beginning of each 
input line.

Andreas


From fmang at ieee.org  Tue Dec 20 19:26:28 2011
From: fmang at ieee.org (Federico Ang)
Date: Wed, 21 Dec 2011 11:26:28 +0800
Subject: [SRILM User List] lattice-ngram test seg fault on Ubuntu with x86_64
Message-ID: <CADooz28Mp+TbzjKz57c1-SSvPk+wgdeKwMMjtm_5dyDeZWM2wA@mail.gmail.com>

Hello,

I successfully compiled SRILM 1.6.0 with Ubuntu 11.04 on an Intel Core i5
with -march=core2 -m64 (I edited the i686-m64 makefile) and with Tcl 8.5,
Gawk 3.1.8, and gcc/g++ 4.6.1 .  On make test, all test gives IDENTICAL
results except for the lattice-ngram test, which gives DIFFERS for both
stdout and stderr.  Investigating the problem, I found that the stdout for
the output is empty.  On the other hand, stderr output is exactly the same
as stderr reference except that there's Segmentation Fault on the last
line.  I don't know how to investigate further.  Please advise so I can
have all test passed.

Best,
Federico Ang
DSP Laboratory, EEE Institute
Univ. of the Phils., Diliman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20111221/1a049b6e/attachment.html>

From stolcke at icsi.berkeley.edu  Tue Dec 20 23:22:05 2011
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Tue, 20 Dec 2011 23:22:05 -0800
Subject: [SRILM User List] lattice-ngram test seg fault on Ubuntu with
 x86_64
In-Reply-To: <CADooz28Mp+TbzjKz57c1-SSvPk+wgdeKwMMjtm_5dyDeZWM2wA@mail.gmail.com>
References: <CADooz28Mp+TbzjKz57c1-SSvPk+wgdeKwMMjtm_5dyDeZWM2wA@mail.gmail.com>
Message-ID: <4EF1891D.2090601@icsi.berkeley.edu>

Federico Ang wrote:
> Hello,
>
> I successfully compiled SRILM 1.6.0 with Ubuntu 11.04 on an Intel Core 
> i5 with -march=core2 -m64 (I edited the i686-m64 makefile) and with 
> Tcl 8.5, Gawk 3.1.8, and gcc/g++ 4.6.1 .  On make test, all test gives 
> IDENTICAL results except for the lattice-ngram test, which gives 
> DIFFERS for both stdout and stderr.  Investigating the problem, I 
> found that the stdout for the output is empty.  On the other hand, 
> stderr output is exactly the same as stderr reference except that 
> there's Segmentation Fault on the last line.  I don't know how to 
> investigate further.  Please advise so I can have all test passed.
Check if you get the same problem with default compiler options (without 
-march=core2) and, if possible, with older versions of gcc.   I have not 
seen core dumps on any tests, including with Ubuntu systems I have 
access to, though the compiler versions might have been less recent.

Andreas


From stolcke at icsi.berkeley.edu  Wed Dec 21 14:47:42 2011
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Wed, 21 Dec 2011 14:47:42 -0800
Subject: [SRILM User List] lattice-ngram test seg fault on Ubuntu with
 x86_64
In-Reply-To: <CADooz29zVvHnHHFQtsW--fzvQrFUQTik1sJ78kvBM0uwawCzPw@mail.gmail.com>
References: <CADooz28Mp+TbzjKz57c1-SSvPk+wgdeKwMMjtm_5dyDeZWM2wA@mail.gmail.com>	<4EF1891D.2090601@icsi.berkeley.edu>
	<CADooz29zVvHnHHFQtsW--fzvQrFUQTik1sJ78kvBM0uwawCzPw@mail.gmail.com>
Message-ID: <4EF2620E.6080808@icsi.berkeley.edu>

Federico Ang wrote:
> You we're right. gcc-4.5 did the trick.  and it's not on the 
> architecture/instruction set. Thank you so much! :)
Glad to hear it .  cc-ing srilm-user for the record.

Andreas

>
> Best,
> Fed Ang
>
> On Wed, Dec 21, 2011 at 3:22 PM, Andreas Stolcke 
> <stolcke at icsi.berkeley.edu <mailto:stolcke at icsi.berkeley.edu>> wrote:
>
>     Federico Ang wrote:
>
>         Hello,
>
>         I successfully compiled SRILM 1.6.0 with Ubuntu 11.04 on an
>         Intel Core i5 with -march=core2 -m64 (I edited the i686-m64
>         makefile) and with Tcl 8.5, Gawk 3.1.8, and gcc/g++ 4.6.1 .
>          On make test, all test gives IDENTICAL results except for the
>         lattice-ngram test, which gives DIFFERS for both stdout and
>         stderr.  Investigating the problem, I found that the stdout
>         for the output is empty.  On the other hand, stderr output is
>         exactly the same as stderr reference except that there's
>         Segmentation Fault on the last line.  I don't know how to
>         investigate further.  Please advise so I can have all test passed.
>
>     Check if you get the same problem with default compiler options
>     (without -march=core2) and, if possible, with older versions of
>     gcc.   I have not seen core dumps on any tests, including with
>     Ubuntu systems I have access to, though the compiler versions
>     might have been less recent.
>
>     Andreas
>
>


From stolcke at icsi.berkeley.edu  Wed Dec 21 17:03:51 2011
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Wed, 21 Dec 2011 17:03:51 -0800
Subject: [SRILM User List] A problem with expanding class-based LMs
In-Reply-To: <4EEB13CD.1050007@ovgu.de>
References: <4EEB13CD.1050007@ovgu.de>
Message-ID: <4EF281F7.5080805@icsi.berkeley.edu>

My guess is that your class definitions contain multiple words per 
expansion, such as "GREETING"  expanding to "gruess gott".    In that 
case a bigram expansion of the LM will not have as much predictive power 
as the original class bigram LM.
Try using -expand-classes 3 (or even higher).

Andreas

Dmytro Prylipko wrote:
> Hi Andreas,
>
> I have a class-based LM, which gives a particular perplexity value on 
> the test set:
>
> ngram -ppl test.fold3.txt -lm 2-gram.class.dd150.fold3.lm -classes 
> class.dd150.fold3.defs -order 2 -vocab ../all.wlist
>
> file test.fold3.txt: 1397 sentences, 37403 words, 0 OOVs
> 427 zeroprobs, logprob= -72617.1 ppl= 78.0551 ppl1= 92.0235
>
> I expanded it and got a word-level model:
>
> ngram -lm 2-gram.class.dd150.fold3.lm -classes class.dd150.fold3.defs 
> -order 2 -write-lm 2-gram.class.dd150.expanded_exact.fold3.lm 
> -expand-classes 2 -expand-exact 2 -vocab ../all.wlist
>
>
> But the new model provides different result:
>
> ngram -ppl test.fold3.txt -lm 
> 2-gram.class.dd150.expanded_exact.fold3.lm -order 2 -vocab ../all.wlist
>
> file test.fold3.txt: 1397 sentences, 37403 words, 0 OOVs
> 0 zeroprobs, logprob= -78108.4 ppl= 103.063 ppl1= 122.544
>
> You can see there is no more zeroprobs in the new one, which .affects 
> the perplexity.
>
>
> I can show you detailed output from both models:
>
> Class-based:
>
> <s> gruess gott frau traub </s>
>         p( gruess | <s> )       = [OOV][2gram] 0.0167159 [ -1.77687 ]
>         p( gott | gruess ...)   = [OOV][1gram][OOV][2gram] 0.658525 [ 
> -0.181428 ]
>         p( frau | gott ...)     = [OOV][1gram][OOV][2gram] 0.119973 [ 
> -0.920917 ]
>         p( traub | frau ...)    = [OOV][OOV] 0 [ -inf ]
>         p( </s> | traub ...)    = [1gram] 0.0377397 [ -1.4232 ]
> 1 sentences, 4 words, 0 OOVs
> 1 zeroprobs, logprob= -4.30242 ppl= 11.9016 ppl1= 27.1731
>
>
> And the same sentence with expanded LM:
>
> <s> gruess gott frau traub </s>
>         p( gruess | <s> )       = [2gram] 0.0167159 [ -1.77687 ]
>         p( gott | gruess ...)   = [2gram] 0.658525 [ -0.181428 ]
>         p( frau | gott ...)     = [2gram] 0.119973 [ -0.920917 ]
>         p( traub | frau ...)    = [1gram] 3.84699e-14 [ -13.4149 ]
>         p( </s> | traub ...)    = [1gram] 0.0377397 [ -1.4232 ]
> 1 sentences, 4 words, 0 OOVs
> 0 zeroprobs, logprob= -17.7173 ppl= 3495.1 ppl1= 26873.5
>
>
> From my point of view it looks like a computational error, such a 
> small probabilities should be treated as zero.
> BTW, how can zero probabilities appear there? They should be smoothed, 
> right?
>
> I divided my corpus on 10 folds and performed these actions on all of 
> them. With 6 folds everything is fine, perplexities are almost the 
> same for both models, but with other 4 parts I have such a problem.
>
> I would be greatly appreciated for any help.
>
> Sincerely yours,
> Dmytro Prylipko.
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user


From stolcke at icsi.berkeley.edu  Thu Dec 22 15:19:49 2011
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Thu, 22 Dec 2011 15:19:49 -0800
Subject: [SRILM User List] A problem with expanding class-based LMs
In-Reply-To: Your message of Thu, 22 Dec 2011 20:50:43 +0100.
	<4EF38A13.7020309@ovgu.de>
Message-ID: <201112222319.pBMNJnV5024999@fruitcake.ICSI.Berkeley.EDU>


The problem turns out to be a sensitivity in the backoff computation to
sums of probabilities that are exactly zero versus numerically equal
to zero (less than Prob_Epsilon).  In your case, the sum of unigram 
probs of the expanded LM is sometimes very slightly less than 1, causing
some probabily mass to be distributed over all the unseen words, and 
the perplexity to be changed noticeably.   

The patch below will catch these cases and produce consistent results independent
of these small numerical differences (which result from probabilties being summed
in different order, depending on whether the iteration is over sorted arrays or 
hash tables).

Andreas

diff -c -r1.122 NgramLM.cc
*** lm/src/NgramLM.cc	30 May 2011 23:46:38 -0000	1.122
--- lm/src/NgramLM.cc	22 Dec 2011 22:27:58 -0000
***************
*** 2118,2125 ****
  	     * unigrams, which we achieve by giving them zero probability.
  	     */
  	    if (order == 0 /*&& numerator > 0.0*/) {
  		distributeProb(numerator, context);
! 	    } else if (numerator == 0.0 && denominator == 0.0) {
  		node->bow = LogP_One;
  	    } else {
  		node->bow = ProbToLogP(numerator) - ProbToLogP(denominator);
--- 2118,2131 ----
  	     * unigrams, which we achieve by giving them zero probability.
  	     */
  	    if (order == 0 /*&& numerator > 0.0*/) {
+ 		if (numerator < Prob_Epsilon) {
+ 		    /*
+ 		     * Avoid spurious non-zero unigram probabilities
+ 		     */
+ 		    numerator = 0.0;
+ 		}
  		distributeProb(numerator, context);
! 	    } else if (numerator < Prob_Epsilon && denominator < Prob_Epsilon) {
  		node->bow = LogP_One;
  	    } else {
  		node->bow = ProbToLogP(numerator) - ProbToLogP(denominator);


In message <4EF38A13.7020309 at ovgu.de>you wrote:
> 
> I had repeated expansion with different binaries and got different 
> results again.
> I attached the source files and corresponding scripts to this e-mail. I 
> did not included the expanded models since they are too large, but they 
> are also available.
> 
> I hope this will help you to investigate the problem.
> 
> Sincerely yours,
> Dmytro Prylipko.
> 
> Am 12/22/2011 7:38 PM, schrieb Andreas Stolcke:
> > Dmytro Prylipko wrote:
> >> I tried expansion also on trigrams with the same problem.
> >> Actually I managed to cope it. I compiled the SRILM with the "_c" 
> >> option and expanded my bigrams with that binary. It helped 
> >> (perplexity measures became the same), however in this case another 
> >> bigrams (expanded ok with usual binary) had the problem described 
> >> before. Is it a bug?
> > You should never get different results (other than sorting order, 
> > e.g., in counts files) with the regular and the _c version.
> > Can you send me the inputs involved?
> >
> > Andreas
> >

From ghenryww at roadrunner.com  Fri Dec 23 12:21:56 2011
From: ghenryww at roadrunner.com (Gil Henry)
Date: Fri, 23 Dec 2011 12:21:56 -0800
Subject: [SRILM User List] ngram count
Message-ID: <000d01ccc1b0$84633200$8d299600$@com>

I have subscribed! I am getting the message ngram count no command when I
execute ngram count with all of the necessary parameters and proper syntax.
Reference SRILM FAQ A1, I have tried scripts; nothing works. Srilm/bin has
commands for ngram and ngram-count (console display). make World, make all,
make cleanest run with no errors.  make test runs with with all "identical",
no "differs". 

 
Thanks,  any help will be appreciated.

 
Gilbert L. Henry

 
ghenryww at roadrunner.com

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20111223/fb080ada/attachment.html>

From stolcke at icsi.berkeley.edu  Fri Dec 23 23:59:47 2011
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Fri, 23 Dec 2011 23:59:47 -0800
Subject: [SRILM User List] ngram count
In-Reply-To: <000d01ccc1b0$84633200$8d299600$@com>
References: <000d01ccc1b0$84633200$8d299600$@com>
Message-ID: <4EF58673.8060709@icsi.berkeley.edu>

Gil Henry wrote:
>
> I have subscribed! I am getting the message ngram count no command 
> when I execute ngram count with all of the necessary parameters and 
> proper syntax. Reference SRILM FAQ A1, I have tried scripts; nothing 
> works. Srilm/bin has commands for ngram and ngram-count (console 
> display). make World, make all, make cleanest run with no errors. make 
> test runs with with all ?identical?, no ?differs?.
>
I the test succeed and the bin directory is populated then the build was 
successful, and your only problem is that you cannot find the binaries 
in your executable search path.

Make sure the PATH variable includes $SRILM/bin/$MACHINE_TYPE , where 
$MACHINE_TYPE is the platform name you build for. If you can't manage 
that ask for help from a local linux or windows expert.

Andreas

> Thanks, any help will be appreciated.
>
> Gilbert L. Henry
>
> ghenryww at roadrunner.com <mailto:ghenryww at roadrunner.com>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user


From saman_2004 at yahoo.com  Mon Dec 26 20:26:28 2011
From: saman_2004 at yahoo.com (Saman Noorzadeh)
Date: Mon, 26 Dec 2011 20:26:28 -0800 (PST)
Subject: [SRILM User List] big difference between ppl and ppl1
Message-ID: <1324959988.52750.YahooMailNeo@web162006.mail.bf1.yahoo.com>


I? made 2 models of 2 languages, Dutch and English, to make a language recognition.
I got the following perplexities:

Model: Dutch??? Test: English??? ppl:55??? ppl2: 2* 10^18
Model: Dutch??? Test: Dutch?? ppl:303?? ppl2: 400
Model: English?? Test: Dutch?? ppl: 600? ppl2: 3122ses n

Model: English? Test: English??? ppl: 227?? ppl2: 1897

I think it is reasonable if I have a large perplexity when my model and test are different but why ppl=55 when having a Duch model and an English test?
and

Why is there a BIG difference in their ppl and ppl1 ?

Thanks in advance
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20111226/401bd3b2/attachment.html>

From burkay at mit.edu  Tue Dec 27 00:56:32 2011
From: burkay at mit.edu (Burkay Gur)
Date: Tue, 27 Dec 2011 10:56:32 +0200
Subject: [SRILM User List] big difference between ppl and ppl1
In-Reply-To: <1324959988.52750.YahooMailNeo@web162006.mail.bf1.yahoo.com>
References: <1324959988.52750.YahooMailNeo@web162006.mail.bf1.yahoo.com>
Message-ID: <0B3009A4-3E4E-4982-A4DF-D52FAC17A9F6@mit.edu>

Is your Dutch model arranged so that there is one sentence on each line? Also which command are you using? I recommend using -gt1max 1 -gt2max 1 -gt3max 1 and -ukndiscount for kneser ney smoothing. These will give you more accurate perplexities.

-Burkay

Sent from my iPad

On Dec 27, 2011, at 6:26 AM, Saman Noorzadeh <saman_2004 at yahoo.com> wrote:

> 
> I  made 2 models of 2 languages, Dutch and English, to make a language recognition.
> I got the following perplexities:
> 
> Model: Dutch    Test: English    ppl:55    ppl2: 2* 10^18
> Model: Dutch    Test: Dutch    ppl:303    ppl2: 400
> Model: English    Test: Dutch    ppl: 600   ppl2: 3122ses n
> Model: English   Test: English    ppl: 227    ppl2: 1897
> 
> I think it is reasonable if I have a large perplexity when my model and test are different but why ppl=55 when having a Duch model and an English test?
> and
> Why is there a BIG difference in their ppl and ppl1 ?
> 
> Thanks in advance
> 
> 
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20111227/e0c542b8/attachment.html>

From saman_2004 at yahoo.com  Tue Dec 27 03:58:34 2011
From: saman_2004 at yahoo.com (Saman Noorzadeh)
Date: Tue, 27 Dec 2011 03:58:34 -0800 (PST)
Subject: [SRILM User List] big difference between ppl and ppl1
In-Reply-To: <0B3009A4-3E4E-4982-A4DF-D52FAC17A9F6@mit.edu>
References: <1324959988.52750.YahooMailNeo@web162006.mail.bf1.yahoo.com>
	<0B3009A4-3E4E-4982-A4DF-D52FAC17A9F6@mit.edu>
Message-ID: <1324987114.82504.YahooMailNeo@web162004.mail.bf1.yahoo.com>

Yes both of my texts are 1 sentence per line, (but some sentences are a little long!)
I used gtmax options but the result were almost the same
the commands I use are as following:

to count:

ngram-count -order 3 -write-vocab language.voc -text language_tain.txt -write language.bo

to make the model:

ngram-count -order 3? language.bo -lm language.BO -gt2min 1 -gt3min 2


testing Perplexity:

ngram -lm language.BO -ppl language_test.txt 


Thank you
Saman


________________________________
 From: Burkay Gur <burkay at MIT.EDU>
To: Saman Noorzadeh <saman_2004 at yahoo.com> 
Cc: Srilm group <srilm-user at speech.sri.com> 
Sent: Tuesday, December 27, 2011 12:56 AM
Subject: Re: [SRILM User List] big difference between ppl and ppl1
 

Is your Dutch model arranged so that there is one sentence on each line? Also which command are you using? I recommend using -gt1max 1 -gt2max 1 -gt3max 1 and -ukndiscount for kneser ney smoothing. These will give you more accurate perplexities.

-Burkay

Sent from my iPad

On Dec 27, 2011, at 6:26 AM, Saman Noorzadeh <saman_2004 at yahoo.com> wrote:


>
>I? made 2 models of 2 languages, Dutch and English, to make a language recognition.
>I got the following perplexities:
>
>
>Model: Dutch??? Test: English??? ppl:55??? ppl2: 2* 10^18
>Model: Dutch??? Test: Dutch?? ppl:303?? ppl2: 400
>Model: English?? Test: Dutch?? ppl: 600? ppl2: 3122ses n
>
>Model: English? Test: English??? ppl: 227?? ppl2: 1897
>
>
>I think it is reasonable if I have a large perplexity when my model and test are different but why ppl=55 when having a Duch model and an English test?
>and
>
>Why is there a BIG difference in their ppl and ppl1 ?
>
>
>Thanks in advance
>
>
>
>
>
_______________________________________________
>SRILM-User site list
>SRILM-User at speech.sri.com
>http://www.speech.sri.com/mailman/listinfo/srilm-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20111227/e43d0675/attachment.html>

From burkay at mit.edu  Tue Dec 27 05:32:16 2011
From: burkay at mit.edu (Burkay Gur)
Date: Tue, 27 Dec 2011 15:32:16 +0200
Subject: [SRILM User List] big difference between ppl and ppl1
In-Reply-To: <1324987114.82504.YahooMailNeo@web162004.mail.bf1.yahoo.com>
References: <1324959988.52750.YahooMailNeo@web162006.mail.bf1.yahoo.com>
	<0B3009A4-3E4E-4982-A4DF-D52FAC17A9F6@mit.edu>
	<1324987114.82504.YahooMailNeo@web162004.mail.bf1.yahoo.com>
Message-ID: <5E431A43-7D70-4C07-BD2B-AE86B2B5C145@mit.edu>

To get lower and more relevant perplexities I d recommend getting rid of the -order 3 and adding the kneser ney smoothing. Also make sure the corpora are not too small. 

Sent from my iPad

On Dec 27, 2011, at 1:58 PM, Saman Noorzadeh <saman_2004 at yahoo.com> wrote:

> Yes both of my texts are 1 sentence per line, (but some sentences are a little long!)
> I used gtmax options but the result were almost the same
> the commands I use are as following:
> 
> to count:
> ngram-count -order 3 -write-vocab language.voc -text language_tain.txt -write language.bo
> 
> to make the model:
> ngram-count -order 3  language.bo -lm language.BO -gt2min 1 -gt3min 2
> 
> testing Perplexity:
> ngram -lm language.BO -ppl language_test.txt 
> 
> Thank you
> Saman
> From: Burkay Gur <burkay at MIT.EDU>
> To: Saman Noorzadeh <saman_2004 at yahoo.com> 
> Cc: Srilm group <srilm-user at speech.sri.com> 
> Sent: Tuesday, December 27, 2011 12:56 AM
> Subject: Re: [SRILM User List] big difference between ppl and ppl1
> 
> Is your Dutch model arranged so that there is one sentence on each line? Also which command are you using? I recommend using -gt1max 1 -gt2max 1 -gt3max 1 and -ukndiscount for kneser ney smoothing. These will give you more accurate perplexities.
> 
> -Burkay
> 
> Sent from my iPad
> 
> On Dec 27, 2011, at 6:26 AM, Saman Noorzadeh <saman_2004 at yahoo.com> wrote:
> 
>> 
>> I  made 2 models of 2 languages, Dutch and English, to make a language recognition.
>> I got the following perplexities:
>> 
>> Model: Dutch    Test: English    ppl:55    ppl2: 2* 10^18
>> Model: Dutch    Test: Dutch    ppl:303    ppl2: 400
>> Model: English    Test: Dutch    ppl: 600   ppl2: 3122ses n
>> Model: English   Test: English    ppl: 227    ppl2: 1897
>> 
>> I think it is reasonable if I have a large perplexity when my model and test are different but why ppl=55 when having a Duch model and an English test?
>> and
>> Why is there a BIG difference in their ppl and ppl1 ?
>> 
>> Thanks in advance
>> 
>> 
>> _______________________________________________
>> SRILM-User site list
>> SRILM-User at speech.sri.com
>> http://www.speech.sri.com/mailman/listinfo/srilm-user
> 
> 
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20111227/f34bf9fd/attachment.html>

From eragani at gmail.com  Thu Dec 29 21:13:36 2011
From: eragani at gmail.com (anil krishna eragani)
Date: Fri, 30 Dec 2011 04:13:36 -0100
Subject: [SRILM User List] Difficulty installing SRILM
Message-ID: <CACfkKNfPRTDz3nDWRMf2z5DZteH814ecmvhGiR8oUx2K8-yJMA@mail.gmail.com>

make[2]: Entering directory
`/home/eragani/Documents/Nlp_Tools/srilm/misc/src'
gcc -m32 -mtune=pentium3 -Wall -Wno-unused-variable -Wno-uninitialized
-D_FILE_OFFSET_BITS=64   -I/usr/include  -I. -I../../include   -c -g -O3 -o
../obj/i686/option.o option.c
In file included from /usr/include/time.h:4:0,
                 from /usr/include/sys/types.h:133,
                 from /usr/include/stdlib.h:320,
                 from option.c:23:
/usr/include/v8.h:79:1: error: unknown type name ?namespace?
/usr/include/v8.h:79:14: error: expected ?=?, ?,?, ?;?, ?asm? or
?__attribute__? before ?{? token
In file included from /usr/include/sys/types.h:133:0,
                 from /usr/include/stdlib.h:320,
                 from option.c:23:
/usr/include/time.h:6:1: error: unknown type name ?namespace?
/usr/include/time.h:6:14: error: expected ?=?, ?,?, ?;?, ?asm? or
?__attribute__? before ?{? token
option.c:34:57: error: unknown type name ?time_t?
option.c: In function ?Opt_Parse?:
option.c:195:5: warning: implicit declaration of function ?ParseTime?
[-Wimplicit-function-declaration]
option.c:196:9: error: ?time_t? undeclared (first use in this function)
option.c:196:9: note: each undeclared identifier is reported only once for
each function it appears in
option.c:196:17: error: expected expression before ?)? token
option.c: At top level:
option.c:400:5: error: unknown type name ?time_t?
make[2]: *** [../obj/i686/option.o] Error 1
make[2]: Leaving directory
`/home/eragani/Documents/Nlp_Tools/srilm/misc/src'
make[1]: *** [release-libraries] Error 1
make[1]: Leaving directory `/home/eragani/Documents/Nlp_Tools/srilm'
make: *** [World] Error 2
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20111230/f88c7384/attachment.html>

From eragani at gmail.com  Thu Dec 29 21:22:49 2011
From: eragani at gmail.com (anil krishna eragani)
Date: Fri, 30 Dec 2011 04:22:49 -0100
Subject: [SRILM User List] Difficulty installing SRILM
Message-ID: <CACfkKNdaoMyqUzoMn83zoqX-Z+zMyb_26UKrwSog-fnK9778YQ@mail.gmail.com>

uname -a
Linux anil-laptop 2.6.40.6-0.fc15.i686.PAE #1 SMP Tue Oct 4 00:44:38 UTC
2011 i686 i686 i386 GNU/Linux

gcc version 4.6.1 20110908 (Red Hat 4.6.1-9) (GCC)

make[2]: Entering directory `/home/eragani/Documents/Nlp_
Tools/srilm/misc/src'
gcc -m32 -mtune=pentium3 -Wall -Wno-unused-variable -Wno-uninitialized
-D_FILE_OFFSET_BITS=64   -I/usr/include  -I. -I../../include   -c -g -O3 -o
../obj/i686/option.o option.c
In file included from /usr/include/time.h:4:0,
                 from /usr/include/sys/types.h:133,
                 from /usr/include/stdlib.h:320,
                 from option.c:23:
/usr/include/v8.h:79:1: error: unknown type name ?namespace?
/usr/include/v8.h:79:14: error: expected ?=?, ?,?, ?;?, ?asm? or
?__attribute__? before ?{? token
In file included from /usr/include/sys/types.h:133:0,
                 from /usr/include/stdlib.h:320,
                 from option.c:23:
/usr/include/time.h:6:1: error: unknown type name ?namespace?
/usr/include/time.h:6:14: error: expected ?=?, ?,?, ?;?, ?asm? or
?__attribute__? before ?{? token
option.c:34:57: error: unknown type name ?time_t?
option.c: In function ?Opt_Parse?:
option.c:195:5: warning: implicit declaration of function ?ParseTime?
[-Wimplicit-function-declaration]
option.c:196:9: error: ?time_t? undeclared (first use in this function)
option.c:196:9: note: each undeclared identifier is reported only once for
each function it appears in
option.c:196:17: error: expected expression before ?)? token
option.c: At top level:
option.c:400:5: error: unknown type name ?time_t?
make[2]: *** [../obj/i686/option.o] Error 1
make[2]: Leaving directory
`/home/eragani/Documents/Nlp_Tools/srilm/misc/src'
make[1]: *** [release-libraries] Error 1
make[1]: Leaving directory `/home/eragani/Documents/Nlp_Tools/srilm'
make: *** [World] Error 2
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20111230/60bb2551/attachment.html>