From stolcke at speech.sri.com Fri Jan 6 14:43:54 2006
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Fri, 06 Jan 2006 14:43:54 PST
Subject: Another lattice rescoring problem
In-Reply-To: Your message of Tue, 18 Oct 2005 15:02:36 +0300.
<4354E45C.706@hut.fi>
Message-ID: <200601062243.OAA03798@tonga>
Teemu,
thanks for pointing out this bug, and sorry for taking so
long to get back to you. In the process of fixing
this I actual found a number of issues, one of them
affecting the lattice-tool -ppl function itself.
As you found out, the -no-nulls option masks the
problem to some extent, and that was the reason I hadn't noticed
these problems before.
That should all be fixed now.
You can download the current beta version from the web site.
If no further problems surface in the coming weeks I'll
release this as the next version.
--Andreas
In message <4354E45C.706 at hut.fi>you wrote:
> I ran into another problem with the lattice rescoring. I have two
> simple HTK lattices (acoustic log-probabilities in parentheses):
>
> test0.htk:
>
> a(-1) --+--> c(-2) ------+--> b(-3)
> | |
> +--> !NULL(-2) --+
>
> test1.htk:
>
> a(-1) -----> !NULL(-2) -----> b(-3)
>
> If I rescore the above lattices with a simple 2-gram language model
> test.arpa (see the end of the mail for the example files), the
> language model probability of the path "a b" is computed incorrectl
> y
> for the first lattice. In the second case, the probability is
> correct:
>
> $ echo "a b" | lattice-tool -in-lattice test0.htk -read-htk \
> -lm test.arpa -ppl - -debug 2
> ...
> p( a | ) = [10] 7.43548e-13 [ -12.1287 ]
> p( b | a ...) = [16] 8.00959e-07 [ -6.09639 ]
> p( | b ...) = [9] 6.88685e-14 [ -13.162 ]
> 0 zeroprobs, logprob= -31.3871 ppl= 2.8997e+10 ppl1= 4.93776e+15
>
> $ echo "a b" | lattice-tool -in-lattice test1.htk -read-htk \
> -lm test.arpa -ppl - -debug 2
> ...
> p( a | ) = [9] 2.57573e-17 [ -16.5891 ]
> p( b | a ...) = [13] 8.00959e-07 [ -6.09639 ]
> p( | b ...) = [8] 6.88685e-14 [ -13.162 ]
> 0 zeroprobs, logprob= -35.8475 ppl= 8.89522e+11 ppl1= 8.38947e+17
>
> It seems that the backoff probability BO(a) is missing from the fir
> st
> case.
>
> Next I tried to use the -no-nulls flag. Then I get correct languag
> e
> mode probabilities for both lattices, but the acoustic probability
> is
> incorrect, as the acoustic probability of the !NULL edge is discard
> ed.
> Should the general LM expansion handle !NULL edges correctly?
>
> I also tried changing the !NULL words to a distinct word symbol and
> specifying it with the -ignore-vocab flag to lattice-tool (tried
> versions 1.4.5 and 1.4.6 beta). Then the acoustic probabilities ar
> e
> preserved nicely, but again the backoff probability BO(a) is missin
> g
> from the first rescored lattice.
>
> Did I miss something again, or is the above expected behaviour?
>
> -Teemu
>
>
> Here are the example files:
>
> test0.htk:
>
> VERSION=1.1
> base=10
> dir=f
> lmscale=1 wdpenalty=0
> start=0 end=3
> N=4 L=4
> I=0
> I=1
> I=2
> I=3
> J=0 S=0 E=1 W=a a=-1
> J=1 S=1 E=2 W=!NULL a=-2
> J=2 S=1 E=2 W=c a=-2
> J=3 S=2 E=3 W=b a=-3
>
>
> test1.htk:
>
> VERSION=1.1
> base=10
> dir=f
> lmscale=1 wdpenalty=0
> start=0 end=3
> N=4 L=3
> I=0
> I=1
> I=2
> I=3
> J=0 S=0 E=1 W=a a=-1
> J=1 S=1 E=2 W=!NULL a=-2
> J=2 S=2 E=3 W=b a=-3
>
>
> test.arpa:
>
> \data\
> ngram 1=5
> ngram 2=5
>
> \1-grams:
> -99 -7.34882
> -2.10718 c -4.28966
> -4.77987 a -4.46041
> -5.81316 -7.34882
> -4.02326 b -2.07313
>
> \2-grams:
> -3.33947 c a
> -1.08518 c
> -4.58511 c b
> -0.000484286 a c
> -1.67833 b c
>
> \end\
From gtg781p at mail.gatech.edu Wed Jan 18 16:31:20 2006
From: gtg781p at mail.gatech.edu (Jinyu Li)
Date: Wed, 18 Jan 2006 19:31:20 -0500
Subject: Is there any SRILM tutorial?
Message-ID: <01f501c61c8f$ac0d6d60$0ae54dc7@eceint.gatech.edu>
Hello,
I am a new user of SRILM. I found that it was difficult for me to build a LM just with the command manual page on the SRILM website. Is there any SRILM tutorial for the new user? I appreciate your help a lot.
Thanks.
Jinyu
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From stolcke at speech.sri.com Wed Jan 18 16:56:53 2006
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Wed, 18 Jan 2006 16:56:53 PST
Subject: Is there any SRILM tutorial?
In-Reply-To: Your message of Wed, 18 Jan 2006 19:31:20 -0500.
<01f501c61c8f$ac0d6d60$0ae54dc7@eceint.gatech.edu>
Message-ID: <200601190056.k0J0urL16069@huge>
>
> I am a new user of SRILM. I found that it was difficult for me to build =
> a LM just with the command manual page on the SRILM website. Is there =
> any SRILM tutorial for the new user? I appreciate your help a lot.
There is $SRILM/doc/lm-intro, but it's not much.
There is some stuff on the web, like
http://ssli.ee.washington.edu/ssli/people/sarahs/srilm.html .
SRILM users, here is your chance to attain fame, if not fortune!
Write a good intro to language modeling based on SRILM ...
If anybody know of such a document please share it.
--Andreas
>
> Thanks.
> Jinyu
From barabbas at gmail.com Sat Jan 21 01:15:18 2006
From: barabbas at gmail.com (Barabbas Jiang@Gmail)
Date: Sat, 21 Jan 2006 17:15:18 +0800
Subject: Is there any SRILM tutorial?
In-Reply-To: <200601190056.k0J0urL16069@huge>
References: <200601190056.k0J0urL16069@huge>
Message-ID: <43D1FBA6.9090401@gmail.com>
For Chinese users, I suggest give these lectures a chance:
http://berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing/Slides/SP2003F_Lecture12_LM%20Training%20Toolkit%20-SRILM.pdf
http://berlin.csie.ntnu.edu.tw/Courses/2004F-SpeechRecognition/Slides/SP2004F_Lecture06-02_SRILM%20Toolkit.pdf
> SRILM users, here is your chance to attain fame, if not fortune!
> Write a good intro to language modeling based on SRILM ...
> If anybody know of such a document please share it.
>
> --Andreas
>
--
/Tian-Jian "Barabbas" Jiang A.K.A Tai-Ming/
*Research Assistant*
_Institute of Information Science_
_Academia SINICA_
*Doctoral Student*
_Department of Computer Science_
_National Tsing-Hua University_
From George.Foster at cnrc-nrc.gc.ca Sat Jan 21 07:39:28 2006
From: George.Foster at cnrc-nrc.gc.ca (Foster, George)
Date: Sat, 21 Jan 2006 10:39:28 -0500
Subject: Trouble building srilm project (macosx)
References: <4aa50c3a8498b5fbc113151eff9ec9ee@gmail.com>
Message-ID:
Peter,
I'm running a version of OSX 10.3.9 and gcc that's almost identical to yours:
% uname -a
Darwin 0547-CRTL.local 7.9.0 Darwin Kernel Version 7.9.0: Wed Mar 30 20:11:17 PST 2005; root:xnu/xnu-517.12.7.obj~1/RELEASE_PPC Power Macintosh powerpc
% gcc -v
Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs
Thread model: posix
gcc version 3.3 20030304 (Apple Computer, Inc. build 1495)
I had no problems building srilm 1.4.5 using the INSTALL instructions
(no code changes necessary).
George
-----Original Message-----
From: owner-srilm-user at speech.sri.com on behalf of P McIlroy
Sent: Mon 11/28/2005 11:39 PM
To: srilm-user at speech.sri.com
Subject: Trouble building srilm project (macosx)
I joined this group in hopes of finding some help on the MacOS compile.
I'm getting a handful of uninstantiated templates in the out-of-the-box
compile on OS X 10.3.9, with compiler version:
gcc version 3.3 20030304 (Apple Computer, Inc. build 1666)
The undefined functions are, in various executables:
FNgramCounts::FNgramCounts(FactoredVocab&,
FNgramSpecs&)
Map2::clear()
NgramCounts::NgramCounts(Vocab&, unsigned int)
I was able to 'fix' ngram-counts by adding
#include "NgramStats.cc"
to the main program.
The same fix does not work for ngram.cc.
Additional investigation shows that the instance file
NGramStatsInt.o
does not include an instantiation of the NGramCount constructor.
Other attempts like adding this to the main program:
static template NGramCounts;
lead to multiple definition errors in the linker.
Is there a known configuration or compiler option that works on OS X
10.3? Or will upgrading to 10.4 fix the problems?
thanks,
Peter McIlroy
Begin forwarded message:
> From: P McIlroy
> Date: November 28, 2005 3:17:14 PM PST
> To: stolcke at speech.sri.com
> Subject: Fwd: Trouble building your srilm project (macosx)
>
> I was able to compile one of the failed executables (ngram-count), but
> it required adding
>
> #include "NgramStats.cc"
>
> to the end of the list of includes in the main source file
> ngram-stats.cc. (This is not the preferred way to force compilation
> of templates, but it's working for now.) I also tried forcing
> instantiation by creating a NgramStats_inst.cc file, but this led to
> horrible multiple definitions.
>
> I'm still getting a warning for multiple definitions of _qsort(), but
> I don't think this is a problem.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From rus at sonic.net Sat Jan 21 10:48:10 2006
From: rus at sonic.net (Russell Sheptak)
Date: Sat, 21 Jan 2006 10:48:10 -0800
Subject: Trouble building srilm project (macosx)
In-Reply-To: <4aa50c3a8498b5fbc113151eff9ec9ee@gmail.com>
References: <4aa50c3a8498b5fbc113151eff9ec9ee@gmail.com>
Message-ID: <2573d84b3d296226abcb363f9a2fc748@sonic.net>
Peter,
There's at least one newer version of the gcc 3.3 compiler available on
the Apple developer site (free download),. I suspect you're missing a
necessary patch since apple supplied 3-4 revisions to gcc 3.3 to fix
bugs before switching to gcc 4. My version is gcc 3.3 20030304 buld
1671, so slightly later than yours, and I had no problem building SRILM
as downloaded and described in the install document on MacOS X 10.3.9.
I ran into one problem on the self test. You're going to run out of
open file descriptors and the ngram tests will fail as a result.
You'll need to reconfigure your kernel to allow more open file
descriptors and reboot, and the self test will pass.
rus
On Nov 28, 2005, at 8:39 PM, P McIlroy wrote:
> I joined this group in hopes of finding some help on the MacOS compile.
>
> I'm getting a handful of uninstantiated templates in the
> out-of-the-box compile on OS X 10.3.9, with compiler version:
>
> gcc version 3.3 20030304 (Apple Computer, Inc. build 1666)
>
> The undefined functions are, in various executables:
>
> FNgramCounts::FNgramCounts(FactoredVocab&,
> FNgramSpecs&)
> Map2::clear()
> NgramCounts::NgramCounts(Vocab&, unsigned int)
>
> I was able to 'fix' ngram-counts by adding
>
> #include "NgramStats.cc"
>
> to the main program.
>
> The same fix does not work for ngram.cc.
>
> Additional investigation shows that the instance file
>
> NGramStatsInt.o
>
> does not include an instantiation of the NGramCount constructor.
>
> Other attempts like adding this to the main program:
>
> static template NGramCounts;
>
> lead to multiple definition errors in the linker.
>
> Is there a known configuration or compiler option that works on OS X
> 10.3? Or will upgrading to 10.4 fix the problems?
>
> thanks,
>
> Peter McIlroy
>
>
>
> Begin forwarded message:
>
>> From: P McIlroy
>> Date: November 28, 2005 3:17:14 PM PST
>> To: stolcke at speech.sri.com
>> Subject: Fwd: Trouble building your srilm project (macosx)
>>
>> I was able to compile one of the failed executables (ngram-count),
>> but it required adding
>>
>> #include "NgramStats.cc"
>>
>> to the end of the list of includes in the main source file
>> ngram-stats.cc. (This is not the preferred way to force compilation
>> of templates, but it's working for now.) I also tried forcing
>> instantiation by creating a NgramStats_inst.cc file, but this led to
>> horrible multiple definitions.
>>
>> I'm still getting a warning for multiple definitions of _qsort(), but
>> I don't think this is a problem.
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 2819 bytes
Desc: not available
URL:
From rix at netcabo.pt Mon Feb 6 07:39:45 2006
From: rix at netcabo.pt (rix)
Date: Mon, 6 Feb 2006 15:39:45 -0000
Subject: limitations in ngram-merge
Message-ID: <159323335F97074D9A594D676652B06754A028@VS3.hdi.tvcabo>
Hi
We are currently having a problem with the merging of count files using ngram-merge.
It seems that there is a limitation in the size of the resulting file of 2GB.
Can you give us some information if this is limitation is due to the program or if it is a limitation due to the configuration of our system. We are running ngram-merge in a PIV 2,66GHz 1GB RAM in Suse 10.0.
Best regards
Ricardo Nunes
Luis Neves
From stolcke at speech.sri.com Wed Feb 8 23:18:03 2006
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Wed, 08 Feb 2006 23:18:03 PST
Subject: limitations in ngram-merge
In-Reply-To: Your message of Mon, 06 Feb 2006 15:39:45 +0000.
<159323335F97074D9A594D676652B06754A028@VS3.hdi.tvcabo>
Message-ID: <200602090718.k197I3F07068@huge>
In message <159323335F97074D9A594D676652B06754A028 at VS3.hdi.tvcabo>you wrote:
> Hi
>
> We are currently having a problem with the merging of count files using ngram
> -merge.
> It seems that there is a limitation in the size of the resulting file of 2GB.
> Can you give us some information if this is limitation is due to the program
> or if it is a limitation due to the configuration of our system. We are runni
> ng ngram-merge in a PIV 2,66GHz 1GB RAM in Suse 10.0.
It's probably an OS limitation. SRILM uses level-2 I/O functions
(see fopen(3)).
We have certainly handled files larger than 2 GB on our Linux machines.
But those files that are usually gzipped (ending in .gz). SRILM
doesn't read or write those directly, since the I/O is to a pipe
that talks to the gzip program. Maybe you can try using gzipped files
in your case too.
--Andreas
From hassan at mimos.my Tue Feb 21 22:38:46 2006
From: hassan at mimos.my (Hassan Mohamed)
Date: Wed, 22 Feb 2006 14:38:46 +0800 (MYT)
Subject: Compile SRILM in Knoppix 4.0
Message-ID: <6876495.1140590326329.SLOX.WebMail.wwwrun@openx.mimos.my>
As there any person who has successfully compiled and installed the latest version of SRILM in Knoppix 4.0. Share your experience :)
I got the problem to compile the code. Previously I can compile it under red hat 9.
From stolcke at speech.sri.com Mon Mar 27 10:27:58 2006
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Mon, 27 Mar 2006 10:27:58 PST
Subject: Info on class-based LMs requested
Message-ID: <200603271827.k2RIRwe06042@huge>
Can someone give him some hints?
Thanks
--Andreas
-------------- next part --------------
An embedded message was scrubbed...
From: Suha Kwak
Subject: SRILM user - I want to obtain more information!!
Date: Mon, 27 Mar 2006 19:35:32 +0900 (KST)
Size: 5380
URL: