From stolcke at speech.sri.com Fri Jan 6 14:43:54 2006 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Fri, 06 Jan 2006 14:43:54 PST Subject: Another lattice rescoring problem In-Reply-To: Your message of Tue, 18 Oct 2005 15:02:36 +0300. <4354E45C.706@hut.fi> Message-ID: <200601062243.OAA03798@tonga> Teemu, thanks for pointing out this bug, and sorry for taking so long to get back to you. In the process of fixing this I actual found a number of issues, one of them affecting the lattice-tool -ppl function itself. As you found out, the -no-nulls option masks the problem to some extent, and that was the reason I hadn't noticed these problems before. That should all be fixed now. You can download the current beta version from the web site. If no further problems surface in the coming weeks I'll release this as the next version. --Andreas In message <4354E45C.706 at hut.fi>you wrote: > I ran into another problem with the lattice rescoring. I have two > simple HTK lattices (acoustic log-probabilities in parentheses): > > test0.htk: > > a(-1) --+--> c(-2) ------+--> b(-3) > | | > +--> !NULL(-2) --+ > > test1.htk: > > a(-1) -----> !NULL(-2) -----> b(-3) > > If I rescore the above lattices with a simple 2-gram language model > test.arpa (see the end of the mail for the example files), the > language model probability of the path "a b" is computed incorrectl > y > for the first lattice. In the second case, the probability is > correct: > > $ echo "a b" | lattice-tool -in-lattice test0.htk -read-htk \ > -lm test.arpa -ppl - -debug 2 > ... > p( a | ) = [10] 7.43548e-13 [ -12.1287 ] > p( b | a ...) = [16] 8.00959e-07 [ -6.09639 ] > p( | b ...) = [9] 6.88685e-14 [ -13.162 ] > 0 zeroprobs, logprob= -31.3871 ppl= 2.8997e+10 ppl1= 4.93776e+15 > > $ echo "a b" | lattice-tool -in-lattice test1.htk -read-htk \ > -lm test.arpa -ppl - -debug 2 > ... > p( a | ) = [9] 2.57573e-17 [ -16.5891 ] > p( b | a ...) = [13] 8.00959e-07 [ -6.09639 ] > p( | b ...) = [8] 6.88685e-14 [ -13.162 ] > 0 zeroprobs, logprob= -35.8475 ppl= 8.89522e+11 ppl1= 8.38947e+17 > > It seems that the backoff probability BO(a) is missing from the fir > st > case. > > Next I tried to use the -no-nulls flag. Then I get correct languag > e > mode probabilities for both lattices, but the acoustic probability > is > incorrect, as the acoustic probability of the !NULL edge is discard > ed. > Should the general LM expansion handle !NULL edges correctly? > > I also tried changing the !NULL words to a distinct word symbol and > specifying it with the -ignore-vocab flag to lattice-tool (tried > versions 1.4.5 and 1.4.6 beta). Then the acoustic probabilities ar > e > preserved nicely, but again the backoff probability BO(a) is missin > g > from the first rescored lattice. > > Did I miss something again, or is the above expected behaviour? > > -Teemu > > > Here are the example files: > > test0.htk: > > VERSION=1.1 > base=10 > dir=f > lmscale=1 wdpenalty=0 > start=0 end=3 > N=4 L=4 > I=0 > I=1 > I=2 > I=3 > J=0 S=0 E=1 W=a a=-1 > J=1 S=1 E=2 W=!NULL a=-2 > J=2 S=1 E=2 W=c a=-2 > J=3 S=2 E=3 W=b a=-3 > > > test1.htk: > > VERSION=1.1 > base=10 > dir=f > lmscale=1 wdpenalty=0 > start=0 end=3 > N=4 L=3 > I=0 > I=1 > I=2 > I=3 > J=0 S=0 E=1 W=a a=-1 > J=1 S=1 E=2 W=!NULL a=-2 > J=2 S=2 E=3 W=b a=-3 > > > test.arpa: > > \data\ > ngram 1=5 > ngram 2=5 > > \1-grams: > -99 -7.34882 > -2.10718 c -4.28966 > -4.77987 a -4.46041 > -5.81316 -7.34882 > -4.02326 b -2.07313 > > \2-grams: > -3.33947 c a > -1.08518 c > -4.58511 c b > -0.000484286 a c > -1.67833 b c > > \end\ From gtg781p at mail.gatech.edu Wed Jan 18 16:31:20 2006 From: gtg781p at mail.gatech.edu (Jinyu Li) Date: Wed, 18 Jan 2006 19:31:20 -0500 Subject: Is there any SRILM tutorial? Message-ID: <01f501c61c8f$ac0d6d60$0ae54dc7@eceint.gatech.edu> Hello, I am a new user of SRILM. I found that it was difficult for me to build a LM just with the command manual page on the SRILM website. Is there any SRILM tutorial for the new user? I appreciate your help a lot. Thanks. Jinyu -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at speech.sri.com Wed Jan 18 16:56:53 2006 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Wed, 18 Jan 2006 16:56:53 PST Subject: Is there any SRILM tutorial? In-Reply-To: Your message of Wed, 18 Jan 2006 19:31:20 -0500. <01f501c61c8f$ac0d6d60$0ae54dc7@eceint.gatech.edu> Message-ID: <200601190056.k0J0urL16069@huge> > > I am a new user of SRILM. I found that it was difficult for me to build = > a LM just with the command manual page on the SRILM website. Is there = > any SRILM tutorial for the new user? I appreciate your help a lot. There is $SRILM/doc/lm-intro, but it's not much. There is some stuff on the web, like http://ssli.ee.washington.edu/ssli/people/sarahs/srilm.html . SRILM users, here is your chance to attain fame, if not fortune! Write a good intro to language modeling based on SRILM ... If anybody know of such a document please share it. --Andreas > > Thanks. > Jinyu From barabbas at gmail.com Sat Jan 21 01:15:18 2006 From: barabbas at gmail.com (Barabbas Jiang@Gmail) Date: Sat, 21 Jan 2006 17:15:18 +0800 Subject: Is there any SRILM tutorial? In-Reply-To: <200601190056.k0J0urL16069@huge> References: <200601190056.k0J0urL16069@huge> Message-ID: <43D1FBA6.9090401@gmail.com> For Chinese users, I suggest give these lectures a chance: http://berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing/Slides/SP2003F_Lecture12_LM%20Training%20Toolkit%20-SRILM.pdf http://berlin.csie.ntnu.edu.tw/Courses/2004F-SpeechRecognition/Slides/SP2004F_Lecture06-02_SRILM%20Toolkit.pdf > SRILM users, here is your chance to attain fame, if not fortune! > Write a good intro to language modeling based on SRILM ... > If anybody know of such a document please share it. > > --Andreas > -- /Tian-Jian "Barabbas" Jiang A.K.A Tai-Ming/ *Research Assistant* _Institute of Information Science_ _Academia SINICA_ *Doctoral Student* _Department of Computer Science_ _National Tsing-Hua University_ From George.Foster at cnrc-nrc.gc.ca Sat Jan 21 07:39:28 2006 From: George.Foster at cnrc-nrc.gc.ca (Foster, George) Date: Sat, 21 Jan 2006 10:39:28 -0500 Subject: Trouble building srilm project (macosx) References: <4aa50c3a8498b5fbc113151eff9ec9ee@gmail.com> Message-ID: Peter, I'm running a version of OSX 10.3.9 and gcc that's almost identical to yours: % uname -a Darwin 0547-CRTL.local 7.9.0 Darwin Kernel Version 7.9.0: Wed Mar 30 20:11:17 PST 2005; root:xnu/xnu-517.12.7.obj~1/RELEASE_PPC Power Macintosh powerpc % gcc -v Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs Thread model: posix gcc version 3.3 20030304 (Apple Computer, Inc. build 1495) I had no problems building srilm 1.4.5 using the INSTALL instructions (no code changes necessary). George -----Original Message----- From: owner-srilm-user at speech.sri.com on behalf of P McIlroy Sent: Mon 11/28/2005 11:39 PM To: srilm-user at speech.sri.com Subject: Trouble building srilm project (macosx) I joined this group in hopes of finding some help on the MacOS compile. I'm getting a handful of uninstantiated templates in the out-of-the-box compile on OS X 10.3.9, with compiler version: gcc version 3.3 20030304 (Apple Computer, Inc. build 1666) The undefined functions are, in various executables: FNgramCounts::FNgramCounts(FactoredVocab&, FNgramSpecs&) Map2::clear() NgramCounts::NgramCounts(Vocab&, unsigned int) I was able to 'fix' ngram-counts by adding #include "NgramStats.cc" to the main program. The same fix does not work for ngram.cc. Additional investigation shows that the instance file NGramStatsInt.o does not include an instantiation of the NGramCount constructor. Other attempts like adding this to the main program: static template NGramCounts; lead to multiple definition errors in the linker. Is there a known configuration or compiler option that works on OS X 10.3? Or will upgrading to 10.4 fix the problems? thanks, Peter McIlroy Begin forwarded message: > From: P McIlroy > Date: November 28, 2005 3:17:14 PM PST > To: stolcke at speech.sri.com > Subject: Fwd: Trouble building your srilm project (macosx) > > I was able to compile one of the failed executables (ngram-count), but > it required adding > > #include "NgramStats.cc" > > to the end of the list of includes in the main source file > ngram-stats.cc. (This is not the preferred way to force compilation > of templates, but it's working for now.) I also tried forcing > instantiation by creating a NgramStats_inst.cc file, but this led to > horrible multiple definitions. > > I'm still getting a warning for multiple definitions of _qsort(), but > I don't think this is a problem. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rus at sonic.net Sat Jan 21 10:48:10 2006 From: rus at sonic.net (Russell Sheptak) Date: Sat, 21 Jan 2006 10:48:10 -0800 Subject: Trouble building srilm project (macosx) In-Reply-To: <4aa50c3a8498b5fbc113151eff9ec9ee@gmail.com> References: <4aa50c3a8498b5fbc113151eff9ec9ee@gmail.com> Message-ID: <2573d84b3d296226abcb363f9a2fc748@sonic.net> Peter, There's at least one newer version of the gcc 3.3 compiler available on the Apple developer site (free download),. I suspect you're missing a necessary patch since apple supplied 3-4 revisions to gcc 3.3 to fix bugs before switching to gcc 4. My version is gcc 3.3 20030304 buld 1671, so slightly later than yours, and I had no problem building SRILM as downloaded and described in the install document on MacOS X 10.3.9. I ran into one problem on the self test. You're going to run out of open file descriptors and the ngram tests will fail as a result. You'll need to reconfigure your kernel to allow more open file descriptors and reboot, and the self test will pass. rus On Nov 28, 2005, at 8:39 PM, P McIlroy wrote: > I joined this group in hopes of finding some help on the MacOS compile. > > I'm getting a handful of uninstantiated templates in the > out-of-the-box compile on OS X 10.3.9, with compiler version: > > gcc version 3.3 20030304 (Apple Computer, Inc. build 1666) > > The undefined functions are, in various executables: > > FNgramCounts::FNgramCounts(FactoredVocab&, > FNgramSpecs&) > Map2::clear() > NgramCounts::NgramCounts(Vocab&, unsigned int) > > I was able to 'fix' ngram-counts by adding > > #include "NgramStats.cc" > > to the main program. > > The same fix does not work for ngram.cc. > > Additional investigation shows that the instance file > > NGramStatsInt.o > > does not include an instantiation of the NGramCount constructor. > > Other attempts like adding this to the main program: > > static template NGramCounts; > > lead to multiple definition errors in the linker. > > Is there a known configuration or compiler option that works on OS X > 10.3? Or will upgrading to 10.4 fix the problems? > > thanks, > > Peter McIlroy > > > > Begin forwarded message: > >> From: P McIlroy >> Date: November 28, 2005 3:17:14 PM PST >> To: stolcke at speech.sri.com >> Subject: Fwd: Trouble building your srilm project (macosx) >> >> I was able to compile one of the failed executables (ngram-count), >> but it required adding >> >> #include "NgramStats.cc" >> >> to the end of the list of includes in the main source file >> ngram-stats.cc. (This is not the preferred way to force compilation >> of templates, but it's working for now.) I also tried forcing >> instantiation by creating a NgramStats_inst.cc file, but this led to >> horrible multiple definitions. >> >> I'm still getting a warning for multiple definitions of _qsort(), but >> I don't think this is a problem. >> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 2819 bytes Desc: not available URL: From rix at netcabo.pt Mon Feb 6 07:39:45 2006 From: rix at netcabo.pt (rix) Date: Mon, 6 Feb 2006 15:39:45 -0000 Subject: limitations in ngram-merge Message-ID: <159323335F97074D9A594D676652B06754A028@VS3.hdi.tvcabo> Hi We are currently having a problem with the merging of count files using ngram-merge. It seems that there is a limitation in the size of the resulting file of 2GB. Can you give us some information if this is limitation is due to the program or if it is a limitation due to the configuration of our system. We are running ngram-merge in a PIV 2,66GHz 1GB RAM in Suse 10.0. Best regards Ricardo Nunes Luis Neves From stolcke at speech.sri.com Wed Feb 8 23:18:03 2006 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Wed, 08 Feb 2006 23:18:03 PST Subject: limitations in ngram-merge In-Reply-To: Your message of Mon, 06 Feb 2006 15:39:45 +0000. <159323335F97074D9A594D676652B06754A028@VS3.hdi.tvcabo> Message-ID: <200602090718.k197I3F07068@huge> In message <159323335F97074D9A594D676652B06754A028 at VS3.hdi.tvcabo>you wrote: > Hi > > We are currently having a problem with the merging of count files using ngram > -merge. > It seems that there is a limitation in the size of the resulting file of 2GB. > Can you give us some information if this is limitation is due to the program > or if it is a limitation due to the configuration of our system. We are runni > ng ngram-merge in a PIV 2,66GHz 1GB RAM in Suse 10.0. It's probably an OS limitation. SRILM uses level-2 I/O functions (see fopen(3)). We have certainly handled files larger than 2 GB on our Linux machines. But those files that are usually gzipped (ending in .gz). SRILM doesn't read or write those directly, since the I/O is to a pipe that talks to the gzip program. Maybe you can try using gzipped files in your case too. --Andreas From hassan at mimos.my Tue Feb 21 22:38:46 2006 From: hassan at mimos.my (Hassan Mohamed) Date: Wed, 22 Feb 2006 14:38:46 +0800 (MYT) Subject: Compile SRILM in Knoppix 4.0 Message-ID: <6876495.1140590326329.SLOX.WebMail.wwwrun@openx.mimos.my> As there any person who has successfully compiled and installed the latest version of SRILM in Knoppix 4.0. Share your experience :) I got the problem to compile the code. Previously I can compile it under red hat 9. From stolcke at speech.sri.com Mon Mar 27 10:27:58 2006 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Mon, 27 Mar 2006 10:27:58 PST Subject: Info on class-based LMs requested Message-ID: <200603271827.k2RIRwe06042@huge> Can someone give him some hints? Thanks --Andreas -------------- next part -------------- An embedded message was scrubbed... From: Suha Kwak Subject: SRILM user - I want to obtain more information!! Date: Mon, 27 Mar 2006 19:35:32 +0900 (KST) Size: 5380 URL: