From stolcke at speech.sri.com Thu Jan 20 17:27:09 2011 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Thu, 20 Jan 2011 17:27:09 -0800 Subject: [SRILM User List] SRILM 1.5.12 released Message-ID: <201101210127.p0L1R9M12432@huge> The latest version of SRILM is now available from http://www.speech.sri.com/projects/srilm/download.html . A list of changes appears below. Enjoy, Andreas 1.5.12 20 Jan 2011 Functionality: * Enable lattice-tool -old-decoding if -nbest-duplicates is specified (and warn about it). * Support make-big-lm -wbdiscount option. * New option ngram -prune-history-lm, for specifying a separate LM that computes the history marginal probablities needed for N-gram pruning purposes. Inspired by C. Chelba et al., "Study on Interaction Between Entropy Pruning and Kneser-Ney Smoothing", Proc. Interspeech-2010. * Added optional limitVocab argument to VocabMultiMap::read() function. This is now used by lattice-tool -limit-vocab to avoid reading parts of the dictionary that are not used in the input. * Added an option -zeroprob-word to ngram and lattice-tool. It specifies a word that should be used as a replacement if the current word has probability zero. This is different from -map-unk which only applies to OOV words and actually replaces the word label in the output lattice, if any. * Added new wrapper LM class NonzeroLM, to implement the above. Portability: * New MACHINE_TYPE values for Android-ARM platform: android-armeabi and android-armeabi-v7a (from Mike Frandsen). * Deleted the htk directory from distribution; it was obsolete and not documented. Bug fixes: * Prob.h: guard against under/overflow in intlog and bytelog conversions. * Replaced gunzip with gzip -d in all scripts (for efficiency). * Better option checking in make-big-lm, disallowing mixing of discounting methods and use of discounting flags that are not supported. * Undefine max() macro in Trellis.h to avoid conflict with some system header files. * Better support for recent MSVC versions in common/Makefile.machine.msvc (from Mile Frandsen). * add-pauses-to-pfsg: prevent existing pause nodes from being processed. From suzuki at ks.cs.titech.ac.jp Mon Jan 24 00:10:50 2011 From: suzuki at ks.cs.titech.ac.jp (suzuki yasuo) Date: Mon, 24 Jan 2011 17:10:50 +0900 Subject: [SRILM User List] Question about output of "ngram -ppl -debug 2" for class-based LM model Message-ID: <20110124171050.d87347e1.suzuki@ks.cs.titech.ac.jp> Hello, all. I made a class LM(bigram) and caluculated ppl of some testdata by this command in shell script, "ngram -order 2 -lm ${CLASS_LM_NAME} -ppl ${TEST} -debug 2 -classes ${CLASS_FILE}". I can get output of -debug 2. A part of that is like this.. The term is generally applied to behavior within civil governments , but politics has been observed in other grou p interactions , including corporate , academic , and religious institutions . p( The | ) = [OOV][2gram] 0.00520962 [ -2.28319 ] p( term | The ...) = [OOV][1gram][OOV][2gram] 0.000536365 [ -3.27054 ] p( is | term ...) = [OOV][1gram][OOV][2gram] 0.0139987 [ -1.85391 ] p( generally | is ...) = [OOV][1gram][OOV][2gram] 0.000171588 [ -3.76551 ] p( applied | generally ...) = [OOV][1gram][OOV][2gram] 0.000122932 [ -3.91033 ] p( to | applied ...) = [OOV][1gram][OOV][2gram] 0.0811208 [ -1.09087 ] p( behavior | to ...) = [OOV][1gram][OOV][2gram] 6.12967e-05 [ -4.21256 ] p( within | behavior ...) = [OOV][1gram][OOV][2gram] 0.000763519 [ -3.11718 ] p( civil | within ...) = [OOV][1gram][OOV][2gram] 4.96081e-05 [ -4.30445 ] p( | civil ...) = [1gram][1gram] 0.0156937 [ -1.80427 ] p( , | ...) = [OOV][1gram] 0.0149661 [ -1.82489 ] p( but | , ...) = [OOV][1gram][OOV][2gram] 0.00500311 [ -2.30076 ] p( politics | but ...) = [OOV][1gram][OOV][2gram] 4.8048e-05 [ -4.31833 ] p( has | politics ...) = [OOV][1gram][OOV][1gram] 0.000661878 [ -3.17922 ] p( been | has ...) = [OOV][1gram][OOV][2gram] 0.00721624 [ -2.14169 ] p( observed | been ...) = [OOV][1gram][OOV][1gram] 1.12884e-05 [ -4.94737 ] p( in | observed ...) = [OOV][1gram][1gram][OOV][2gram][1gram] 0.0144335 [ -1.84063 ] p( other | in ...) = [OOV][1gram][OOV][2gram][OOV][2gram] 0.00162061 [ -2.79032 ] p( group | other ...) = [OOV][1gram][OOV][2gram] 0.000567602 [ -3.24596 ] p( | group ...) = [1gram][1gram] 0.0150167 [ -1.82343 ] p( , | ...) = [OOV][1gram] 0.0149661 [ -1.82489 ] p( including | , ...) = [OOV][1gram][OOV][2gram] 0.000755534 [ -3.12175 ] p( corporate | including ...) = [OOV][1gram][OOV][2gram] 5.59105e-05 [ -4.25251 ] p( , | corporate ...) = [OOV][1gram][OOV][1gram] 0.0222226 [ -1.65321 ] p( academic | , ...) = [OOV][1gram][OOV][2gram] 4.36976e-05 [ -4.35954 ] p( , | academic ...) = [OOV][1gram][OOV][1gram] 0.0222226 [ -1.65321 ] p( and | , ...) = [OOV][1gram][OOV][2gram] 0.0787025 [ -1.10401 ] p( religious | and ...) = [OOV][1gram][OOV][2gram] 6.80949e-05 [ -4.16689 ] p( institutions | religious ...) = [OOV][1gram][OOV][2gram] 0.000141801 [ -3.84832 ] p( . | institutions ...) = [OOV][1gram][OOV][2gram] 0.0110882 [ -1.95514 ] p( | . ...) = [1gram][2gram] 0.979002 [ -0.00921631 ] 1 sentences, 30 words, 0 OOVs 0 zeroprobs, logprob= -85.9741 ppl= 593.414 ppl1= 734.18 I can understand how these probs were caluculated for most of the lines, but I can't analyze this line p( in | observed ...) = [OOV][1gram][1gram][OOV][2gram][1gram] 0.0144335 [ -1.84063 ] Will you tell me the meaning of this line? How this prob were caluculated from my class-based LM? -- Yasuo Suzuki 4th year undergrad at Shinoda Laboratory Department of Computer Science Tokyo Institute of Technology suzuki at ks.cs.titech.ac.jp From pawang.iitk at gmail.com Mon Jan 24 11:12:12 2011 From: pawang.iitk at gmail.com (Pawan Goyal) Date: Mon, 24 Jan 2011 19:12:12 +0000 Subject: [SRILM User List] error message while running make World Message-ID: Hi all, uname -a Linux pawan-laptop 2.6.32-28-generic #55-Ubuntu SMP Mon Jan 10 23:42:43 UTC 2011 x86_64 GNU/Linux gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) error message while make World ......................... /usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-linux-gnu/4.4.3/libstdc++.so when searching for -lstdc++ /usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-linux-gnu/4.4.3/libstdc++.a when searching for -lstdc++ /usr/bin/ld: cannot find -lstdc++ collect2: ld returned 1 exit status /home/pawan/Documents/PhD/summarization/srilm/sbin/decipher-install 0555 ../bin/i686/maxalloc ../../bin/i686 ERROR: File to be installed (../bin/i686/maxalloc) does not exist. ERROR: File to be installed (../bin/i686/maxalloc) is not a plain file. Usage: decipher-install ... mode: file permission mode, in octal file1 ... fileN: files to be installed directory: where the files should be installed .................................................................................. Thanks in advance Pawan -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at speech.sri.com Mon Jan 24 14:07:57 2011 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Mon, 24 Jan 2011 14:07:57 -0800 Subject: [SRILM User List] error message while running make World In-Reply-To: References: Message-ID: <4D3DF83D.30307@speech.sri.com> Pawan Goyal wrote: > Hi all, > > uname -a > Linux pawan-laptop 2.6.32-28-generic #55-Ubuntu SMP Mon Jan 10 > 23:42:43 UTC 2011 x86_64 GNU/Linux > > gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) The probable reason is that you're trying to compile 32-bit binaries (the default for the i686 platform), but you Ubuntu system doesn't have the required libraries installed (only the 64bit ones are installed on most systems). Two solutions: 1) install the optional 32bit binaries using commands such as apt-get install ia32-libs (if you want to build with Tcl support you'd also need the 32bit version of libtcl -- don't know the name of the package). 2) Compile 64bit binaries. You can copy common/Makefile.machine.i686-ubuntu to common/Makefile.machine.i686, or edit the file by hand. Andreas > > error message while > > make World > > ......................... > /usr/bin/ld: skipping incompatible > /usr/lib/gcc/x86_64-linux-gnu/4.4.3/libstdc++.so when searching for > -lstdc++ > /usr/bin/ld: skipping incompatible > /usr/lib/gcc/x86_64-linux-gnu/4.4.3/libstdc++.a when searching for > -lstdc++ > > /usr/bin/ld: cannot find -lstdc++ > collect2: ld returned 1 exit status > /home/pawan/Documents/PhD/summarization/srilm/sbin/decipher-install > 0555 ../bin/i686/maxalloc ../../bin/i686 > ERROR: File to be installed (../bin/i686/maxalloc) does not exist. > ERROR: File to be installed (../bin/i686/maxalloc) is not a plain file. > Usage: decipher-install ... > mode: file permission mode, in octal > file1 ... fileN: files to be installed > directory: where the files should be installed > > .................................................................................. > > Thanks in advance > Pawan > ------------------------------------------------------------------------ > > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user From stolcke at speech.sri.com Mon Jan 24 14:21:50 2011 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Mon, 24 Jan 2011 14:21:50 -0800 Subject: [SRILM User List] Question about output of "ngram -ppl -debug 2" for class-based LM model In-Reply-To: <20110124171050.d87347e1.suzuki@ks.cs.titech.ac.jp> References: <20110124171050.d87347e1.suzuki@ks.cs.titech.ac.jp> Message-ID: <4D3DFB7E.4030603@speech.sri.com> suzuki yasuo wrote: > Hello, all. > > I made a class LM(bigram) and caluculated ppl of some testdata by this command in shell script, > > "ngram -order 2 -lm ${CLASS_LM_NAME} -ppl ${TEST} -debug 2 -classes ${CLASS_FILE}". > > I can get output of -debug 2. A part of that is like this.. > > > The term is generally applied to behavior within civil governments , but politics has been observed in other grou > p interactions , including corporate , academic , and religious institutions . > p( The | ) = [OOV][2gram] 0.00520962 [ -2.28319 ] > p( term | The ...) = [OOV][1gram][OOV][2gram] 0.000536365 [ -3.27054 ] > p( is | term ...) = [OOV][1gram][OOV][2gram] 0.0139987 [ -1.85391 ] > p( generally | is ...) = [OOV][1gram][OOV][2gram] 0.000171588 [ -3.76551 ] > p( applied | generally ...) = [OOV][1gram][OOV][2gram] 0.000122932 [ -3.91033 ] > p( to | applied ...) = [OOV][1gram][OOV][2gram] 0.0811208 [ -1.09087 ] > p( behavior | to ...) = [OOV][1gram][OOV][2gram] 6.12967e-05 [ -4.21256 ] > p( within | behavior ...) = [OOV][1gram][OOV][2gram] 0.000763519 [ -3.11718 ] > p( civil | within ...) = [OOV][1gram][OOV][2gram] 4.96081e-05 [ -4.30445 ] > p( | civil ...) = [1gram][1gram] 0.0156937 [ -1.80427 ] > p( , | ...) = [OOV][1gram] 0.0149661 [ -1.82489 ] > p( but | , ...) = [OOV][1gram][OOV][2gram] 0.00500311 [ -2.30076 ] > p( politics | but ...) = [OOV][1gram][OOV][2gram] 4.8048e-05 [ -4.31833 ] > p( has | politics ...) = [OOV][1gram][OOV][1gram] 0.000661878 [ -3.17922 ] > p( been | has ...) = [OOV][1gram][OOV][2gram] 0.00721624 [ -2.14169 ] > p( observed | been ...) = [OOV][1gram][OOV][1gram] 1.12884e-05 [ -4.94737 ] > p( in | observed ...) = [OOV][1gram][1gram][OOV][2gram][1gram] 0.0144335 [ -1.84063 ] > p( other | in ...) = [OOV][1gram][OOV][2gram][OOV][2gram] 0.00162061 [ -2.79032 ] > p( group | other ...) = [OOV][1gram][OOV][2gram] 0.000567602 [ -3.24596 ] > p( | group ...) = [1gram][1gram] 0.0150167 [ -1.82343 ] > p( , | ...) = [OOV][1gram] 0.0149661 [ -1.82489 ] > p( including | , ...) = [OOV][1gram][OOV][2gram] 0.000755534 [ -3.12175 ] > p( corporate | including ...) = [OOV][1gram][OOV][2gram] 5.59105e-05 [ -4.25251 ] > p( , | corporate ...) = [OOV][1gram][OOV][1gram] 0.0222226 [ -1.65321 ] > p( academic | , ...) = [OOV][1gram][OOV][2gram] 4.36976e-05 [ -4.35954 ] > p( , | academic ...) = [OOV][1gram][OOV][1gram] 0.0222226 [ -1.65321 ] > p( and | , ...) = [OOV][1gram][OOV][2gram] 0.0787025 [ -1.10401 ] > p( religious | and ...) = [OOV][1gram][OOV][2gram] 6.80949e-05 [ -4.16689 ] > p( institutions | religious ...) = [OOV][1gram][OOV][2gram] 0.000141801 [ -3.84832 ] > p( . | institutions ...) = [OOV][1gram][OOV][2gram] 0.0110882 [ -1.95514 ] > p( | . ...) = [1gram][2gram] 0.979002 [ -0.00921631 ] > 1 sentences, 30 words, 0 OOVs > 0 zeroprobs, logprob= -85.9741 ppl= 593.414 ppl1= 734.18 > > I can understand how these probs were caluculated for most of the lines, but I can't analyze this line > > p( in | observed ...) = [OOV][1gram][1gram][OOV][2gram][1gram] 0.0144335 [ -1.84063 ] > > Will you tell me the meaning of this line? How this prob were caluculated from my class-based LM? > Each term in brackets [OOV] [1gram] ... corresponds to one way to parse the the word as part of a class expansion, as as a plain word. For example, you see p( The | ) = [OOV][2gram] 0.00520962 [ -2.28319 ] because first word could be generated by the LM as a bigram The, or as CLASS with "The" being a member of CLASS. I suspect your LM doesn't contain "The" as a vocabulary item independent of CLASS, hence the first parse yields the [OOV] label. One you get to the second word you have more ways to predict the next word, because now the history also has multiple parses. In general, the predicted probabilities for all parses are added up to arrive at the total conditional probability. So disable this type of processing (multiple parses) you can use the -simple-classes option, but that only works if word-class membership is unambiugous. Andreas -classes newlabels+spell.classes > > > From pawang.iitk at gmail.com Mon Jan 24 14:45:36 2011 From: pawang.iitk at gmail.com (Pawan Goyal) Date: Mon, 24 Jan 2011 22:45:36 +0000 Subject: [SRILM User List] error message while running make World In-Reply-To: <4D3DF83D.30307@speech.sri.com> References: <4D3DF83D.30307@speech.sri.com> Message-ID: Hi Andreas, Thanks for pointing out the incompatibly problem. I had the 32-bit binaries installed already, so tried the second option. I am not using tcl support, i.e. NO_TCL = X TCL_INCLUDE = TCL_LIBRARY = I am still getting the problems and sorry but not able to figure out the solution. Part of the error message during make World: .................................................................................................................. /usr/bin/g++ -march=athlon64 -m64 -Wall -Wno-unused-variable -Wno-uninitialized -DINSTANTIATE_TEMPLATES -D_FILE_OFFSET_BITS=64 -I. -I../../include -L../../lib/i686 -g -O3 -o ../bin/i686/maxalloc ../obj/i686/maxalloc.o ../obj/i686/libdstruct.a -lm -ldl ../../lib/i686/libmisc.a -lm 2>&1 | c++filt /usr/bin/ld: i386 architecture of input file `../obj/i686/maxalloc.o' is incompatible with i386:x86-64 output /usr/bin/ld: i386 architecture of input file `../../lib/i686/libmisc.a(option.o)' is incompatible with i386:x86-64 output collect2: ld returned 1 exit status /home/pawan/Documents/PhD/summarization/srilm/sbin/decipher-install 0555 ../bin/i686/maxalloc ../../bin/i686 ERROR: File to be installed (../bin/i686/maxalloc) does not exist. ERROR: File to be installed (../bin/i686/maxalloc) is not a plain file. Usage: decipher-install ... mode: file permission mode, in octal file1 ... fileN: files to be installed directory: where the files should be installed files = ../bin/i686/maxalloc directory = ../../bin/i686 mode = 0555 make[2]: [../../bin/i686/maxalloc] Error 1 (ignored) make[2]: Leaving directory `/home/pawan/Documents/PhD/summarization/srilm/dstruct/src' make[2]: Entering directory `/home/pawan/Documents/PhD/summarization/srilm/lm/src' /usr/bin/g++ -march=athlon64 -m64 -Wall -Wno-unused-variable -Wno-uninitialized -DINSTANTIATE_TEMPLATES -D_FILE_OFFSET_BITS=64 -I. -I../../include -u matherr -L../../lib/i686 -g -O3 -o ../bin/i686/ngram ../obj/i686/ngram.o ../obj/i686/liboolm.a -lm -ldl ../../lib/i686/libflm.a ../../lib/i686/libdstruct.a ../../lib/i686/libmisc.a -lm 2>&1 | c++filt collect2: ld terminated with signal 11 [Segmentation fault] /usr/bin/ld: i386 architecture of input file `../obj/i686/ngram.o' is incompatible with i386:x86-64 output /usr/bin/ld: i386 architecture of input file `../obj/i686/liboolm.a(matherr.o)' is incompatible with i386:x86-64 output ...................................................................................... /home/pawan/Documents/PhD/summarization/srilm/sbin/decipher-install 0555 ../bin/i686/ngram ../../bin/i686 ERROR: File to be installed (../bin/i686/ngram) does not exist. ERROR: File to be installed (../bin/i686/ngram) is not a plain file. Usage: decipher-install ... mode: file permission mode, in octal file1 ... fileN: files to be installed directory: where the files should be installed files = ../bin/i686/ngram directory = ../../bin/i686 mode = 0555 make[2]: [../../bin/i686/ngram] Error 1 (ignored) ............................................................................... Thanks Pawan On Mon, Jan 24, 2011 at 10:07 PM, Andreas Stolcke wrote: > Pawan Goyal wrote: > >> Hi all, >> >> uname -a >> Linux pawan-laptop 2.6.32-28-generic #55-Ubuntu SMP Mon Jan 10 23:42:43 >> UTC 2011 x86_64 GNU/Linux >> >> gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) >> > The probable reason is that you're trying to compile 32-bit binaries (the > default for the i686 platform), but you Ubuntu system doesn't have the > required libraries installed (only the 64bit ones are installed on most > systems). > > Two solutions: 1) install the optional 32bit binaries using commands such > as > > apt-get install ia32-libs > > (if you want to build with Tcl support you'd also need the 32bit version of > libtcl -- don't know the name of the package). > > 2) Compile 64bit binaries. You can copy > common/Makefile.machine.i686-ubuntu to common/Makefile.machine.i686, or edit > the file by hand. > > Andreas > > > >> error message while >> make World >> >> ......................... >> /usr/bin/ld: skipping incompatible >> /usr/lib/gcc/x86_64-linux-gnu/4.4.3/libstdc++.so when searching for -lstdc++ >> /usr/bin/ld: skipping incompatible >> /usr/lib/gcc/x86_64-linux-gnu/4.4.3/libstdc++.a when searching for -lstdc++ >> >> /usr/bin/ld: cannot find -lstdc++ >> collect2: ld returned 1 exit status >> /home/pawan/Documents/PhD/summarization/srilm/sbin/decipher-install 0555 >> ../bin/i686/maxalloc ../../bin/i686 >> ERROR: File to be installed (../bin/i686/maxalloc) does not exist. >> ERROR: File to be installed (../bin/i686/maxalloc) is not a plain file. >> Usage: decipher-install ... >> mode: file permission mode, in octal >> file1 ... fileN: files to be installed >> directory: where the files should be installed >> >> >> .................................................................................. >> >> Thanks in advance >> Pawan >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> SRILM-User site list >> SRILM-User at speech.sri.com >> http://www.speech.sri.com/mailman/listinfo/srilm-user >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at speech.sri.com Mon Jan 24 17:27:20 2011 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Mon, 24 Jan 2011 17:27:20 -0800 Subject: [SRILM User List] error message while running make World In-Reply-To: References: <4D3DF83D.30307@speech.sri.com> Message-ID: <4D3E26F8.6070307@speech.sri.com> Pawan Goyal wrote: > Hi Andreas, > > Thanks for pointing out the incompatibly problem. I had the 32-bit > binaries installed already, so tried the second option. I am not using > tcl support, i.e. > NO_TCL = X > TCL_INCLUDE = > TCL_LIBRARY = > I am still getting the problems and sorry but not able to figure out the solution. Part of the error message during make World: But now you are trying a 64bit compile! (look at your g++ options. That is fine, but you need to completely remove all old .o files because you cannot mix 32bit and 64bit .o files and libraries. Andreas > > .................................................................................................................. > /usr/bin/g++ -march=athlon64 -m64 -Wall -Wno-unused-variable -Wno-uninitialized -DINSTANTIATE_TEMPLATES -D_FILE_OFFSET_BITS=64 -I. -I../../include -L../../lib/i686 -g -O3 -o ../bin/i686/maxalloc ../obj/i686/maxalloc.o ../obj/i686/libdstruct.a -lm -ldl ../../lib/i686/libmisc.a -lm 2>&1 | c++filt > /usr/bin/ld: i386 architecture of input file `../obj/i686/maxalloc.o' is incompatible with i386:x86-64 output > /usr/bin/ld: i386 architecture of input file `../../lib/i686/libmisc.a(option.o)' is incompatible with i386:x86-64 output > collect2: ld returned 1 exit status > /home/pawan/Documents/PhD/summarization/srilm/sbin/decipher-install 0555 ../bin/i686/maxalloc ../../bin/i686 > ERROR: File to be installed (../bin/i686/maxalloc) does not exist. > ERROR: File to be installed (../bin/i686/maxalloc) is not a plain file. > Usage: decipher-install ... > mode: file permission mode, in octal > file1 ... fileN: files to be installed > directory: where the files should be installed > > files = ../bin/i686/maxalloc > directory = ../../bin/i686 > mode = 0555 > > make[2]: [../../bin/i686/maxalloc] Error 1 (ignored) > make[2]: Leaving directory `/home/pawan/Documents/PhD/summarization/srilm/dstruct/src' > make[2]: Entering directory `/home/pawan/Documents/PhD/summarization/srilm/lm/src' > /usr/bin/g++ -march=athlon64 -m64 -Wall -Wno-unused-variable -Wno-uninitialized -DINSTANTIATE_TEMPLATES -D_FILE_OFFSET_BITS=64 -I. -I../../include -u matherr -L../../lib/i686 -g -O3 -o ../bin/i686/ngram ../obj/i686/ngram.o ../obj/i686/liboolm.a -lm -ldl ../../lib/i686/libflm.a ../../lib/i686/libdstruct.a ../../lib/i686/libmisc.a -lm 2>&1 | c++filt > collect2: ld terminated with signal 11 [Segmentation fault] > /usr/bin/ld: i386 architecture of input file `../obj/i686/ngram.o' is incompatible with i386:x86-64 output > /usr/bin/ld: i386 architecture of input file `../obj/i686/liboolm.a(matherr.o)' is incompatible with i386:x86-64 output > ...................................................................................... > /home/pawan/Documents/PhD/summarization/srilm/sbin/decipher-install 0555 ../bin/i686/ngram ../../bin/i686 > ERROR: File to be installed (../bin/i686/ngram) does not exist. > ERROR: File to be installed (../bin/i686/ngram) is not a plain file. > Usage: decipher-install ... > mode: file permission mode, in octal > file1 ... fileN: files to be installed > directory: where the files should be installed > > files = ../bin/i686/ngram > directory = ../../bin/i686 > mode = 0555 > > make[2]: [../../bin/i686/ngram] Error 1 (ignored) > > ............................................................................... > > Thanks > Pawan > > On Mon, Jan 24, 2011 at 10:07 PM, Andreas Stolcke > > wrote: > > Pawan Goyal wrote: > > Hi all, > > uname -a > Linux pawan-laptop 2.6.32-28-generic #55-Ubuntu SMP Mon Jan 10 > 23:42:43 UTC 2011 x86_64 GNU/Linux > > gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) > > The probable reason is that you're trying to compile 32-bit > binaries (the default for the i686 platform), but you Ubuntu > system doesn't have the required libraries installed (only the > 64bit ones are installed on most systems). > > Two solutions: 1) install the optional 32bit binaries using > commands such as > > apt-get install ia32-libs > > (if you want to build with Tcl support you'd also need the 32bit > version of libtcl -- don't know the name of the package). > > 2) Compile 64bit binaries. You can copy > common/Makefile.machine.i686-ubuntu to > common/Makefile.machine.i686, or edit the file by hand. > > Andreas > > > > error message while > make World > > ......................... > /usr/bin/ld: skipping incompatible > /usr/lib/gcc/x86_64-linux-gnu/4.4.3/libstdc++.so when > searching for -lstdc++ > /usr/bin/ld: skipping incompatible > /usr/lib/gcc/x86_64-linux-gnu/4.4.3/libstdc++.a when searching > for -lstdc++ > > /usr/bin/ld: cannot find -lstdc++ > collect2: ld returned 1 exit status > /home/pawan/Documents/PhD/summarization/srilm/sbin/decipher-install > 0555 ../bin/i686/maxalloc ../../bin/i686 > ERROR: File to be installed (../bin/i686/maxalloc) does not > exist. > ERROR: File to be installed (../bin/i686/maxalloc) is not a > plain file. > Usage: decipher-install ... > mode: file permission mode, in octal > file1 ... fileN: files to be installed > directory: where the files should be installed > > .................................................................................. > > Thanks in advance > Pawan > ------------------------------------------------------------------------ > > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user > > > From mshamsuddeen2 at gmail.com Mon Jan 24 23:57:05 2011 From: mshamsuddeen2 at gmail.com (Muhammad Shamsuddeen Muhammad) Date: Tue, 25 Jan 2011 15:57:05 +0800 Subject: [SRILM User List] ngram-count missing Message-ID: I get this error when running the following command =>> integ at integ-desktop:~/tools/ demo$ ../../tools/srilm/bin/i686-gcc4/ngram-count -order 3 -interpolate -kndiscount -unk -text work/lm/news-commentary.lowercased.en -lm work/lm/news-commentary.lowercased.lm bash: ../../tools/srilm/bin/i686-gcc4/ngram-count: No such file or directory Up until this point, the installation process has been smooth, i have checked the directory and the file is no where to be found. Can anyone shed any light onto the situation and possibly guide me on how to fix it. Thanks in Advance. -- Muhammad Shamsuddeen Muhammad "There is no knowledge that is not power". -------------- next part -------------- An HTML attachment was scrubbed... URL: From pawang.iitk at gmail.com Tue Jan 25 03:03:58 2011 From: pawang.iitk at gmail.com (Pawan Goyal) Date: Tue, 25 Jan 2011 11:03:58 +0000 Subject: [SRILM User List] error message while running make World In-Reply-To: <4D3E26F8.6070307@speech.sri.com> References: <4D3DF83D.30307@speech.sri.com> <4D3E26F8.6070307@speech.sri.com> Message-ID: Thanks Andreas. I did everything from the start again and it was successful! Regards Pawan On Tue, Jan 25, 2011 at 1:27 AM, Andreas Stolcke wrote: > Pawan Goyal wrote: > >> Hi Andreas, >> >> Thanks for pointing out the incompatibly problem. I had the 32-bit >> binaries installed already, so tried the second option. I am not using tcl >> support, i.e. NO_TCL = X >> TCL_INCLUDE = TCL_LIBRARY = I am still getting the problems and >> sorry but not able to figure out the solution. Part of the error message >> during make World: >> > But now you are trying a 64bit compile! (look at your g++ options. > > That is fine, but you need to completely remove all old .o files because > you cannot mix 32bit and 64bit .o files and libraries. > > Andreas > >> >> .................................................................................................................. >> /usr/bin/g++ -march=athlon64 -m64 -Wall -Wno-unused-variable >> -Wno-uninitialized -DINSTANTIATE_TEMPLATES -D_FILE_OFFSET_BITS=64 -I. >> -I../../include -L../../lib/i686 -g -O3 -o ../bin/i686/maxalloc >> ../obj/i686/maxalloc.o ../obj/i686/libdstruct.a -lm -ldl >> ../../lib/i686/libmisc.a -lm 2>&1 | c++filt >> /usr/bin/ld: i386 architecture of input file `../obj/i686/maxalloc.o' is >> incompatible with i386:x86-64 output >> /usr/bin/ld: i386 architecture of input file >> `../../lib/i686/libmisc.a(option.o)' is incompatible with i386:x86-64 output >> collect2: ld returned 1 exit status >> /home/pawan/Documents/PhD/summarization/srilm/sbin/decipher-install 0555 >> ../bin/i686/maxalloc ../../bin/i686 >> ERROR: File to be installed (../bin/i686/maxalloc) does not exist. >> ERROR: File to be installed (../bin/i686/maxalloc) is not a plain file. >> Usage: decipher-install ... >> mode: file permission mode, in octal >> file1 ... fileN: files to be installed >> directory: where the files should be installed >> >> files = ../bin/i686/maxalloc >> directory = ../../bin/i686 >> mode = 0555 >> >> make[2]: [../../bin/i686/maxalloc] Error 1 (ignored) >> make[2]: Leaving directory >> `/home/pawan/Documents/PhD/summarization/srilm/dstruct/src' >> make[2]: Entering directory >> `/home/pawan/Documents/PhD/summarization/srilm/lm/src' >> /usr/bin/g++ -march=athlon64 -m64 -Wall -Wno-unused-variable >> -Wno-uninitialized -DINSTANTIATE_TEMPLATES -D_FILE_OFFSET_BITS=64 -I. >> -I../../include -u matherr -L../../lib/i686 -g -O3 -o ../bin/i686/ngram >> ../obj/i686/ngram.o ../obj/i686/liboolm.a -lm -ldl ../../lib/i686/libflm.a >> ../../lib/i686/libdstruct.a ../../lib/i686/libmisc.a -lm 2>&1 | c++filt >> collect2: ld terminated with signal 11 [Segmentation fault] >> /usr/bin/ld: i386 architecture of input file `../obj/i686/ngram.o' is >> incompatible with i386:x86-64 output >> /usr/bin/ld: i386 architecture of input file >> `../obj/i686/liboolm.a(matherr.o)' is incompatible with i386:x86-64 output >> >> ...................................................................................... >> /home/pawan/Documents/PhD/summarization/srilm/sbin/decipher-install 0555 >> ../bin/i686/ngram ../../bin/i686 >> ERROR: File to be installed (../bin/i686/ngram) does not exist. >> ERROR: File to be installed (../bin/i686/ngram) is not a plain file. >> Usage: decipher-install ... >> mode: file permission mode, in octal >> file1 ... fileN: files to be installed >> directory: where the files should be installed >> >> files = ../bin/i686/ngram >> directory = ../../bin/i686 >> mode = 0555 >> >> make[2]: [../../bin/i686/ngram] Error 1 (ignored) >> >> >> ............................................................................... >> >> Thanks Pawan >> >> On Mon, Jan 24, 2011 at 10:07 PM, Andreas Stolcke > stolcke at speech.sri.com>> wrote: >> >> Pawan Goyal wrote: >> >> Hi all, >> >> uname -a >> Linux pawan-laptop 2.6.32-28-generic #55-Ubuntu SMP Mon Jan 10 >> 23:42:43 UTC 2011 x86_64 GNU/Linux >> >> gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) >> >> The probable reason is that you're trying to compile 32-bit >> binaries (the default for the i686 platform), but you Ubuntu >> system doesn't have the required libraries installed (only the >> 64bit ones are installed on most systems). >> >> Two solutions: 1) install the optional 32bit binaries using >> commands such as >> >> apt-get install ia32-libs >> >> (if you want to build with Tcl support you'd also need the 32bit >> version of libtcl -- don't know the name of the package). >> >> 2) Compile 64bit binaries. You can copy >> common/Makefile.machine.i686-ubuntu to >> common/Makefile.machine.i686, or edit the file by hand. >> >> Andreas >> >> >> >> error message while >> make World >> >> ......................... >> /usr/bin/ld: skipping incompatible >> /usr/lib/gcc/x86_64-linux-gnu/4.4.3/libstdc++.so when >> searching for -lstdc++ >> /usr/bin/ld: skipping incompatible >> /usr/lib/gcc/x86_64-linux-gnu/4.4.3/libstdc++.a when searching >> for -lstdc++ >> >> /usr/bin/ld: cannot find -lstdc++ >> collect2: ld returned 1 exit status >> /home/pawan/Documents/PhD/summarization/srilm/sbin/decipher-install >> 0555 ../bin/i686/maxalloc ../../bin/i686 >> ERROR: File to be installed (../bin/i686/maxalloc) does not >> exist. >> ERROR: File to be installed (../bin/i686/maxalloc) is not a >> plain file. >> Usage: decipher-install ... >> mode: file permission mode, in octal >> file1 ... fileN: files to be installed >> directory: where the files should be installed >> >> >> .................................................................................. >> >> Thanks in advance >> Pawan >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> SRILM-User site list >> SRILM-User at speech.sri.com >> >> http://www.speech.sri.com/mailman/listinfo/srilm-user >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mehdi_hoseini at comp.iust.ac.ir Wed Jan 26 05:47:51 2011 From: mehdi_hoseini at comp.iust.ac.ir (Mehdi hoseini) Date: Wed, 26 Jan 2011 17:17:51 +0330 Subject: [SRILM User List] Problem in using Language model Message-ID: hi all I made a simple trigram in ARPA format with SRILM and I made a ASR with HTK. but I have problems with use this trigram model in HTK. Does anybody in here use language models in HTK? if so I have some questions to ask. best regards Mehdi Hoseini -------------- next part -------------- An HTML attachment was scrubbed... URL: From zeeshankhans at gmail.com Thu Jan 27 13:59:59 2011 From: zeeshankhans at gmail.com (zeeshan khan) Date: Thu, 27 Jan 2011 22:59:59 +0100 Subject: [SRILM User List] dynamic Loglinear mix for lattice rescoring Message-ID: Hi all, Is there a way to rescore htk lattices using dynamic log-linear interpolation of more than one language models, using SRILM. Ideally, the command should look like lattice-tool -read-htk -in-lattice $SRC_LATTICE -lm $LM_FILE -order $LM_ORDER -bayes 0 -lambda $LAMBDA -mix-lm $LM2_FILE -loglinear-mix -write-htk -out-lattice $TMP_TRG_LATTIC -unk -map-unk $UNK_WORD -keep-unk but there is no loglinear-mix option in lattice-tool, like in ngram. Thanks in advance, Zeeshan. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at speech.sri.com Fri Jan 28 21:49:48 2011 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Sat, 29 Jan 2011 00:49:48 -0500 Subject: [SRILM User List] lattice-tool related issues In-Reply-To: References: <201008032329.o73NTtj15149@huge> Message-ID: <4D43AA7C.6010106@speech.sri.com> Sorry for not responding earlier to this. The latest version has a new lattice-tool option : -zeroprob-word . It allows you to avoid assigning zero probabilities to OOV words without mapping them to . Andreas Anoop Deoras wrote: > > On Aug 3, 2010, at 7:29 PM, Andreas Stolcke wrote: > >> >> In message <5D1CA95A-F9E4-417E-B276-DE8056B3F254 at jhu.edu>you wrote: >>> Hello, >>> >>> I am trying to rescore htk lattices using lattice-tool and am >>> running into following issues: >>> >>> 1. I pass a 3gm language model and a vocabulary file to rescore the >>> lattice (encoding bigram information) and >>> then write back the updated and expanded lattice back in the htk >>> format. >>> >>> However, when I specify -unk and -keep-unk flags, the OOV words gets >>> mapped to unk without preserving the >>> original label. I was under the impression that -keep-unk would >>> preserve the label of the OOV word, but it does not do so. >> >> I just looked at the code, and it seems that -keep-unk is only >> implemented >> when reading HTK format lattices, not for PFSGs. >> Is that what you are using? >> >> If you are using HTK lattices then please prepare some small input data >> files that demonstrate the problem, and I can look into it when I get >> a chance. >> > > Hi Andreas, > > I am, infact, using HTK lattices. I was doing some debugging myself > and noticed > that when the rescoring LM is of the same order as that of the lattice > (i.e. if the > lattice expansion is not required), then -keep-unk works fine. When I > use a higher > order LM, it fails. I have uploaded the data at: > > > > Please run RescoreLattice.sh to process the HTK lattice file. I have > kept the > necessary vocabulary and trigram and bigram LM files too (Note: input > lattices > encodes bigram history and hence a trigram rescoring LM expands the > lattice) > > The word 'slash' is out of vocabulary. A bigram rescoring keeps it intact > while trigram rescoring maps it to > > >>> >>> 2. Before I rescore the lattice, I want to split some words (multiword >>> units). The multiwords are connected by an >>> underscore character. I hence use the flags, -split-multiwords -multi- >>> char _ >>> >>> All goes well, as long as I do not use -unk -keep-unk flag in >>> conjunction with -split-multiwords . If I use -unk -keep-unk flag >>> (for point 1 above) and also use -split-multiwords flags, then the >>> multiword functionality does not work moreover the OOV >>> words get mapped to . >>> >>> I should point out that the multi-word unit is NOT in my vocabulary >>> but after the split, all the individual words are found >>> in the vocabulary. Hence, I am suspecting that the functionality for >>> the flag -unk takes place before the splitting >>> and since no multiword unit is in the vocabulary, the -split- >>> multiwords functionality does not have >>> anything to split. >>> >>> I was wondering if there is anyway we can invoke split-multiword >>> functionality before mapping >>> unk words ? >> >> The way it works is that upon reading the lattice (before any operation >> on them), word labels are converted to integers. Normally a new word >> generates a new integer autoamtically, but with -unk and -keep-unk >> unknown words are mapped to the integer code. >> >> So therefore, the splitting won't work if the multiwords themselves >> are not in the vocabulary. >> >> A workaround is to do the multiword splitting in a separate processing >> pass, where lattice-tool is invoked WITHOUT -unk. >> >> Andreas > > Yes, that makes sense. Thank you. > > -Anoop > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user From stolcke at speech.sri.com Fri Jan 28 22:38:24 2011 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Fri, 28 Jan 2011 22:38:24 -0800 Subject: [SRILM User List] dynamic Loglinear mix for lattice rescoring In-Reply-To: Your message of Thu, 27 Jan 2011 22:59:59 +0100. Message-ID: <201101290638.p0T6cOM06163@huge> > > Hi all, > > Is there a way to rescore htk lattices using dynamic log-linear > interpolation of more than one language models, using SRILM. > > Ideally, the command should look like > > lattice-tool -read-htk -in-lattice $SRC_LATTICE -lm $LM_FILE -order > $LM_ORDER -bayes 0 -lambda $LAMBDA -mix-lm $LM2_FILE -loglinear-mix > -write-htk -out-lattice $TMP_TRG_LATTIC -unk -map-unk $UNK_WORD > -keep-unk > > but there is no loglinear-mix option in lattice-tool, like in ngram. > > Thanks in advance, The patch below will add the -loglinear-mix option to lattice-tool. Andreas Index: lattice/src/lattice-tool.cc =================================================================== RCS file: /home/srilm/CVS/srilm/lattice/src/lattice-tool.cc,v retrieving revision 1.154 retrieving revision 1.155 diff -c -r1.154 -r1.155 *** lattice/src/lattice-tool.cc 14 Jan 2011 01:07:54 -0000 1.154 --- lattice/src/lattice-tool.cc 29 Jan 2011 05:56:35 -0000 1.155 *************** *** 5,11 **** #ifndef lint static char Copyright[] = "Copyright (c) 1997-2011 SRI International. All Rights Reserved."; ! static char RcsId[] = "@(#)$Id: lattice-tool.cc,v 1.154 2011/01/14 01:07:54 stolcke Exp $"; #endif #ifdef PRE_ISO_CXX --- 5,11 ---- #ifndef lint static char Copyright[] = "Copyright (c) 1997-2011 SRI International. All Rights Reserved."; ! static char RcsId[] = "@(#)$Id: lattice-tool.cc,v 1.155 2011/01/29 05:56:35 stolcke Exp $"; #endif #ifdef PRE_ISO_CXX *************** *** 43,48 **** --- 43,49 ---- #include "SimpleClassNgram.h" #include "ProductNgram.h" #include "BayesMix.h" + #include "LoglinearMix.h" #include "RefList.h" #include "LatticeLM.h" #include "WordMesh.h" *************** *** 138,143 **** --- 139,145 ---- static double mixLambda7 = 0.0; static double mixLambda8 = 0.0; static double mixLambda9 = 0.0; + static int loglinearMix = 0; static char *inLattice = 0; static char *inLattice2 = 0; static char *inLatticeList = 0; *************** *** 231,236 **** --- 233,239 ---- { OPT_FLOAT, "mix-lambda8", &mixLambda8, "mixture weight for -mix-lm8" }, { OPT_STRING, "mix-lm9", &mixFile9, "ninth LM to mix in" }, { OPT_FLOAT, "mix-lambda9", &mixLambda9, "mixture weight for -mix-lm9" }, + { OPT_TRUE, "loglinear-mix", &loglinearMix, "use log-linear mixture LM" }, { OPT_INT, "order", &order, "ngram order used for expansion or bigram weight substitution" }, { OPT_TRUE, "no-expansion", &noExpansion, "do not apply expansion with LM" }, { OPT_STRING, "ref-list", &refList, "reference file used for computing WER (lines starting with utterance id)" }, *************** *** 1090,1095 **** --- 1093,1144 ---- } } + LM * + makeLoglinearMixLM(Array filenames, Vocab &vocab, + SubVocab *classVocab, unsigned order, + LM *oldLM, Array lambdas) + { + Array allLMs; + allLMs[0] = oldLM; + + for (unsigned i = 1; i < filenames.size(); i++) { + const char *filename = filenames[i]; + File file(filename, "r"); + + /* + * create factored LM if -factored was specified, + * class-ngram if -classes were specified, + * and otherwise a regular ngram + */ + Ngram *lm = factored ? + new ProductNgram((ProductVocab &)vocab, order) : + (classVocab != 0) ? + (simpleClasses ? + new SimpleClassNgram(vocab, *classVocab, order) : + new ClassNgram(vocab, *classVocab, order)) : + new Ngram(vocab, order); + assert(lm != 0); + + if (!lm->read(file, limitVocab)) { + cerr << "format error in mix-lm file " << filename << endl; + exit(1); + } + + /* + * Each class LM needs to read the class definitions + */ + if (classesFile != 0) { + File file(classesFile, "r"); + ((ClassNgram *)lm)->readClasses(file); + } + allLMs[i] = lm; + } + + LM *newLM = new LoglinearMix(vocab, allLMs, lambdas); + assert(newLM != 0); + + return newLM; + } int main (int argc, char *argv[]) { *************** *** 1310,1316 **** useLM = ngram; } ! if (mixFile) { /* * create a Bayes mixture LM */ --- 1359,1365 ---- useLM = ngram; } ! if (mixFile && !loglinearMix) { /* * create a Bayes mixture LM */ *************** *** 1370,1375 **** --- 1419,1476 ---- useLM = makeMixLM(mixFile9, *vocab, classVocab, order, useLM, mixLambda9, 1.0); } + } else if (mixFile && loglinearMix) { + /* + * Create log-linear mixture LM + */ + double mixLambda1 = 1.0 - mixLambda - mixLambda2 - mixLambda3 + - mixLambda4 - mixLambda5 - mixLambda6 - mixLambda7 + - mixLambda8 - mixLambda9; + + Array filenames; + Array lambdas; + + /* Add redundant filename entry for base LM to make filenames array + * symmetric with lambdas */ + filenames[0] = ""; + filenames[1] = mixFile; + lambdas[0] = mixLambda; + lambdas[1] = mixLambda1; + + if (mixFile2) { + filenames[2] = mixFile2; + lambdas[2] = mixLambda2; + } + if (mixFile3) { + filenames[3] = mixFile3; + lambdas[3] = mixLambda3; + } + if (mixFile4) { + filenames[4] = mixFile4; + lambdas[4] = mixLambda4; + } + if (mixFile5) { + filenames[5] = mixFile5; + lambdas[5] = mixLambda5; + } + if (mixFile6) { + filenames[6] = mixFile6; + lambdas[6] = mixLambda6; + } + if (mixFile7) { + filenames[7] = mixFile7; + lambdas[7] = mixLambda7; + } + if (mixFile8) { + filenames[8] = mixFile8; + lambdas[8] = mixLambda8; + } + if (mixFile9) { + filenames[9] = mixFile9; + lambdas[9] = mixLambda9; + } + useLM = makeLoglinearMixLM(filenames, *vocab, classVocab, order, + useLM, lambdas); } /* From mshamsuddeen2 at gmail.com Mon Jan 31 17:57:59 2011 From: mshamsuddeen2 at gmail.com (Muhammad Shamsuddeen Muhammad) Date: Tue, 1 Feb 2011 09:57:59 +0800 Subject: [SRILM User List] SRILM Missing FIles Message-ID: I compiled srilm but faced an error while trying to build a language model using a tutorial. The error was that i had the file "ngram-count" missing, while running this command >> $ $SRILM?HOME/bin/i686/ngram?count ?order 3 ?interpolate ?kndiscount ?unk ? text lm/corpus.lowercased.en ?lm lm/corpus.lm I tried compiling from scratch all over again and it still is missing. According to another tutorial thou, if the following files >> liboolm.a libdstruct.a libflm.a liblattice.a libmisc.a are created then the installation was successful, and they are all present in my installation. So what may be the issue here, and since "ngram-count" is missing from the $SRILM?HOME/bin/i686/ directory there could be other missing files. Can someone possibly send me a list of all the files present in that directory of a working installation so that i could compare also. Best Regards -- Muhammad Shamsuddeen Muhammad "There is no knowledge that is not power". -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at speech.sri.com Tue Feb 1 11:48:36 2011 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Tue, 01 Feb 2011 11:48:36 -0800 Subject: [SRILM User List] SRILM Missing FIles In-Reply-To: References: Message-ID: <4D486394.5000102@speech.sri.com> Muhammad Shamsuddeen Muhammad wrote: > I compiled srilm but faced an error while trying to build a language > model using a tutorial. The error was that i had the file > "ngram-count" missing, while running this command >> > > $ $SRILM?HOME/bin/i686/ngram?count ?order 3 ?interpolate ?kndiscount > ?unk ? > text lm/corpus.lowercased.en ?lm lm/corpus.lm > > I tried compiling from scratch all over again and it still is missing. > According to another tutorial thou, if the following files >> > > liboolm.a > libdstruct.a > libflm.a > liblattice.a > libmisc.a > > are created then the installation was successful, and they are all > present in my installation. > So what may be the issue here, and since "ngram-count" is missing from > the $SRILM?HOME/bin/i686/ directory there could be other missing > files. Can someone possibly send me a list of all the files present in > that directory of a working installation so that i could compare also. If there are no binaries generated in $SRILM/bin/$MACHINE_TYPE (with ngram-count being one of them) then you need to follow the checklist under frequently asked question A1) at http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.7.html Andreas > > Best Regards > > -- > Muhammad Shamsuddeen Muhammad > > "There is no knowledge that is not power". > > ------------------------------------------------------------------------ > > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user From mshamsuddeen2 at gmail.com Tue Feb 1 20:04:36 2011 From: mshamsuddeen2 at gmail.com (Muhammad Shamsuddeen Muhammad) Date: Wed, 2 Feb 2011 12:04:36 +0800 Subject: [SRILM User List] make release not working Message-ID: Upon trying to 'compile moses support scripts' after editing the Makefile to the relevant directories, when i enter the " $ make release" command, i get a response saying 'make: release is up-to date' and the time stamped folder is not created when i look into the directory. Here is the output of the command... integ at integ-desktop:~/mosesdecoder/trunk/scripts$ make release make: `release' is up to date. Any suggestions as to what im doing wrong? Regards -- Muhammad Shamsuddeen Muhammad "There is No Knowledge That is Not Power". -------------- next part -------------- An HTML attachment was scrubbed... URL: From marco.turchi at gmail.com Sun Feb 6 15:32:20 2011 From: marco.turchi at gmail.com (marco turchi) Date: Mon, 7 Feb 2011 00:32:20 +0100 Subject: [SRILM User List] Query srilm from Java Message-ID: Dear All, I need to query srilm from java, do you know any free available wrappers? Best Regards Marco -------------- next part -------------- An HTML attachment was scrubbed... URL: From zeeshankhans at gmail.com Mon Feb 7 15:24:39 2011 From: zeeshankhans at gmail.com (zeeshan khan) Date: Tue, 8 Feb 2011 00:24:39 +0100 Subject: [SRILM User List] effect of ngram -vocab and -limit-vocab on ppl calculations Message-ID: Hi all, I wanted to share my observation regarding the SRILM toolkit's calculation of perplexities and the effect of -vocab and -limit-vocab on it, and wanted to know why this happens. SRILM toolkit's ngram tool gives 3 different perplexities of the SAME text if these options are used as follows. P1: ngram -unk -map-unk '[UNKNOWN]' -order 4 -lm -ppl : gives the highest perplexity value P2: ngram -unk -map-unk '[UNKNOWN]' -vocab -order 4 -lm -ppl : gives perplexity value lesser than P1 and greater than P3. P3: ngram -unk -map-unk '[UNKNOWN]' -vocab -limit-vocab -order 4 -lm -ppl : gives perplexity value smaller than both P1 and P2. Can anyone tell me why this happens ? I thought the effect of -vocab and -limit-vocab options is only on memory usage. Just for information, the VOCAB files are generated from lattice files generated during a recognition process. Thanks and Regards, Zeeshan. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at speech.sri.com Mon Feb 7 23:26:02 2011 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Mon, 07 Feb 2011 23:26:02 -0800 Subject: [SRILM User List] effect of ngram -vocab and -limit-vocab on ppl calculations In-Reply-To: References: Message-ID: <4D50F00A.7030807@speech.sri.com> zeeshan khan wrote: > Hi all, > > I wanted to share my observation regarding the SRILM toolkit's > calculation of perplexities and the effect of -vocab and -limit-vocab > on it, and wanted to know why this happens. > > > SRILM toolkit's ngram tool gives 3 different perplexities of the SAME > text if these options are used as follows. > > P1: ngram -unk -map-unk '[UNKNOWN]' -order 4 -lm -ppl > : gives the highest perplexity value > > P2: ngram -unk -map-unk '[UNKNOWN]' -vocab -order 4 -lm > -ppl : gives perplexity value lesser than P1 and > greater than P3. That's probably because your contains more words than the LM itself. That means fewer words are mapped to '[UNKNOWN]' and this changes which probabilities are looked up in the LM. If however your contains a subset of the vocabulary in the LM itself then there should be no change in perplexity. > > P3: ngram -unk -map-unk '[UNKNOWN]' -vocab -limit-vocab > -order 4 -lm -ppl : gives perplexity value > smaller than both P1 and P2. This has the effect that only ngrams covered by the words in are read from the LM. Presumably more words are now mapped to [UNKNOWN], but it's hard to predict what happens to perplexity because you don't say what the relationship between the vocabulary and the data in is. The purpose of -limit-vocab is to all and only the portions of the LM that are needed by the input data. Therefore, to make meaningful use of this option you need to generate the vocabulary from the in this case. > > Can anyone tell me why this happens ? I thought the effect of -vocab > and -limit-vocab options is only on memory usage. A good way to track down the differences is to use -debug 2, capture the output in files, and use diff to see where they differ. Andreas > > > Just for information, the VOCAB files are generated from lattice files > generated during a recognition process. > > > Thanks and Regards, > > > Zeeshan. > ------------------------------------------------------------------------ > > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user From mehdi_hoseini at comp.iust.ac.ir Tue Feb 8 04:03:20 2011 From: mehdi_hoseini at comp.iust.ac.ir (Mehdi hoseini) Date: Tue, 08 Feb 2011 15:33:20 +0330 Subject: [SRILM User List] Variable N-grams Message-ID: hi all, I read a paper titled "Variable N-grams and Extensions for Conversational Speech Language modeling". I wonder is there any option in SRILM that help me to make Variable N-grams Language model? thanks. M. Hoseini -------------- next part -------------- An HTML attachment was scrubbed... URL: From fabian_in_hongkong at hotmail.com Wed Feb 9 03:05:11 2011 From: fabian_in_hongkong at hotmail.com (Fabian -) Date: Wed, 9 Feb 2011 12:05:11 +0100 Subject: [SRILM User List] Expand class-based LM PPL Message-ID: Hi, I have a language model interpolated from a class LM and a word LM. If I compute the PPL on my dev set with ngram -classes .. it gives a reasonable PPL, if I expand the interpolated LM and compute the PPL (without the -classes parameter) I get a very high PPL. Can anyone tell me why? Best,Fabian -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at speech.sri.com Wed Feb 9 09:13:27 2011 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Wed, 09 Feb 2011 09:13:27 -0800 Subject: [SRILM User List] Expand class-based LM PPL In-Reply-To: References: Message-ID: <4D52CB37.2010401@speech.sri.com> Fabian - wrote: > Hi, > > I have a language model interpolated from a class LM and a word LM. If > I compute the PPL on my dev set with ngram -classes .. it gives a > reasonable PPL, if I expand the interpolated LM and compute the PPL > (without the -classes parameter) I get a very high PPL. Can anyone > tell me why? Have you tried expanding the class LM before interpolation, and verifying that it has a reasonable PPL ? Andreas > > Best, > Fabian > ------------------------------------------------------------------------ > > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user From lfu20 at hotmail.com Sun Feb 13 13:05:51 2011 From: lfu20 at hotmail.com (Luis Uebel) Date: Sun, 13 Feb 2011 21:05:51 +0000 Subject: [SRILM User List] Compacting language models In-Reply-To: <4D52CB37.2010401@speech.sri.com> References: , <4D52CB37.2010401@speech.sri.com> Message-ID: I am using SRI to produce some reverse language models and are quite big. Stats: training data: 1.1G words 88M sentences but system was limited to 39k words (wordlist.txt) by: ngram-count -memuse -order 3 -interpolate -kndiscount -unk -vocab ../lang-data/wordlist.txt -limit-vocab -text ../lang-data/${training}-${reverse}.xml -lm ${training}-reverse-lm${trigram} Is there other options to reduce LM size since trigrams are 1.7G? (without so much lost in performance)? Thanks, Luis -------------- next part -------------- An HTML attachment was scrubbed... URL: From ammansik at cis.hut.fi Mon Feb 14 03:36:53 2011 From: ammansik at cis.hut.fi (Andre Mansikkaniemi) Date: Mon, 14 Feb 2011 13:36:53 +0200 Subject: [SRILM User List] Python wrapper Message-ID: <4D5913D5.4010404@cis.hut.fi> Hi! I'm trying to compile a Python wrapper for SRILM as described in [1]. I run into problems when compiling the Python shared library module: g++ -shared srilm.o srilm_wrap.o -loolm -ldstruct -lmisc -L/home/ammansik/srilm-1.5.12/lib/i686-m64 -o _srilm.so I get the following error messages: /usr/bin/ld: /home/andre/srilm-1.5.12/lib/i686-m64/liboolm.a(Prob.o): relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC /home/andre/srilm-1.5.12/lib/i686-m64/liboolm.a: could not read symbols: Bad value collect2: ld returned 1 exit status The system I'm using is x86_64 GNU/Linux. Any ideas how to solve this? Andr? [1] Nitin Madnani. Source Code: Querying and Serving N -gram Language Models with Python From stolcke at icsi.berkeley.edu Tue Feb 15 15:35:50 2011 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Tue, 15 Feb 2011 15:35:50 -0800 Subject: [SRILM User List] Python wrapper In-Reply-To: <4D5913D5.4010404@cis.hut.fi> References: <4D5913D5.4010404@cis.hut.fi> Message-ID: <4D5B0DD6.60005@icsi.berkeley.edu> Andre Mansikkaniemi wrote: > Hi! > I'm trying to compile a Python wrapper for SRILM as described in [1]. I > run into problems when compiling the Python shared library module: > g++ -shared srilm.o srilm_wrap.o -loolm -ldstruct -lmisc > -L/home/ammansik/srilm-1.5.12/lib/i686-m64 -o _srilm.so > I get the following error messages: > /usr/bin/ld: /home/andre/srilm-1.5.12/lib/i686-m64/liboolm.a(Prob.o): > relocation R_X86_64_32 against `a local symbol' can not be used when > making a shared object; recompile with -fPIC > /home/andre/srilm-1.5.12/lib/i686-m64/liboolm.a: could not read symbols: > Bad value > collect2: ld returned 1 exit status > The system I'm using is x86_64 GNU/Linux. > Any ideas how to solve this? > PIC compilation should be the default in recent releases. Run "make cleanest" , then rebuild and make sure -fPIC appears inn all compiler commands. Make sure the PIC_FLAG variable is not modified in any in common/Makefile.machine.i686-m64 or Makefile.site.i686-m64 . Andreas > Andr? > > [1] Nitin Madnani. Source Code: Querying and Serving N -gram Language > Models with Python > > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user > > From ammansik at cis.hut.fi Wed Feb 16 00:13:15 2011 From: ammansik at cis.hut.fi (Andre Mansikkaniemi) Date: Wed, 16 Feb 2011 10:13:15 +0200 Subject: [SRILM User List] Python wrapper In-Reply-To: <4D5B0DD6.60005@icsi.berkeley.edu> References: <4D5913D5.4010404@cis.hut.fi> <4D5B0DD6.60005@icsi.berkeley.edu> Message-ID: <4D5B871B.4020002@cis.hut.fi> Andreas Stolcke wrote: > Andre Mansikkaniemi wrote: >> Hi! >> I'm trying to compile a Python wrapper for SRILM as described in [1]. I >> run into problems when compiling the Python shared library module: >> g++ -shared srilm.o srilm_wrap.o -loolm -ldstruct -lmisc >> -L/home/ammansik/srilm-1.5.12/lib/i686-m64 -o _srilm.so >> I get the following error messages: >> /usr/bin/ld: /home/andre/srilm-1.5.12/lib/i686-m64/liboolm.a(Prob.o): >> relocation R_X86_64_32 against `a local symbol' can not be used when >> making a shared object; recompile with -fPIC >> /home/andre/srilm-1.5.12/lib/i686-m64/liboolm.a: could not read symbols: >> Bad value >> collect2: ld returned 1 exit status >> The system I'm using is x86_64 GNU/Linux. >> Any ideas how to solve this? >> > PIC compilation should be the default in recent releases. > Run "make cleanest" , then rebuild and make sure -fPIC appears inn all > compiler commands. Make sure the PIC_FLAG variable is not modified in > any in common/Makefile.machine.i686-m64 or Makefile.site.i686-m64 . > > Andreas Hi, I got it working now. Many thanks! Andr? > >> Andr? >> >> [1] Nitin Madnani. Source Code: Querying and Serving N -gram Language >> Models with Python >> >> _______________________________________________ >> SRILM-User site list >> SRILM-User at speech.sri.com >> http://www.speech.sri.com/mailman/listinfo/srilm-user >> >> > From mehdi_hoseini at comp.iust.ac.ir Wed Feb 16 03:04:52 2011 From: mehdi_hoseini at comp.iust.ac.ir (Mehdi hoseini) Date: Wed, 16 Feb 2011 14:34:52 +0330 Subject: [SRILM User List] I have problem in building new version Message-ID: hi I build new version files successfully except ngram and lattice-tool. I don't know how to deal with. if I use ngram.exe and lattice-tool.exe from previous version (1.5.11), do I lose so many things? thanks From stolcke at icsi.berkeley.edu Wed Feb 23 06:28:56 2011 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Wed, 23 Feb 2011 06:28:56 -0800 Subject: [SRILM User List] Compacting language models In-Reply-To: References: , <4D52CB37.2010401@speech.sri.com> Message-ID: <4D6519A8.4090901@icsi.berkeley.edu> Luis Uebel wrote: > I am using SRI to produce some reverse language models and are quite big. > Stats: training data: 1.1G words > 88M sentences > > but system was limited to 39k words (wordlist.txt) by: > ngram-count -memuse -order 3 -interpolate -kndiscount -unk -vocab > ../lang-data/wordlist.txt -limit-vocab -text > ../lang-data/${training}-${reverse}.xml -lm > ${training}-reverse-lm${trigram} > > > Is there other options to reduce LM size since trigrams are 1.7G? > (without so much lost in performance)? Luis, if the issue is that training takes too much memory, please see the FAQ on memory issues. If you already have a (large) LM and want to reduce its size for test purposes, us the ngram -prune option. You want to read the following papers to understand how LM pruning works: A. Stolcke,'' Entropy-based Pruning of Backoff Language Models,'' Proc. DARPA Broadcast News Transcription and Understanding Workshop, pp. 270-274, Lansdowne, VA, 1998. C. Chelba, T. Brants, W. Neveitt, and P. Xu, ''Study on Interaction Between Entropy Pruning and Kneser-Ney Smoothing,'' Proc. Interspeech, pp. 2422-2425, Makuhari, Japan, 2010. Andreas > > Thanks, > > > Luis > > ------------------------------------------------------------------------ > > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user From no-reply at dropboxmail.com Sun Mar 6 14:49:35 2011 From: no-reply at dropboxmail.com (Dropbox) Date: Sun, 06 Mar 2011 22:49:35 +0000 Subject: [SRILM User List] Jayadev Jayaraman invited you to Dropbox! Message-ID: <20110306224935.28EA6476BCA@mailman-2.dropboxmail.com> Jayadev Jayaraman wants you to use Dropbox to sync and share files online and across computers. Get started here: http://www.dropbox.com/link/20.R4fEtk4066/NjczNTk3MTgzNw?src=referrals_bulk9 - The Dropbox Team ____________________________________________________ To stop receiving invites from Dropbox, please go to http://www.dropbox.com/bl/225e2a93fec1/srilm-user%40speech.sri.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mehdi_hoseini at comp.iust.ac.ir Mon Mar 7 12:03:23 2011 From: mehdi_hoseini at comp.iust.ac.ir (Mehdi hoseini) Date: Mon, 07 Mar 2011 23:33:23 +0330 Subject: [SRILM User List] Error Comiling SRILM Message-ID: hi all I use cygwin to compile SRILM. but I have these i attached the picture. regards Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Tue Mar 22 10:37:46 2011 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Tue, 22 Mar 2011 10:37:46 -0700 Subject: [SRILM User List] threshold on maximal counts for LM estimation In-Reply-To: References: Message-ID: <4D88DE6A.1030402@icsi.berkeley.edu> zeeshan khan wrote: > Thanks alot Andreas for your answers ! > I have another question. > > Using the ngram-count tool, is there a way to generate a count file > which contains counts only lower than a certain limit. > For example, if I want to generate a count file which contains only > those N-grams which occurred less than 50 times in a corpus, how can I > do it with the ngram-count. May be it is very simple to do, but I > couldnt find it. Currently, I do it manually, but it is cumbersome and > time-consuming. > > There is a way to set the maximal count of N-grams of an order / n / > that are discounted under Good-Turing, but I couldnt find a way to set > a maximal count limit of all N-grams to be considered at all. There is no way to do it using existing functions in ngram-count. Even if there were a way to do it with a built-in function you would not really gain any efficiency because to know if something occurs more than N times you need to keep track of all counts to begin with. So you're not going to be able to do much better than ngram-count -text .... -write - | gawk '$NR < 50' | gzip > counts-less-than-50.gz Andreas > > Best Regards, > Zeeshan. > > > > > > > On Tue, Feb 8, 2011 at 8:26 AM, Andreas Stolcke > > wrote: > > zeeshan khan wrote: > > Hi all, > I wanted to share my observation regarding the SRILM toolkit's > calculation of perplexities and the effect of -vocab and > -limit-vocab on it, and wanted to know why this happens. > > > SRILM toolkit's ngram tool gives 3 different perplexities of > the SAME text if these options are used as follows. > P1: ngram -unk -map-unk '[UNKNOWN]' -order 4 -lm > -ppl : gives the highest perplexity value > > P2: ngram -unk -map-unk '[UNKNOWN]' -vocab -order > 4 -lm -ppl : gives perplexity value > lesser than P1 and greater than P3. > > That's probably because your contains more words than > the LM itself. That means fewer words are mapped to '[UNKNOWN]' > and this changes which probabilities are looked up in the LM. If > however your contains a subset of the vocabulary in > the LM itself then there should be no change in perplexity. > > > P3: ngram -unk -map-unk '[UNKNOWN]' -vocab > -limit-vocab -order 4 -lm -ppl : gives > perplexity value smaller than both P1 and P2. > > This has the effect that only ngrams covered by the words in > are read from the LM. > Presumably more words are now mapped to [UNKNOWN], but it's hard > to predict what happens to perplexity because you don't say what > the relationship between the vocabulary and the data in > is. > The purpose of -limit-vocab is to all and only the portions of the > LM that are needed by the input data. Therefore, to make > meaningful use of this option you need to generate the vocabulary > from the in this case. > > > Can anyone tell me why this happens ? I thought the effect of > -vocab and -limit-vocab options is only on memory usage. > > A good way to track down the differences is to use -debug 2, > capture the output in files, and use diff to see where they differ. > > Andreas > > > > Just for information, the VOCAB files are generated from > lattice files generated during a recognition process. > > > Thanks and Regards, > > > Zeeshan. > ------------------------------------------------------------------------ > > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user > > > > From andersson at disi.unitn.it Thu Mar 31 06:01:14 2011 From: andersson at disi.unitn.it (Simon Andersson) Date: Thu, 31 Mar 2011 15:01:14 +0200 (CEST) Subject: [SRILM User List] fngram-count test doesn't terminate Message-ID: <53728.127.0.0.1.1301576474.squirrel@mail.disi.unitn.it> Hello, I just installed SRILM 1.5.12 and ran make test The test results were O.K. for all modules except flm (I removed flm from the test and reran it). The fngram-count test doesn't terminate (I terminated it after an hour). Has somebody experienced this? Thanks, - Simon Andersson University of Trento, Italy From stolcke at icsi.berkeley.edu Thu Mar 31 18:04:35 2011 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Thu, 31 Mar 2011 18:04:35 -0700 Subject: [SRILM User List] fngram-count test doesn't terminate In-Reply-To: <53728.127.0.0.1.1301576474.squirrel@mail.disi.unitn.it> References: <53728.127.0.0.1.1301576474.squirrel@mail.disi.unitn.it> Message-ID: <4D9524A3.6090401@icsi.berkeley.edu> Simon Andersson wrote: > Hello, > > I just installed SRILM 1.5.12 and ran > > make test > > The test results were O.K. for all modules except flm (I removed flm from > the test and reran it). The fngram-count test doesn't terminate (I > terminated it after an hour). Has somebody experienced this? > Two ideas: - try a different compiler (eg, a different version of gcc, or update the version installed by default on your system) - disable optimization I have heard about issues specifically with fngram-count. Andreas > Thanks, > > - Simon Andersson > University of Trento, Italy > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user >