From lfu20 at hotmail.com Sat Jul 2 00:07:18 2011 From: lfu20 at hotmail.com (Luis Uebel) Date: Sat, 2 Jul 2011 07:07:18 +0000 Subject: [SRILM User List] Language Model Adaptation In-Reply-To: <4E08F15A.3010503@icsi.berkeley.edu> References: , <4E08F15A.3010503@icsi.berkeley.edu> Message-ID: How can I produce a LM adapted for a particular speaker? What are the SRILM tools and parameters that I need to use? I produce a LM restricted for words that appear in test set. Model was quite big and now I would like to adapt for a particular speaker using a set of sentence (text) that he pronunciated. Thanks, Luis -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Sun Jul 3 13:42:04 2011 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Sun, 03 Jul 2011 13:42:04 -0700 Subject: [SRILM User List] Language Model Adaptation In-Reply-To: References: , <4E08F15A.3010503@icsi.berkeley.edu> Message-ID: <4E10D41C.9050100@icsi.berkeley.edu> Luis Uebel wrote: > How can I produce a LM adapted for a particular speaker? You need (a lot) of speaker-specific data. Then you adapt as you would to other LM conditions, like specific genres, topics, etc. > What are the SRILM tools and parameters that I need to use? Your question is too general. Start by reviewing the literature on LM adaptation, or the overview paper by Bellegarda http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.4893&rep=rep1&type=pdf and figure out what methods look promising. SRILM implements several, but not all of the techniques described. The options in the ngram(1) tool that are relevant include -mix-lm -bayes -cache -adapt-marginals (and associated options described nearby in the man page). Andreas > > I produce a LM restricted for words that appear in test set. > Model was quite big and now I would like to adapt for a particular speaker > using a set of sentence (text) that he pronunciated. > > Thanks, > > > Luis > > ------------------------------------------------------------------------ > > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user From stolcke at icsi.berkeley.edu Sun Jul 3 13:45:34 2011 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Sun, 03 Jul 2011 13:45:34 -0700 Subject: [SRILM User List] Adaptation in SRILM In-Reply-To: References: Message-ID: <4E10D4EE.90804@icsi.berkeley.edu> Mehdi hoseini wrote: > hi, > I have questions in two Model adaptation with SRILM: > > 1. referring to "statistical language model adaptation:review and > perspectives"(Jerome R. Bellegarda 2003), what does *-mix-lm > *option in *ngram* does?* *is it just *interpolate* adaptation > language model with background language model and lambda as an > interpolate coefficient? or it use* MAP adaptation* method to > merge two language models? > -mix-lm (together with the -bayes 0 option) implements the interpolation (mixing) of multiple LMs at the probability level. (eqn. 4 in the paper). Without the -bayes option an approximation to the interpolated LM is generated in the form of a single combined backoff LM. > 1. how can i merge two language model in SRILM, using Back-off > technique (Besling and Meier, 1995) > The fill-in adaptation method is not currently implemented. Andreas From mehdi_hoseini at comp.iust.ac.ir Wed Jul 6 02:24:19 2011 From: mehdi_hoseini at comp.iust.ac.ir (Mehdi hoseini) Date: Wed, 06 Jul 2011 12:54:19 +0330 Subject: [SRILM User List] Trigger Language Model/Adaptation Message-ID: hi, Is there any way to create trigger language models in SRILM? or perform trigger adaptation with SRILM? Thanks a lot -------------- next part -------------- An HTML attachment was scrubbed... URL: From eahogue at gmail.com Thu Jul 14 16:10:56 2011 From: eahogue at gmail.com (Alan Hogue) Date: Thu, 14 Jul 2011 16:10:56 -0700 Subject: [SRILM User List] Installation problem Message-ID: Hello, I have searched the archive for this list carefully and cannot find anything that quite addresses the problem I'm having. I'd really appreciate any advice. I am trying to install on a machine running Ubuntu 11.04. Here are the details: uname -m: i686 uname -r 2.6.38-10-generic-pae gcc -v gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4) make -v GNU Make 3.81 I have set the path in the top-level Makefile. Since I have gcc4 I am using the following Makefile: Makefile.machine.i686-gcc4 So in that file I have added this: TCL_INCLUDE =-I/usr/include/tcl8.4 TCL_LIBRARY =-L/usr/lib/tcl8.4 -ltcl I have checked all of this many times, so I am pretty sure I am following all the directions in the various install and README files. When I run make: make MACHINE_TYPE=i686-gcc4 World I get errors like the following: ERROR: File to be installed (../bin/i686-gcc4/maxalloc) does not exist. ERROR: File to be installed (../bin/i686-gcc4/maxalloc) is not a plain file. Usage: decipher-install ... mode: file permission mode, in octal file1 ... fileN: files to be installed directory: where the files should be installed files = ../bin/i686-gcc4/maxalloc directory = ../../bin/i686-gcc4 mode = 0555 make[2]: [../../bin/i686-gcc4/maxalloc] Error 1 (ignored) The same basic message pops up for at lots of other files (maybe all of them?), including: ngram, ngram-count, ngram-merge, ngram-class, disambig, etc. At this point I cannot tell what could be wrong. Can anyone suggest anything I might try? I hope I've given enough information, if not I will be glad to respond with more! Sincerely, Alan -------------- next part -------------- An HTML attachment was scrubbed... URL: From shamsaabid at yahoo.com Fri Jul 15 01:20:59 2011 From: shamsaabid at yahoo.com (s a) Date: Fri, 15 Jul 2011 01:20:59 -0700 (PDT) Subject: [SRILM User List] Fw: Problem building SRILM with cygwin Message-ID: <1310718059.79987.YahooMailClassic@web161004.mail.bf1.yahoo.com> --- On Fri, 7/15/11, s a wrote: From: s a Subject: Problem building SRILM with cygwin To: srilm-user at speech.sri.com Date: Friday, July 15, 2011, 1:04 PM I have attached the make output file. And also the Makefile.machine.cygwin whose tcl values i edited. Im not sure how to change the CC values. Please guide My operating system and gcc version are as follows $ uname -aCYGWIN_NT-5.1 home 1.7.9(0.237/5/3) 2011-03-29 10:10 i686 Cygwin $ gcc -vUsing built-in specs.Target: i686-pc-cygwinConfigured with: /gnu/gcc/releases/respins/4.3.4-4/gcc4-4.3.4-4/src/gcc-4.3.4/configure --srcdir=/gnu/gcc/releases/respins/4.3.4-4/gcc4-4.3.4-4/src/gcc-4.3.4 --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --libexecdir=/usr/lib --datadir=/usr/share --localstatedir=/var --sysconfdir=/etc --infodir=/usr/share/info --mandir=/usr/share/man -C --datadir=/usr/share --infodir=/usr/share/info --mandir=/usr/share/man -v --with-gmp=/usr --with-mpfr=/usr --enable-bootstrap --enable-version-specific-runtime-libs --with-slibdir=/usr/bin --libexecdir=/usr/lib --enable-static --enable-shared --enable-shared-libgcc --disable-__cxa_atexit --with-gnu-ld --with-gnu-as --with-dwarf2 --disable-sjlj-exceptions?--enable-languages=ada,c,c++,fortran,java,objc,obj-c++ --disable-symvers --enable-libjava --program-suffix=-4 --enable-libgomp --enable-libssp --enable-libada--enable-threads=posix --with-arch=i686 --with-tune=generic --enable-libgcj-sublibs CC=gcc-4 CXX=g++-4 CC_FOR_TARGET=gcc-4 CXX_FOR_TARGET=g++-4 GNATMAKE_FOR_TARGET=gnatmake GNATBIND_FOR_TARGET=gnatbind --with-ecj-jar=/usr/share/java/ecj.jar Thread model: posixgcc version 4.3.4 20090804 (release) 1 (GCC) Regards,Shamsa. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.output Type: application/octet-stream Size: 19439 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Makefile.machine.cygwin Type: application/octet-stream Size: 1951 bytes Desc: not available URL: From stolcke at icsi.berkeley.edu Fri Jul 15 22:50:17 2011 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Fri, 15 Jul 2011 22:50:17 -0700 Subject: [SRILM User List] Fw: Problem building SRILM with cygwin In-Reply-To: Your message of Fri, 15 Jul 2011 01:20:59 -0700. <1310718059.79987.YahooMailClassic@web161004.mail.bf1.yahoo.com> Message-ID: <201107160550.p6G5oH08015479@fruitcake.ICSI.Berkeley.EDU> Your output shows: SRILMversion.h:1:23: warning: missing terminating " character so something went wrong in the generation of this file. Try % cd misc/src % rm SRILMversion.h % make SRILMversion.h The file should look similar to this: #define SRILM_RELEASE "1.6.0-beta" #define SRILM_COPYRIGHT "\n\ This software is subject to the SRILM Community Research License Version\n\ 1.0 (the \"License\"); you may not use this software except in compliance\n\ with the License. A copy of the License is included in the SRILM root\n\ directory. Software distributed under the License is distributed on an\n\ \"AS IS\" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied.\n\ See the License for the specific language governing rights and\n\ limitations under the License. This software is Copyright (c) SRI\n\ International, 1995-2011. All rights reserved.\n\ \n\ If this software was obtained under a commercial license agreement with\n\ SRI then the provisions therein govern the use of the software and the\n\ above notice does not apply.\n\ " Notice the double quotes starting and ending all string values, and the use of backslashes. If this is not the case for you then your cygwin installation might have a problem. In that case try fixing the file by hand, then go back to the top-level directory and run make World again. Andreas From shamsaabid at yahoo.com Sat Jul 16 09:30:56 2011 From: shamsaabid at yahoo.com (s a) Date: Sat, 16 Jul 2011 09:30:56 -0700 (PDT) Subject: [SRILM User List] Fw: Problem building SRILM with cygwin In-Reply-To: <201107160550.p6G5oH08015479@fruitcake.ICSI.Berkeley.EDU> Message-ID: <1310833856.85005.YahooMailClassic@web161003.mail.bf1.yahoo.com> Thanks, the problem was that in the first line#define SRILM_RELEASE "1.5.12"the ending quotes were on the next lineI fixed it manually and ran make world and the build was thankfully error free.Thanks.???? --- On Sat, 7/16/11, Andreas Stolcke wrote: From: Andreas Stolcke Subject: Re: [SRILM User List] Fw: Problem building SRILM with cygwin To: "s a" Cc: srilm-user at speech.sri.com Date: Saturday, July 16, 2011, 10:50 AM Your output shows: ??? SRILMversion.h:1:23: warning: missing terminating " character so something went wrong in the generation of this file. Try ??? % cd misc/src ??? % rm SRILMversion.h ??? % make SRILMversion.h The file should look similar to this: ??? #define SRILM_RELEASE "1.6.0-beta" ??? #define SRILM_COPYRIGHT "\n\ ??? This software is subject to the SRILM Community Research License Version\n\ ??? 1.0 (the \"License\"); you may not use this software except in compliance\n\ ??? with the License. A copy of the License is included in the SRILM root\n\ ??? directory.? Software distributed under the License is distributed on an\n\ ??? \"AS IS\" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied.\n\ ??? See the License for the specific language governing rights and\n\ ??? limitations under the License.? This software is Copyright (c) SRI\n\ ??? International, 1995-2011.? All rights reserved.\n\ ??? \n\ ??? If this software was obtained under a commercial license agreement with\n\ ??? SRI then the provisions therein govern the use of the software and the\n\ ??? above notice does not apply.\n\ ??? " Notice the double quotes starting and ending all string values, and the use of backslashes. If this is not the case for you then your cygwin installation might have a problem. In that case try fixing the file by hand, then go back to the top-level directory and run make World again. Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Sat Jul 16 12:42:29 2011 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Sat, 16 Jul 2011 12:42:29 -0700 Subject: [SRILM User List] Fw: Problem building SRILM with cygwin In-Reply-To: Your message of Sat, 16 Jul 2011 09:30:56 -0700. <1310833856.85005.YahooMailClassic@web161003.mail.bf1.yahoo.com> Message-ID: <201107161942.p6GJgTAn022327@fruitcake.ICSI.Berkeley.EDU> In message <1310833856.85005.YahooMailClassic at web161003.mail.bf1.yahoo.com>you wrote: > > Thanks, the problem was that in the first line > #define SRILM_RELEASE "1.5.12" > the ending quotes were on the next line > I fixed it manually and ran make world and the build was thankfully error free. > Thanks. > Glad to hear it. The question though is: why did this happen? When you regenerate the SRILMversion.h file it should run the commands: read version < /home/anstolck/srilm/RELEASE; echo "#define SRILM_RELEASE \"$version\"" > SRILMversion.h sed -f /home/anstolck/srilm/sbin/stringify-copyright /home/anstolck/srilm/Copyright >> SRILMversion.h which, under a proper Cygwin installation, should produce the correct output. (The first command is the one that appearently isn't working in your case.) So you might want to experiment with you PATH and other environment settings, and let us know if you find a fix. Andreas From maralthemoral at gmail.com Thu Jul 21 03:23:27 2011 From: maralthemoral at gmail.com (Maral Sh.) Date: Thu, 21 Jul 2011 06:23:27 -0400 Subject: [SRILM User List] Class-based LM Message-ID: Dear SRILM users, I am trying to train a class-based LM. I was hoping there is an step-by-step guide for doing this, but I couldn't find any. I have to create two different LM. my corpus is POS tagged and one of LMs should be based on POS tags. I should also create an LM based on automatic clustering(I removed the tags and I should perform this automatic clustering on this untagged corpus). The format of my tagged corpus is one word per line along with its tag, which are tab-separated. I first excluded the tags in a separate text file and performed the following command on it -> ./ngram-class -text tag.txt -full -classes output.cls -class-counts output.counts then I tried ./replace-word-with-classes classes=output.cls corpus.txt > tag.txt in the end the tag.txt file was someting like the corpus.txt file (it was a word -space- tag per line format). The thing is I don't know what to do next, and if I have done correctly up to now. I appreciate it if anyone can help me ASAP. I have deadlines on Monday. Maral -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Thu Jul 21 10:26:37 2011 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Thu, 21 Jul 2011 10:26:37 -0700 Subject: [SRILM User List] Class-based LM In-Reply-To: References: Message-ID: <4E28614D.2070409@icsi.berkeley.edu> You can find some tutorial information on how to use induced-class-based LMs at http://ssli.ee.washington.edu/courses/ee517/srilm.html The basic mistake in your case is that you are trying to feed tag.txt instead of corpus.txt to ngram-class. replace-word-with-classes is typically used to prepare training data once the class definitions exist (either from ngram-class or by hand crafting). Andreas Maral Sh. wrote: > Dear SRILM users, > I am trying to train a class-based LM. I was hoping there is an > step-by-step guide for doing this, but I couldn't find any. > I have to create two different LM. my corpus is POS tagged and one of > LMs should be based on POS tags. I should also create an LM based on > automatic clustering(I removed the tags and I should perform this > automatic clustering on this untagged corpus). > The format of my tagged corpus is one word per line along with its > tag, which are tab-separated. > I first excluded the tags in a separate text file and performed the > following command on it -> > > ./ngram-class -text tag.txt -full -classes output.cls -class-counts > output.counts > > then I tried > > ./replace-word-with-classes classes=output.cls corpus.txt > tag.txt > > in the end the tag.txt file was someting like the corpus.txt file (it > was a word -space- tag per line format). > > The thing is I don't know what to do next, and if I have done > correctly up to now. > I appreciate it if anyone can help me ASAP. I have deadlines on Monday. > > > Maral > ------------------------------------------------------------------------ > > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user From stolcke at icsi.berkeley.edu Wed Jul 27 14:30:16 2011 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Wed, 27 Jul 2011 14:30:16 -0700 Subject: [SRILM User List] Class-based LM In-Reply-To: References: <4E28614D.2070409@icsi.berkeley.edu> <4E288E2F.2010401@icsi.berkeley.edu> Message-ID: <4E308368.7060701@icsi.berkeley.edu> Maral Sh. wrote: > Dear Andreas, > I finally trained both my LMs. The funny thing is, with POS-tagged LM > the perplexity is about 22 , with a word-based LM the perplexity is > 303 and with automatic clustering the perplexity goes up to 325. I was > wondering if this is normal or have I done something wrong in the > process of training my models?! how can I know what mistake I have > made and where?! > > Best regards, > Maral > It is possible that you just don't have enough data to learn good word classes. As a sanity check you could include the test set in your training set for class induction. You might also get better results if you exclude the least frequent words (say, all words occurring less than 5 times) from the induction, and put them into a separate class (which you have to define and add to the eventual class definition file by hand). Andreas From cyrine.nasri at gmail.com Fri Jul 29 11:44:01 2011 From: cyrine.nasri at gmail.com (Cyrine NASRI) Date: Fri, 29 Jul 2011 20:44:01 +0200 Subject: [SRILM User List] problem when installing srilm on ubuntu machine Message-ID: Hello everyone, I have a problem when installing SRILM on my machine ubuntu 11.04. I downloaded version 6.0 SRILM and I unpacked I changed the makefile like this: SRILM = /users/parole/cnasri/tools/srilm/srilm MACHINE_TYPE := $(shell $(SRILM)/sbin/machine-type) RELEASE := $(shell cat RELEASE) and I did make world, But it gives me an error message /usr/include/gnu/stubs.h:7:27: fatal error: gnu/stubs-32.h: Aucun fichier ou dossier de ce type compilation terminated. make[2]: *** [../obj/i686/option.o] Erreur 1 Any idea please? Thank you in advance Best regards -- *Cyrine Ph.D. Student in Computer Science* -------------- next part -------------- An HTML attachment was scrubbed... URL: From kermorvant at gmail.com Sun Jul 31 03:24:05 2011 From: kermorvant at gmail.com (Christopher Kermorvant) Date: Sun, 31 Jul 2011 12:24:05 +0200 Subject: [SRILM User List] problem when installing srilm on ubuntu machine In-Reply-To: References: Message-ID: google stubs-32.h missing ubuntu -> You need to install the glibc-devel package -- Chris On Fri, Jul 29, 2011 at 8:44 PM, Cyrine NASRI wrote: > Hello everyone, > I have a problem when installing SRILM on my machine ubuntu 11.04. > I downloaded version 6.0 SRILM and I unpacked > I changed the makefile like this: > SRILM = /users/parole/cnasri/tools/srilm/srilm > MACHINE_TYPE := $(shell $(SRILM)/sbin/machine-type) > RELEASE := $(shell cat RELEASE) > > and I did make world, > But it gives me an error message > > /usr/include/gnu/stubs.h:7:27: fatal error: gnu/stubs-32.h: Aucun fichier ou > dossier de ce type > compilation terminated. > make[2]: *** [../obj/i686/option.o] Erreur 1 > > > Any idea please? > > Thank you in advance > > Best regards > > > -- > Cyrine > Ph.D. Student in Computer Science > > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user > From mehdi_hoseini at comp.iust.ac.ir Sun Jul 31 06:47:09 2011 From: mehdi_hoseini at comp.iust.ac.ir (Mehdi hoseini) Date: Sun, 31 Jul 2011 17:17:09 +0330 Subject: [SRILM User List] -adapt-marginals-beta Message-ID: hi all, can anybody introduce me the paper that "-adapt-marginals-beta" option in "ngram" is implemented based on that? best regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Sun Jul 31 15:55:04 2011 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Sun, 31 Jul 2011 17:55:04 -0500 Subject: [SRILM User List] -adapt-marginals-beta In-Reply-To: References: Message-ID: <4E35DD48.6070805@icsi.berkeley.edu> Mehdi hoseini wrote: > hi all, > can anybody introduce me the paper that "-adapt-marginals-beta" option > in "ngram" is implemented based on that? > best regards > Please read the documentation before posting to this list. The ngram man page says > -adapt-marginals LM > Use an LM obtained by adapting the unigram marginals to the values > specified in the LM in > ngram-format(5), using the method described in Kneser et al. (1997). > The LM to be adapted is > that constructed according to the other options. and the full reference is given at the end of the man page. Andreas From shamsaabid at yahoo.com Tue Aug 2 10:52:06 2011 From: shamsaabid at yahoo.com (s a) Date: Tue, 2 Aug 2011 10:52:06 -0700 (PDT) Subject: [SRILM User List] converting pocketsphinx lattice file to HTK or PFSG format Message-ID: <1312307526.56147.YahooMailClassic@web161010.mail.bf1.yahoo.com> Hi Im using lattice-tool to generate a WCN from my pocketsphinx lattice. However, the format of pocketsphinx lattice is not compatible with the lattice-tool. How do i go about converting my pocketsphinx lattice into HTK or PFSG format?Is there any other way to achieve what im trying to do? Regards,Shamsa. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lfu20 at hotmail.com Tue Aug 9 14:59:59 2011 From: lfu20 at hotmail.com (Luis Uebel) Date: Tue, 9 Aug 2011 21:59:59 +0000 Subject: [SRILM User List] Configuration for best language models Message-ID: I am producing some language models (3-grams) for HTK. What is the best configuration for produce the best language models using SRILM? My configuration is: $SRILM/ngram-count -memuse -order ${trigram} -interpolate -kndiscount -unk -vocab $wordlist -limit-vocab -text ${training} -lm ${train}-lm ${trigram} The script line is above and I am using -kndiscount Is there a better type of discount or parameters to produce better language models using SRILM? Number of words (unique): 38k Size: 93Mbytes Number of lines: 550656 Number of words (total): 17166049 (17M) Thanks. Luis -------------- next part -------------- An HTML attachment was scrubbed... URL: From 537333 at unizar.es Wed Aug 10 01:52:59 2011 From: 537333 at unizar.es (Julia Olcoz =?iso-8859-1?b?TWFydO1uZXo=?=) Date: Wed, 10 Aug 2011 10:52:59 +0200 Subject: [SRILM User List] basic LM with GT discount and Katz backoff: evaluating probabilities Message-ID: <20110810105259.bhqpv3gbvccgccww@webmail.unizar.es> Hello, I am trying to verify if I understand smoothing and backoff techniques when generating a Language Model (LM). Thus, I use SRILM Toolkit as follows: ngram-count -text train.txt -no-sos -no-eos -order 2 -lm LM_2gram_train.arpa in order to create a LM, trained by the train.txt and without eos token. My goal is to calculate by my own the resulting probabilities when applying GT smoothing, and verify if results are correct by comparing to the ones obtained by SRILM. The fact is that I do not obtain the same values.. In order to do it easy, I have edited a train corpus so simple, compound only by 3 items (w1, w2, w3) and the eos ``.``: line content 1 w1 w2 w2 w1 . 2 w1 w2 w2 w1 w1 w1 . 3 w2 w1 w1 w2 . 4 w1 . 5 w1 w1 w1 . 6 w3 w3 w3 w3 w3 w3 w3 w3 w3 w3 w3 . 7 w1 w3 8 w2 w3 9 w3 w1 10 w3 w2 and I calculate log probabilities as follows: x r nr nr+1 r* PGT(x) PMLE(x) PGT'(x) log10[PGT'(x)] w1w1 5 1 0 0,00 0,00 0,19 0,19 -0,73 w1w2 3 2 1 2,00 0,07 0,11 0,07 -1,13 w1w3 1 4 1 0,50 0,02 0,04 0,02 -1,73 w2w1 3 2 1 2,00 0,07 0,11 0,07 -1,13 w2w2 2 1 2 6,00 0,22 0,07 0,22 -0,65 w2w3 1 4 1 0,50 0,02 0,04 0,02 -1,73 w3w1 1 4 1 0,50 0,02 0,04 0,02 -1,73 w3w1 1 4 1 0,50 0,02 0,04 0,02 -1,73 w3w3 10 1 0 0,00 0,00 0,37 0,37 -0,43 sum 27 1,00 1,00 where: x is the bigram, r the number of counts of x, nr the number of bigrams with r counts, nr+1 the number of bigrams with r+1 counts, r*=(r+1)(nr+1)/nr, PGT(x)=r*/sum(r), PMLE(x)=r/sum(r), PGT'(x)= (r>=k ? PMLE(x): PGT(x)) with k=5. SRILM results are: \data\ ngram 1=5 ngram 2=9 \1-grams: -99 -99 -0.4220737 w1 0 -0.6651117 w2 0 -0.3921105 w3 0 \2-grams: -0.6627578 w1 w1 -0.1856366 w1 w2 -0.8846066 w1 w3 -0.1249387 w2 w1 -1 w2 w2 -0.8239087 w2 w3 -0.7269987 w3 w1 -0.7269987 w3 w2 -0.20412 w3 w3 \end\ I see that backoff values are null because of all possible bigrams are seen in the train corpus. (Later, I would like to do all this, but removing lines 7 to 10 from the corpus and trying to calculate Katz backoff weights). For instance, I obtain SRILM results of probabilities only for 1-grams, using MLE as shown: y C(y) PMV(y) log[PMV(y) ] w1 14 0,38 -0,4220737 w2 8 0,22 -0,6651117 w3 15 0,41 -0,3921105 sum 37 1,00 but why not with 2-grams? Thank you in advance. From gwenole.lecorve at gmail.com Wed Aug 10 02:23:30 2011 From: gwenole.lecorve at gmail.com (=?ISO-8859-1?B?R3fpbm9s6SBMZWNvcnbp?=) Date: Wed, 10 Aug 2011 11:23:30 +0200 Subject: [SRILM User List] Configuration for best language models In-Reply-To: References: Message-ID: Luis, I wouldn't say there is one absolute good recipe to build a language model (though there are some good practices). Regarding the smoothing, many papers have been studying different techniques and have highlight their respective strenghs and weakness. Especially, it has, for instance, recently been shown that KN smoothing does not well behave along with strong entropy-based pruning (even if you don't seem to use it). As for the other parameters, this may depend on your target task. Thus, I just would say : - read papers about smoothing techniques, eg: [1] Chen, S. F. & Goodman, J. An Empirical Study of Smoothing Techniques for Language Modeling Harvard University, 1998 [2] Chelba, C.; Brants, T.; Neveitt, W. & Xu, P. Study on Interaction Between Entropy Pruning and Kneser-Ney Smoothing Proc. of Interspeech, 2010, 2422-2425 - and compare the effect of different parameters/options (see the manual) in terms of perplexity or what ever measure you're seeking to minimize in the end. Especially, try to toggle on/off -interpolate, -gtnmin N (cutoff) as well as pruning options. Best regards, Gwenole. 2011/8/9 Luis Uebel > I am producing some language models (3-grams) for HTK. > What is the best configuration for produce the best language models using > SRILM? > My configuration is: > $SRILM/ngram-count -memuse -order ${trigram} -interpolate -kndiscount -unk > -vocab $wordlist -limit-vocab -text ${training} -lm ${train}-lm > ${trigram} > > > The script line is above and I am using -kndiscount > Is there a better type of discount or parameters to produce better language > models using SRILM? > > Number of words (unique): 38k > Size: 93Mbytes > Number of lines: 550656 > Number of words (total): 17166049 (17M) > > Thanks. > > > Luis > > > > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dimbabaniotis at gmail.com Mon Aug 15 03:31:31 2011 From: dimbabaniotis at gmail.com (Dimitris Babaniotis) Date: Mon, 15 Aug 2011 13:31:31 +0300 Subject: [SRILM User List] FLM problem Message-ID: <4E48F583.1010903@gmail.com> Hello, I have a problem with fngram-count and fngram commands. I create a language model with fngram command gives me an error. This is factor file: 1 ## Best perplexity found ## logprob= -84709 ppl= 166.097 ppl1= 431.488 W : 4 W(-1) W(-2) M(0) S(0) /home/dimbaba/test.counts /home/dimbaba/M0S0.txt 4 W1,W2, W2 ndiscount M0 M0 ndiscount S0 S0 ndiscount 0 0 ndiscount This is an example input for train: W-???v?????:S-???:M-??? W-???:S-???:M-??? W-??v??o?:S-??v:M-?o? W-???????:S-???:M-??? W-???:S-???:M-??? W-?????????:S-???:M-??? W-???:S-???:M-??? W-???????:S-???:M-??? W-???:S-???:M-??? W-??????????:S-???:M-??? W-????????????:S-???:M-??? W-?:S-?:M-? W-?????:S-???:M-??? W-????:S-???:M-??? W-????????:S-???:M-??? W-???:S-???:M-??? W-?????????:S-???:M-??? W-17:S-17:M-17 W-??????????:S-???:M-??? W-???:S-???:M-??? W-???:S-???:M-??? W-????????:S-???:M-??? W-????:S-???:M-??? W-???:S-???:M-??? W-??????:S-???:M-??? W-?????:S-???:M-??? W-???:S-???:M-??? W-,:S-,:M-, W-??????????:S-???:M-??? W-??:S-??:M-?? W-????????:S-???:M-??? W-????:S-???:M-??? W-????:S-???:M-??? W-????????:S-???:M-??? W-.:S-.:M-. These are the commands: fngram -factor-file /home/dimbaba/test.ff -ppl aligned/el-de/el-test.txt -nonull -no-virtual-begin-sentence fngram-count -factor-file /home/dimbaba/test.ff -text /home/dimbaba/factoredExample.txt -nonull -no-virtual-begin-sentence -lm ghu This is the output of fngram command: /home/dimbaba/test.counts: line 643: malformed N-gram count or more than 100 words per line warning: no singleton counts GT discounting disabled warning: no singleton counts GT discounting disabled warning: no singleton counts GT discounting disabled warning: no singleton counts GT discounting disabled warning: no singleton counts GT discounting disabled warning: no singleton counts GT discounting disabled warning: no singleton counts GT discounting disabled warning: no singleton counts GT discounting disabled warning: no singleton counts GT discounting disabled warning: no singleton counts GT discounting disabled warning: no singleton counts GT discounting disabled warning: no singleton counts GT discounting disabled /home/dimbaba/M0S0.txt: line 21: error, ngram line has invalid number (1) of fields, expecting either 2 or 3 format error in lm file Where is the problem? Thanks Dimitris From utebachmeier at gmail.com Thu Aug 18 13:09:34 2011 From: utebachmeier at gmail.com (Sabrina Friedman) Date: Thu, 18 Aug 2011 23:09:34 +0300 Subject: [SRILM User List] Language Model Adaptationfrom Sabrina Message-ID: How can I produce a LM adapted for a particular speaker? What are the SRILM tools and parameters that I need to use? I produce a LM restricted for words that appear in test set. Model was quite big and now I would like to adapt for a particular speaker using a set of sentence (text) that he pronunciated. Thanks, Luis Sabrina Friedman Billige Fl?ge Marketing GmbH Emanuelstr. 3, 10317 Berlin Deutschland Telefon: +49 (33) 5310967 Email: utebachmeier at gmail.com Site: http://flug.airego.de - Billige Fl?ge vergleichen From stolcke at icsi.berkeley.edu Fri Aug 19 13:16:43 2011 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Fri, 19 Aug 2011 13:16:43 -0700 Subject: [SRILM User List] Language Model Adaptationfrom Sabrina In-Reply-To: References: Message-ID: <4E4EC4AB.4080604@icsi.berkeley.edu> Sabrina Friedman wrote: > How can I produce a LM adapted for a particular speaker? > What are the SRILM tools and parameters that I need to use? > > I produce a LM restricted for words that appear in test set. > Model was quite big and now I would like to adapt for a particular speaker > using a set of sentence (text) that he pronunciated. > > Thanks, > > > Luis > > Are you the same Luis that asked this very question back in July? http://www.speech.sri.com/pipermail/srilm-user/2011q3/001068.html My reply is still at http://www.speech.sri.com/pipermail/srilm-user/2011q3/001069.html Andreas > > > > Sabrina Friedman > Billige Fl?ge Marketing GmbH > Emanuelstr. 3, > 10317 Berlin > Deutschland > Telefon: +49 (33) 5310967 > Email: utebachmeier at gmail.com > Site: http://flug.airego.de - Billige Fl?ge vergleichen > > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user > > From stolcke at icsi.berkeley.edu Mon Aug 22 09:08:11 2011 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Mon, 22 Aug 2011 09:08:11 -0700 Subject: [SRILM User List] SRILM manipulates lattices as files only? In-Reply-To: <1313920761.47367.YahooMailClassic@web161003.mail.bf1.yahoo.com> References: <1313920761.47367.YahooMailClassic@web161003.mail.bf1.yahoo.com> Message-ID: <4E527EEB.6000000@icsi.berkeley.edu> s a wrote: > > I am using pocketsphinx as decoder for my android application. > and I intend to use SRILM for sausage generation. > > I was able to generate the htk format lattice using the -outlatfmt > htk parameter in the pocketsphinxbatch command > > i later gave that file as input to the srilm lattice-tool (on > cygwin prompt) and got a mesh file > > Now i want this mesh available in my android application. In my > android app i have got a lattice object and i am able to access > the words on the nodes of the lattice. > > I need to know which srilm file has the method that would take my > lattice object as input and return me a lattice which is in > sausage form. (or whether it is possible or not? because i studied > the code and i could see lattice being manipulated as a file) > You would have to construct the lattice data structure in memory. The best way to learn how to do that (I admit it is somewhat involved) is to read the code that reads an HTK lattice file into memory. You can find this in $SRILM/lattice/src/HTKLattice.cc, function Lattice::readHTK(). A more straightforward, if somewhat less efficient way would be to write the textual HTK lattice representation to a string object, and then "read" from that string using Lattice::readHTK(). For this purpose the File object can be constructed from a string, see the comment in $SRILM/misc/src/File.h . I believe you need to get the very latest (beta) version of SRILM for this to work. Andreas From stolcke at icsi.berkeley.edu Sun Aug 28 22:47:21 2011 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Sun, 28 Aug 2011 22:47:21 -0700 Subject: [SRILM User List] cashe adaptation In-Reply-To: References: Message-ID: <4E5B27E9.5000900@icsi.berkeley.edu> iman emamgholipour wrote: > > Dear Dr. Stolcke, > > I used SRILM for cache adapting in language models. and I used > this command to reach that goal. > > /*ngram -rescore-ngram CacheAdaptedLanguageModel.txt -lm > Unigram.txt -mix-lm BackgroundLM.txt -cache-lambda > "lambda_value" -cache "lenght" -write-lm > CacheAdaptedLanguageModel_Output.txt*/ > > But unconventionally the results show that cache length has no > effect on /Word Error Rate /. I test it till 10000 for length. > Would you please tell me where my mistake is? > > Best regards > > You cannot use -rescore-ngram in conjunction with -cache . Cache-based adaptation only makes sense when processing words sequentially. The -rescore-ngram function is to reassign ngram conditional probabilities in an LM, without providing more distant context. So a cache is useless here. And -write-lm is only supported to pure ngram and a few other specialized LM formats. ngram -cache only makes sense in combination with options that evaluate the LM on input text, such as -ppl, -rescore, etc. (nbest rescoring is also problematic since you probably don't want one nbest hypothesis for a sentence to affect the cache for another alternate hypothesis.) Andreas From danielshout at hotmail.com Wed Sep 7 06:03:54 2011 From: danielshout at hotmail.com (Daniel Schaut) Date: Wed, 7 Sep 2011 15:03:54 +0200 Subject: [SRILM User List] SRILM 1.5.12: Compiling Issues on Ubuntu 10.4 Message-ID: Hi all, I'm a pretty new user to SRILM and encounter some problems while compiling the binaries. I would very much appreciate it, if you guys could give a hint on a how to solve my compiling problem. My machine: Oracle VM VB 4.0.12 r72916 Ubuntu 10.4 (32bit) Kernel 2.6.32-33-generic GCC 4.4.3 (i486-linux-gnu) Intel C2D T7700 @2.4GHz 2.00 GB RAM 1. Changes done to Top-level Makefile: SRILM = /home/user/srilm-1.5.12 2. Changes done to default Makefile.machine.i686-ubuntu: GCC_FLAGS = -mtune=pentium3 -Wreturn-type -Wimplicit CC = $(GCC_PATH)gcc-4.4 $(GCC_FLAGS) CXX = $(GCC_PATH)g++-4.4 $(GCC_FLAGS) -DINSTANTIATE_TEMPLATES TCL_INCLUDE = -I /usr/include/tcl8.5 TCL_LIBRARY = -ltcl GAWK = /usr/bin/gawk By default, C-Shell and TC-Shell are installed in /usr/bin/ on my machine. According to README.linux, INSTALL and FAQ csh/tcsh should be located in /bin/. I suppose I have to link csh/tcsh to /bin/ to make the compiling process work properly? GCC_FLAG is obviously not correct for my machine, isn't it? Which one would be the correct GCC_FLAG? Furthermore, I already tried to set variables with SETENV and created a new machine-specific makefile to compile binaries, but it still fails. I also searched the user archives and found some answers regarding my problem, but I can't fix it. Tutorials on the internet did not work for me. Could you please give advice on how to solve the following compiling issue? When I try to compile SRILM in csh/tcsh I get a bunch of errors: make: /home/user/srilm-1.5.12/sbin/machine-type: Command not found mkdir include lib bin make init make[1]: /home/user/srilm-1.5.12/sbin/machine-type: Command not found make[1]: Entering directory `/home/user/srilm-1.5.12' for subdir in misc dstruct lm flm lattice utils; do \ (cd $subdir/src; make SRILM=/home/user/srilm-1.5.12 MACHINE_TYPE= OPTION= MAKE_PIC= init) || exit 1; \ done make[2]: /home/user/srilm-1.5.12/sbin/machine-type: Command not found make[2]: Entering directory `/home/user/srilm-1.5.12/misc/src' /home/user/srilm-1.5.12/common/Makefile.common.variables:96: /home/user/srilm-1.5.12/common/Makefile.machine.: No such file or directory make[2]: *** No rule to make target `/home/user/srilm-1.5.12/common/Makefile.machine.'. Stop. make[2]: Leaving directory `/home/user/mosestools/srilm-1.5.12/misc/src' make[1]: *** [init] Error 1 make[1]: Leaving directory `/home/user/srilm-1.5.12' make: *** [World] Error 2 Best, Dan EDIT: I've renamed Makefile.machine.i686-ubuntu to Makefile.machine. accordingly in the meantime. Though, the whole compiling process still fails with lots of errors, e.g.: /home/user/srilm-1.5.12/sbin/machine-type: Command not found /home/user/srilm-1.5.12/sbin/decipher-install: Command not found In file included from ./Map.cc:12: ./Map.h:63:21: error: Boolean.h: No such file or directory ./Vocab.h:98:20: error: SArray.h: No such file or directory In file included from Trie.h:103, from Trie.cc:27, from SArrayTrie.cc:17: SArray.h:54: error: 'Boolean' has not been declared SArray.h:55: error: 'Boolean' has not been declared SArray.h:56: error: 'Boolean' has not been declared SArray.h:57: error: 'Boolean' has not been declared SArray.h:70: error: 'Boolean' does not name a type SArray.h:54: error: 'foundP' is not a member of '_Map' SArray.h:55: error: 'foundP' is not a member of '_Map' SArray.h:56: error: 'foundP' is not a member of '_Map' SArray.h:57: error: 'foundP' is not a member of '_Map' Is this most likely owing to csh/tcsh located in /usr/bin/ and not in /bin/? Best, Dan From s.bakhshaei at yahoo.com Wed Sep 21 06:52:18 2011 From: s.bakhshaei at yahoo.com (Somayeh Bakhshaei) Date: Wed, 21 Sep 2011 06:52:18 -0700 (PDT) Subject: [SRILM User List] installing problem Message-ID: <1316613138.72558.YahooMailClassic@web111709.mail.gq1.yahoo.com> Hello all, I am trying to install srilm on 64 bit CPU and it give this error: make[2]: Entering directory `/home/bakhshaei/IWSLT2011/srilm/misc/src' make[2]: Warning: File `Dependencies.i686' has modification time 597 s in the future gcc -mtune=pentium3 -Wreturn-type -Wimplicit -Wimplicit-int -D_FILE_OFFSET_BITS=64??? -I. -I../../include?? -c -g -O3 -o ../obj/i686/option.o option.c option.c:1:0: error: CPU you selected does not support x86-64 instruction set make[2]: *** [../obj/i686/option.o] Error 1 make[2]: Leaving directory `/home/bakhshaei/IWSLT2011/srilm/misc/src' make[1]: *** [release-libraries] Error 1 make[1]: Leaving directory `/home/bakhshaei/IWSLT2011/srilm' make: *** [World] Error 2 How it will be solved? How can I download a 64-bit Srilm? ------------------ Best Regards, S.Bakhshaei -------------- next part -------------- An HTML attachment was scrubbed... URL: