From cristinaguerreroflores at gmail.com Tue Oct 1 05:09:51 2013 From: cristinaguerreroflores at gmail.com (Cristina Guerrero) Date: Tue, 1 Oct 2013 14:09:51 +0200 Subject: [SRILM User List] Confusion network combination Message-ID: Hello, I am looking for information to accomplish confusion-network-combination with the SRILM toolkit. I want to use the lattices generated by different speech recognizers over the same speech segment. I haven't found a detailed description of the steps to follow, so here is what I'm doing right now: 1- Extract lattices from the various recognizers (With HTK in my case) 2- Take one of these lattices as a starting point and convert it into a confusion network MESH0 (lattice-tool -read-htk -in-lattice LATTICE0 write-mesh MESH0). *I know -posterior-prune can be applied to the lattice before building the mesh for better results according to the "Finding consensus.." paper. 3- Then, take the next lattice (LATTICE1) and merge it with the previously generated mesh ( lattice-tool -in-lattice LATTICE1 -init-mesh MESH0 -write-mesh MESH1). 4- And repeat the merging step (3) using the previous mesh to initialize the next lattice. I'd really appreciate it if anyone could tell me if the described procedure is the correct one, or provide me more information about it. Thanks a lot in advance, Cristina -------------- next part -------------- An HTML attachment was scrubbed... URL: From ammansik at cis.hut.fi Tue Oct 1 06:49:05 2013 From: ammansik at cis.hut.fi (=?ISO-8859-1?Q?Andr=E9_Mansikkaniemi?=) Date: Tue, 01 Oct 2013 16:49:05 +0300 Subject: [SRILM User List] Lattice-tool and -word-posteriors-for-sentences Message-ID: <524AD2D1.4080600@cis.hut.fi> Hi, Been trying to use the lattice-tool and the '-word-posteriors-for-sentences' option to calculate posterior probabilities for words in an ASR output hypothesis. So I have a lattice and hypothesis file to begin with, and run the following commands to generate the posterior probabilities. lattice-tool -read-htk -in-lattice test.lat -write-mesh test.mesh lattice-tool -read-mesh -in-lattice test.mesh -word-posteriors-for-sentences test.hyp > test.posteriors Is this the correct way to do it, since I always end up getting 0 posterior probabilities for all words in the sentence? I tried to replace the test.hyp with output generated from lattice-tool's own -viterbi-decode but result is still the same. BR, Andr? From stolcke at icsi.berkeley.edu Tue Oct 1 10:05:55 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Tue, 01 Oct 2013 10:05:55 -0700 Subject: [SRILM User List] Confusion network combination In-Reply-To: References: Message-ID: <524B00F3.9080006@icsi.berkeley.edu> On 10/1/2013 5:09 AM, Cristina Guerrero wrote: > Hello, > I am looking for information to accomplish > confusion-network-combination with the SRILM toolkit. I want to use > the lattices generated by different speech recognizers over the same > speech segment. I haven't found a detailed description of the steps to > follow, so here is what I'm doing right now: > 1- Extract lattices from the various recognizers (With HTK in my case) > 2- Take one of these lattices as a starting point and convert it into > a confusion network MESH0 (lattice-tool -read-htk -in-lattice LATTICE0 > write-mesh MESH0). *I know -posterior-prune can be applied to the > lattice before building the mesh for better results according to the > "Finding consensus.." paper. > 3- Then, take the next lattice (LATTICE1) and merge it with the > previously generated mesh ( lattice-tool -in-lattice LATTICE1 > -init-mesh MESH0 -write-mesh MESH1). > 4- And repeat the merging step (3) using the previous mesh to > initialize the next lattice. What you're doing works, but it is a roundabout way to perform confusion network combination and you don't have control over the weighting of the posterior probabilities in each input CN. A more straightforward approach is to dump the various CNs for each utterance, then combine them in one step using the command nbest-lattice -use-mesh -lattice-files FILE where FILE contains a list of CN files and associated weights (see man page for details). Andreas From stolcke at icsi.berkeley.edu Tue Oct 1 10:11:26 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Tue, 01 Oct 2013 10:11:26 -0700 Subject: [SRILM User List] Lattice-tool and -word-posteriors-for-sentences In-Reply-To: <524AD2D1.4080600@cis.hut.fi> References: <524AD2D1.4080600@cis.hut.fi> Message-ID: <524B023E.7000307@icsi.berkeley.edu> On 10/1/2013 6:49 AM, Andr? Mansikkaniemi wrote: > Hi, > > Been trying to use the lattice-tool and the > '-word-posteriors-for-sentences' option to calculate posterior > probabilities for words in an ASR output hypothesis. > > So I have a lattice and hypothesis file to begin with, and run the > following commands to generate the posterior probabilities. > > lattice-tool -read-htk -in-lattice test.lat -write-mesh test.mesh > lattice-tool -read-mesh -in-lattice test.mesh > -word-posteriors-for-sentences test.hyp > test.posteriors Try lattice-tool -read-htk -in-lattice test.lat -word-posteriors-for-sentences test.hyp > test.posteriors The -word-posteriors-for-sentences option triggers CN construction from the input lattice, and then aligns each line in test.hyp to that CN. Andreas From stolcke at icsi.berkeley.edu Tue Oct 1 10:24:41 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Tue, 01 Oct 2013 10:24:41 -0700 Subject: [SRILM User List] Count-lm reference request In-Reply-To: <8D08C80B82D9757-1094-339A7@webmail-d268.sysops.aol.com> References: <8D08C80B82D9757-1094-339A7@webmail-d268.sysops.aol.com> Message-ID: <524B0559.1080200@icsi.berkeley.edu> On 9/30/2013 10:46 PM, E wrote: > Hello, > > I'm trying to understand the meaning of "google.count.lm0" file as > given in FAQ section on creating LM from Web1T corpus. From what I > read in Sec 11.4.1 Deleted Interpolation Smoothing in Spoken Language > Processing, by Huang et al. > (equation 11.22) bigram case > > P(w_i | w_{i-1}) = \lambda * P_{MLE}(w_i | w_{i-1}) + (1 - \lambda) * > P(w_i) > > They call \lambda's as the mixture weights. I wonder if they are > conceptually the same as the ones used in google.countlm. If so why > are they arranged in a 15x5 matrix? Where can I read more about the same? I don't have access to the book chapter you cite, but from the equation it looks like a single fixed interpolation weight is used. In the SRILM count-lm implementation you have separate lambdas assigned to different groups of context ngrams, as a function of the frequency of those contexts. This is what is called "Jelinek-Mercer" smoothing in http://acl.ldc.upenn.edu/P/P96/P96-1041.pdf , where the bucketing of the contexts is done based on frequency (as suggested in the paper). The specifics are spelled out in the ngram(1) man page. The relevant bits are: mixweights M w01 w02 ... w0N w11 w12 ... w1N ... wM1 wM2 ... wMN countmodulus m M specifies the number of mixture weight bins (minus 1). m is the width of a mixture weight bin. Thus, wij is the mixture weight used to interpolate an j-th order maximum-likelihood estimate with lower-order estimates given that the (j-1)-gram context has been seen with a frequency between i*m and (i+1)*m-1 times. (For contexts with frequency greater than M*m, the i=M weights are used.) Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Tue Oct 1 21:20:15 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Tue, 01 Oct 2013 21:20:15 -0700 Subject: [SRILM User List] 1-count Higher order ngrams not excluded by gtmin In-Reply-To: References: Message-ID: <524B9EFF.9050307@icsi.berkeley.edu> On 9/28/2013 12:21 AM, Mohammed Mediani wrote: > Dear Andreas, > I noticed that when I train a 6-gram KN LM, I get some 1-count ngrams > which are no prefixes of any higher order ngrams in the 4 and 3 > models. Are those another exception besides the one stated in Warning4 > (http://www.speech.sri.com/projects/srilm/manpages/ngram-discount.7.html)? > SRILM always includes unigrams for all words in the LM vocabulary. This happens to make up for some limitations of the ARPA format. It does not allow a separate definition of what the LM vocabulary is, so it is implicitly defined by the unigram list. Also, there is no way to specify a backoff to "zero-grams" (uniform distribution), so unigram probabilities for all words (whether observed in the training set or not) are given explicitly. Andreas From otheremailid at aol.com Wed Oct 2 01:16:03 2013 From: otheremailid at aol.com (E) Date: Wed, 2 Oct 2013 04:16:03 -0400 (EDT) Subject: [SRILM User List] Count-lm reference request In-Reply-To: <524B0559.1080200@icsi.berkeley.edu> References: <8D08C80B82D9757-1094-339A7@webmail-d268.sysops.aol.com> <524B0559.1080200@icsi.berkeley.edu> Message-ID: <8D08D5EC52F045F-1094-3E82E@webmail-d268.sysops.aol.com> Thanks for the pointers! Three questions - 1. The same number of bins are used for all n-grams even though number of ngrams for each N may differ. In web1T, Number of unigrams: 13,588,391 Number of fivegrams: 1,176,470,663 Would it make any improvement if fivegrams were binned more number of times than unigrams? 2. For a particular ngram in test data, the algorithm will decide which bin Wij's to use based on how many times that n-gram occurred in training data. Is this right? 3. What does it mean when some weights are zero after tuning them. I used just 10 sentences (5 repeated) in tune.txt and got google.countlm as at the bottom. For ex. w01, w02 are non-zero but w03 is zero. Does this mean that in the development set, there were no trigrams that corresponded to counts in bin 0? order 5 mixweights 15 0.5 0.5 0 0 0 0.5 0.5 0 0 0 0.5 0.5 0 0 0 0.5 0.5 0.5 0.5 0.198641 0.5 0.5 0 0 0 0.5 0.5 0.5 0 0.5 0.5 0.5 0.5 0.5 0 0.5 0.5 0.5 0 0.5 0.5 0.5 0.5 0.5 0 0.5 0.5 0 0 0.5 0.5 0.5 0.054722 0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0 0.5 1 1.97997e-05 0.0844577 0.030065 3.44131e-06 countmodulus 40 vocabsize 13588391 totalcount 4294967295 -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Wed Oct 2 08:55:51 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Wed, 02 Oct 2013 08:55:51 -0700 Subject: [SRILM User List] Count-lm reference request In-Reply-To: <8D08D5EC52F045F-1094-3E82E@webmail-d268.sysops.aol.com> References: <8D08C80B82D9757-1094-339A7@webmail-d268.sysops.aol.com> <524B0559.1080200@icsi.berkeley.edu> <8D08D5EC52F045F-1094-3E82E@webmail-d268.sysops.aol.com> Message-ID: <524C4207.8070209@icsi.berkeley.edu> On 10/2/2013 1:16 AM, E wrote: > Thanks for the pointers! Three questions - > > 1. The same number of bins are used for all n-grams even though number > of ngrams for each N may differ. In web1T, > Number of unigrams: 13,588,391 > Number of fivegrams: 1,176,470,663 > Would it make any improvement if fivegrams were binned more number of > times than unigrams? That's good idea, but I haven't tried it, so I cannot say how much it would help. It might also help to just have more bins for lower-order ngrams since there are more samples of them (more data, hence more parameters can be estimated). > > 2. For a particular ngram in test data, the algorithm will decide > which bin Wij's to use based on how many times that n-gram occurred in > training data. Is this right? Right. > > 3. What does it mean when some weights are zero after tuning them. I > used just 10 sentences (5 repeated) in tune.txt and got > google.countlm as at the bottom. > > For ex. w01, w02 are non-zero but w03 is zero. Does this mean that in > the development set, there were no trigrams that corresponded to > counts in bin 0? Correct. Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From otheremailid at aol.com Wed Oct 9 02:05:55 2013 From: otheremailid at aol.com (E) Date: Wed, 9 Oct 2013 05:05:55 -0400 (EDT) Subject: [SRILM User List] ngram-count hangs and other problems Message-ID: <8D092E5E574EFD0-1114-1F1@webmail-m231.sysops.aol.com> Hello, Please find my files here http://goo.gl/WVMEcw To keep file size small I've only shared unigram counts. When I run the following command- ngram-count -order 1 -vocab wordList -read ngramCounts -lm ug.lm I get below output- warning: no singleton counts GT discounting disabled BOW numerator for context "" is -126.947 < 0 I understand that the "singleton" warning is because there are no ngrams that occur only once. Still the "ug.lm" file is generated. Two issues- If I use the following command suggested elsewhere in the mailing list to fix "BOW numerator .." warning, I get more warnings and the original warning is still present. ngram -lm ug.lm -renorm -write-lm ug_norm.lm If to fix the "singleton" warning, I use WittenBell smoothing (As advised in another thread here), ngram-count hangs indefinitely. ngram-count -order 1 -vocab wordList -read ngramCounts -lm ug.lm -wbdiscount1 How do I debug this issue? -------------- next part -------------- An HTML attachment was scrubbed... URL: From otheremailid at aol.com Wed Oct 9 07:31:43 2013 From: otheremailid at aol.com (E) Date: Wed, 9 Oct 2013 10:31:43 -0400 (EDT) Subject: [SRILM User List] ngram-count hangs and other problems In-Reply-To: <8D092E5E574EFD0-1114-1F1@webmail-m231.sysops.aol.com> References: <8D092E5E574EFD0-1114-1F1@webmail-m231.sysops.aol.com> Message-ID: <8D0931368982F22-1114-246A@webmail-m231.sysops.aol.com> Perhaps the ngramCount file I used crosses some limit on count of a particular ngram. Because some very large count words have positive log probability in the "ug.lm" file. BTW I used bin/i686/ngram-count executable. I used Web1T to obtain these counts. Is there a workaround, like assigning artificial counts (= upperlimit) to the troublesome ngrams? -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Wed Oct 9 09:43:37 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Wed, 09 Oct 2013 09:43:37 -0700 Subject: [SRILM User List] ngram-count hangs and other problems In-Reply-To: <8D0931368982F22-1114-246A@webmail-m231.sysops.aol.com> References: <8D092E5E574EFD0-1114-1F1@webmail-m231.sysops.aol.com> <8D0931368982F22-1114-246A@webmail-m231.sysops.aol.com> Message-ID: <525587B9.8000607@icsi.berkeley.edu> On 10/9/2013 7:31 AM, E wrote: > Perhaps the ngramCount file I used crosses some limit on count of a > particular ngram. Because some very large count words have positive > log probability in the "ug.lm" file. BTW I used bin/i686/ngram-count > executable. > I used Web1T to obtain these counts. Is there a workaround, like > assigning artificial counts (= upperlimit) to the troublesome ngrams? My suspicion is that you're exceeding memory limits with this data. Possibly you are also exceeding the range of 32bit integers with some large unigram counts. 1) Make sure you're building 64-bit executables. If "file bin/i686/ngram-count" says that it's an 32-bit executable, do a "make clean" and rebuilt with "make MACHINE_TYPE=i686-m64 ..." . 2) To find out what the memory demand of your job is, try scaling back the data size (say take 1/100 or 1/10 of it), and monitor the memory usage with "top" or similar utility. Then extrapolate (linearly) to the full data size. 3) If you find your computer doesn't have enough memory try the memory saving techniques discussed at http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.7.html under "Large data and memory issues". Good luck! Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From rimlaatar at yahoo.fr Thu Oct 10 01:45:24 2013 From: rimlaatar at yahoo.fr (Laatar Rim) Date: Thu, 10 Oct 2013 09:45:24 +0100 (BST) Subject: [SRILM User List] installation srilm Message-ID: <1381394724.72897.YahooMailNeo@web133001.mail.ir2.yahoo.com> Hello, I tried to install on my machine SRIL Ubunti 12.04 i686, I followed the following steps: 1. I downloaded the file 2. I decompress 3. I edit the file:Makefile.machine.i686 # Tcl support (standard in Linux) ???? TCL_INCLUDE = /usr/include/tcl8.5 ???? TCL_LIBRARY = -ltcl8.5 and ?? GCC_FLAGS = -mtune=pentium3 -Wall -Wno-unused-variable -Wno-uninitialized ?? CC = /usr/bin/gcc $(GCC_FLAGS) ?? CXX = /usr/bin/gcc/g++ $(GCC_FLAGS) but when I execute the command make World? it shows me the following error: hp at ubuntu:~/SRILM/srilm$ make World make: /home/hp/SRILM/srilm/bin/i686/sbin/machine-type : commande introuvable Makefile:13: /home/hp/SRILM/srilm/bin/i686/common/Makefile.common.variables: Aucun fichier ou dossier de ce type make: *** Pas de r?gle pour fabriquer la cible ? /home/hp/SRILM/srilm/bin/i686/common/Makefile.common.variables ?. Arr?t. hp at ubuntu:~/SRILM/srilm$ plz help me !! ? ---- Cordialement Rim LAATAR? Ing?nieur? Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax(ENIS) ?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles Technologies ? laFSEGS?--Option TALN Site web:Rim LAATAR BEN SAID Tel: (+216) 99 64 74 98? ---- -------------- next part -------------- An HTML attachment was scrubbed... URL: From otheremailid at aol.com Thu Oct 10 05:37:34 2013 From: otheremailid at aol.com (E) Date: Thu, 10 Oct 2013 08:37:34 -0400 (EDT) Subject: [SRILM User List] ngram-count hangs and other problems Message-ID: <8D093CCA0EFA02B-1114-B92D@webmail-m231.sysops.aol.com> Thanks! >1) Make sure you're building 64-bit executables. If "file bin/i686/ngram-count" says that it's an 32-bit >executable, do a "make clean" and rebuilt with "make MACHINE_TYPE=i686-m64 ..." . This worked. I had to use "make OPTION=_l" though. Now there is no problem of ngrams with positive log probability. But when I run below command- bin/i686_l/ngram-count -order 1 -vocab wordList -read ngramCounts -lm ug.lm -wbdiscount1 The memory usage is not much (~ 5mb) but the CPU usage is in high 90's. I tried your suggestion to scale down data. Just used 100 unigrams and the *.lm file was created within minutes. And for the complete data, using -wbdiscount took about 2 hours. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Thu Oct 10 07:40:56 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Thu, 10 Oct 2013 07:40:56 -0700 Subject: [SRILM User List] installation srilm In-Reply-To: <1381394724.72897.YahooMailNeo@web133001.mail.ir2.yahoo.com> References: <1381394724.72897.YahooMailNeo@web133001.mail.ir2.yahoo.com> Message-ID: <5256BC78.2050000@icsi.berkeley.edu> On 10/10/2013 1:45 AM, Laatar Rim wrote: > Hello, > I tried to install on my machine SRIL Ubunti 12.04 i686, I followed > the following steps: > 1. I downloaded the file > 2. I decompress > 3. I edit the file:Makefile.machine.i686 > > > # Tcl support (standard in Linux) > TCL_INCLUDE = /usr/include/tcl8.5 This needs to be -I/usr/include/tcl8.5 . Make there sure is a tcl.h file in that directory. > > TCL_LIBRARY = -ltcl8.5 > > > > and > > GCC_FLAGS = -mtune=pentium3 -Wall -Wno-unused-variable > -Wno-uninitialized > CC = /usr/bin/gcc $(GCC_FLAGS) > CXX = /usr/bin/gcc/g++ $(GCC_FLAGS) > > but when I execute the command make World it shows me the following > error: > > hp at ubuntu:~/SRILM/srilm$ make World > make: /home/hp/SRILM/srilm/bin/i686/sbin/machine-type : commande > introuvable The SRILM variable needs to point to the top of the directory tree, not the bin/i686 directory. The machine-type script lives in $SRILM/sbin . Andreas > Makefile:13: > /home/hp/SRILM/srilm/bin/i686/common/Makefile.common.variables: Aucun > fichier ou dossier de ce type > make: *** Pas de r?gle pour fabriquer la cible ? > /home/hp/SRILM/srilm/bin/i686/common/Makefile.common.variables ?. Arr?t. > hp at ubuntu:~/SRILM/srilm$ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Thu Oct 10 07:51:14 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Thu, 10 Oct 2013 07:51:14 -0700 Subject: [SRILM User List] ngram-count hangs and other problems In-Reply-To: <8D093CCA0EFA02B-1114-B92D@webmail-m231.sysops.aol.com> References: <8D093CCA0EFA02B-1114-B92D@webmail-m231.sysops.aol.com> Message-ID: <5256BEE2.5080504@icsi.berkeley.edu> On 10/10/2013 5:37 AM, E wrote: > > > Thanks! > > >1) Make sure you're building 64-bit executables. If "file > bin/i686/ngram-count" says that it's an 32-bit >executable, do a "make > clean" and rebuilt with "make MACHINE_TYPE=i686-m64 ..." . > > This worked. I had to use "make OPTION=_l" though. Now there is no > problem of ngrams with positive log probability. FYI, OPTION=_l triggers the use of 64-bit integer counts stored in a lookup table so that each instance takes only 32bits (assuming the counts are used sparsely). This is a way to support large counts on 32bit machines, but doesn't really make sense on 64-bit machines. Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From xulikui123321 at 163.com Mon Oct 14 18:22:08 2013 From: xulikui123321 at 163.com (=?GBK?B?0Ow=?=) Date: Tue, 15 Oct 2013 09:22:08 +0800 (CST) Subject: [SRILM User List] ngam stdin/stdout Message-ID: <748c760b.1f52.141b9b4b0e6.Coremail.xulikui123321@163.com> when compute perplexity with ngram, the usage is ngram -lm language.lm -order 4 -ppl test.txt. but now I want compute perplexity with ngram from stdin, what's the command should i use? -------------- next part -------------- An HTML attachment was scrubbed... URL: From venkataraman.anand at gmail.com Mon Oct 14 18:29:52 2013 From: venkataraman.anand at gmail.com (Anand Venkataraman) Date: Mon, 14 Oct 2013 18:29:52 -0700 Subject: [SRILM User List] ngam stdin/stdout In-Reply-To: <748c760b.1f52.141b9b4b0e6.Coremail.xulikui123321@163.com> References: <748c760b.1f52.141b9b4b0e6.Coremail.xulikui123321@163.com> Message-ID: Pipe into the command and use - (hyphen) for the arg to -ppl & -- Sent from my Google Nexus On Oct 14, 2013 6:26 PM, "?" wrote: > when compute perplexity with ngram, the usage is ngram -lm language.lm > -order 4 -ppl test.txt. > but now I want compute perplexity with ngram from stdin, what's the > command should i use? > > > > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From venkataraman.anand at gmail.com Mon Oct 14 21:26:16 2013 From: venkataraman.anand at gmail.com (Anand Venkataraman) Date: Mon, 14 Oct 2013 21:26:16 -0700 Subject: [SRILM User List] ngam stdin/stdout In-Reply-To: <32a5560f.74ec.141ba375c87.Coremail.xulikui123321@163.com> References: <748c760b.1f52.141b9b4b0e6.Coremail.xulikui123321@163.com> <32a5560f.74ec.141ba375c87.Coremail.xulikui123321@163.com> Message-ID: bash$ echo $SENTENCE | ngram ... -ppl - However, if you're planning to do this on a per-sentence basis, it's inefficient. You should ideally compute it on whole files using ngram -debug 1 and post-process the output to extract ppls for individual sentences. That way you can get away with not having to invoke ngram/load the lm multiple times. & On Mon, Oct 14, 2013 at 8:44 PM, ? wrote: > thank you very much, the problem is solved! if I want to computer > perplexity of sentence, not a file, like ngram -lm language.lm -order 4 > -ppl " I MISS YOU" ,what's the command should i use( i want to > compute perplexity with ngram on hadoop)? > > > > > > > At 2013-10-15 09:29:52,"Anand Venkataraman" > wrote: > > Pipe into the command and use - (hyphen) for the arg to -ppl > > & > -- > Sent from my Google Nexus > On Oct 14, 2013 6:26 PM, "?" wrote: > >> when compute perplexity with ngram, the usage is ngram -lm language.lm >> -order 4 -ppl test.txt. >> but now I want compute perplexity with ngram from stdin, what's the >> command should i use? >> >> >> >> _______________________________________________ >> SRILM-User site list >> SRILM-User at speech.sri.com >> http://www.speech.sri.com/mailman/listinfo/srilm-user >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Mon Oct 14 21:42:44 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Mon, 14 Oct 2013 21:42:44 -0700 Subject: [SRILM User List] ngam stdin/stdout In-Reply-To: References: <748c760b.1f52.141b9b4b0e6.Coremail.xulikui123321@163.com> <32a5560f.74ec.141ba375c87.Coremail.xulikui123321@163.com> Message-ID: <525CC7C4.4070809@icsi.berkeley.edu> On 10/14/2013 9:26 PM, Anand Venkataraman wrote: > bash$ echo $SENTENCE | ngram ... -ppl - > > However, if you're planning to do this on a per-sentence basis, it's > inefficient. > > You should ideally compute it on whole files using ngram -debug 1 and > post-process the output to extract ppls for individual sentences. That > way you can get away with not having to invoke ngram/load the lm > multiple times. FYI, the ngram -escape option was created to embed useful metainformation in the input that is passed through to the output. This allows you post-process the output and associate the ppl information with subdivisions of the input stream, if needed. Andreas > > & > > > On Mon, Oct 14, 2013 at 8:44 PM, ? > wrote: > > thank you very much, the problem is solved! if I want to computer > perplexity of sentence, not a file, like ngram -lm language.lm > -order 4 -ppl " I MISS YOU" ,what's the command should i use( i > want to compute perplexity with ngram on hadoop)? > > > > > > > At 2013-10-15 > 09:29:52,"Anand Venkataraman" > wrote: > > Pipe into the command and use - (hyphen) for the arg to -ppl > > & > -- > Sent from my Google Nexus > > On Oct 14, 2013 6:26 PM, "?" > wrote: > > when compute perplexity with ngram, the usage is ngram > -lm language.lm -order 4 -ppl test.txt. > but now I want compute perplexity with ngram from stdin, > what's the command should i use? > > > > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user > > > > > > > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From xulikui123321 at 163.com Tue Oct 15 03:59:39 2013 From: xulikui123321 at 163.com (=?GBK?B?0Ow=?=) Date: Tue, 15 Oct 2013 18:59:39 +0800 (CST) Subject: [SRILM User List] use ngram computer perplexity of per line string on hadoop Message-ID: <5a51f5b2.114d9.141bbc56c35.Coremail.xulikui123321@163.com> I have solved the problem of computer perplexity of per line string by use ngram, as follows(perl script): foreach my $line(@lines){ $str = `echo $line | ngram -lm news.lm -ppl - -debug 1`; print $str ."\n"; } but if the language model too big, I should load the language every line. that's a waste of time, is there some method that I only should load the language model once? like replace the lm file (news.lm) with a file pointer? -------------- next part -------------- An HTML attachment was scrubbed... URL: From christophe.servan at gmail.com Tue Oct 15 05:56:48 2013 From: christophe.servan at gmail.com (Christophe Servan) Date: Tue, 15 Oct 2013 14:56:48 +0200 Subject: [SRILM User List] use ngram computer perplexity of per line string on hadoop In-Reply-To: <5a51f5b2.114d9.141bbc56c35.Coremail.xulikui123321@163.com> References: <5a51f5b2.114d9.141bbc56c35.Coremail.xulikui123321@163.com> Message-ID: Hi, long ago I used the ngram program as server. It is related to the switch -server-port. This may be a solution. Best, Christophe 2013/10/15 ? > I have solved the problem of computer perplexity of per line string by use > ngram, as follows(perl script): > foreach my $line(@lines){ > $str = `echo $line | ngram -lm news.lm -ppl - -debug 1`; > print $str ."\n"; > } > but if the language model too big, I should load the language every line. > that's a waste of time, is there some method that I only should load the > language model once? like replace the lm file (news.lm) with a file > pointer? > > > > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cristinaguerreroflores at gmail.com Wed Oct 16 06:35:57 2013 From: cristinaguerreroflores at gmail.com (Cristina Guerrero) Date: Wed, 16 Oct 2013 15:35:57 +0200 Subject: [SRILM User List] Hypothesis from a mesh Message-ID: I'm working with confusion networks/sausages. Observing the posteriors in each 'align' it seems that the command: lattice-tool -read-mesh -in-lattice MY.MESH -viterbi-decode extracts the hypothesis with the lowest WER (it is with the highest posteriors per align). Is this correct? From my observation, using "-posterior-decode" doesn't extract the best-hypothesis out of a mesh. Cristina -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Wed Oct 16 09:28:22 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Wed, 16 Oct 2013 09:28:22 -0700 Subject: [SRILM User List] Hypothesis from a mesh In-Reply-To: References: Message-ID: <525EBEA6.8030906@icsi.berkeley.edu> On 10/16/2013 6:35 AM, Cristina Guerrero wrote: > I'm working with confusion networks/sausages. > Observing the posteriors in each 'align' it seems that the command: > lattice-tool -read-mesh -in-lattice MY.MESH -viterbi-decode > extracts the hypothesis with the lowest WER (it is with the highest > posteriors per align). Is this correct? From my observation, using > "-posterior-decode" doesn't extract the best-hypothesis out of a mesh. Posterior decoding will extract the words with the highest posterior probability estimates for each alignment position. Of course these are ESTIMATES, and even if you had accurate posterior probabilities the actual words spoken could be different. That's the nature of a probabilistic classifier! Andreas From prashant.mathur at xrce.xerox.com Fri Oct 18 08:38:50 2013 From: prashant.mathur at xrce.xerox.com (MATHUR, Prashant) Date: Fri, 18 Oct 2013 15:38:50 +0000 Subject: [SRILM User List] linear interpolation of LM Message-ID: <6507F4CC05459348A4F0D2F41256C11E2B7DBB3F@engins.xrce.xeroxlabs.com> Hi, I wanted to know how do I do linear interpolation of several models given their weights? Also, can I interpolated more than 10 models at once? I tried several hit/trials so far. Nothing seems to work for me. $ngram -lm small.lm.1 -lambda 0.7 -mix-lm big.lm.1 -unk It doesn't throw any output or error. when I try -write option $ngram -lm small.lm.1 -lambda 0.7 -mix-lm big.lm.1 -unk -write-lm mixed.lm.1 write() method not implemented error writing mixed.lm.1 but when I try $ngram -lm small.lm.1 -lambda 0.7 -mix-lm2 big.lm.1 -mix-lambda2 0.3 -unk -write-lm mixed.lm.1 Then the mixed.lm.1 file is the same as small.lm.1 My SRILM version is 1.5.3. I read that there are many ways of interpolation such as count-based and log-linear interpolation. I tried the options -count-lm (throws a format error), -loglinear-mix (didn't do anything) I am out of options. Please help! Thanks, -- Prashant -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Fri Oct 18 16:40:24 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Fri, 18 Oct 2013 16:40:24 -0700 Subject: [SRILM User List] linear interpolation of LM In-Reply-To: <6507F4CC05459348A4F0D2F41256C11E2B7DBB3F@engins.xrce.xeroxlabs.com> References: <6507F4CC05459348A4F0D2F41256C11E2B7DBB3F@engins.xrce.xeroxlabs.com> Message-ID: <5261C6E8.7090700@icsi.berkeley.edu> On 10/18/2013 8:38 AM, MATHUR, Prashant wrote: > Hi, > > I wanted to know how do I do linear interpolation of several models > given their weights? > Also, can I interpolated more than 10 models at once? > > I tried several hit/trials so far. Nothing seems to work for me. > > $ngram -lm small.lm.1 -lambda 0.7 -mix-lm big.lm.1 -unk > It doesn't throw any output or error. > > when I try -write option > $ngram -lm small.lm.1 -lambda 0.7 -mix-lm big.lm.1 -unk -write-lm > mixed.lm.1 > write() method not implemented > error writing mixed.lm.1 The above command should work, unless you are not giving the complete command line in your example (if you add the -bayes option then you will see the "not implemented" error). > > but when I try > $ngram -lm small.lm.1 -lambda 0.7 -mix-lm2 big.lm.1 -mix-lambda2 0.3 > -unk -write-lm mixed.lm.1 > Then the mixed.lm.1 file is the same as small.lm.1 > > My SRILM version is 1.5.3. > I read that there are many ways of interpolation such as count-based > and log-linear interpolation. > I tried the options -count-lm (throws a format error), -loglinear-mix > (didn't do anything) > > I am out of options. Please help! You are using a very old version of SRILM. Please get the latest stable version (1.7.0). If you want to try the current beta version (1.7.1) you will find a new option (ngram -read-mix-lms) that allows you to specify the mixture component LMs in a separate file, and also allows an arbitrary number of components. Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From rimlaatar at yahoo.fr Fri Oct 25 06:17:59 2013 From: rimlaatar at yahoo.fr (Laatar Rim) Date: Fri, 25 Oct 2013 14:17:59 +0100 (BST) Subject: [SRILM User List] installation srilm Message-ID: <1382707079.14682.YahooMailNeo@web133002.mail.ir2.yahoo.com> when i run: ??? make World > make.output 2>&1 the result : ?make: /sbin/machine-type : commande introuvable Makefile:13: /common/Makefile.common.variables: Aucun fichier ou dossier de ce type make: *** Pas de r?gle pour fabriquer la cible ? /common/Makefile.common.variables ?. Arr?t. can you help me plz !!! ---- Cordialement Rim LAATAR? Ing?nieur? Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax(ENIS) ?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles Technologies ? laFSEGS?--Option TALN Site web:Rim LAATAR BEN SAID Tel: (+216) 99 64 74 98? ---- -------------- next part -------------- An HTML attachment was scrubbed... URL: From christophe.servan at gmail.com Fri Oct 25 06:24:52 2013 From: christophe.servan at gmail.com (Christophe Servan) Date: Fri, 25 Oct 2013 15:24:52 +0200 Subject: [SRILM User List] installation srilm In-Reply-To: <1382707079.14682.YahooMailNeo@web133002.mail.ir2.yahoo.com> References: <1382707079.14682.YahooMailNeo@web133002.mail.ir2.yahoo.com> Message-ID: Hi, you have to set the SRILM environment variable before launching the compilation process. Best, Christophe Le 25 octobre 2013 15:17, Laatar Rim a ?crit : > when i run: > > make World > make.output 2>&1 > > the result : > make: /sbin/machine-type : commande introuvable > Makefile:13: /common/Makefile.common.variables: Aucun fichier ou dossier > de ce type > make: *** Pas de r?gle pour fabriquer la cible ? > /common/Makefile.common.variables ?. Arr?t. > > can you help me plz !!! > ---- > Cordialement > > *Rim LAATAR * > Ing?nieur Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax (ENIS > ) > ?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles > Technologies ? la FSEGS --Option TALN > Site web: Rim LAATAR BEN SAID > Tel: (+216) 99 64 74 98 > ---- > > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christophe.servan at gmail.com Fri Oct 25 06:31:25 2013 From: christophe.servan at gmail.com (Christophe Servan) Date: Fri, 25 Oct 2013 15:31:25 +0200 Subject: [SRILM User List] installation srilm In-Reply-To: <1382707742.7358.YahooMailNeo@web133006.mail.ir2.yahoo.com> References: <1382707079.14682.YahooMailNeo@web133002.mail.ir2.yahoo.com> <1382707742.7358.YahooMailNeo@web133006.mail.ir2.yahoo.com> Message-ID: You hate to set your SRILM variable like this : export SRILM=/usr/local/srilm-1.4.5 you don't have to add it to your path. Best, Christophe 2013/10/25 Laatar Rim > yes I executed these two commands: > export SRILM=/usr/local/srilm-1.4.5/bin/i686/ > export PATH=$PATH:$SRILM > > > ---- > Cordialement > > *Rim LAATAR * > Ing?nieur Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax (ENIS > ) > ?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles > Technologies ? la FSEGS --Option TALN > Site web: Rim LAATAR BEN SAID > Tel: (+216) 99 64 74 98 > ---- > > > Le Vendredi 25 octobre 2013 13h24, Christophe Servan < > christophe.servan at gmail.com> a ?crit : > Hi, > you have to set the SRILM environment variable before launching the > compilation process. > > Best, > > Christophe > > > Le 25 octobre 2013 15:17, Laatar Rim a ?crit : > > when i run: > > make World > make.output 2>&1 > > the result : > make: /sbin/machine-type : commande introuvable > Makefile:13: /common/Makefile.common.variables: Aucun fichier ou dossier > de ce type > make: *** Pas de r?gle pour fabriquer la cible ? > /common/Makefile.common.variables ?. Arr?t. > > can you help me plz !!! > ---- > Cordialement > > *Rim LAATAR * > Ing?nieur Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax (ENIS > ) > ?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles > Technologies ? la FSEGS --Option TALN > Site web: Rim LAATAR BEN SAID > Tel: (+216) 99 64 74 98 > ---- > > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rimlaatar at yahoo.fr Fri Oct 25 07:10:50 2013 From: rimlaatar at yahoo.fr (Laatar Rim) Date: Fri, 25 Oct 2013 15:10:50 +0100 (BST) Subject: [SRILM User List] role of make World Message-ID: <1382710250.46434.YahooMailNeo@web133006.mail.ir2.yahoo.com> hi, what is the role of make world and how can i know tha srilm is perfectly installed? in my machine? thanks ? ---- Cordialement Rim LAATAR? Ing?nieur? Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax(ENIS) ?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles Technologies ? laFSEGS?--Option TALN Site web:Rim LAATAR BEN SAID Tel: (+216) 99 64 74 98? ---- -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Fri Oct 25 09:22:35 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Fri, 25 Oct 2013 09:22:35 -0700 Subject: [SRILM User List] role of make World In-Reply-To: <1382710250.46434.YahooMailNeo@web133006.mail.ir2.yahoo.com> References: <1382710250.46434.YahooMailNeo@web133006.mail.ir2.yahoo.com> Message-ID: <526A9ACB.5080105@icsi.berkeley.edu> On 10/25/2013 7:10 AM, Laatar Rim wrote: > hi, > what is the role of make world and how can i know tha srilm is > perfectly installed in my machine? > thanks make World builds the SRILM binary libraries and executables and some scripts from source files, and installs them in $SRILM/lib and $SRILM/bin . To verify that it works first see if $SRILM/bin/$MACHINE_TYPE is populated with executable files. ($MACHINE_TYPE is a string identifying your platform, like i686 for Intel-based Linux). make test will run a suite of tests of the SRILM tools and tell you if any unexpected results are found. Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From vkepuska at fit.edu Mon Oct 28 09:53:26 2013 From: vkepuska at fit.edu (Veton Kepuska) Date: Mon, 28 Oct 2013 16:53:26 +0000 Subject: [SRILM User List] My makefile fails in cygwin? Message-ID: <1D4DDF8036F9CD4ABBCF12DECB691B126E320961@EX10-BE1.fit.edu> Here is the error message: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> make release-libraries make[1]: Entering directory '/cygdrive/u/public_html/ece5527/srilm' for subdir in misc dstruct lm flm lattice utils; do \ (cd $subdir/src; make SRILM=/cygdrive/u/public_html/ece5527/srilm MACHIN E_TYPE=cygwin OPTION= MAKE_PIC= release-libraries) || exit 1; \ done make[2]: Entering directory '/cygdrive/u/public_html/ece5527/srilm/misc/src' g++ -Wall -Wno-unused-variable -Wno-uninitialized -DINSTANTIATE_TEMPLATES -I. -I../../include -c -g -O2 -o ../obj/cygwin/File.o File.cc In file included from File.cc:27:0: srilm_iconv.h:15:25: fatal error: iconv.h: No such file or directory # include_next ^ compilation terminated. /cygdrive/u/public_html/ece5527/srilm/common/Makefile.common.targets:93: recipe for target '../obj/cygwin/File.o' failed make[2]: *** [../obj/cygwin/File.o] Error 1 make[2]: Leaving directory '/cygdrive/u/public_html/ece5527/srilm/misc/src' Makefile:105: recipe for target 'release-libraries' failed make[1]: *** [release-libraries] Error 1 make[1]: Leaving directory '/cygdrive/u/public_html/ece5527/srilm' Makefile:54: recipe for target 'World' failed make: *** [World] Error 2 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Can someone help? Thanks --Veton [Google Groups] SmartPhoneE Visit this group The learning and knowledge that we have, is, at the most, but little compared with that of which we are ignorant. - Plato "Those that would give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." - Benjamin Franklin, A Historical Review of Pennsylvania, 1759 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Dr. Veton K?puska, Associate Professor ECE Department Florida Institute of Technology Olin Engineering Building 150 West University Blvd. Melbourne, FL 32901-6975 Tel. (321) 674-7183 Mob. (321) 759-3157 E-mail: vkepuska at fit.edu ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The information transmitted (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is intended only for the person(s) or entity/entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 2676 bytes Desc: image001.gif URL: From stolcke at icsi.berkeley.edu Mon Oct 28 11:57:06 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Tue, 29 Oct 2013 02:57:06 +0800 Subject: [SRILM User List] My makefile fails in cygwin? In-Reply-To: <1D4DDF8036F9CD4ABBCF12DECB691B126E320961@EX10-BE1.fit.edu> References: <1D4DDF8036F9CD4ABBCF12DECB691B126E320961@EX10-BE1.fit.edu> Message-ID: <526EB382.90203@icsi.berkeley.edu> Your cygwin installation is missing the iconv package, it seems. Fire up the cygwin setup.exe , and when you get to the screen where you can modify what's to be installed, search for "iconv" and add it. Andreas On 10/29/2013 12:53 AM, Veton Kepuska wrote: > > Here is the error message: > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > make release-libraries > > make[1]: Entering directory '/cygdrive/u/public_html/ece5527/srilm' > > for subdir in misc dstruct lm flm lattice utils; do \ > > (cd $subdir/src; make > SRILM=/cygdrive/u/public_html/ece5527/srilm MACHIN > > E_TYPE=cygwin OPTION= MAKE_PIC= release-libraries) || exit 1; \ > > done > > make[2]: Entering directory > '/cygdrive/u/public_html/ece5527/srilm/misc/src' > > g++ -Wall -Wno-unused-variable -Wno-uninitialized > -DINSTANTIATE_TEMPLATES -I. > > -I../../include -c -g -O2 -o ../obj/cygwin/File.o File.cc > > In file included from File.cc:27:0: > > srilm_iconv.h:15:25: fatal error: iconv.h: No such file or directory > > # include_next > > ^ > > compilation terminated. > > /cygdrive/u/public_html/ece5527/srilm/common/Makefile.common.targets:93: > recipe > > for target '../obj/cygwin/File.o' failed > > make[2]: *** [../obj/cygwin/File.o] Error 1 > > make[2]: Leaving directory > '/cygdrive/u/public_html/ece5527/srilm/misc/src' > > Makefile:105: recipe for target 'release-libraries' failed > > make[1]: *** [release-libraries] Error 1 > > make[1]: Leaving directory '/cygdrive/u/public_html/ece5527/srilm' > > Makefile:54: recipe for target 'World' failed > > make: *** [World] Error 2 > > <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< > > Can someone help? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vkepuska at fit.edu Mon Oct 28 13:53:47 2013 From: vkepuska at fit.edu (Veton Kepuska) Date: Mon, 28 Oct 2013 20:53:47 +0000 Subject: [SRILM User List] My makefile fails in cygwin? In-Reply-To: <526EB382.90203@icsi.berkeley.edu> References: <1D4DDF8036F9CD4ABBCF12DECB691B126E320961@EX10-BE1.fit.edu> <526EB382.90203@icsi.berkeley.edu> Message-ID: <1D4DDF8036F9CD4ABBCF12DECB691B126E321B01@EX10-BE1.fit.edu> Andreas, Thank you very much for your information. I did that but I am getting this error message which hinders my installation even thought I did include (serveral times) the stddef package in Cygwin. >>>>>>>>>>>>>>>>>>>>> make[1]: Entering directory '/cygdrive/u/public_html/ece5527/srilm' for subdir in misc dstruct lm flm lattice utils; do \ (cd $subdir/src; make SRILM=/cygdrive/u/public_html/ece5527/srilm MACHIN E_TYPE=cygwin OPTION= MAKE_PIC= release-libraries) || exit 1; \ done make[2]: Entering directory '/cygdrive/u/public_html/ece5527/srilm/misc/src' g++ -Wall -Wno-unused-variable -Wno-uninitialized -DINSTANTIATE_TEMPLATES -I. -I../../include -c -g -O2 -o ../obj/cygwin/File.o File.cc In file included from /usr/include/sys/reent.h:14:0, from /usr/include/string.h:11, from File.cc:12: /usr/include/sys/_types.h:72:20: fatal error: stddef.h: No such file or director y #include ^ compilation terminated. /cygdrive/u/public_html/ece5527/srilm/common/Makefile.common.targets:93: recipe for target '../obj/cygwin/File.o' failed make[2]: *** [../obj/cygwin/File.o] Error 1 make[2]: Leaving directory '/cygdrive/u/public_html/ece5527/srilm/misc/src' Makefile:106: recipe for target 'release-libraries' failed make[1]: *** [release-libraries] Error 1 make[1]: Leaving directory '/cygdrive/u/public_html/ece5527/srilm' Makefile:55: recipe for target 'World' failed make: *** [World] Error 2 <<<<<<<<<<<<<<<<<<<< Thanks --Veton From: Andreas Stolcke [mailto:stolcke at icsi.berkeley.edu] Sent: Monday, October 28, 2013 2:57 PM To: Veton Kepuska; 'srilm-user at speech.sri.com' Subject: Re: [SRILM User List] My makefile fails in cygwin? Your cygwin installation is missing the iconv package, it seems. Fire up the cygwin setup.exe , and when you get to the screen where you can modify what's to be installed, search for "iconv" and add it. Andreas On 10/29/2013 12:53 AM, Veton Kepuska wrote: Here is the error message: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> make release-libraries make[1]: Entering directory '/cygdrive/u/public_html/ece5527/srilm' for subdir in misc dstruct lm flm lattice utils; do \ (cd $subdir/src; make SRILM=/cygdrive/u/public_html/ece5527/srilm MACHIN E_TYPE=cygwin OPTION= MAKE_PIC= release-libraries) || exit 1; \ done make[2]: Entering directory '/cygdrive/u/public_html/ece5527/srilm/misc/src' g++ -Wall -Wno-unused-variable -Wno-uninitialized -DINSTANTIATE_TEMPLATES -I. -I../../include -c -g -O2 -o ../obj/cygwin/File.o File.cc In file included from File.cc:27:0: srilm_iconv.h:15:25: fatal error: iconv.h: No such file or directory # include_next ^ compilation terminated. /cygdrive/u/public_html/ece5527/srilm/common/Makefile.common.targets:93: recipe for target '../obj/cygwin/File.o' failed make[2]: *** [../obj/cygwin/File.o] Error 1 make[2]: Leaving directory '/cygdrive/u/public_html/ece5527/srilm/misc/src' Makefile:105: recipe for target 'release-libraries' failed make[1]: *** [release-libraries] Error 1 make[1]: Leaving directory '/cygdrive/u/public_html/ece5527/srilm' Makefile:54: recipe for target 'World' failed make: *** [World] Error 2 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Can someone help? -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Mon Oct 28 14:29:50 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Tue, 29 Oct 2013 05:29:50 +0800 Subject: [SRILM User List] My makefile fails in cygwin? In-Reply-To: <1D4DDF8036F9CD4ABBCF12DECB691B126E321B01@EX10-BE1.fit.edu> References: <1D4DDF8036F9CD4ABBCF12DECB691B126E320961@EX10-BE1.fit.edu> <526EB382.90203@icsi.berkeley.edu> <1D4DDF8036F9CD4ABBCF12DECB691B126E321B01@EX10-BE1.fit.edu> Message-ID: <526ED74E.2000702@icsi.berkeley.edu> On 10/29/2013 4:53 AM, Veton Kepuska wrote: > > Andreas, > > Thank you very much for your information. I did that but I am getting > this error message which hinders my installation even thought I did > include (serveral times) the stddef package in Cygwin. > > >>>>>>>>>>>>>>>>>>>>> > > make[1]: Entering directory '/cygdrive/u/public_html/ece5527/srilm' > > for subdir in misc dstruct lm flm lattice utils; do \ > > (cd $subdir/src; make SRILM=/cygdrive/u/public_html/ece5527/srilm MACHIN > > E_TYPE=cygwin OPTION= MAKE_PIC= release-libraries) || exit 1; \ > > done > > make[2]: Entering directory > '/cygdrive/u/public_html/ece5527/srilm/misc/src' > > g++ -Wall -Wno-unused-variable -Wno-uninitialized > -DINSTANTIATE_TEMPLATES -I. > > -I../../include -c -g -O2 -o ../obj/cygwin/File.o File.cc > > In file included from /usr/include/sys/reent.h:14:0, > > from /usr/include/string.h:11, > > from File.cc:12: > > /usr/include/sys/_types.h:72:20: fatal error: stddef.h: No such file > or director > > y > > #include > Very odd. I also have the #include in sys/_types.h but no such file exists on my system. But I don't get this error, so the conditionals in this header don't fire on my system. I'm using gcc 4.7.3. I don't really understand how these system header files are supposed to interact. You could try creating a dummy stddef.h in /usr/include . Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From rimlaatar at yahoo.fr Tue Oct 29 03:13:41 2013 From: rimlaatar at yahoo.fr (Laatar Rim) Date: Tue, 29 Oct 2013 10:13:41 +0000 (GMT) Subject: [SRILM User List] problem with /sbin/machine-type Message-ID: <1383041621.54905.YahooMailNeo@web133005.mail.ir2.yahoo.com> to install SRILM I followed the? steps: 1 - downolad package 2 - srilm / common I CAHNGE: ?# Tcl support (standard in Linux) ???? TCL_INCLUDE = -I/usr/include/tcl8.5? ???? TCL_LIBRARY = -ltcl8.5 ?# Use the GNU C compiler. ? GCC_FLAGS = -march=i686 -Wreturn-type -Wimplicit CC = /usr/bin/gcc $(GCC_FLAGS) CXX = /usr/bin/g++ -Wno-deprecated $(GCC_FLAGS) -DINSTANTIATE_TEMPLATES 3 - run /home/hp/SRILM/srilm/sbin/machine-type set environment variables: export SRILM=/home/hp/SRILM/srilm export PATH=$PATH:$SRILM but I still have these errors: hp at ubuntu:~/SRILM/srilm$ make test make: /sbin/machine-type : commande introuvable Makefile:13: /common/Makefile.common.variables: Aucun fichier ou dossier de ce type make: *** Pas de r?gle pour fabriquer la cible ? /common/Makefile.common.variables ?. Arr?t. plz help me !!! ? ---- Cordialement Rim LAATAR? Ing?nieur? Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax(ENIS) ?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles Technologies ? laFSEGS?--Option TALN Site web:Rim LAATAR BEN SAID Tel: (+216) 99 64 74 98? ---- -------------- next part -------------- An HTML attachment was scrubbed... URL: From fsegs.fatmamallek at gmail.com Tue Oct 29 12:31:05 2013 From: fsegs.fatmamallek at gmail.com (fatma mallek) Date: Tue, 29 Oct 2013 20:31:05 +0100 Subject: [SRILM User List] problem with n-gram command Message-ID: Hi , i'm using SRILM with Cygwin and i can't generate the n-gram count. *$ ngram-count -text corpus1.txt -order 3 -write corpus2.txt* * -bash: ngram-count : commande introuvable* * * Can someone help me please? Best regards, Fatma -- * ----------------------------------------------------------------------------------------------------------------------- * Fatma MALLEK https://sites.google.com/site/fatmamallek87/home * * Email: fsegs.fatmamallek at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From okuru13 at ku.edu.tr Tue Oct 29 12:35:22 2013 From: okuru13 at ku.edu.tr (Onur Kuru) Date: Tue, 29 Oct 2013 21:35:22 +0200 Subject: [SRILM User List] problem with n-gram command In-Reply-To: References: Message-ID: <090CCEC4-4213-41F8-9847-A2046D97D85C@my.ku.edu.tr> I think it should have been: ngram-count -text corpus1.txt -write-order 3 On Oct 29, 2013, at 9:31 PM, fatma mallek wrote: > Hi , > > i'm using SRILM with Cygwin and i can't generate the n-gram count. > > $ ngram-count -text corpus1.txt -order 3 -write corpus2.txt > -bash: ngram-count : commande introuvable > > Can someone help me please? > > Best regards, > Fatma > -- > ----------------------------------------------------------------------------------------------------------------------- > Fatma MALLEK > https://sites.google.com/site/fatmamallek87/home > > Email: fsegs.fatmamallek at gmail.com > > > _______________________________________________ > SRILM-User site list > SRILM-User at speech.sri.com > http://www.speech.sri.com/mailman/listinfo/srilm-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From venkataraman.anand at gmail.com Tue Oct 29 12:55:18 2013 From: venkataraman.anand at gmail.com (Anand Venkataraman) Date: Tue, 29 Oct 2013 12:55:18 -0700 Subject: [SRILM User List] problem with n-gram command In-Reply-To: References: Message-ID: On Tue, Oct 29, 2013 at 12:31 PM, fatma mallek wrote: > commande introuvable This is a bash message that it couldn't find the ngram-count. Please check and fix your $PATH environment variable, or invoke ngram-count with its full path. & -------------- next part -------------- An HTML attachment was scrubbed... URL: From fsegs.fatmamallek at gmail.com Tue Oct 29 13:13:53 2013 From: fsegs.fatmamallek at gmail.com (fatma mallek) Date: Tue, 29 Oct 2013 21:13:53 +0100 Subject: [SRILM User List] problem with n-gram command In-Reply-To: References: Message-ID: http://www.cs.brandeis.edu/~cs114/CS114_docs/SRILM_Tutorial_20080512.pdf this is the link of the document. 2013/10/29 fatma mallek > thanks Anand, > > i actually edited in this file "?c:\cygwin\home\yourname\.bashrc? > *export SRILM=/srilm* > *export MACHINE_TYPE=cygwin* > *export PATH=$PATH:$pwd:$SRILM/bin/cygwin* > *export MANPATH=$MANPATH:$SRILM/man* > > exactly like the doc joined explain > I maked the same things step by step! but i don't know where is the > problem! > > > > 2013/10/29 Anand Venkataraman > >> >> On Tue, Oct 29, 2013 at 12:31 PM, fatma mallek < >> fsegs.fatmamallek at gmail.com> wrote: >> >>> commande introuvable >> >> >> This is a bash message that it couldn't find the ngram-count. Please >> check and fix your $PATH environment variable, or invoke ngram-count with >> its full path. >> >> & >> > > > > -- > * > ----------------------------------------------------------------------------------------------------------------------- > * > Fatma MALLEK > https://sites.google.com/site/fatmamallek87/home > * * > Email: fsegs.fatmamallek at gmail.com > > > -- * ----------------------------------------------------------------------------------------------------------------------- * Fatma MALLEK https://sites.google.com/site/fatmamallek87/home * * Email: fsegs.fatmamallek at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Tue Oct 29 16:01:28 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Wed, 30 Oct 2013 07:01:28 +0800 Subject: [SRILM User List] problem with /sbin/machine-type In-Reply-To: <1383041621.54905.YahooMailNeo@web133005.mail.ir2.yahoo.com> References: <1383041621.54905.YahooMailNeo@web133005.mail.ir2.yahoo.com> Message-ID: <52703E48.5000803@icsi.berkeley.edu> On 10/29/2013 6:13 PM, Laatar Rim wrote: > > to install SRILM I followed the steps: > 1 - downolad package > 2 - srilm / common I CAHNGE: > # Tcl support (standard in Linux) > TCL_INCLUDE = -I/usr/include/tcl8.5 > > TCL_LIBRARY = -ltcl8.5 > > # Use the GNU C compiler. > GCC_FLAGS = -march=i686 -Wreturn-type -Wimplicit > CC = /usr/bin/gcc $(GCC_FLAGS) > CXX = /usr/bin/g++ -Wno-deprecated $(GCC_FLAGS) -DINSTANTIATE_TEMPLATES > > 3 - run /home/hp/SRILM/srilm/sbin/machine-type > set environment variables: > export SRILM=/home/hp/SRILM/srilm > export PATH=$PATH:$SRILM > but I still have these errors: hp at ubuntu:~/SRILM/srilm$ make test > make: /sbin/machine-type : commande introuvable > Makefile:13: /common/Makefile.common.variables: Aucun fichier ou > dossier de ce type > make: *** Pas de r?gle pour fabriquer la cible ? > /common/Makefile.common.variables ?. Arr?t. Try running make SRILM=/home/hp/SRILM/srilm World If that goes well (no error message) make SRILM=/home/hp/SRILM/srilm test Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From rimlaatar at yahoo.fr Wed Oct 30 06:14:59 2013 From: rimlaatar at yahoo.fr (Laatar Rim) Date: Wed, 30 Oct 2013 13:14:59 +0000 (GMT) Subject: [SRILM User List] problem with installig srilm Message-ID: <1383138899.70167.YahooMailNeo@web133001.mail.ir2.yahoo.com> hi, i run? make SRILM=/home/hp/SRILM/srilm? World the result is : cd ..; /home/hp/SRILM/srilm/sbin/make-standard-directories make ../obj/i686/STAMP ../bin/i686/STAMP make[3]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ? make[3]: ? ../obj/i686/STAMP ? est ? jour. make[3]: ? ../bin/i686/STAMP ? est ? jour. make[3]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ? make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/utils/src ? cd ..; /home/hp/SRILM/srilm/sbin/make-standard-directories make ../obj/i686/STAMP ../bin/i686/STAMP make[3]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/utils/src ? make[3]: ? ../obj/i686/STAMP ? est ? jour. make[3]: ? ../bin/i686/STAMP ? est ? jour. make[3]: quittant le r?pertoire ? /home/hp/SRILM/srilm/utils/src ? make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/utils/src ? make[1]: quittant le r?pertoire ? /home/hp/SRILM/srilm ? make release-headers make[1]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm ? for subdir in misc dstruct lm flm lattice utils; do \ ??? ??? (cd $subdir/src; make SRILM=/home/hp/SRILM/srilm MACHINE_TYPE=i686 OPTION= MAKE_PIC= release-headers) || exit 1; \ ??? done make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/misc/src ? make[2]: Rien ? faire pour ? release-headers ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/misc/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/dstruct/src ? make[2]: Rien ? faire pour ? release-headers ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/dstruct/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lm/src ? make[2]: Rien ? faire pour ? release-headers ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lm/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/flm/src ? make[2]: Rien ? faire pour ? release-headers ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/flm/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ? make[2]: Rien ? faire pour ? release-headers ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/utils/src ? make[2]: Rien ? faire pour ? release-headers ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/utils/src ? make[1]: quittant le r?pertoire ? /home/hp/SRILM/srilm ? make depend make[1]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm ? for subdir in misc dstruct lm flm lattice utils; do \ ??? ??? (cd $subdir/src; make SRILM=/home/hp/SRILM/srilm MACHINE_TYPE=i686 OPTION= MAKE_PIC= depend) || exit 1; \ ??? done make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/misc/src ? rm -f Dependencies.i686 /usr/bin/gcc -march=i686 -Wreturn-type -Wimplicit -D_FILE_OFFSET_BITS=64?? -I/usr/include/tcl8.5?? -I. -I../../include -MM? ./option.c ./zio.c ./fcheck.c ./fake-rand48.c ./version.c ./ztest.c | sed -e "s&^\([^ ]\)&../obj/i686"'$(OBJ_OPTION)'"/\1&g" -e "s&\.o&.o&g" >> Dependencies.i686 /usr/bin/g++ -Wno-deprecated -march=i686 -Wreturn-type -Wimplicit -DINSTANTIATE_TEMPLATES -D_FILE_OFFSET_BITS=64?? -I/usr/include/tcl8.5?? -I. -I../../include -MM? ./Debug.cc ./File.cc ./MStringTokUtil.cc ./tls.cc ./tserror.cc ./tclmain.cc ./testFile.cc | sed -e "s&^\([^ ]\)&../obj/i686"'$(OBJ_OPTION)'"/\1&g" -e "s&\.o&.o&g" >> Dependencies.i686 cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] /home/hp/SRILM/srilm/sbin/generate-program-dependencies ../bin/i686 ../obj/i686 "" ztest testFile | sed -e "s&\.o&.o&g" >> Dependencies.i686 make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/misc/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/dstruct/src ? rm -f Dependencies.i686 /usr/bin/gcc -march=i686 -Wreturn-type -Wimplicit -D_FILE_OFFSET_BITS=64?? -I/usr/include/tcl8.5?? -I. -I../../include -MM? ./qsort.c ./maxalloc.c | sed -e "s&^\([^ ]\)&../obj/i686"'$(OBJ_OPTION)'"/\1&g" -e "s&\.o&.o&g" >> Dependencies.i686 /usr/bin/g++ -Wno-deprecated -march=i686 -Wreturn-type -Wimplicit -DINSTANTIATE_TEMPLATES -D_FILE_OFFSET_BITS=64?? -I/usr/include/tcl8.5?? -I. -I../../include -MM? ./MemStats.cc ./LHashTrie.cc ./SArrayTrie.cc ./BlockMalloc.cc ./DStructThreads.cc ./Array.cc ./IntervalHeap.cc ./Map.cc ./SArray.cc ./LHash.cc ./Map2.cc ./Trie.cc ./CachedMem.cc ./testArray.cc ./testMap.cc ./benchHash.cc ./testHash.cc ./testSizes.cc ./testCachedMem.cc ./testBlockMalloc.cc ./testMap2.cc ./testTrie.cc | sed -e "s&^\([^ ]\)&../obj/i686"'$(OBJ_OPTION)'"/\1&g" -e "s&\.o&.o&g" >> Dependencies.i686 cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] /home/hp/SRILM/srilm/sbin/generate-program-dependencies ../bin/i686 ../obj/i686 "" maxalloc testArray testMap benchHash testHash testSizes testCachedMem testBlockMalloc testMap2 testTrie | sed -e "s&\.o&.o&g" >> Dependencies.i686 make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/dstruct/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lm/src ? rm -f Dependencies.i686 /usr/bin/gcc -march=i686 -Wreturn-type -Wimplicit -D_FILE_OFFSET_BITS=64?? -I/usr/include/tcl8.5?? -I. -I../../include -MM? ./matherr.c | sed -e "s&^\([^ ]\)&../obj/i686"'$(OBJ_OPTION)'"/\1&g" -e "s&\.o&.o&g" >> Dependencies.i686 /usr/bin/g++ -Wno-deprecated -march=i686 -Wreturn-type -Wimplicit -DINSTANTIATE_TEMPLATES -D_FILE_OFFSET_BITS=64?? -I/usr/include/tcl8.5?? -I. -I../../include -MM? ./Prob.cc ./Counts.cc ./XCount.cc ./Vocab.cc ./VocabMap.cc ./VocabMultiMap.cc ./VocabDistance.cc ./SubVocab.cc ./MultiwordVocab.cc ./TextStats.cc ./LM.cc ./LMClient.cc ./LMStats.cc ./RefList.cc ./Bleu.cc ./NBest.cc ./NBestSet.cc ./NgramLM.cc ./NgramStatsInt.cc ./NgramStatsShort.cc ./NgramStatsLong.cc ./NgramStatsLongLong.cc ./NgramStatsFloat.cc ./NgramStatsDouble.cc ./NgramStatsXCount.cc ./NgramCountLM.cc ./MSWebNgramLM.cc ./Discount.cc ./ClassNgram.cc ./SimpleClassNgram.cc ./DFNgram.cc ./SkipNgram.cc ./HiddenNgram.cc ./HiddenSNgram.cc ./VarNgram.cc ./DecipherNgram.cc ./TaggedVocab.cc ./TaggedNgram.cc ./TaggedNgramStats.cc ./StopNgram.cc ./StopNgramStats.cc ./MultiwordLM.cc ./NonzeroLM.cc ./BayesMix.cc ./LoglinearMix.cc ./AdaptiveMix.cc ./AdaptiveMarginals.cc ./CacheLM.cc ./DynamicLM.cc ./HMMofNgrams.cc ./WordAlign.cc ./WordLattice.cc ./WordMesh.cc ./simpleTrigram.cc ./LMThreads.cc ./NgramStats.cc ./Trellis.cc ./testBinaryCounts.cc ./testHash.cc ./testProb.cc ./testXCount.cc ./testParseFloat.cc ./testVocabDistance.cc ./testNgram.cc ./testNgramAlloc.cc ./testMultiReadLM.cc ./hoeffding.cc ./tolower.cc ./testLattice.cc ./testError.cc ./testNBest.cc ./testMix.cc ./testTaggedVocab.cc ./testVocab.cc ./ngram.cc ./ngram-count.cc ./ngram-merge.cc ./ngram-class.cc ./disambig.cc ./anti-ngram.cc ./nbest-lattice.cc ./nbest-mix.cc ./nbest-optimize.cc ./nbest-pron-score.cc ./segment.cc ./segment-nbest.cc ./hidden-ngram.cc ./multi-ngram.cc | sed -e "s&^\([^ ]\)&../obj/i686"'$(OBJ_OPTION)'"/\1&g" -e "s&\.o&.o&g" >> Dependencies.i686 cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] /home/hp/SRILM/srilm/sbin/generate-program-dependencies ../bin/i686 ../obj/i686 "" testBinaryCounts testHash testProb testXCount testParseFloat testVocabDistance testNgram testNgramAlloc testMultiReadLM hoeffding tolower testLattice testError testNBest testMix testTaggedVocab testVocab? ngram ngram-count ngram-merge ngram-class disambig anti-ngram nbest-lattice nbest-mix nbest-optimize nbest-pron-score segment segment-nbest hidden-ngram multi-ngram | sed -e "s&\.o&.o&g" >> Dependencies.i686 make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lm/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/flm/src ? rm -f Dependencies.i686 /usr/bin/g++ -Wno-deprecated -march=i686 -Wreturn-type -Wimplicit -DINSTANTIATE_TEMPLATES -D_FILE_OFFSET_BITS=64?? -I/usr/include/tcl8.5?? -I. -I../../include -MM? ./FDiscount.cc ./FNgramStats.cc ./FNgramStatsInt.cc ./FNgramSpecs.cc ./FNgramSpecsInt.cc ./FactoredVocab.cc ./FNgramLM.cc ./ProductVocab.cc ./ProductNgram.cc ./FLMThreads.cc ./strtolplusb.cc ./wmatrix.cc ./pngram.cc ./fngram-count.cc ./fngram.cc | sed -e "s&^\([^ ]\)&../obj/i686"'$(OBJ_OPTION)'"/\1&g" -e "s&\.o&.o&g" >> Dependencies.i686 cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] /home/hp/SRILM/srilm/sbin/generate-program-dependencies ../bin/i686 ../obj/i686 "" pngram fngram-count fngram? | sed -e "s&\.o&.o&g" >> Dependencies.i686 make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/flm/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ? rm -f Dependencies.i686 /usr/bin/g++ -Wno-deprecated -march=i686 -Wreturn-type -Wimplicit -DINSTANTIATE_TEMPLATES -D_FILE_OFFSET_BITS=64?? -I/usr/include/tcl8.5?? -I. -I../../include -MM? ./Lattice.cc ./LatticeAlign.cc ./LatticeExpand.cc ./LatticeIndex.cc ./LatticeNBest.cc ./LatticeNgrams.cc ./LatticeReduce.cc ./HTKLattice.cc ./LatticeLM.cc ./LatticeThreads.cc ./LatticeDecode.cc ./testLattice.cc ./lattice-tool.cc | sed -e "s&^\([^ ]\)&../obj/i686"'$(OBJ_OPTION)'"/\1&g" -e "s&\.o&.o&g" >> Dependencies.i686 cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default] /home/hp/SRILM/srilm/sbin/generate-program-dependencies ../bin/i686 ../obj/i686 "" testLattice? lattice-tool | sed -e "s&\.o&.o&g" >> Dependencies.i686 make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/utils/src ? rm -f Dependencies.i686 /home/hp/SRILM/srilm/sbin/generate-program-dependencies ../bin/i686 ../obj/i686 ""? | sed -e "s&\.o&.o&g" >> Dependencies.i686 make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/utils/src ? make[1]: quittant le r?pertoire ? /home/hp/SRILM/srilm ? make release-libraries make[1]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm ? for subdir in misc dstruct lm flm lattice utils; do \ ??? ??? (cd $subdir/src; make SRILM=/home/hp/SRILM/srilm MACHINE_TYPE=i686 OPTION= MAKE_PIC= release-libraries) || exit 1; \ ??? done make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/misc/src ? make[2]: Rien ? faire pour ? release-libraries ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/misc/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/dstruct/src ? make[2]: Rien ? faire pour ? release-libraries ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/dstruct/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lm/src ? make[2]: Rien ? faire pour ? release-libraries ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lm/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/flm/src ? make[2]: Rien ? faire pour ? release-libraries ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/flm/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ? make[2]: Rien ? faire pour ? release-libraries ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/utils/src ? make[2]: Rien ? faire pour ? release-libraries ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/utils/src ? make[1]: quittant le r?pertoire ? /home/hp/SRILM/srilm ? make release-programs make[1]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm ? for subdir in misc dstruct lm flm lattice utils; do \ ??? ??? (cd $subdir/src; make SRILM=/home/hp/SRILM/srilm MACHINE_TYPE=i686 OPTION= MAKE_PIC= release-programs) || exit 1; \ ??? done make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/misc/src ? make[2]: Rien ? faire pour ? release-programs ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/misc/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/dstruct/src ? make[2]: Rien ? faire pour ? release-programs ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/dstruct/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lm/src ? make[2]: Rien ? faire pour ? release-programs ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lm/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/flm/src ? make[2]: Rien ? faire pour ? release-programs ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/flm/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ? make[2]: Rien ? faire pour ? release-programs ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/utils/src ? make[2]: Rien ? faire pour ? release-programs ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/utils/src ? make[1]: quittant le r?pertoire ? /home/hp/SRILM/srilm ? make release-scripts make[1]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm ? for subdir in misc dstruct lm flm lattice utils; do \ ??? ??? (cd $subdir/src; make SRILM=/home/hp/SRILM/srilm MACHINE_TYPE=i686 OPTION= MAKE_PIC= release-scripts) || exit 1; \ ??? done make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/misc/src ? make[2]: Rien ? faire pour ? release-scripts ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/misc/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/dstruct/src ? make[2]: Rien ? faire pour ? release-scripts ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/dstruct/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lm/src ? make[2]: Rien ? faire pour ? release-scripts ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lm/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/flm/src ? make[2]: Rien ? faire pour ? release-scripts ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/flm/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ? make[2]: Rien ? faire pour ? release-scripts ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ? make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/utils/src ? make[2]: Rien ? faire pour ? release-scripts ?. make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/utils/src ? make[1]: quittant le r?pertoire ? /home/hp/SRILM/srilm ? can some one tell me what's the problem !!!!! ? ---- Cordialement Rim LAATAR? Ing?nieur? Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax(ENIS) ?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles Technologies ? laFSEGS?--Option TALN Site web:Rim LAATAR BEN SAID Tel: (+216) 99 64 74 98? ---- -------------- next part -------------- An HTML attachment was scrubbed... URL: From sergey.zablotskiy at uni-ulm.de Thu Oct 31 03:21:36 2013 From: sergey.zablotskiy at uni-ulm.de (Sergey Zablotskiy) Date: Thu, 31 Oct 2013 11:21:36 +0100 Subject: [SRILM User List] make-big-lm with kn-/wb-discount Message-ID: <52722F30.7050205@uni-ulm.de> Hi Everybody, is there any workaround to combine modified Kneser-Ney smoothing for lower-order n-grams along with Witten-Bell smooting for higher-order n-grams using the MAKE-BIG-LM training script? I am getting the following error/message: make-big-lm: must use one of GT, KN, or WB discounting for all orders while executing: >> make-big-lm -read ${count_file} -vocab ${vocab} -unk -order 4 \ -kndiscount1 -kndiscount2 -kndiscount3 -wbdiscount4 \ -interpolate -lm name.lm I can not use the kndiscount for 4-Gram because some counts of counts are zero in my case. Thank you very much in advance, Regards Sergey. -- M.Sc. Sergey Zablotskiy Institute of Communications Engineering University of Ulm Albert-Einstein-Allee 43, Room 43.1.225 89081 Ulm, Germany Phone: +49 731 50-26275 Fax: +49 731 50-26259 http://www.uni-ulm.de/in/nt/staff/research-assistants-external/zablotskiy.html From stolcke at icsi.berkeley.edu Thu Oct 31 16:02:03 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Fri, 01 Nov 2013 07:02:03 +0800 Subject: [SRILM User List] make-big-lm with kn-/wb-discount In-Reply-To: <52722F30.7050205@uni-ulm.de> References: <52722F30.7050205@uni-ulm.de> Message-ID: <5272E16B.9040405@icsi.berkeley.edu> On 10/31/2013 6:21 PM, Sergey Zablotskiy wrote: > Hi Everybody, > > is there any workaround to combine modified Kneser-Ney smoothing for > lower-order n-grams along with Witten-Bell smooting for higher-order > n-grams using the MAKE-BIG-LM training script? > > I am getting the following error/message: > make-big-lm: must use one of GT, KN, or WB discounting for all orders > > while executing: > >> make-big-lm -read ${count_file} -vocab ${vocab} -unk -order 4 \ > -kndiscount1 -kndiscount2 -kndiscount3 -wbdiscount4 \ > -interpolate -lm name.lm > > I can not use the kndiscount for 4-Gram because some counts of counts > are zero in my case. > 1) It does not make sense to combine KN discounting for lower-order ngrams with some other method since the KN method of discounting the lower-order ngram is designed precisely to complement the discounting for the highest-order ngrams. 2) make-big-lm invokes a helper script called make-kn-discounts to compute the discounting factors based on the counts-of-counts. It tries to fill in for missing (zero) counts-of-counts based on an empirical regularity in the counts-of-counts (the details are in Section 4 of this paper ). If that mechanism doesn't work for some reason we should try to fix it. Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From Joris.Pelemans at esat.kuleuven.be Fri Nov 1 17:00:26 2013 From: Joris.Pelemans at esat.kuleuven.be (Joris Pelemans) Date: Sat, 02 Nov 2013 01:00:26 +0100 Subject: [SRILM User List] Adding n-grams to an existing LM Message-ID: <5274409A.7020003@esat.kuleuven.be> Hello, I have an existing 5-gram LM with KN discounting and I would like to add new words to it. To estimate reasonable n-gram probabilities for a new word, I am now using (a fraction of) the probabilities of a synonym of the word. I am simply replacing every occurrence of the synonym with the new word, copying the logprob (or slightly altering it in case of a fraction) and alpha and adding the new line to the LM. Obviously the resulting n-gram is no longer normalized. I thought I would be able to fix this relatively easily with: ngram -lm src.arpa -order 5 -renorm -write-lm dest.arpa but I get a lot of errors of the type "BOW numerator for context is ... < 0" and "BOW denominator for context is ... <= 0. What do these errors mean, can I ignore them or is there a better way to renormalize my new LMs? Thanks in advance, Joris From stolcke at icsi.berkeley.edu Fri Nov 1 18:07:00 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Sat, 02 Nov 2013 09:07:00 +0800 Subject: [SRILM User List] Adding n-grams to an existing LM In-Reply-To: <5274409A.7020003@esat.kuleuven.be> References: <5274409A.7020003@esat.kuleuven.be> Message-ID: <52745034.1050402@icsi.berkeley.edu> On 11/2/2013 8:00 AM, Joris Pelemans wrote: > Hello, > > I have an existing 5-gram LM with KN discounting and I would like to > add new words to it. To estimate reasonable n-gram probabilities for a > new word, I am now using (a fraction of) the probabilities of a > synonym of the word. I am simply replacing every occurrence of the > synonym with the new word, copying the logprob (or slightly altering > it in case of a fraction) and alpha and adding the new line to the LM. > Obviously the resulting n-gram is no longer normalized. I thought I > would be able to fix this relatively easily with: > > ngram -lm src.arpa -order 5 -renorm -write-lm dest.arpa > > but I get a lot of errors of the type "BOW numerator for context is > ... < 0" and "BOW denominator for context is ... <= 0. The BOW for a given context is is computed as 1 - sum of all higher-order probabilities (in a given context), divided by 1 - sum of all backoff probabilities for those same ngrams. So, if you're adding ngrams to a context, those sums can exceed 1, and you end up with negative numerators and/or denominators. The ngram -renorm option only recomputes the backoff weights to achieve normalization, it does not modified the explicitly given ngram probabilities. > > What do these errors mean, can I ignore them or is there a better way > to renormalize my new LMs? I think you should split the existing ngram probabilities among all the synonyms, when the synonym occurs in the final position of the ngram. That would not add anything to the sums of probabilities involved in the BOW computation. For example, if have p(c | a b) = x and d and c synonyms, you set p(c | a b ) = x/2 p(d | a b) = x/2 If, however, the synonyms occur in the context portion of the ngram, you can just copy the parameter (as you have been doing). p( e | a c) = p(e | a d) Then, use -renorm to recompute the BOWs. Andreas From Joris.Pelemans at esat.kuleuven.be Sat Nov 2 06:16:16 2013 From: Joris.Pelemans at esat.kuleuven.be (Joris Pelemans) Date: Sat, 02 Nov 2013 14:16:16 +0100 Subject: [SRILM User List] Adding n-grams to an existing LM In-Reply-To: <52745034.1050402@icsi.berkeley.edu> References: <5274409A.7020003@esat.kuleuven.be> <52745034.1050402@icsi.berkeley.edu> Message-ID: <5274FB20.9020908@esat.kuleuven.be> On 11/02/13 02:07, Andreas Stolcke wrote: > On 11/2/2013 8:00 AM, Joris Pelemans wrote: >> but I get a lot of errors of the type "BOW numerator for context is >> ... < 0" and "BOW denominator for context is ... <= 0. > > The BOW for a given context is is computed as 1 - sum of all > higher-order probabilities (in a given context), divided by 1 - sum of > all backoff probabilities for those same ngrams. So, if you're adding > ngrams to a context, those sums can exceed 1, and you end up with > negative numerators and/or denominators. I can see how that happens for the numerators, but aren't the backoff weights recomputed and thus this not prevent the denominators from ending up negative? What if I remove all the backoff weights and then renormalize? I'm just asking out of interest, I got rid of all the denominator complaints (see below). >> What do these errors mean, can I ignore them or is there a better way >> to renormalize my new LMs? > > I think you should split the existing ngram probabilities among all > the synonyms, when the synonym occurs in the final position of the > ngram. That would not add anything to the sums of probabilities > involved in the BOW computation. That did take care of most of the errors. Only a handful of numerator complaints left, but I guess that might be due to bad scripting on my behalf. I find it strange though that the complaints I get, concern n-grams that aren't in the LM at all. The following is the first complaint that I get: BOW numerator for context "negentig Hills" is -0.0120325 < 0 But if I grep the LM (before and after renormalization) for "negentig Hills" it gives me nothing? If there are no 3-grams with this context, how can 1 - (sum of all higher-order probabilities with this context) be negative? > For example, if have p(c | a b) = x and d and c synonyms, you set > > p(c | a b ) = x/2 > p(d | a b) = x/2 OK, that makes sense. And just to be complete (in case others might want to know), if I want to map d onto c with a certainty of say 0.1, then I just do: p(c | a b ) = 0.9*x p(d | a b) = 0.1*x > If, however, the synonyms occur in the context portion of the ngram, > you can just copy the parameter (as you have been doing). > > p( e | a c) = p(e | a d) And this stays the same for the 0.1 example/ Thanks already! Joris From Joris.Pelemans at esat.kuleuven.be Sat Nov 2 07:46:31 2013 From: Joris.Pelemans at esat.kuleuven.be (Joris Pelemans) Date: Sat, 02 Nov 2013 15:46:31 +0100 Subject: [SRILM User List] Adding n-grams to an existing LM In-Reply-To: <52745034.1050402@icsi.berkeley.edu> References: <5274409A.7020003@esat.kuleuven.be> <52745034.1050402@icsi.berkeley.edu> Message-ID: <52751047.1040901@esat.kuleuven.be> On 11/02/13 02:07, Andreas Stolcke wrote: > On 11/2/2013 8:00 AM, Joris Pelemans wrote: >> >> What do these errors mean, can I ignore them or is there a better way >> to renormalize my new LMs? > > I think you should split the existing ngram probabilities among all > the synonyms, when the synonym occurs in the final position of the > ngram. That would not add anything to the sums of probabilities > involved in the BOW computation. > > For example, if have p(c | a b) = x and d and c synonyms, you set > > p(c | a b ) = x/2 > p(d | a b) = x/2 Another question with regards to this problem. Say, I don't know a good synonym for d, but I still want to include it by mapping it onto (what else, right?), obviously by a very small fraction of the probability, since it's a class. The above technique would lead to gigantic LMs, since is all over the place. Is there a smart way in the SRILM toolkit that lets you specify that some words should be modeled as ? Regards, Joris From stolcke at icsi.berkeley.edu Sat Nov 2 18:32:10 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Sat, 02 Nov 2013 18:32:10 -0700 Subject: [SRILM User List] Adding n-grams to an existing LM In-Reply-To: <5274FB20.9020908@esat.kuleuven.be> References: <5274409A.7020003@esat.kuleuven.be> <52745034.1050402@icsi.berkeley.edu> <5274FB20.9020908@esat.kuleuven.be> Message-ID: <5275A79A.3070309@icsi.berkeley.edu> On 11/2/2013 6:16 AM, Joris Pelemans wrote: > On 11/02/13 02:07, Andreas Stolcke wrote: >> On 11/2/2013 8:00 AM, Joris Pelemans wrote: >>> but I get a lot of errors of the type "BOW numerator for context is >>> ... < 0" and "BOW denominator for context is ... <= 0. >> >> The BOW for a given context is is computed as 1 - sum of all >> higher-order probabilities (in a given context), divided by 1 - sum >> of all backoff probabilities for those same ngrams. So, if you're >> adding ngrams to a context, those sums can exceed 1, and you end up >> with negative numerators and/or denominators. > I can see how that happens for the numerators, but aren't the backoff > weights recomputed and thus this not prevent the denominators from > ending up negative? What if I remove all the backoff weights and then > renormalize? I'm just asking out of interest, I got rid of all the > denominator complaints (see below). The same reasoning applies to the denominator, since it obtained by summing over ngram one order less. If you're adding trigrams and bigrams, say, then the denominator for bigram BOWs will be affected by the added bigrams. >>> What do these errors mean, can I ignore them or is there a better >>> way to renormalize my new LMs? >> >> I think you should split the existing ngram probabilities among all >> the synonyms, when the synonym occurs in the final position of the >> ngram. That would not add anything to the sums of probabilities >> involved in the BOW computation. > That did take care of most of the errors. Only a handful of numerator > complaints left, but I guess that might be due to bad scripting on my > behalf. I find it strange though that the complaints I get, concern > n-grams that aren't in the LM at all. The following is the first > complaint that I get: > > BOW numerator for context "negentig Hills" is -0.0120325 < 0 > > But if I grep the LM (before and after renormalization) for "negentig > Hills" it gives me nothing? If there are no 3-grams with this context, > how can 1 - (sum of all higher-order probabilities with this context) > be negative? The ngrams in these messages are printed in reverse order. That's because the contexts are stored in a trie that's indexed most-recent-word-first. Andreas > >> For example, if have p(c | a b) = x and d and c synonyms, you set >> >> p(c | a b ) = x/2 >> p(d | a b) = x/2 > OK, that makes sense. And just to be complete (in case others might > want to know), if I want to map d onto c with a certainty of say 0.1, > then I just do: > > p(c | a b ) = 0.9*x > p(d | a b) = 0.1*x > >> If, however, the synonyms occur in the context portion of the ngram, >> you can just copy the parameter (as you have been doing). >> >> p( e | a c) = p(e | a d) > > And this stays the same for the 0.1 example/ > > Thanks already! > > Joris From stolcke at icsi.berkeley.edu Sat Nov 2 18:35:07 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Sat, 02 Nov 2013 18:35:07 -0700 Subject: [SRILM User List] Adding n-grams to an existing LM In-Reply-To: <52751047.1040901@esat.kuleuven.be> References: <5274409A.7020003@esat.kuleuven.be> <52745034.1050402@icsi.berkeley.edu> <52751047.1040901@esat.kuleuven.be> Message-ID: <5275A84B.8060401@icsi.berkeley.edu> On 11/2/2013 7:46 AM, Joris Pelemans wrote: > On 11/02/13 02:07, Andreas Stolcke wrote: >> >> For example, if have p(c | a b) = x and d and c synonyms, you set >> >> p(c | a b ) = x/2 >> p(d | a b) = x/2 > > Another question with regards to this problem. Say, I don't know a > good synonym for d, but I still want to include it by mapping it onto > (what else, right?), obviously by a very small fraction of the > probability, since it's a class. The above technique would lead > to gigantic LMs, since is all over the place. Is there a smart > way in the SRILM toolkit that lets you specify that some words should > be modeled as ? I'm not sure I understand what you mean. is a special word that all words not in the vocabulary are mapped to at test time. So the way you 'model' a word by is to not include it in the vocabulary of your LM. Andreas From Joris.Pelemans at esat.kuleuven.be Sun Nov 3 01:43:55 2013 From: Joris.Pelemans at esat.kuleuven.be (Joris Pelemans) Date: Sun, 03 Nov 2013 10:43:55 +0100 Subject: [SRILM User List] Adding n-grams to an existing LM In-Reply-To: <5275A84B.8060401@icsi.berkeley.edu> References: <5274409A.7020003@esat.kuleuven.be> <52745034.1050402@icsi.berkeley.edu> <52751047.1040901@esat.kuleuven.be> <5275A84B.8060401@icsi.berkeley.edu> Message-ID: <52761ADB.50906@esat.kuleuven.be> On 11/03/13 02:35, Andreas Stolcke wrote: > On 11/2/2013 7:46 AM, Joris Pelemans wrote: >> On 11/02/13 02:07, Andreas Stolcke wrote: >>> >>> For example, if have p(c | a b) = x and d and c synonyms, you set >>> >>> p(c | a b ) = x/2 >>> p(d | a b) = x/2 >> >> Another question with regards to this problem. Say, I don't know a >> good synonym for d, but I still want to include it by mapping it onto >> (what else, right?), obviously by a very small fraction of the >> probability, since it's a class. The above technique would lead >> to gigantic LMs, since is all over the place. Is there a smart >> way in the SRILM toolkit that lets you specify that some words should >> be modeled as ? > > I'm not sure I understand what you mean. is a special word > that all words not in the vocabulary are mapped to at test time. So > the way you 'model' a word by is to not include it in the > vocabulary of your LM. I am investigating different techniques to introduce new words to the vocabulary. Say I have a vocabulary of 100,000 words and I want to introduce 1 new word X (for the sake of simplicity). I could do one of 3 options: 1. use the contexts in which X appears in some training data (but sometimes X may not appear (enough)) 2. estimate the probability of X by taking a fraction of the prob mass of a synonym of X (which I described earlier) 3. estimate the probability of X by taking a fraction of the prob mass of the class (if e.g. no good synonym is at hand) I could then compare the perplexities of these 3 LMs with a vocabulary of size 100,001 words to see which technique is best for a given word/situation. Joris -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Sun Nov 3 16:01:40 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Sun, 03 Nov 2013 16:01:40 -0800 Subject: [SRILM User List] Adding n-grams to an existing LM In-Reply-To: <52761ADB.50906@esat.kuleuven.be> References: <5274409A.7020003@esat.kuleuven.be> <52745034.1050402@icsi.berkeley.edu> <52751047.1040901@esat.kuleuven.be> <5275A84B.8060401@icsi.berkeley.edu> <52761ADB.50906@esat.kuleuven.be> Message-ID: <5276E3E4.7010801@icsi.berkeley.edu> On 11/3/2013 1:43 AM, Joris Pelemans wrote: > On 11/03/13 02:35, Andreas Stolcke wrote: >> On 11/2/2013 7:46 AM, Joris Pelemans wrote: >>> On 11/02/13 02:07, Andreas Stolcke wrote: >>>> >>>> For example, if have p(c | a b) = x and d and c synonyms, you set >>>> >>>> p(c | a b ) = x/2 >>>> p(d | a b) = x/2 >>> >>> Another question with regards to this problem. Say, I don't know a >>> good synonym for d, but I still want to include it by mapping it >>> onto (what else, right?), obviously by a very small fraction >>> of the probability, since it's a class. The above technique >>> would lead to gigantic LMs, since is all over the place. Is >>> there a smart way in the SRILM toolkit that lets you specify that >>> some words should be modeled as ? >> >> I'm not sure I understand what you mean. is a special word >> that all words not in the vocabulary are mapped to at test time. So >> the way you 'model' a word by is to not include it in the >> vocabulary of your LM. > I am investigating different techniques to introduce new words to the > vocabulary. Say I have a vocabulary of 100,000 words and I want to > introduce 1 new word X (for the sake of simplicity). I could do one of > 3 options: > > 1. use the contexts in which X appears in some training data (but > sometimes X may not appear (enough)) > 2. estimate the probability of X by taking a fraction of the prob > mass of a synonym of X (which I described earlier) > 3. estimate the probability of X by taking a fraction of the prob > mass of the class (if e.g. no good synonym is at hand) > > I could then compare the perplexities of these 3 LMs with a vocabulary > of size 100,001 words to see which technique is best for a given > word/situation. > And option 3 is effectively already implemented by the way unseen words are mapped to . If you want to compute perplexity in a fair way you would take the LM containing and for every occurrence of X you add log p(X | ) (the share of unk-probability mass you want to give to X). That way you don't need to add any ngrams to the LM. What this effectively does is simulate a class-based Ngram model where is a class and X one of its members. Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From Joris.Pelemans at esat.kuleuven.be Mon Nov 4 01:01:26 2013 From: Joris.Pelemans at esat.kuleuven.be (Joris Pelemans) Date: Mon, 04 Nov 2013 10:01:26 +0100 Subject: [SRILM User List] Adding n-grams to an existing LM In-Reply-To: <5276E3E4.7010801@icsi.berkeley.edu> References: <5274409A.7020003@esat.kuleuven.be> <52745034.1050402@icsi.berkeley.edu> <52751047.1040901@esat.kuleuven.be> <5275A84B.8060401@icsi.berkeley.edu> <52761ADB.50906@esat.kuleuven.be> <5276E3E4.7010801@icsi.berkeley.edu> Message-ID: <52776266.3020409@esat.kuleuven.be> On 11/04/13 01:01, Andreas Stolcke wrote: > On 11/3/2013 1:43 AM, Joris Pelemans wrote: >> I am investigating different techniques to introduce new words to the >> vocabulary. Say I have a vocabulary of 100,000 words and I want to >> introduce 1 new word X (for the sake of simplicity). I could do one >> of 3 options: >> >> 1. use the contexts in which X appears in some training data (but >> sometimes X may not appear (enough)) >> 2. estimate the probability of X by taking a fraction of the prob >> mass of a synonym of X (which I described earlier) >> 3. estimate the probability of X by taking a fraction of the prob >> mass of the class (if e.g. no good synonym is at hand) >> >> I could then compare the perplexities of these 3 LMs with a >> vocabulary of size 100,001 words to see which technique is best for a >> given word/situation. >> > And option 3 is effectively already implemented by the way unseen > words are mapped to . If you want to compute perplexity in a > fair way you would take the LM containing and for every > occurrence of X you add log p(X | ) (the share of > unk-probability mass you want to give to X). That way you don't need > to add any ngrams to the LM. What this effectively does is simulate a > class-based Ngram model where is a class and X one of its members. Yes, this is exactly what I meant when I asked for a "smart way in the SRILM toolkit", so I assume this is included. I looked up how to use class-based models and I think I found what I need to do. Is the following the correct way to calculate perplexity for these models? ngram -lm class_lm.arpa -ppl test.txt -order n -classes expansions.class where expansions.class contains lines like this: p(X | ) X p(Y | ) Y 1-p(X | )-p(Y | ) not_mapped I assume the last line is necessary since the man page for "classes-format" says "All expansion probabilities for a given class should sum to one, although this is not necessarily enforced by the software and would lead to improper models." Joris -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Mon Nov 4 09:16:25 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Mon, 04 Nov 2013 09:16:25 -0800 Subject: [SRILM User List] Adding n-grams to an existing LM In-Reply-To: <52776266.3020409@esat.kuleuven.be> References: <5274409A.7020003@esat.kuleuven.be> <52745034.1050402@icsi.berkeley.edu> <52751047.1040901@esat.kuleuven.be> <5275A84B.8060401@icsi.berkeley.edu> <52761ADB.50906@esat.kuleuven.be> <5276E3E4.7010801@icsi.berkeley.edu> <52776266.3020409@esat.kuleuven.be> Message-ID: <5277D669.1070500@icsi.berkeley.edu> On 11/4/2013 1:01 AM, Joris Pelemans wrote: > On 11/04/13 01:01, Andreas Stolcke wrote: >> On 11/3/2013 1:43 AM, Joris Pelemans wrote: >>> I am investigating different techniques to introduce new words to >>> the vocabulary. Say I have a vocabulary of 100,000 words and I want >>> to introduce 1 new word X (for the sake of simplicity). I could do >>> one of 3 options: >>> >>> 1. use the contexts in which X appears in some training data (but >>> sometimes X may not appear (enough)) >>> 2. estimate the probability of X by taking a fraction of the prob >>> mass of a synonym of X (which I described earlier) >>> 3. estimate the probability of X by taking a fraction of the prob >>> mass of the class (if e.g. no good synonym is at hand) >>> >>> I could then compare the perplexities of these 3 LMs with a >>> vocabulary of size 100,001 words to see which technique is best for >>> a given word/situation. >>> >> And option 3 is effectively already implemented by the way unseen >> words are mapped to . If you want to compute perplexity in a >> fair way you would take the LM containing and for every >> occurrence of X you add log p(X | ) (the share of >> unk-probability mass you want to give to X). That way you don't need >> to add any ngrams to the LM. What this effectively does is simulate >> a class-based Ngram model where is a class and X one of its >> members. > Yes, this is exactly what I meant when I asked for a "smart way in the > SRILM toolkit", so I assume this is included. I looked up how to use > class-based models and I think I found what I need to do. Is the > following the correct way to calculate perplexity for these models? > > ngram -lm class_lm.arpa -ppl test.txt -order n -classes expansions.class > > where expansions.class contains lines like this: > > p(X | ) X > p(Y | ) Y > 1-p(X | )-p(Y | ) not_mapped Yes, except you have to use a new class symbol, like UNKWORD, and replace the "not_mapped" with the standard . Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From rimlaatar at yahoo.fr Tue Nov 26 02:28:31 2013 From: rimlaatar at yahoo.fr (Laatar Rim) Date: Tue, 26 Nov 2013 10:28:31 +0000 (GMT) Subject: [SRILM User List] commands used to build a ML type N-Class Message-ID: <1385461711.76816.YahooMailNeo@web173202.mail.ir2.yahoo.com> Hello, what are the commands used to build a ML type N-Class ? Thanks a lot ? ---- Cordialement Rim LAATAR? Ing?nieur? Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax(ENIS) ?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles Technologies ? laFSEGS?--Option TALN Site web:Rim LAATAR BEN SAID Tel: (+216) 99 64 74 98? ---- -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Wed Nov 27 00:37:11 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Wed, 27 Nov 2013 00:37:11 -0800 Subject: [SRILM User List] commands used to build a ML type N-Class In-Reply-To: <1385539922.30780.YahooMailNeo@web173203.mail.ir2.yahoo.com> References: <1385461711.76816.YahooMailNeo@web173202.mail.ir2.yahoo.com> <5294DE68.108@icsi.berkeley.edu> <1385536581.45235.YahooMailNeo@web173205.mail.ir2.yahoo.com> <1385539922.30780.YahooMailNeo@web173203.mail.ir2.yahoo.com> Message-ID: <5295AF37.3080807@icsi.berkeley.edu> On 11/27/2013 12:12 AM, Laatar Rim wrote: > H i, > /I am trying to train a class-based LM. I was hoping there is an// step-by-step guide for doing this !!!/ See the thread at//http://www.speech.sri.com/pipermail/srilm-user/2011q3/001078.html on this mailing list (and the link to tutorial page that is given there). Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Wed Nov 27 10:04:02 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Wed, 27 Nov 2013 10:04:02 -0800 Subject: [SRILM User List] commands used to build a ML type N-Class In-Reply-To: <1385541844.41439.YahooMailNeo@web173202.mail.ir2.yahoo.com> References: <1385461711.76816.YahooMailNeo@web173202.mail.ir2.yahoo.com> <5294DE68.108@icsi.berkeley.edu> <1385536581.45235.YahooMailNeo@web173205.mail.ir2.yahoo.com> <1385539922.30780.YahooMailNeo@web173203.mail.ir2.yahoo.com> <5295AF37.3080807@icsi.berkeley.edu> <1385541844.41439.YahooMailNeo@web173202.mail.ir2.yahoo.com> Message-ID: <52963412.1050401@icsi.berkeley.edu> On 11/27/2013 12:44 AM, Laatar Rim wrote: > J'ai d?ja vu le lien aussi j'ai test? ces commandes : > ngram-class -vocab vocab_file \ > -text input_file \ > -numclasses num \ > -class-counts output.class-counts \ > -classes output.classes > ma question est : pour construire un mod?le de langage de type n classe , on a besoin seulement denumclasses (nombre de classes) et ? quoi sertreplace-word-with-classes ?? > Merci d'avance pour vos ?claircissement .. replace-words-with-classes replaces word labels with class labels, so that you can train a class-level ngram model. It is described in thetraining-scripts(1) man page. Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From rimlaatar at yahoo.fr Mon Dec 2 23:41:37 2013 From: rimlaatar at yahoo.fr (Laatar Rim) Date: Tue, 3 Dec 2013 07:41:37 +0000 (GMT) Subject: [SRILM User List] class based language model s Message-ID: <1386056497.13953.YahooMailNeo@web173204.mail.ir2.yahoo.com> Hello, To build a class based language?model??with srilm , I use the same commands ?specific to ?LM type n-gram and just replace the corpus of words with a corpus of classes ?? ? ---- Cordialement Rim LAATAR? Ing?nieur? Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax(ENIS) ?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles Technologies ? laFSEGS?--Option TALN Site web:Rim LAATAR BEN SAID Tel: (+216) 99 64 74 98? ---- -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Tue Dec 3 00:12:29 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Tue, 03 Dec 2013 00:12:29 -0800 Subject: [SRILM User List] class based language model s In-Reply-To: <1386056497.13953.YahooMailNeo@web173204.mail.ir2.yahoo.com> References: <1386056497.13953.YahooMailNeo@web173204.mail.ir2.yahoo.com> Message-ID: <529D926D.20502@icsi.berkeley.edu> On 12/2/2013 11:41 PM, Laatar Rim wrote: > Hello, > To build a class based language model with srilm , I use the same > commands specific to LM type n-gram and just replace the corpus of > words with a corpus of classes ?? Yes. replace-words-with-classes automates the replacement of word string by class labels. Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Tue Dec 3 14:48:57 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Tue, 03 Dec 2013 14:48:57 -0800 Subject: [SRILM User List] class based language model s In-Reply-To: <1386103837.59641.YahooMailNeo@web173205.mail.ir2.yahoo.com> References: <1386056497.13953.YahooMailNeo@web173204.mail.ir2.yahoo.com> <529D926D.20502@icsi.berkeley.edu> <1386063351.55568.YahooMailNeo@web173203.mail.ir2.yahoo.com> <529E205C.40302@icsi.berkeley.edu> <1386103837.59641.YahooMailNeo@web173205.mail.ir2.yahoo.com> Message-ID: <529E5FD9.9010705@icsi.berkeley.edu> On 12/3/2013 12:50 PM, Laatar Rim wrote: > in the class format: > /class/ [/p/]/word1/ /word2/ ... > how can i calculate p ? Use replace-words-with-classes with the outfile= option. This is explained in a previous post . Andreas > > Le Mardi 3 d?cembre 2013 18h18, Andreas Stolcke > a ?crit : > On 12/3/2013 1:35 AM, Laatar Rim wrote: >> hello, >> >> on the internet I found this: >> to build and use a simple class language model: >> Induce classes: >> ngram-class -vocab vocab_file \ >> -text input_file \ >> -numclasses num \ >> -class-counts output.class-counts \ >> -classes output.classes >> in this exemple we need only number of class, how can i use corpus of class ??? > The steps for building a class-based LM are: > > 1. prepare class definition file in the format described in the > classes-format(5) manual page. this can be done by hand or from other > knowledge sources, or automatically using word clustering algorithms > (see ngram-class(1)). > > 2. condition the training data or counts to replace words with class > labels, > using the "replace-words-with-classes" filter (see training-scripts(1) > man page). > > 3. run ngram-count on the result of step 2. > > > Andreas > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rimlaatar at yahoo.fr Tue Dec 10 22:20:26 2013 From: rimlaatar at yahoo.fr (Laatar Rim) Date: Wed, 11 Dec 2013 06:20:26 +0000 (GMT) Subject: [SRILM User List] class-based model Message-ID: <1386742826.82557.YahooMailNeo@web173205.mail.ir2.yahoo.com> Dear Andreas, First i'm so sorry for disturbing you,? from many day?I want to train a class-based model ,So I use (1), (2) and (3) to create the class model.? (1)? ngram-class -vocab '/home/hp/Documents/SRILM/tata.txt' -text '/home/hp/Documents/SRILM/trainingData.txt' -numclasses 37 -class-counts output.class-counts -classes '/home/hp/Documents/SRILM/Replace_word_with_class_SRILM'? (2)?? replace-words-with-classes? classes='/home/hp/Documents/SRILM/Replace_word_with_class_SRILM' '/home/hp/Documents/SRILM/trainingData.txt' > output_text_with_classes_2 (3)?ngram-count -tolower -text '/home/hp/Documents/SRILM/trainingData.txt' -lm class_based_model_2 * tata.txt : a list of all the words in the vocabulary? * trainingData.txt : my training data * Replace_word_with_class: my file class definition ( format like this :name class p word1 word2 ...) ????????????????????????????[Quantite?0.21 ????????? ??????? ?????????? ???????? ?????? ?????? ????????? ?????? ???????? ??????? ???????? ???????? ????? ?????? ?????? ????????? ????????????????????????? ??[Promotion?0.245 ???????????? ??????????? ????????????? ?????????? ???????????? ???????? ????????? ????????????????? ?????????????? The result: class_based_model_2 is like this : -2.44486 ? -0.1822249 -4.447026 ?????????? -0.0797594 -4.447026 ??????????? -0.282028 -3.075958 ????? -0.3957056 -4.748056 ???? -0.1852052 -4.748056 ??? -0.07981876 -4.748056 ????? -0.1853914 -4.447026 ?????? -0.1845916 i want to know if this commands and the result are true , and why in myclass_based_model_2 it found only the world and not a class ?? Please help me !! Thanks A Lot .. ? ---- Cordialement Rim LAATAR? Ing?nieur? Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax(ENIS) ?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles Technologies ? laFSEGS?--Option TALN Site web:Rim LAATAR BEN SAID Tel: (+216) 99 64 74 98? ---- -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Wed Dec 11 00:53:19 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Wed, 11 Dec 2013 00:53:19 -0800 Subject: [SRILM User List] class-based model In-Reply-To: <1386742826.82557.YahooMailNeo@web173205.mail.ir2.yahoo.com> References: <1386742826.82557.YahooMailNeo@web173205.mail.ir2.yahoo.com> Message-ID: <52A827FF.1090004@icsi.berkeley.edu> On 12/10/2013 10:20 PM, Laatar Rim wrote: > Dear Andreas, > First i'm so sorry for disturbing you, > > from many day I want to train a class-based model ,So I use (1), (2) > and (3) to create the class model. > > (1) ngram-class -vocab '/home/hp/Documents/SRILM/tata.txt' -text > '/home/hp/Documents/SRILM/trainingData.txt' -numclasses 37 > -class-counts output.class-counts -classes > '/home/hp/Documents/SRILM/Replace_word_with_class_SRILM' > > > (2) replace-words-with-classes > classes='/home/hp/Documents/SRILM/Replace_word_with_class_SRILM' > '/home/hp/Documents/SRILM/trainingData.txt' > output_text_with_classes_2 > > (3) ngram-count -tolower -text > '/home/hp/Documents/SRILM/trainingData.txt' -lm class_based_model_2 For step 3 you need to use ngram-count -text output_text_with_classes_2 -lm class_based_model_2 To evaluate the LM you would then use ngram -lm class_based_model_2 -classes '/home/hp/Documents/SRILM/Replace_word_with_class_SRILM' -ppl ... (or other options that use the LM) Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Wed Dec 11 08:51:24 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Wed, 11 Dec 2013 08:51:24 -0800 Subject: [SRILM User List] class-based model In-Reply-To: <1386769259.94283.YahooMailNeo@web173202.mail.ir2.yahoo.com> References: <1386742826.82557.YahooMailNeo@web173205.mail.ir2.yahoo.com> <52A827FF.1090004@icsi.berkeley.edu> <1386769259.94283.YahooMailNeo@web173202.mail.ir2.yahoo.com> Message-ID: <52A8980C.1020009@icsi.berkeley.edu> On 12/11/2013 5:40 AM, Laatar Rim wrote: > Hello, > Tahnk you so much , another question how can i interpret this result ? The same way you interpret a standard LM. The class-based LM just uses a different way to compute the word probabilities. Check the tutorials that are linked to at http://www.speech.sri.com/projects/srilm/manpages/, for example, the lecture by Jurafsky . The interpretation of perplexity (= average branching factor) is the same no matter what type of LM you are using. Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Wed Dec 11 09:48:03 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Wed, 11 Dec 2013 09:48:03 -0800 Subject: [SRILM User List] class-based model In-Reply-To: <1386783558.12847.YahooMailNeo@web173204.mail.ir2.yahoo.com> References: <1386742826.82557.YahooMailNeo@web173205.mail.ir2.yahoo.com> <52A827FF.1090004@icsi.berkeley.edu> <1386769259.94283.YahooMailNeo@web173202.mail.ir2.yahoo.com> <52A8980C.1020009@icsi.berkeley.edu> <1386783558.12847.YahooMailNeo@web173204.mail.ir2.yahoo.com> Message-ID: <52A8A553.5060206@icsi.berkeley.edu> On 12/11/2013 9:39 AM, Laatar Rim wrote: > for example how can i interpret this line : > > \3-grams: > -0.4148394 CLASS-00001 CLASS-00001 CLASS-00009 log_10 P(CLASS-00009 | CLASS-00001 CLASS-00001) = -0.4148394 So a word from CLASS-00009 following two words from class CLASS-00001 has probability 10^ -0.4148394, times the probability of the word in class CLASS-00009 (which you can get from the class membership file. Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Thu Dec 12 14:07:08 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Thu, 12 Dec 2013 14:07:08 -0800 Subject: [SRILM User List] Google 1B Word Language Modeling Benchmark Message-ID: <52AA338C.1040908@icsi.berkeley.edu> Ciprian Chelba asked me to forward the following information about a recently launched initiative in large-scale LM benchmarking. More information at https://code.google.com/p/1-billion-word-language-modeling-benchmark/ . Andreas _________________________________________________________________________________________________________ Here is a brief description of the project. "The purpose of the project is to make available a standard training and test setup for language modeling experiments. The training/held-out data was produced from a download at statmt.org using a combination of Bash shell and Perl scripts distributed here. This also means that your results on this data set are reproducible by the research community at large. Besides the scripts needed to rebuild the training/held-out data, it also makes available log-probability values for each word in each of ten held-out data sets, for each of the following baseline models: * unpruned Katz (1.1B n-grams), * pruned Katz (~15M n-grams), * unpruned Interpolated Kneser-Ney (1.1B n-grams), * pruned Interpolated Kneser-Ney (~15M n-grams) ArXiv paper: http://arxiv.org/abs/1312.3005 Happy benchmarking!" -- -Ciprian -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Thu Dec 12 14:21:17 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Thu, 12 Dec 2013 14:21:17 -0800 Subject: [SRILM User List] class-based model In-Reply-To: <1386852561.98570.YahooMailNeo@web173205.mail.ir2.yahoo.com> References: <1386742826.82557.YahooMailNeo@web173205.mail.ir2.yahoo.com> <52A827FF.1090004@icsi.berkeley.edu> <1386769259.94283.YahooMailNeo@web173202.mail.ir2.yahoo.com> <52A8980C.1020009@icsi.berkeley.edu> <1386783558.12847.YahooMailNeo@web173204.mail.ir2.yahoo.com> <52A8A553.5060206@icsi.berkeley.edu> <1386784962.96731.YahooMailNeo@web173204.mail.ir2.yahoo.com> <1386852561.98570.YahooMailNeo@web173205.mail.ir2.yahoo.com> Message-ID: <52AA36DD.4060409@icsi.berkeley.edu> On 12/12/2013 4:49 AM, Laatar Rim wrote: > Hello , > this line : -0.7027302 ?????????? CLASS-00021 means : probability that > word ??????????in class CLASS-00021 is 10 ^ -0.7027302 > > PLZ tell me if i'm wrong I assume this is a line from the LM. It is NOT a class-membership probability. The N-gram model can have a mix of word and class labels. A word label simply represents a class consisting only of the word itself. Therefore, the above line means that class CLASS-00021 has probability 10 ^ -0.7027302 when the previous word is ?????????? . The class membership probabilities are stored in the file that you specify with ngram -classes . Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From rimlaatar at yahoo.fr Tue Dec 17 02:58:09 2013 From: rimlaatar at yahoo.fr (Laatar Rim) Date: Tue, 17 Dec 2013 10:58:09 +0000 (GMT) Subject: [SRILM User List] class based model Message-ID: <1387277889.14584.YahooMailNeo@web173202.mail.ir2.yahoo.com> Dear Andreas ,? First , sorry to disturb you by my stupid questions , but i? still have an ambiguous about class based model and i will be very grateful if you can help me. There are my questions: 1- The file : class format ( class p word1 word2 ...) , it supports only a simple words or it can support word such as? : ? good-morning , thank-you ... 2-Yhe class model can have a mixte of word and class definition ? 3- You say that A word label simply represents a class consisting only of the word itself , but i don't have class that contains one word , and is that means my model is wrong ? 4-? To execute this command :replace-words-with-classes? classes='/home/hp/Documents/SRILM/Replace_word_with_class_SRILM' '/home/hp/Documents/SRILM/trainingData.txt' > output_text_with_classes_2 :? trainingData.txt? must continue punctuation marks or only phrases. Thank you.. ? ---- Cordialement Rim LAATAR? Ing?nieur? Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax(ENIS) ?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles Technologies ? laFSEGS?--Option TALN Site web:Rim LAATAR BEN SAID Tel: (+216) 99 64 74 98? ---- -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Tue Dec 17 17:22:45 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Tue, 17 Dec 2013 17:22:45 -0800 Subject: [SRILM User List] class based model In-Reply-To: <1387277889.14584.YahooMailNeo@web173202.mail.ir2.yahoo.com> References: <1387277889.14584.YahooMailNeo@web173202.mail.ir2.yahoo.com> Message-ID: <52B0F8E5.9060301@icsi.berkeley.edu> On 12/17/2013 2:58 AM, Laatar Rim wrote: > Dear Andreas , > > First , sorry to disturb you by my stupid questions , but i still > have an ambiguous about class based model and i will be very grateful > if you can help me. > > There are my questions: > > 1- The file : class format ( class p word1 word2 ...) , it supports > only a simple words or it can support word such as : > good-morning , thank-you ... The expansion of a class can be one or more words, e.g., CITY 0.123 New York > > 2-Yhe class model can have a mixte of word and class definition ? Yes. The LM could have an ngram "the CITY" (see above). > > 3- You say that A word label simply represents a class consisting only > of the word itself , but i don't have class that contains one word , > and is that means my model is wrong ? what is meant is that a class ngrams with a mix of words and class labels is equivalent to class ngram model that has only class-based ngrams, where the word labels are replaced by classes that have only that one word as their membership. > > 4- To execute this command :replace-words-with-classes > classes='/home/hp/Documents/SRILM/Replace_word_with_class_SRILM' > '/home/hp/Documents/SRILM/trainingData.txt' > > output_text_with_classes_2 : > > trainingData.txt must continue punctuation marks or only phrases. It depends on whether your ngram model is supposed to include punctuation or not. The software doesn't care whether you have punctuation, it treats period, comma, etc. as word strings just like any other. It depends on your application (the program that uses the LM) whether punctuation is appropriate or not. Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at icsi.berkeley.edu Wed Dec 18 08:12:18 2013 From: stolcke at icsi.berkeley.edu (Andreas Stolcke) Date: Wed, 18 Dec 2013 08:12:18 -0800 Subject: [SRILM User List] class based model In-Reply-To: <1387353916.63557.YahooMailNeo@web173202.mail.ir2.yahoo.com> References: <1387277889.14584.YahooMailNeo@web173202.mail.ir2.yahoo.com> <52B0F8E5.9060301@icsi.berkeley.edu> <1387353916.63557.YahooMailNeo@web173202.mail.ir2.yahoo.com> Message-ID: <52B1C962.3020004@icsi.berkeley.edu> On 12/18/2013 12:05 AM, Laatar Rim wrote: > Hi, > > Thanks a lot, > > The expansion of a class can be one or more words, e.g., > > CITY 0.123 New York > > and what about words of each class ? it must be simple ( new , york, > good, morning ..) or i can use words like ( New-York , good-morning > ...) ??? > > [the class file defintion can support word like New-York , > good-morning or words must be simple .. ??] Yes, you can have hyphens in word strings. Everything that is not whitespace (space, tab, newline) can be part of words. Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: