From partha.talukdar at gmail.com Sat Oct 16 21:14:35 2004 From: partha.talukdar at gmail.com (Partha Talukdar) Date: Sun, 17 Oct 2004 00:14:35 -0400 Subject: Smoothing Error Message-ID: <8903a7f304101621147622e36d@mail.gmail.com> Hi I am new to SRILM. While trying to build a language model, I am getting the follwoing error: > one of required modified KneserNey count-of-counts is zero > error in discount estimator for order 1 I used the following command to build the model: ../../tools/SRILM/bin/i686/ngram-count -order 4 -text temp.txt -kndiscount -kndiscount1 -kndiscount2 -kndiscount3 -kndiscount4 -interpolate -interpolate1 -interpolate2 -interpolate3 -interpolate4 -lm temp.lm -gt1min 0 -gt2min 0 -gt3min 0 -gt4min 0 -debug 1 Output was: temp.txt: line 50000: 50000 sentences, 6348678 words, 0 OOVs 0 zeroprobs, logprob= 0 ppl= 1 ppl1= 1 modifying 1-gram counts for Kneser-Ney smoothing Kneser-Ney smoothing 1-grams n1 = 3 n2 = 0 n3 = 1 n4 = 0 one of required modified KneserNey count-of-counts is zero error in discount estimator for order 1 Any thoughts on this ?? Thanks in advance Partha From svmats at yahoo.com Sun Oct 17 06:27:27 2004 From: svmats at yahoo.com (Mats Svensson) Date: Sun, 17 Oct 2004 06:27:27 -0700 (PDT) Subject: SRILM on Athlon 64 Message-ID: <20041017132727.72940.qmail@web61304.mail.yahoo.com> Dear all! I have a question: Has anyone ever tried to build the SRILM on a machine with Athlon 64 processor? Were/will be there some problems? And do you find 64-bit useful (better than Intel P-IV) for using with the SRILM toolkit? Thanks a lot. Mats --------------------------------- Do you Yahoo!? vote.yahoo.com - Register online to vote today! -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at speech.sri.com Sun Oct 17 09:06:49 2004 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Sun, 17 Oct 2004 09:06:49 PDT Subject: Smoothing Error In-Reply-To: Your message of Sun, 17 Oct 2004 00:14:35 -0400. <8903a7f304101621147622e36d@mail.gmail.com> Message-ID: <200410171606.JAA11427@huge> In message <8903a7f304101621147622e36d at mail.gmail.com>you wrote: > Hi > > I am new to SRILM. While trying to build a language model, I am > getting the follwoing error: > > > one of required modified KneserNey count-of-counts is zero > > error in discount estimator for order 1 > > I used the following command to build the model: > > ../../tools/SRILM/bin/i686/ngram-count -order 4 -text temp.txt > -kndiscount -kndiscount1 -kndiscount2 -kndiscount3 -kndiscount4 > -interpolate -interpolate1 -interpolate2 -interpolate3 -interpolate4 > -lm temp.lm -gt1min 0 -gt2min 0 -gt3min 0 -gt4min 0 -debug 1 > > Output was: > > temp.txt: line 50000: 50000 sentences, 6348678 words, 0 OOVs > 0 zeroprobs, logprob= 0 ppl= 1 ppl1= 1 > modifying 1-gram counts for Kneser-Ney smoothing > Kneser-Ney smoothing 1-grams > n1 = 3 > n2 = 0 > n3 = 1 > n4 = 0 > one of required modified KneserNey count-of-counts is zero > error in discount estimator for order 1 The count-of-count statistics of your data are not suitable for KN smoothing. They are also very odd: you have 6348678 words, yet only 3 words occurring once, 0 words occurring twice, etc. I suspect you data was artificially generated or manipulated in some way. In any case, please try another smoothing method that is not based on counts-of-counts, such as Witten-Bell. --Andreas From stolcke at speech.sri.com Thu Oct 21 18:18:16 2004 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Thu, 21 Oct 2004 18:18:16 PDT Subject: witten_bell discounting between SRL and CMU are incompatible ? In-Reply-To: Your message of Thu, 21 Oct 2004 17:52:51 -0700. <417859E3.3050309@u.washington.edu> Message-ID: <200410220118.SAA21529@huge> In message <417859E3.3050309 at u.washington.edu>you wrote: > > I use the following command to evaluate these LMs using ngram: > > ngram -unk -lm edge_word_linear3/edge_word_linear3.bin.arpa -ppl > /u/yangmei/srilm/src/normtext/edge.test.norm.check > file /u/yangmei/srilm/src/normtext/edge.test.norm.check: 2576 sentences, > 35920 words, 1372 OOVs > 0 zeroprobs, logprob= -125966 ppl= 2472.37 ppl1= 4427.05 Mei Yang, one small problem is that the CMU LM uses the word for the unknown word, whereas SRILM uses (lowercase). you can fix that by running ngram -map-unk '' ... This replaces all unknown words with . However, that is not the reason for the high perplexities. The end-of-sentence symbol in your CMU-generated LM has a unigram probability that is essentially 0 (log = -98.9923). So every time you backoff to a unigram for predicting the perplexity goes through the roof. I would consider this a bug in the LM construction. The problem is made worse by the fact that there are no ngrams containing (other than the unigram), so after an OOV word you always back off to unigram, and if that happens to be the end-of-sentence the whole sentence gets probability 0. I suspect that the CMU perplexity computation either excludes from the computation, or excludes the word after an OOV. I think both are inappropriate, but the handling of OOVs in perplexity computation is not well standardized. I suggest you run ngram -debug 2 -ppl .... to output all the conditional word probabilties, and do something equivalent with the CMU tools, and compare the probabilities. They should match exactly. The difference will probably come from (a) which words are excluded in the overall perplexity, and (b) what the denominator in the computation is. For SRILM, the denominator is the sum of all non-OOV words and end-of-sentence tokens, both of which are reported in the output. --Andreas From stolcke at speech.sri.com Wed Oct 27 02:19:33 2004 From: stolcke at speech.sri.com (Stolcke) Date: Wed, 27 Oct 2004 10:19:33 +0100 Subject: Hello Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: DELETED0.TXT URL: From stolcke at speech.sri.com Mon Nov 15 13:39:13 2004 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Mon, 15 Nov 2004 13:39:13 PST Subject: SRILM and gcc 3.4.3 Message-ID: <200411152139.NAA04076@tonga> FYI, the patch below is necessary to get SRILM to compile with the latest version of the gcc compiler. It might work for older gcc 3.4.x releases, too, but I haven't tried it. --Andreas *** /tmp/T0p3XbS_ Mon Nov 15 13:35:18 2004 --- dstruct/src/LHash.cc Sun Nov 14 20:45:54 2004 *************** *** 9,15 **** #ifndef lint static char LHash_Copyright[] = "Copyright (c) 1995-1998 SRI International. All Rights Reserved."; ! static char LHash_RcsId[] = "@(#)$Header: /home/srilm/devel/dstruct/src/RCS/LHash.cc,v 1.44 2003/07/01 06:03:35 stolcke Exp $"; #endif #include --- 9,15 ---- #ifndef lint static char LHash_Copyright[] = "Copyright (c) 1995-1998 SRI International. All Rights Reserved."; ! static char LHash_RcsId[] = "@(#)$Header: /home/srilm/devel/dstruct/src/RCS/LHash.cc,v 1.45 2004/11/15 04:45:47 stolcke Exp $"; #endif #include *************** *** 22,28 **** #undef INSTANTIATE_LHASH #define INSTANTIATE_LHASH(KeyT, DataT) \ ! DataT *LHash< KeyT, DataT >::removedData = 0; \ template class LHash< KeyT, DataT >; \ template class LHashIter< KeyT, DataT > --- 22,28 ---- #undef INSTANTIATE_LHASH #define INSTANTIATE_LHASH(KeyT, DataT) \ ! template <> DataT *LHash< KeyT, DataT >::removedData = 0; \ template class LHash< KeyT, DataT >; \ template class LHashIter< KeyT, DataT > *** /tmp/T00K4fDt Mon Nov 15 13:35:47 2004 --- dstruct/src/SArray.cc Sun Nov 14 20:45:55 2004 *************** *** 9,15 **** #ifndef lint static char SArray_Copyright[] = "Copyright (c) 1995-1998 SRI International. All Rights Reserved."; ! static char SArray_RcsId[] = "@(#)$Header: /home/srilm/devel/dstruct/src/RCS/SArray.cc,v 1.35 2003/07/01 06:03:35 stolcke Exp $"; #endif #include --- 9,15 ---- #ifndef lint static char SArray_Copyright[] = "Copyright (c) 1995-1998 SRI International. All Rights Reserved."; ! static char SArray_RcsId[] = "@(#)$Header: /home/srilm/devel/dstruct/src/RCS/SArray.cc,v 1.36 2004/11/15 04:45:47 stolcke Exp $"; #endif #include *************** *** 22,28 **** #undef INSTANTIATE_SARRAY #define INSTANTIATE_SARRAY(KeyT, DataT) \ ! DataT *SArray< KeyT, DataT >::removedData = 0; \ template class SArray< KeyT, DataT >; \ template class SArrayIter< KeyT, DataT > --- 22,28 ---- #undef INSTANTIATE_SARRAY #define INSTANTIATE_SARRAY(KeyT, DataT) \ ! template <> DataT *SArray< KeyT, DataT >::removedData = 0; \ template class SArray< KeyT, DataT >; \ template class SArrayIter< KeyT, DataT > *** /tmp/T0HUxS3_ Mon Nov 15 13:36:12 2004 --- lm/src/ngram-count.cc Sun Nov 14 20:46:20 2004 *************** *** 6,12 **** #ifndef lint static char Copyright[] = "Copyright (c) 1995-2002 SRI International. All Rights Reserved."; ! static char RcsId[] = "@(#)$Header: /home/srilm/devel/lm/src/RCS/ngram-count.cc,v 1.48 2003/10/10 01:23:39 stolcke Exp $"; #endif #include --- 6,12 ---- #ifndef lint static char Copyright[] = "Copyright (c) 1995-2002 SRI International. All Rights Reserved."; ! static char RcsId[] = "@(#)$Header: /home/srilm/devel/lm/src/RCS/ngram-count.cc,v 1.49 2004/11/15 04:46:15 stolcke Exp $"; #endif #include *************** *** 390,396 **** * This stores the discounting parameters for the various orders * Note this is only needed when estimating an LM */ ! Discount **discounts = new (Discount *)[order]; assert(discounts != 0); for (i = 0; i < order; i ++) { --- 390,396 ---- * This stores the discounting parameters for the various orders * Note this is only needed when estimating an LM */ ! Discount **discounts = new Discount *[order]; assert(discounts != 0); for (i = 0; i < order; i ++) { *** /tmp/T00yHSdk Mon Nov 15 13:36:32 2004 --- lm/src/nbest-optimize.cc Sun Nov 14 20:46:20 2004 *************** *** 5,11 **** #ifndef lint static char Copyright[] = "Copyright (c) 2000-2004 SRI International. All Rights Reserved."; ! static char RcsId[] = "@(#)$Header: /home/srilm/devel/lm/src/RCS/nbest-optimize.cc,v 1.36 2004/09/09 10:44:31 stolcke Exp $"; #endif #include --- 5,11 ---- #ifndef lint static char Copyright[] = "Copyright (c) 2000-2004 SRI International. All Rights Reserved."; ! static char RcsId[] = "@(#)$Header: /home/srilm/devel/lm/src/RCS/nbest-optimize.cc,v 1.37 2004/11/15 04:46:15 stolcke Exp $"; #endif #include *************** *** 1646,1652 **** /* * Allocate score matrix for this nbest list */ ! NBestScore **scores = new (NBestScore *)[numScores]; assert(scores != 0); for (unsigned i = 0; i < numScores; i ++) { --- 1646,1652 ---- /* * Allocate score matrix for this nbest list */ ! NBestScore **scores = new NBestScore *[numScores]; assert(scores != 0); for (unsigned i = 0; i < numScores; i ++) { From =?utf-8?Q?=CE=9C=CF=80=CE=AC=CF=83=CF=83=CE=B9=CE=BF=CF=85?= Fri Nov 19 01:31:48 2004 From: =?utf-8?Q?=CE=9C=CF=80=CE=AC=CF=83=CF=83=CE=B9=CE=BF=CF=85?= (=?utf-8?Q?=CE=9C=CF=80=CE=AC=CF=83=CF=83=CE=B9=CE=BF=CF=85?=) Date: Fri, 19 Nov 2004 11:31:48 +0200 Subject: Deleted interpolation Message-ID: <20041119113148.9476ab65@cronus.aiia.csd.auth.gr> Is deleted interpolation according to Jelinek's Method supported in SRILM. Can anyone give me some details? Thank you Nikoletta From stolcke at speech.sri.com Fri Nov 19 08:41:28 2004 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Fri, 19 Nov 2004 08:41:28 PST Subject: Deleted interpolation In-Reply-To: Your message of Fri, 19 Nov 2004 11:31:48 +0200. <20041119113148.9476ab65@cronus.aiia.csd.auth.gr> Message-ID: <200411191641.IAA20360@huge> In message <20041119113148.9476ab65 at cronus.aiia.csd.auth.gr>you wrote: > Is deleted interpolation according to Jelinek's Method supported in SRILM. > > Can anyone give me some details? Deleted interpolation is not implemented in SRILM, the reason being mainly that hardly anybody uses it these days. Other smoothing methods that don't require held-out data (like Kneser-Ney) have been shown to be almost always superior (cf. the Chen & Goodman paper on smoothing methods). --Andreas From wellingt at cs.nyu.edu Fri Dec 31 01:20:34 2004 From: wellingt at cs.nyu.edu (Ben Wellington) Date: Fri, 31 Dec 2004 04:20:34 -0500 Subject: linking problems Message-ID: <41D519E2.9040508@cs.nyu.edu> Greetings. I am new to the toolkit and I am having a big linking problem. I wrote a program which among other things has the following lines: Vocab *vocab = new Vocab; assert(vocab != 0); ngramLMs[d] = new Ngram(*vocab, order); I then readin a model file and query it. However, while linking my program, I get: /home/wellingt/dev/MOTE/obj/Linux/NGramModel.o(.text+0x347): In function `NGramModel::NGramModel(std::vector >, char const*)': /usr/include/c++/3.2.3/bits/stl_vector.h:1006: undefined reference to `Vocab::Vocab(unsigned int, unsigned int)' /home/wellingt/dev/MOTE/obj/Linux/NGramModel.o(.text+0x41c): In function `NGramModel::NGramModel(std::vector >, char const*)': /home/wellingt/dev/MOTE/src/NGramModel.C:65: undefined reference to `Ngram::Ngram(Vocab&, unsigned int)' /home/wellingt/dev/MOTE/obj/Linux/NGramModel.o(.text+0x9fd): In function `NGramModel::NGramModel(std::vector >, char const*)': /home/wellingt/dev/MOTE/src/NGramModel.C:62: undefined reference to `Vocab::Vocab(unsigned int, unsigned int)' /home/wellingt/dev/MOTE/obj/Linux/NGramModel.o(.text+0xad2):/home/wellingt/dev/MOTE/src/NGramModel.C:65: undefined reference to `Ngram::Ngram(Vocab&, unsigned int)' collect2: ld returned 1 exit status I have included the library and the headers. I made sure everything that was included in the ngram.cc compilation is also in mine. But while I have no issues compiling ngram.cc, I always get the above errors on my file. the following is in my linking and object file compilation: -lflm -ldstruct -lmisc -lm -ldl -ltcl -L/home/wellingt/dev/srilm/lib/i686 -I/home/wellingt/dev/srilm/include /home/wellingt/dev/srilm/lm/obj/i686/liboolm.a what am I missing? Any help on this would be appreciated. Thank you, Benjamin Wellington New York University From stolcke at speech.sri.com Fri Dec 31 12:46:09 2004 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Fri, 31 Dec 2004 12:46:09 PST Subject: linking problems In-Reply-To: Your message of Fri, 31 Dec 2004 04:20:34 -0500. <41D519E2.9040508@cs.nyu.edu> Message-ID: <200412312046.MAA06884@tonga> Where you able to build the standard "ngram" program in SRILM without problems? If so the problems you are having now are most likely due to a discrepancy in the compiler or linker flags used. Record the command line flags used in SRILM building and linking. Then use the exact same options when compiling and linking your own program. --Andreas In message <41D519E2.9040508 at cs.nyu.edu>you wrote: > Greetings. > > I am new to the toolkit and I am having a big linking problem. I wrote > a program which among other things has the following lines: > > Vocab *vocab = new Vocab; > assert(vocab != 0); > > ngramLMs[d] = new Ngram(*vocab, order); > > I then readin a model file and query it. > > However, while linking my program, I get: > > > /home/wellingt/dev/MOTE/obj/Linux/NGramModel.o(.text+0x347): In function > `NGramModel::NGramModel(std::vector std::allocator >, char const*)': > /usr/include/c++/3.2.3/bits/stl_vector.h:1006: undefined reference to > `Vocab::Vocab(unsigned int, unsigned int)' > /home/wellingt/dev/MOTE/obj/Linux/NGramModel.o(.text+0x41c): In function > `NGramModel::NGramModel(std::vector std::allocator >, char const*)': > /home/wellingt/dev/MOTE/src/NGramModel.C:65: undefined reference to > `Ngram::Ngram(Vocab&, unsigned int)' > /home/wellingt/dev/MOTE/obj/Linux/NGramModel.o(.text+0x9fd): In function > `NGramModel::NGramModel(std::vector std::allocator >, char const*)': > /home/wellingt/dev/MOTE/src/NGramModel.C:62: undefined reference to > `Vocab::Vocab(unsigned int, unsigned int)' > /home/wellingt/dev/MOTE/obj/Linux/NGramModel.o(.text+0xad2):/home/wellingt/de > v/MOTE/src/NGramModel.C:65: > undefined reference to `Ngram::Ngram(Vocab&, unsigned int)' > collect2: ld returned 1 exit status > > > I have included the library and the headers. > I made sure everything that was included in the ngram.cc compilation is > also in mine. > But while I have no issues compiling ngram.cc, I always get the above > errors on my file. > > the following is in my linking and object file compilation: > > -lflm -ldstruct -lmisc -lm -ldl -ltcl > -L/home/wellingt/dev/srilm/lib/i686 > -I/home/wellingt/dev/srilm/include > /home/wellingt/dev/srilm/lm/obj/i686/liboolm.a > > what am I missing? Any help on this would be appreciated. > > Thank you, > Benjamin Wellington > New York University