From Antoine.Ghaoui at jinny.ie Mon Apr 9 03:14:02 2007 From: Antoine.Ghaoui at jinny.ie (Antoine Ghaoui) Date: Mon, 9 Apr 2007 13:14:02 +0300 Subject: FLM Message-ID: <434213AD-8977-4AC6-9B07-C6B923CEC4DA@jinny.ie> Hello, I'm using FLM to test some models. I'm using the same data and the same vocabulary in both tools, ngram- count and fngram-count. I'm not able to generate the same trigram model. The number of bigram and trigram in the LM files generated are different. using ngram-count, I'm getting: \data\ ngram 1=315 ngram 2=23800 ngram 3=120408 using fngram-count, I'm getting: \data\ ngram 0x0=315 ngram 0x1=23523 ngram 0x2=0 ngram 0x3=86366 knowing that ngram-count is used with the default parameters and the factor file for the fngram-count is: ##rule trigram 1 U : 2 U(-1) U(-2) ntextfile.flm.cnt ntextfile.flm.lm 3 U1U2 U2 wbdiscount gtmin 3 interpolate U1 U1 wbdiscount gtmin 1 interpolate 0 0 What are the parameters to use in the factor file in order to get the same LM output? Thanks Antoine -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at speech.sri.com Mon Apr 9 22:47:45 2007 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Mon, 09 Apr 2007 22:47:45 -0700 Subject: FLM In-Reply-To: <434213AD-8977-4AC6-9B07-C6B923CEC4DA@jinny.ie> References: <434213AD-8977-4AC6-9B07-C6B923CEC4DA@jinny.ie> Message-ID: <461B2501.8010300@speech.sri.com> Antoine Ghaoui wrote: > Hello, > > I'm using FLM to test some models. > > I'm using the same data and the same vocabulary in both tools, > ngram-count and fngram-count. > I'm not able to generate the same trigram model. > The number of bigram and trigram in the LM files generated are different. > > using ngram-count, I'm getting: > \data\ > ngram 1=315 > ngram 2=23800 > ngram 3=120408 > > using fngram-count, I'm getting: > \data\ > ngram 0x0=315 > ngram 0x1=23523 > ngram 0x2=0 > ngram 0x3=86366 > > knowing that ngram-count is used with the default parameters and the > factor file for the fngram-count is: > > ##rule trigram > 1 > U : 2 U(-1) U(-2) ntextfile.flm.cnt ntextfile.flm.lm 3 > U1U2 U2 wbdiscount gtmin 3 interpolate > U1 U1 wbdiscount gtmin 1 interpolate > 0 0 > > What are the parameters to use in the factor file in order to get the > same LM output? For one thing, the default gtmin values in ngram-count are unigrams 1 bigrams 1 trigrams 2 Andreas From stolcke at speech.sri.com Thu Apr 12 22:46:09 2007 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Thu, 12 Apr 2007 22:46:09 -0700 Subject: SRILM. In-Reply-To: <481682.37738.qm@web7606.mail.in.yahoo.com> References: <481682.37738.qm@web7606.mail.in.yahoo.com> Message-ID: <461F1921.3030409@speech.sri.com> milu philip wrote: > Sir, > > I am a student of Amrita Vishwa Vidyappethom doing my Mtech in the > field of computational engineering and networking. I am doing a > project by name Languag eIdentification Using Statistical > Approaches.The techniques used are PRLM and PPRLM. The softwares being > used are phone recogniser and SRILM. SRILM is used for language > modelling and phone recogniser is used for creation of phones from a > speech sample. > > After the phones are obtained , it is given as input to SRILM for > language modelling.Once the training language models are obtained, the > phone sof the testing language is given. My doubt is that when the > command > > ngram -ppl TEST.text -lm TRAINING.lm > > of SRILM is given, we get perplexity along with a logprob value. Is it > the phone sequence of the testing language which is mentioned as > "TEST.text" in the above mentioned command or can we create language > model for the test sequence and then give it as -ppl expects a text file containing test data. One sentence per line. > > ngram -ppl TEST.lm -lm TRAINING.lm > > In both the cases the logprob values are different. the second command would produce garbage, as the LM file would be interpreted as test data. Andreas > > Is there any other command in SRILM, which can be used to obtain the > probability that a trained language model produces a particular test > sequence? > Please help me out by clarifying this doubt. > > Thanking you, > > Milu. > > ------------------------------------------------------------------------ > Check out what you're missing if you're not on Yahoo! Messenger > From stolcke at speech.sri.com Thu Apr 12 22:44:32 2007 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Thu, 12 Apr 2007 22:44:32 -0700 Subject: htk-words-on-nodes option in lattice-tool In-Reply-To: References: Message-ID: <461F18C0.8040406@speech.sri.com> jpinto at idiap.ch wrote: > Hello, > > I have a phoneme lattice (obtained from NOWAY decoder) with phoneme > tokens on the links (edges). I wish to convert this to HTK format with > phoneme info on nodes and I do the following: > > lattice-tool -in-lattice input.lat -read-htk -write-htk -out-lattice > output.lattice -htk-words-on-nodes > > I observe that the output lattice has more number of nodes & links > (NODES=448 LINKS=766) compared to the input lattice (N=65 L=383) > > when I dont give the option -htk-words-on-nodes, nodes and links remain > the same. > > I dont understand why the number of nodes and links should increase. Am I > missing something ? Any help in this regard would be very helpful. > That's because when you move attributes from links to nodes you might have to duplicate nodes to create an equivalent lattice. In fact, the way SRILM reads HTK lattices is by converting each link to a node, thereby enabling the -htk-words-on-nodes mapping. Unfortunately, the code is not smart enough to avoid the duplication even when it is not really necessary given how the links are originally labeled. Note: lattice-tool is not meant to be a general HTK lattice format manipulation tool. You would think HTK has better tools for that. Andreas From sara_abd_elhamed at yahoo.com Fri Apr 13 09:26:22 2007 From: sara_abd_elhamed at yahoo.com (Sara Abd-ElHamed) Date: Fri, 13 Apr 2007 09:26:22 -0700 (PDT) Subject: Question Message-ID: <197099.73124.qm@web90408.mail.mud.yahoo.com> I want to know how to run the Factored Lnaguage Model (FLM) that is inside the SRILM. Thanks in advance for help. --------------------------------- Ahhh...imagining that irresistible "new car" smell? Check outnew cars at Yahoo! Autos. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at speech.sri.com Thu Apr 19 16:09:37 2007 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Thu, 19 Apr 2007 13:09:37 -1000 Subject: Problems about srilm In-Reply-To: <20070419081442.M92426@cyut.edu.tw> References: <20070419081442.M92426@cyut.edu.tw> Message-ID: <4627F6B1.70507@speech.sri.com> ??? wrote: > Hello! > I am a student from Taiwan. > I have some questions when I encountered difficulties in using srilm. The > problem is as the attaching field. And when I made google n-gram models, I > also encountered the same problem. Would you please tell me what the mistake > did I make? Thank you! > It is impossible to read the entire google 5gram corpus into memory, which is what you are trying to do. You have to use the count-based LM, and estimate deleted interpolation weights from a small amount of data, so that only a small portion of the ngrams need to be kept in memory. I'm sorry there is no good documentation of this process at this point (you can piece it together by reading the manual pages for ngram-count and ngram, and look at the example in $SRILM/test/tests/ngram-count-lm-limit-vocab/run-test We will make complete instructions for google ngram usage available in the future. Andreas > -- > Chaoyang University of Technology > WebMail http://webmail.cyut.edu.tw > > > > > > ------------------------------------------------------------------------ > From stolcke at speech.sri.com Mon Apr 23 22:40:56 2007 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Mon, 23 Apr 2007 22:40:56 -0700 Subject: ngram -nbest-files In-Reply-To: <564887.5274.qm@web92012.mail.cnb.yahoo.com> References: <564887.5274.qm@web92012.mail.cnb.yahoo.com> Message-ID: <462D9868.8050106@speech.sri.com> ?? ? wrote: > Hi, I have some problems in rescoring multiple n-best list. The ngram -ppl option can yield the language model probability of each sentence, but can't deal with mulpitle n-best list at one run. I then tried to use ngram -nbest-files option to rescore multiple n-best lists. But the language model score obtained was quite different from those from the above -ppl option. Aren't they both log probability (base 10) of a sentence? Any help will be greatly appreciated. Regards, Wenxiao > > ------------------------------------------------------------------------ > ????????-3.5G???20M??? You need to make sure the nbest lists are in proper format, which is different from the format accepted by -ppl. The nbest formats are described in the nbest-format(5) man page. N-best rescoring should give the same log-10 probabilities are -ppl. If not, please send a minimal example to reproduce the problem. Andreas From stolcke at speech.sri.com Mon Apr 23 22:44:45 2007 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Mon, 23 Apr 2007 22:44:45 -0700 Subject: Problems about srilm In-Reply-To: <46287DDA.9040707@cyut.edu.tw> References: <20070419081442.M92426@cyut.edu.tw> <4627F6B1.70507@speech.sri.com> <46287DDA.9040707@cyut.edu.tw> Message-ID: <462D994D.4040603@speech.sri.com> ?? wrote: > Thank you very much for your answering. > I have another question that if I only want to train *google 3-gram > language model*, what instructions should I use? > I have referred to the pages and tried the instructions, but it still > did not work out. > Is the reason the same as memory not big enough? > Even just the google 3-grams will be way to big to all fit into memory. > Could you give me an *example* about bulilding google 3-gram LM file > ,please? > Again, this will require using the -count-lm option with some tricks that are not documents as yet. Please be patient (or read all the manual pages carefully to figure it our yourself.) > > I figured out maybe there are two methods to resolve the problem: > 1.Build the google 3-gram LM file by batches of reading google corpus > and then build the complete google 3-gram LM file. > But I need to know that is there any instruction to build the google > 3-gram LM file by *batches of reading google corpus*? > This won't work because the smoothing methods for backoff LMs require access to the entire ngram set to compute its discounting estimates. > 2.I trained small language models individually from google files and > then combined pieces of google 3-gram LM files. > But I need to know that is there any instruction to *combine pieces of > google 3-gram LM files*? > Sorry, that won't work, for the same reason as above. Andreas From alumae at gmail.com Tue Apr 24 03:01:12 2007 From: alumae at gmail.com (Tanel) Date: Tue, 24 Apr 2007 13:01:12 +0300 Subject: lattice-tool and noise probability Message-ID: <8abdac980704240301yf711706w44052bc16f961643@mail.gmail.com> Hello, When rescoring lattices using lattice-tool, is there a possibility (or a workaround) to assign a LM probability to noise words? The noise words should still be skipped when calculating n-gram probabilities for other words. I understand that currently, noise words get a LM probability of log-zero, which may make them too probable to be inserted in place of other candidates. Regards, Tanel From stolcke at speech.sri.com Tue Apr 24 07:37:00 2007 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Tue, 24 Apr 2007 07:37:00 -0700 Subject: lattice-tool and noise probability In-Reply-To: <8abdac980704240301yf711706w44052bc16f961643@mail.gmail.com> References: <8abdac980704240301yf711706w44052bc16f961643@mail.gmail.com> Message-ID: <462E160C.7080804@speech.sri.com> Tanel wrote: > Hello, > > When rescoring lattices using lattice-tool, is there a possibility (or > a workaround) to assign a LM probability to noise words? The noise > words should still be skipped when calculating n-gram probabilities > for other words. > > I understand that currently, noise words get a LM probability of > log-zero, which may make them too probable to be inserted in place of > other candidates. No, there is no such provision in lattice-tool. It should be really easy to write perl script that reads a lattice file and insert a constant LM score of your choosing for noise words. Andreas > > Regards, > Tanel From stolcke at speech.sri.com Tue Apr 24 10:29:37 2007 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Tue, 24 Apr 2007 10:29:37 PDT Subject: Class-based LM using the SRILM toolkit? In-Reply-To: Your message of Wed, 18 Apr 2007 23:31:30 +0530. Message-ID: <200704241729.l3OHTbJ29944@huge> In message you wro te: > Dear Dr. Stolcke, > > Thank you for your attention. > > Is there no way to construct a class-based LM by pre-defining the > classes to be used (vis-a-vis inducing them)? The class-format man > page does mention how classes may be defined by hand, but this format > requires the specification of the class expansion probabilities as > well. Can these probabilities be calculated by a program in the > toolkit? Correct me if I'm wrong, but these probabilities are given by > (for a certain word wi, and class ci) : Number of times wi occurs in > class ci/Number of times words in class ci occur. You (1) define your classes by hand, using dummy probabilities. (2) use the replace-words-with-classes with options outfile=FILE normalize=1 on some training data. This is documented in the training-scripts(5) man page. > Also, is the file that is generated by the ngram-class -class-counts > option in the same format as class-format? Can a file in the > class-format format be used directly by the ngram-count program to > learn a class-based LM? The -class-counts output is in the right format to be used as a count input file for ngram-count to estimate a bigram LM for the class labels. However, this will only work for bigram LMs since ngram-class doesn't use higher-order statistics. The recommended procedure is to again use the replace-words-with-classes command to insert class labels in your LM training data, and then use ngram-count on the transformed data to estimate the class ngram probabilities. Andreas From stolcke at speech.sri.com Wed Apr 25 16:27:24 2007 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Wed, 25 Apr 2007 16:27:24 -0700 Subject: FLM in SRILM In-Reply-To: <859958.67801.qm@web90404.mail.mud.yahoo.com> References: <859958.67801.qm@web90404.mail.mud.yahoo.com> Message-ID: <462FE3DC.5000306@speech.sri.com> Sara Abd-ElHamed wrote: > How can i run FLM that is in SRILM? > > ------------------------------------------------------------------------ > Ahhh...imagining that irresistible "new car" smell? > Check out new cars at Yahoo! Autos. > Sara, you question is much too general. You should first read the available documents in $SRILM/flm/doc/. Then look at the example in $SRILM/tests/fngram-count . If you then have specific questions you can direct them to the srilm-user mailing list (the SRILM web page tells you how to join). Andreas From dianaduraiz at gmail.com Thu Apr 26 10:39:35 2007 From: dianaduraiz at gmail.com (=?ISO-8859-1?Q?Diana_Dur=E1n?=) Date: Thu, 26 Apr 2007 19:39:35 +0200 Subject: Perplexity Message-ID: Hello, I would like to know what the difference is between ppl and ppll at the output file when executing ngram. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ioparin at yahoo.co.uk Thu Apr 26 11:30:23 2007 From: ioparin at yahoo.co.uk (ilya oparin) Date: Thu, 26 Apr 2007 19:30:23 +0100 (BST) Subject: Perplexity In-Reply-To: Message-ID: <627682.9664.qm@web25402.mail.ukl.yahoo.com> Diana, Here's an abstract from SRILM manpages on ngram: "Perplexity is given with two different normalizations: counting all input tokens (``ppl'') and excluding end-of-sentence tags (``ppl1'')." --- Diana Dur?n wrote: > Hello, > > I would like to know what the difference is between > ppl and ppll at the > output file when executing ngram. > > Thank you. > best regards, Ilya ___________________________________________________________ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ From marco.turchi at gmail.com Fri May 4 08:12:14 2007 From: marco.turchi at gmail.com (marco turchi) Date: Fri, 4 May 2007 16:12:14 +0100 Subject: question Message-ID: <79a042480705040812h4912178dqeb24acf21bd26f84@mail.gmail.com> Dear experts, I have a strange question for u. if I have two language models, LM1 and LM2, does Srilm have any scripts to merge them in only 1 language model LM3? Thanks a lot Marco From ioparin at yahoo.co.uk Fri May 4 09:11:09 2007 From: ioparin at yahoo.co.uk (ilya oparin) Date: Fri, 4 May 2007 17:11:09 +0100 (BST) Subject: question In-Reply-To: <79a042480705040812h4912178dqeb24acf21bd26f84@mail.gmail.com> Message-ID: <670739.73588.qm@web25410.mail.ukl.yahoo.com> Dear Marco, In ngram use -mix-lm options to interpolate the models and then write resulting LM back with -write-lm . http://www.speech.sri.com/projects/srilm/manpages/ngram.html --- marco turchi wrote: > Dear experts, > I have a strange question for u. > if I have two language models, LM1 and LM2, does > Srilm have any > scripts to merge them in only 1 language model LM3? > > Thanks a lot > Marco > best regards, Ilya Send instant messages to your online friends http://uk.messenger.yahoo.com From bplank at science.uva.nl Tue May 15 07:04:01 2007 From: bplank at science.uva.nl (B. Plank) Date: Tue, 15 May 2007 16:04:01 +0200 (CEST) Subject: write-vocab Message-ID: <3169.145.116.12.178.1179237841.squirrel@webmail.science.uva.nl> Dear SRILM-team, is there a parameter to get the n most frequent words out of a LM? (i.e. like restricing the write-vocab of "ngram -order 1" to just output the n-most frequent words?) I am sure there is, just now I don't see it. Thank you for any help, Barbara From stolcke at speech.sri.com Tue May 15 09:28:24 2007 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Tue, 15 May 2007 09:28:24 -0700 Subject: write-vocab In-Reply-To: <3169.145.116.12.178.1179237841.squirrel@webmail.science.uva.nl> References: <3169.145.116.12.178.1179237841.squirrel@webmail.science.uva.nl> Message-ID: <4649DFA8.8080800@speech.sri.com> B. Plank wrote: > Dear SRILM-team, > > is there a parameter to get the n most frequent words out of a LM? (i.e. > like restricing the write-vocab of "ngram -order 1" to just output the > n-most frequent words?) I am sure there is, just now I don't see it. > > Thank you for any help, > Barbara > > Actually, there is no such tool. The frequency of words is not generally available in the LM, only their unigram probabilities. Since the unigram probabilities are usually a monotonic function of the unigram frequencies you could write a small script that extracts the words from the unigram section of the LM file and sorts them by their probabilities. Andreas From stolcke at speech.sri.com Mon May 21 09:00:35 2007 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Mon, 21 May 2007 09:00:35 -0700 Subject: Inquiry about expanding interpolated LM (class model + word model) to word model In-Reply-To: References: <000701c79a43$d8150110$7c216b82@speech.sri.com> <200705200318.l4K3ITY04648@huge> Message-ID: <4651C223.4070103@speech.sri.com> Xiaodan Zhuang wrote: > Dear Andreas, > > Thanks for your input. > > It looks to me that the output class ngram (either just > -expand-classes or further interpolated with some word ngram) is a > normal class ngram followed by the probability of a class emitting a > particular word. Is that just the class ngram appended by the class > definition file? Correct. > > If I need to convert the class-ngram or the interpolated LM into a > pure word ngram model for use elsewhere, shall I replace for example > the lines in the following 2-grams as indicated: > > 2-grams: > pp1(log) class-1 class-2 >>>>>> change to all possible word pairs, such as "pp1+log(p11)+log(p29) > word-1 word-9" and "pp1+log(p12)+log(p29) word-2 word-9" > > pp2(log) class-1 word-5 >>>>> change to "pp2+log(p11) word-1 word-5" and "pp2+log(p12) word-2 >>>>> word-5" > > > [[class definition: > class-1 p11 word-1 > class-1 p12 word-2 > class-2 p29 word-9 > ]] Yes, except you need to take care to sum probabilities of word ngrams that are generated through multiple distinct class expansions. Also, in SRILM, classes may have multi-word strings as members, further complicating the situation. The good news is that ngram -expand-classes already does all this for you. Beware that expanding large or even moderate-sized class ngrams may not be feasible computationally depending on the cardinality of your classes Andreas From stolcke at speech.sri.com Tue May 29 20:56:03 2007 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Tue, 29 May 2007 20:56:03 PDT Subject: Class-based LM using the SRILM toolkit? In-Reply-To: Your message of Mon, 21 May 2007 23:13:33 +0530. Message-ID: <200705300356.l4U3u3R26372@huge> > > Dear Dr. Stolcke, > > Thank you once again for your invaluable help. > > I have now developed two LMs using your toolkit - a trigram word-based model > and a class-based model (static models). I now want to interpolate them and > then apply some form of smoothing on the resultant LM. The ngram program in > the toolkit has a -mix-lm option which allows linear interpolation; the > manpages for that option mention: > > "*NOTE: *Unless *-bayes *(see below) is specified, *-mix-lm *triggers a > static interpolation of the models in memory. In most cases a more > efficient, dynamic interpolation is sufficient, requested by *-bayes > 0*.**Also, mixing models of different type ( > e.g., word-based and class-based) will *only *work correctly with dynamic > interpolation." > > What is dynamic interpolation? Is it applicable in my case? Can Dynamic interpolation means that the probabilities of the interpolated model are computed on-the-fly, at test time. Static interpolation, by contrast, means that a single model is created ahead of testing, containing the interpolated probabilities in the usual backoff format. This is only possible for models of the same type, as explained in the note above. > mixing/interpolation of these models be perfomed only with the -dynamic > option? In that case, how? The -dynamic option has nothing to do with dynamic interpolation of the kind we are discussing here. Dynamic interpolation is enabled by the -bayes option. > > Also, what is the -bayes interpolation method about? The manpages say for > the -bayes option: > "Interpolate the second and the main model using posterior probabilities for > local N-gram-contexts of length *length*." > What are you referring to by "N-gram contexts"? Are only the posterior > probabilities interpolated here? If possible, please provide me with a link > to a reference text etc. where I can learn more about this. For an explanation of Bayesian interpolation please consult the technical report cited at the bottom of the ngram(1) man page. You can get it at http://www.speech.sri.com/cgi-bin/run-distill?papers/lm95-report.ps.gz then check Section 2.3. Andreas From svmats at yahoo.com Tue May 29 23:57:12 2007 From: svmats at yahoo.com (Mats Svenson) Date: Tue, 29 May 2007 23:57:12 -0700 (PDT) Subject: Perplexity in "ngram" Message-ID: <130070.98900.qm@web31606.mail.mud.yahoo.com> Hi, I have tried to use "ngram" to count perplexity of my LMs. However, I am not sure how does the srilm implementation treat OOVs in terms of counting perplexity. Is it that "log P(|history) != 0" or OOVs are just ignored? If a model with a higher number of OOVs has a lower perplexity than another LM, does it mean that it is "better" in this -ppl implementation? Second, in some discussions, I have heard about -ppl1 option, but the current version does not seem to have it. In what -ppl1 differs from -ppl? Third, is there a way how to meaningfully compute perplexity for a hidden event LM? Or another way how to evaluate hidden event LM quality? Thanks for your help, Mats ____________________________________________________________________________________ Get your own web address. Have a HUGE year through Yahoo! Small Business. http://smallbusiness.yahoo.com/domains/?p=BESTDEAL From ioparin at yahoo.co.uk Fri Jun 1 04:54:43 2007 From: ioparin at yahoo.co.uk (ilya oparin) Date: Fri, 1 Jun 2007 12:54:43 +0100 (BST) Subject: [SRILM] lattice rescoring with FLM Message-ID: <335592.97241.qm@web25411.mail.ukl.yahoo.com> Hi, everybody, I'm doing lattice rescoring with FLM (SRILM 1.5.0). However, I somehow screw it up - may be somebody could give me some hints on possible inconsistencies? I have lattices generated with large bigram model and then rescore it with small domain-specific trigram model (word and FLM). For FLM rescoring I convert all word nodes (HTK format) to FLM representation (e.g. W=HELLO -> W=W-HELLO:S-HELLO:I-NULL:T-Db--x-) and then rescore with lattice-tool -in-lattice-list [list] -unk -read-htk -no-nulls -no-htk-nulls -htk-words-on-nodes -factored -lm [FLM_specification_file] -write-htk -htk-logbase 2.71828 -htk-lmscale 12.0 -htk-wdpenalty -10.0 -out-lattice-dir [dir] I get lots of warnings "might produce unnormalized LM" (that might be expected due to small size of domain-specific training data?) So, even if I use FLM that imitates simple word trigram LM, the final accuracy on rescored lattices is 3 times lower than with conventional LM. When I check the perplexities, they coincide. So, there should be something with the way I do FLM lattice rescoring. Did I miss something? best regards, Ilya ___________________________________________________________ Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html From stolcke at speech.sri.com Fri Jun 1 12:39:08 2007 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Fri, 01 Jun 2007 12:39:08 -0700 Subject: Perplexity in "ngram" In-Reply-To: <130070.98900.qm@web31606.mail.mud.yahoo.com> References: <130070.98900.qm@web31606.mail.mud.yahoo.com> Message-ID: <466075DC.1090300@speech.sri.com> Mats Svenson wrote: > Hi, > I have tried to use "ngram" to count perplexity of my > LMs. However, I am not sure how does the srilm > implementation treat OOVs in terms of counting > perplexity. Is it that "log P(|history) != 0" or > OOVs are just ignored? If a model with a higher number > SRILM excludes words with zero probability from the perplexity computation and reports their tally separately. That includes OOV words when the LM doesn't contain an unknown word () token. > of OOVs has a lower perplexity than another LM, does > it mean that it is "better" in this -ppl > implementation? > Possibly. You should not compare perplexities of LMs with different vocabularies. > Second, in some discussions, I have heard about -ppl1 > option, but the current version does not seem to have > it. In what -ppl1 differs from -ppl? > There is no -ppl1 option. -ppl reports a statistic labeled "ppl1", which is explained in the ngram man page. > Third, is there a way how to meaningfully compute > perplexity for a hidden event LM? Or another way how > to evaluate hidden event LM quality? > Hidden event LMs are LMs, so you can compute a word-based perplexity just like for any other LM. If the goal of the HE-LM is to decode hidden events (like sentence boundaries) then you can obviously evaluate that task as well. Andreas From ioparin at yahoo.co.uk Mon Jun 4 04:05:03 2007 From: ioparin at yahoo.co.uk (ilya oparin) Date: Mon, 4 Jun 2007 12:05:03 +0100 (BST) Subject: [SRILM] Message-ID: <823133.46992.qm@web25411.mail.ukl.yahoo.com> As for my last question (regarding lattice-tool -factored option), there were memory-overflow problems with expansion of some lattices I didn't notice at first and which screwed the results. Sorry for a false-alarm question. best regards, Ilya ___________________________________________________________ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ From yuan_wenxiao at yahoo.com.cn Wed Jun 6 07:29:47 2007 From: yuan_wenxiao at yahoo.com.cn (=?gb2312?q?=CE=C4=F3=E3=20=D4=B7?=) Date: Wed, 6 Jun 2007 22:29:47 +0800 (CST) Subject: Word Insertion Penalty Message-ID: <434038.72632.qm@web92006.mail.cnb.yahoo.com> Dear All, I am wondering if I could set word insertion penalty when rescoring a N-best list using SRILM tool, and if yes, how? Regards, Wenxiao --------------------------------- ????????3.5G???20M??? -------------- next part -------------- An HTML attachment was scrubbed... URL: From sara_abd_elhamed at yahoo.com Fri Jun 8 11:26:36 2007 From: sara_abd_elhamed at yahoo.com (Sara Abd-ElHamed) Date: Fri, 8 Jun 2007 11:26:36 -0700 (PDT) Subject: Guestion in FLM Message-ID: <776540.25151.qm@web90407.mail.mud.yahoo.com> I have a problem in running FLM in SRILM I know there are two commands in it "fngram-count and fngram" but i don't know what is the format of input files,plz if anyone can help. thanks in advance. --------------------------------- Sick sense of humor? Visit Yahoo! TV's Comedy with an Edge to see what's on, when. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ioparin at yahoo.co.uk Fri Jun 8 14:03:09 2007 From: ioparin at yahoo.co.uk (ilya oparin) Date: Fri, 8 Jun 2007 22:03:09 +0100 (BST) Subject: Guestion in FLM In-Reply-To: <776540.25151.qm@web90407.mail.mud.yahoo.com> Message-ID: <15520.62771.qm@web25409.mail.ukl.yahoo.com> Hi, Those functions are very well documented in the technical report you should find in your SRILM distribution in flm/doc/arabic-final.pdf. If you somehow don't have it, just write and I'll send you the link. --- Sara Abd-ElHamed wrote: > I have a problem in running FLM in SRILM > I know there are two commands in it "fngram-count > and fngram" but i don't know what is the format of > input files,plz if anyone can help. > > thanks in advance. > > > --------------------------------- > Sick sense of humor? Visit Yahoo! TV's Comedy with > an Edge to see what's on, when. best regards, Ilya ___________________________________________________________ Copy addresses and emails from any email account to Yahoo! Mail - quick, easy and free. http://uk.docs.yahoo.com/trueswitch2.html From stolcke at speech.sri.com Fri Jun 8 16:12:53 2007 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Fri, 08 Jun 2007 16:12:53 PDT Subject: Kullback Leibler In-Reply-To: Your message of Fri, 08 Jun 2007 23:46:33 +0200. <1594.145.116.14.61.1181339193.squirrel@webmail.science.uva.nl> Message-ID: <200706082312.l58NCrX8012764@dylan.speech.sri.com> In message <1594.145.116.14.61.1181339193.squirrel at webmail.science.uva.nl>you w rote: > Dear Andreas Stolcke, > > just a very small question for curiosity. Is there already some tool > included in the SRILM toolkit to calculate the Kullback Leibler divergence > between two LMs? No. it would be cool to have such a tool. For certain models (e.g., ngram models) you could come up with an exact computation, similar to what the pruning algorithm uses. For the general case you could sample from one of the LMs and compute an empirical cross-entropy. Feel free to implement something ... Andreas From miss_egypt2008 at yahoo.com Sat Jun 9 09:27:48 2007 From: miss_egypt2008 at yahoo.com (dodo rafik) Date: Sat, 9 Jun 2007 09:27:48 -0700 (PDT) Subject: help Message-ID: <503203.96630.qm@web51604.mail.re2.yahoo.com> Hello I have a question about using SRILM as am using the toolkit in tagging text but this tagging is based on Facotred Language Model (FLM) and i don't know how to do that using SRILM toolkit so support me with information needed to do that and also there are 2 files i need to know the data types of contens of each file as the extensions of these files .count & .count.lm so please answer me as soon as possible Thanks in advance bye --------------------------------- Luggage? GPS? Comic books? Check out fitting gifts for grads at Yahoo! Search. -------------- next part -------------- An HTML attachment was scrubbed... URL: From miss_egypt2008 at yahoo.com Sat Jun 9 10:29:27 2007 From: miss_egypt2008 at yahoo.com (dodo rafik) Date: Sat, 9 Jun 2007 10:29:27 -0700 (PDT) Subject: help Message-ID: <20070609172927.15523.qmail@web51601.mail.re2.yahoo.com> Hello I have a question about using SRILM as am using the toolkit in tagging text but this tagging is based on Facotred Language Model (FLM) and i don't know how to do that using SRILM toolkit so support me with information needed to do that and also there are 2 files i need to know the data types of contens of each file as the extensions of these files .count & .count.lm so please answer me as soon as possible Thanks in advance bye --------------------------------- Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at speech.sri.com Sat Jun 9 11:29:56 2007 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Sat, 09 Jun 2007 11:29:56 -0700 Subject: FLM Tutorial available Message-ID: <466AF1A4.3070707@speech.sri.com> Kevin Duh kindly made available a very nice factored LM tutorial, which should answer many of the questions posted on srilm-user recently. I am including his original email below. I also put links to this and other tutorials and overview publications on our web server at http://www.speech.sri.com/projects/srilm/manpages/ . If you know of other useful material that can help a novice get started, please let me know. Happy modeling, Andreas -------- Original Message -------- Subject: Re: Guestion in FLM Date: Fri, 08 Jun 2007 23:09:15 -0800 From: Kevin Duh To: Sara Abd-ElHamed CC: SRILM mailing list References: <15520.62771.qm at web25409.mail.ukl.yahoo.com> We have written a Factored LM tutorial to help people get started with using FLMs. It is available here: http://ssli.ee.washington.edu/people/duh/papers/flm-manual.pdf This tutorial includes: 1. Intro to FLM and generalized backoff 2. Specification and syntax for using FLM in SRILM 3. A step-by-step walk-through for first time users Some of the material is taken from the "arabic-final" report mentioned by Ilya Oparin, but updated and re-formatted in a clearer way. If there are questions and suggestions, please feel free to email. Thanks, Kevin Duh ----------------------------------------------- Signals, Speech, and Language Interpretation Lab University of Washington, Seattle http://ssli.ee.washington.edu/people/duh ---------------------------------------------- From marco.turchi at gmail.com Mon Jun 11 06:48:09 2007 From: marco.turchi at gmail.com (marco turchi) Date: Mon, 11 Jun 2007 14:48:09 +0100 Subject: Interpolation vs ngram-merge Message-ID: <79a042480706110648o692375cnbd1f991a71db8ba5@mail.gmail.com> Dear experts, i have a question for u. I have two dataset, and I want to construct a LM that contains both the dataset. srilm provides me two different paths: 1)to create 2 different LMs and then interpolate them 2)to count the n-gram for each dataset, merge these counts using ngram-merge, and at the end construct the final LM. which are the differences of these methods? Can u suggest me a paper or book where I can understand these differences? Thanks a lot Marco From ioparin at yahoo.co.uk Mon Jun 11 12:11:36 2007 From: ioparin at yahoo.co.uk (ilya oparin) Date: Mon, 11 Jun 2007 20:11:36 +0100 (BST) Subject: Interpolation vs ngram-merge In-Reply-To: <79a042480706110648o692375cnbd1f991a71db8ba5@mail.gmail.com> Message-ID: <325691.25850.qm@web25408.mail.ukl.yahoo.com> I have experience with training LMs on huge data (hundreds millions wordfors). If this is the case for you it can be actually be more efficient (or even possible at all) to interpolate trained LMs, than join the counts and train (due to time and memory expenses). Moreover, it allows to give models different weights and tune those according to perplexity results on some test data if the "target speech" for recognition is already known. --- marco turchi wrote: > Dear experts, > i have a question for u. > I have two dataset, and I want to construct a LM > that contains both the dataset. > srilm provides me two different paths: > 1)to create 2 different LMs and then interpolate > them > 2)to count the n-gram for each dataset, merge these > counts using > ngram-merge, and at the end construct the final LM. > which are the differences of these methods? > Can u suggest me a paper or book where I can > understand these differences? > > Thanks a lot > Marco > best regards, Ilya ___________________________________________________________ Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html From sara_abd_elhamed at yahoo.com Mon Jun 11 18:45:58 2007 From: sara_abd_elhamed at yahoo.com (Sara Abd-ElHamed) Date: Mon, 11 Jun 2007 18:45:58 -0700 (PDT) Subject: Q in Factor file Message-ID: <820653.31594.qm@web90412.mail.mud.yahoo.com> Hi, I have another question in FLM. In the factor file we have two file: The count file(.count.gz) The language model file (.lm.gz) I want to know where these file come from? Are those file the output of ngram and ngram-count? Thanks in advance. --------------------------------- You snooze, you lose. Get messages ASAP with AutoCheck in the all-new Yahoo! Mail Beta. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinduh at u.washington.edu Mon Jun 11 20:53:06 2007 From: kevinduh at u.washington.edu (Kevin Duh) Date: Mon, 11 Jun 2007 19:53:06 -0800 Subject: Q in Factor file In-Reply-To: <820653.31594.qm@web90412.mail.mud.yahoo.com> References: <820653.31594.qm@web90412.mail.mud.yahoo.com> Message-ID: <466E18A2.4080908@u.washington.edu> Hi Sara, The count and LM files are the outputs of fngram-count and inputs to fngram. The filenames are specified by the factor-file. Hope that helps, Kevin Sara Abd-ElHamed wrote: > Hi, > I have another question in FLM. > In the factor file we have two file: > > 1. The count file(.count.gz) > 2. The language model file (.lm.gz) > > I want to know where these file come from? > Are those file the output of ngram and ngram-count? > Thanks in advance. > > ------------------------------------------------------------------------ > You snooze, you lose. Get messages ASAP with AutoCheck > > in the all-new Yahoo! Mail Beta. From svp at zuzino.net.ru Mon Jun 11 23:20:38 2007 From: svp at zuzino.net.ru (Sergey Protasov) Date: Tue, 12 Jun 2007 10:20:38 +0400 Subject: add new words to current classes Message-ID: <150c31280706112320v4c0f0af0nd669850c67cce5d@mail.gmail.com> Dear experts, I have small corpora with dictionary of 10K words that split on 200 classes. And I have big corpora with dictionary of 30K words (20K of new words). I want to split 20K new words to the 200 classes that exist. How can I do it? (using srilm) I dont want to move any of old 10K words from class to class. Thanks in advance! From stolcke at speech.sri.com Tue Jun 12 10:20:13 2007 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Tue, 12 Jun 2007 10:20:13 -0700 Subject: Word Insertion Penalty In-Reply-To: <434038.72632.qm@web92006.mail.cnb.yahoo.com> References: <434038.72632.qm@web92006.mail.cnb.yahoo.com> Message-ID: <466ED5CD.4040107@speech.sri.com> ?? ? wrote: > Dear All, > > I am wondering if I could set word insertion penalty when rescoring a > N-best list using SRILM tool, and if yes, how? Word insertion penalty and other score weight parameters (like the LM weight) are typically optimized on held-out test set, by minimizing the empirical word error. The program nbest-optimize serves exactly this purpose. Andreas From stolcke at speech.sri.com Tue Jun 12 10:49:30 2007 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Tue, 12 Jun 2007 10:49:30 -0700 Subject: add new words to current classes In-Reply-To: <150c31280706112320v4c0f0af0nd669850c67cce5d@mail.gmail.com> References: <150c31280706112320v4c0f0af0nd669850c67cce5d@mail.gmail.com> Message-ID: <466EDCAA.2090202@speech.sri.com> Sergey Protasov wrote: > Dear experts, > > I have small corpora with dictionary of 10K words that split on 200 > classes. > > And I have big corpora with dictionary of 30K words (20K of new words). > > I want to split 20K new words to the 200 classes that exist. > > How can I do it? (using srilm) > > I dont want to move any of old 10K words from class to class. I agree this would be a useful function to have, but unfortunately it is not currently implemented. It should be fairly straightforward to do based on the existing code. You basically need to load an existing class definition, then create singleton classes for the new words, and start incremental merging with the number of classes limited to the original set. If you care about this problem you should try to modify ngram-class.cc and share the results with the rest of us! I'd be happy to give some guidance and review changes if you are willing to do the work. Andreas Andreas From svp at zuzino.net.ru Tue Jun 12 23:42:13 2007 From: svp at zuzino.net.ru (Sergey Protasov) Date: Wed, 13 Jun 2007 10:42:13 +0400 Subject: add new words to current classes In-Reply-To: <466EDCAA.2090202@speech.sri.com> References: <150c31280706112320v4c0f0af0nd669850c67cce5d@mail.gmail.com> <466EDCAA.2090202@speech.sri.com> Message-ID: <150c31280706122342k7d4433b9kc5493a585b58423@mail.gmail.com> Thank you, Andreas, for your answer. Unfortunately I dont have a good skills in C++ language at the moment. But I can try to develop some perl script for this idea. > I agree this would be a useful function to have, but unfortunately it is > not currently implemented. > It should be fairly straightforward to do based on the existing code. > > You basically need to load an existing class definition, then create > singleton classes for the > new words, and start incremental merging with the number of classes > limited to the original set. > > If you care about this problem you should try to modify ngram-class.cc > and share the results with > the rest of us! I'd be happy to give some guidance and review changes if > you are willing to do the work. > > Andreas > > > Andreas > > > From sara_abd_elhamed at yahoo.com Wed Jun 13 13:29:22 2007 From: sara_abd_elhamed at yahoo.com (Sara Abd-ElHamed) Date: Wed, 13 Jun 2007 13:29:22 -0700 (PDT) Subject: Question Message-ID: <12474.38130.qm@web90413.mail.mud.yahoo.com> Hi, When i tried to run the FLM it give me an error. Error: couldn't form int for number of factored LMs in when reading FLM spec file What is the cause of this error? I tried the examples in the papers you give to me and it also give me the same error. For example i tried the example of unigram ## word unigram W : 0 word_1gram.count.gz word_1gram.lm.gz 1 0b0 0b0 kndiscount gtmin 1 Sorry for disturbance. --------------------------------- Got a little couch potato? Check out fun summer activities for kids. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sara_abd_elhamed at yahoo.com Wed Jun 13 13:30:23 2007 From: sara_abd_elhamed at yahoo.com (Sara Abd-ElHamed) Date: Wed, 13 Jun 2007 13:30:23 -0700 (PDT) Subject: Fwd: Question Message-ID: <649062.28869.qm@web90404.mail.mud.yahoo.com> Hi, When i tried to run the FLM it give me an error. Error: couldn't form int for number of factored LMs in when reading FLM spec file What is the cause of this error? I tried the examples in the papers you give to me and it also give me the same error. For example i tried the example of unigram ## word unigram W : 0 word_1gram.count.gz word_1gram.lm.gz 1 0b0 0b0 kndiscount gtmin 1 Sorry for disturbance. --------------------------------- Pinpoint customers who are looking for what you sell. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ioparin at yahoo.co.uk Wed Jun 13 23:08:43 2007 From: ioparin at yahoo.co.uk (ilya oparin) Date: Thu, 14 Jun 2007 07:08:43 +0100 (BST) Subject: Fwd: Question In-Reply-To: <649062.28869.qm@web90404.mail.mud.yahoo.com> Message-ID: <831489.57332.qm@web25401.mail.ukl.yahoo.com> Hi, you should put the number corresponding to the number of FLMs in the FLM specification file (in your case 1) right after the comment line, as e.g. ## word unigram 1 W : 0 word_1gram.count.gz word_1gram.lm.gz 1 ... --- Sara Abd-ElHamed wrote: > Hi, > When i tried to run the FLM it give me an error. > Error: couldn't form int for number of factored > LMs in when reading FLM spec file > What is the cause of this error? > I tried the examples in the papers you give to me > and it also give me the same error. > For example i tried the example of unigram > > ## word unigram > W : 0 word_1gram.count.gz word_1gram.lm.gz 1 > 0b0 0b0 kndiscount gtmin 1 > > Sorry for disturbance. > > > > --------------------------------- > Pinpoint customers who are looking for what you > sell. best regards, Ilya ___________________________________________________________ Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html From miss_egypt2008 at yahoo.com Fri Jun 15 05:35:01 2007 From: miss_egypt2008 at yahoo.com (dodo rafik) Date: Fri, 15 Jun 2007 05:35:01 -0700 (PDT) Subject: FLM Message-ID: <576668.66895.qm@web51605.mail.re2.yahoo.com> hi, i tried to run the FLM but I've a problem even after i put the number corresponding to the number of FLMs in the FLM specification file , it didn't give errors but it didn't work and didn't give any result Sorry for disturbance. --------------------------------- Got a little couch potato? Check out fun summer activities for kids. -------------- next part -------------- An HTML attachment was scrubbed... URL: From miss_egypt2008 at yahoo.com Fri Jun 15 05:33:54 2007 From: miss_egypt2008 at yahoo.com (dodo rafik) Date: Fri, 15 Jun 2007 05:33:54 -0700 (PDT) Subject: FLM Message-ID: <765043.51068.qm@web51611.mail.re2.yahoo.com> hi, i tried to run the FLM but I've a problem even after i put the number corresponding to the number of FLMs in the FLM specification file , it didn't give errors but it didn't work and didn't give any result Sorry for disturbance. --------------------------------- Need a vacation? Get great deals to amazing places on Yahoo! Travel. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ioparin at yahoo.co.uk Fri Jun 15 07:30:44 2007 From: ioparin at yahoo.co.uk (ilya oparin) Date: Fri, 15 Jun 2007 15:30:44 +0100 (BST) Subject: FLM In-Reply-To: <576668.66895.qm@web51605.mail.re2.yahoo.com> Message-ID: <462791.81485.qm@web25409.mail.ukl.yahoo.com> Then you probably do something wrong. It should work. Please check you FLM-specification file and (fngram-count) input -text file (that should contain your training text in FLM format (!)). P.S. You know, there is little point in writing "it didn't give errors but it didn't work and didn't give any result". You must specify the problem precisely if you want any precise answers. --- dodo rafik wrote: > hi, > > i tried to run the FLM but I've a problem even > after i put the number corresponding to the number > of FLMs in the FLM specification file , it didn't > give errors but it didn't work and didn't give any > result > > Sorry for disturbance. > > > > --------------------------------- > Got a little couch potato? > Check out fun summer activities for kids. best regards, Ilya ___________________________________________________________ Now you can scan emails quickly with a reading pane. Get the new Yahoo! Mail. http://uk.docs.yahoo.com/nowyoucan.html From miss_egypt2008 at yahoo.com Fri Jun 15 09:57:29 2007 From: miss_egypt2008 at yahoo.com (dodo rafik) Date: Fri, 15 Jun 2007 09:57:29 -0700 (PDT) Subject: FLM Message-ID: <6536.69211.qm@web51611.mail.re2.yahoo.com> sorry again for disturbance my FLM specefication file contains the following : ## word unigram 1 W : 0 word_1gram.count.gz word_1gram.lm.gz 1 0b0 0b0 kndiscount gtmin 1 the text file contains the following : W-the:P-article W-brown:P-adjective W-dog:P-noun W-ate:P-verb W-a:P-article W-bone:P-noun and the syntax i use is: " fngram-count.exe -factor-file my.flm -text train.txt " as my.flm is the FLM specefication file and train.txt is the text file so can you please tell me what are the modifications that need to be added to create LM file and count file hope you answer me as soon as possible thanks in advance --------------------------------- You snooze, you lose. Get messages ASAP with AutoCheck in the all-new Yahoo! Mail Beta. -------------- next part -------------- An HTML attachment was scrubbed... URL: From miss_egypt2008 at yahoo.com Fri Jun 15 10:03:03 2007 From: miss_egypt2008 at yahoo.com (dodo rafik) Date: Fri, 15 Jun 2007 10:03:03 -0700 (PDT) Subject: FLM Message-ID: <827471.74730.qm@web51610.mail.re2.yahoo.com> sorry again for disturbance my FLM specefication file contains the following : ## word unigram 1 W : 0 word_1gram.count.gz word_1gram.lm.gz 1 0b0 0b0 kndiscount gtmin 1 the text file contains the following : W-the:P-article W-brown:P-adjective W-dog:P-noun W-ate:P-verb W-a:P-article W-bone:P-noun and the syntax i use is: " fngram-count.exe -factor-file my.flm -text train.txt " as my.flm is the FLM specefication file and train.txt is the text file so can you please tell me what are the modifications that need to be added to create LM file and count file hope you answer me as soon as possible thanks in advance --------------------------------- Choose the right car based on your needs. Check out Yahoo! Autos new Car Finder tool. -------------- next part -------------- An HTML attachment was scrubbed... URL: From miss_egypt2008 at yahoo.com Fri Jun 15 10:03:45 2007 From: miss_egypt2008 at yahoo.com (dodo rafik) Date: Fri, 15 Jun 2007 10:03:45 -0700 (PDT) Subject: FLM Message-ID: <20070615170345.19056.qmail@web51602.mail.re2.yahoo.com> sorry again for disturbance my FLM specefication file contains the following : ## word unigram 1 W : 0 word_1gram.count.gz word_1gram.lm.gz 1 0b0 0b0 kndiscount gtmin 1 the text file contains the following : W-the:P-article W-brown:P-adjective W-dog:P-noun W-ate:P-verb W-a:P-article W-bone:P-noun and the syntax i use is: " fngram-count.exe -factor-file my.flm -text train.txt " as my.flm is the FLM specefication file and train.txt is the text file so can you please tell me what are the modifications that need to be added to create LM file and count file hope you answer me as soon as possible thanks in advance --------------------------------- Be a better Heartthrob. Get better relationship answers from someone who knows. Yahoo! Answers - Check it out. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ioparin at yahoo.co.uk Fri Jun 15 12:25:11 2007 From: ioparin at yahoo.co.uk (ilya oparin) Date: Fri, 15 Jun 2007 20:25:11 +0100 (BST) Subject: FLM In-Reply-To: <20070615170345.19056.qmail@web51602.mail.re2.yahoo.com> Message-ID: <29831.33330.qm@web25404.mail.ukl.yahoo.com> FLM "manual" /flm/doc/arabic-final.pdf, page 55 It needs just a bit of careful reading -lm and -write-counts --- dodo rafik wrote: > sorry again for disturbance > > my FLM specefication file contains the following : > ## word unigram > 1 > W : 0 word_1gram.count.gz word_1gram.lm.gz 1 > 0b0 0b0 kndiscount gtmin 1 > > the text file contains the following : > W-the:P-article W-brown:P-adjective W-dog:P-noun > W-ate:P-verb > W-a:P-article W-bone:P-noun > > and the syntax i use is: > > " fngram-count.exe -factor-file my.flm -text > train.txt " > > as my.flm is the FLM specefication file and > train.txt is the text file > > so can you please tell me what are the > modifications that need to be added > to create LM file and count file > > hope you answer me as soon as possible > thanks in advance > > > --------------------------------- > Be a better Heartthrob. Get better relationship > answers from someone who knows. > Yahoo! Answers - Check it out. best regards, Ilya ___________________________________________________________ All New Yahoo! Mail ? Tired of unwanted email come-ons? Let our SpamGuard protect you. http://uk.docs.yahoo.com/nowyoucan.html From miss_egypt2008 at yahoo.com Fri Jun 15 14:12:02 2007 From: miss_egypt2008 at yahoo.com (dodo rafik) Date: Fri, 15 Jun 2007 14:12:02 -0700 (PDT) Subject: please Message-ID: <221170.78405.qm@web51611.mail.re2.yahoo.com> thnx alot for your reply i already did what you said before but the output was only the count file what about LM file? sorry again for disturbance --------------------------------- Luggage? GPS? Comic books? Check out fitting gifts for grads at Yahoo! Search. -------------- next part -------------- An HTML attachment was scrubbed... URL: From miss_egypt2008 at yahoo.com Fri Jun 15 14:12:46 2007 From: miss_egypt2008 at yahoo.com (dodo rafik) Date: Fri, 15 Jun 2007 14:12:46 -0700 (PDT) Subject: No subject Message-ID: <20070615211246.60643.qmail@web51603.mail.re2.yahoo.com> thnx alot for your reply i already did what you said before but the output was only the count file what about LM file? sorry again for disturbance --------------------------------- Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us. -------------- next part -------------- An HTML attachment was scrubbed... URL: From miss_egypt2008 at yahoo.com Fri Jun 15 14:13:00 2007 From: miss_egypt2008 at yahoo.com (dodo rafik) Date: Fri, 15 Jun 2007 14:13:00 -0700 (PDT) Subject: FLM Message-ID: <20070615211300.60708.qmail@web51603.mail.re2.yahoo.com> thnx alot for your reply i already did what you said before but the output was only the count file what about LM file? sorry again for disturbance --------------------------------- Got a little couch potato? Check out fun summer activities for kids. -------------- next part -------------- An HTML attachment was scrubbed... URL: From miss_egypt2008 at yahoo.com Fri Jun 15 14:39:27 2007 From: miss_egypt2008 at yahoo.com (dodo rafik) Date: Fri, 15 Jun 2007 14:39:27 -0700 (PDT) Subject: FLM Message-ID: <228364.3654.qm@web51609.mail.re2.yahoo.com> thnx alot for your reply i already did what you said before but the output was only the count file what about LM file? sorry again for disturbance --------------------------------- Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out. -------------- next part -------------- An HTML attachment was scrubbed... URL: From miss_egypt2008 at yahoo.com Fri Jun 15 14:40:21 2007 From: miss_egypt2008 at yahoo.com (dodo rafik) Date: Fri, 15 Jun 2007 14:40:21 -0700 (PDT) Subject: FLM Message-ID: <282560.90176.qm@web51611.mail.re2.yahoo.com> thnx alot for your reply i already did what you said before but the output was only the count file what about LM file? sorry again for disturbance --------------------------------- Bored stiff? Loosen up... Download and play hundreds of games for free on Yahoo! Games. -------------- next part -------------- An HTML attachment was scrubbed... URL: From zeman at ufal.ms.mff.cuni.cz Sat Jun 16 00:39:02 2007 From: zeman at ufal.ms.mff.cuni.cz (Daniel Zeman) Date: Sat, 16 Jun 2007 09:39:02 +0200 Subject: FLM In-Reply-To: <228364.3654.qm@web51609.mail.re2.yahoo.com> References: <228364.3654.qm@web51609.mail.re2.yahoo.com> Message-ID: <46739396.90301@ufal.mff.cuni.cz> Dear dodo, don't apologize for disturbance. Rather, please don't post every question three times or so. Dan dodo rafik napsal(a): > thnx alot for your reply > i already did what you said before but the output was only the count file > what about LM file? > sorry again for disturbance > > ------------------------------------------------------------------------ > Be a better Globetrotter. Get better travel answers > from > someone who knows. > Yahoo! Answers - Check it out. From sahar_magdy_mansor at yahoo.com Sat Jun 16 10:37:35 2007 From: sahar_magdy_mansor at yahoo.com (sahar magdy) Date: Sat, 16 Jun 2007 10:37:35 -0700 (PDT) Subject: help Message-ID: <336200.28335.qm@web35715.mail.mud.yahoo.com> hi , I downloaded SRILM version 1.5.4 but when i decompress the package it gives me the following errors i also tried to downlaod it more than one time and it gives me tha same result ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open mips-elf (bin\sgi --> mips-elf) ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points to missing file ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open sparc-elf (bin\sun4 --> sparc-elf) ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points to missing file ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open sparc-elf (bin\sun4_solaris --> sparc-elf) ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points to missing file ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open i386-solaris_g (lib\i386-solaris_m --> i386-solaris_g) ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points to missing file ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open i386-solaris_c (lib\i386-solaris-p4_c --> i386-solaris_c) ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points to missing file --------------------------------- Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at speech.sri.com Sat Jun 16 10:51:26 2007 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Sat, 16 Jun 2007 10:51:26 PDT Subject: help In-Reply-To: Your message of Sat, 16 Jun 2007 10:37:35 -0700. <336200.28335.qm@web35715.mail.mud.yahoo.com> Message-ID: <200706161751.l5GHpRN20224@huge> You can ignore these errors. they don't affect your building of SRILM on a Windows system. --Andreas In message <336200.28335.qm at web35715.mail.mud.yahoo.com>you wrote: > --0-1019792502-1182015455=:28335 > Content-Type: text/plain; charset=iso-8859-1 > Content-Transfer-Encoding: 8bit > > hi , > I downloaded SRILM version 1.5.4 but when i decompress the package it gives > me the following errors i also tried to downlaod it more than one time and i > t gives me tha same result > > > ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open mips-elf > (bin\sgi --> mips-elf) > ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points t > o missing file > ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open sparc-elf > (bin\sun4 --> sparc-elf) > ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points t > o missing file > ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open sparc-elf > (bin\sun4_solaris --> sparc-elf) > ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points t > o missing file > ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open i386-solar > is_g (lib\i386-solaris_m --> i386-solaris_g) > ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points t > o missing file > ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open i386-solar > is_c (lib\i386-solaris-p4_c --> i386-solaris_c) > ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points t > o missing file > > > --------------------------------- > Park yourself in front of a world of choices in alternative vehicles. > Visit the Yahoo! Auto Green Center. > --0-1019792502-1182015455=:28335 > Content-Type: text/html; charset=iso-8859-1 > Content-Transfer-Encoding: 8bit > >
hi ,
I downloaded SRILM version 1.5.4 but when i d > ecompress the package it gives me the following errors i also tried to downla > od it more than one time and it gives me tha same result
  DIV>
 
!   C:\Documents and Settings\sahar\D > esktop\srilm.tgz: Cannot open mips-elf (bin\sgi --> mips-elf)
! &n > bsp; C:\Documents and Settings\sahar\Desktop\srilm.tgz: 5_0 style="CURSOR: hand; BORDER-BOTTOM: #0066cc 1px dashed; HEIGHT: 1em">Symb > olic link points to missing file
!   C:\Documents and Set > tings\sahar\Desktop\srilm.tgz: Cannot open sparc-elf (bin\sun4 --> sparc-e > lf)
!   C:\Documents and Settings\sahar\Desktop\srilm.tgz: id=lw_1182015325_1 style="CURSOR: hand; BORDER-BOTTOM: #0066cc 1px dashed; H > EIGHT: 1em">Symbolic link points to missing file
!   C:\D > ocuments and Settings\sahar\Desktop\srilm.tgz: Cannot open > sparc-elf (bin\sun4_solaris --> sparc-elf)
!   C:\Documents > and Settings\sahar\Desktop\srilm.tgz: Symbolic link points SPAN> to missing file
!   C:\Documents and Settings\sahar\Deskto > p\srilm.tgz: Cannot open i386-solaris_g (lib\i386-solaris_m --> i386-solar > is_g)
!   C:\Documents and Settings\sahar\Desktop\srilm.tgz: AN id=lw_1182015325_3 style="CURSOR: hand; BORDER-BOTTOM: #0066cc 1px dashed; > HEIGHT: 1em">Symbolic link points
to missing file
!   C: > \Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open i386-solaris_c ( > lib\i386-solaris-p4_c --> i386-solaris_c)
!   C:\Documents an > d Settings\sahar\Desktop\srilm.tgz: Symbolic link points to missing > file

> >


Park yourself in front of a world of choices in alternative > vehicles.
Visit the Yahoo! Auto Green Center. > --0-1019792502-1182015455=:28335-- From sahar_magdy_mansor at yahoo.com Sat Jun 16 14:54:09 2007 From: sahar_magdy_mansor at yahoo.com (sahar magdy) Date: Sat, 16 Jun 2007 14:54:09 -0700 (PDT) Subject: help Message-ID: <541903.49958.qm@web35702.mail.mud.yahoo.com> hi , when i run SRILM version 1.3 , i 've a problem that the fngram-count output is only the count file and it is empty file , and when i opened it , it gives me an error(the archive is either in unknown format or damaged) and then I downloaded SRILM version 1.5.4 but when i decompress the package it gives me the following errors i also tried to downlaod it more than one time and it gives me tha same result ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open mips-elf (bin\sgi --> mips-elf) ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points to missing file ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open sparc-elf (bin\sun4 --> sparc-elf) ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points to missing file ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open sparc-elf (bin\sun4_solaris --> sparc-elf) ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points to missing file ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open i386-solaris_g (lib\i386-solaris_m --> i386-solaris_g) ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points to missing file ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open i386-solaris_c (lib\i386-solaris-p4_c --> i386-solaris_c) ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link points to missing file --------------------------------- We won't tell. Get more on shows you hate to love (and love to hate): Yahoo! TV's Guilty Pleasures list. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stolcke at speech.sri.com Sun Jun 17 13:58:11 2007 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Sun, 17 Jun 2007 13:58:11 -0700 Subject: [Fwd: SRILM] Message-ID: <4675A063.3070604@speech.sri.com> Anyone can give J. some pointers on how to build class-based LMs? Andreas -------------- next part -------------- An embedded message was scrubbed... From: "J.Sashank" Subject: SRILM Date: Sat, 16 Jun 2007 16:20:18 +0530 (IST) Size: 2961 URL: From ioparin at yahoo.co.uk Sun Jun 17 22:52:50 2007 From: ioparin at yahoo.co.uk (ilya oparin) Date: Mon, 18 Jun 2007 06:52:50 +0100 (BST) Subject: [Fwd: SRILM] In-Reply-To: <4675A063.3070604@speech.sri.com> Message-ID: <963905.98762.qm@web25404.mail.ukl.yahoo.com> Hi, J. You can use class-ngram (see manpages) to generate classes from text automatically. Two files, class count (standard N-gram file with class labels as units) and class definition (telling you class assignments for words) files are generated. Use those to train LMs with ngram-count as usual, you need just to add -classes option to refer to the class-definition file. If you want to use classes of your own, it's a bit more tricky, since you have to take care of correct class-definition file forming. regards, Ilya --- Andreas Stolcke wrote: > > Anyone can give J. some pointers on how to build > class-based LMs? > > Andreas > > > Date: Sat, 16 Jun 2007 16:20:18 +0530 (IST) > Subject: SRILM > From: "J.Sashank" > To: stolcke at speech.sri.com > > Sir, > I am undergraduate student studying in IIT > Bombay . I am working on a > research project which involves trigram model.I want > to use > class-based trigram model but I cannot find its the > usage in the SRILM > package . Can you please tell me about the usage of > the package for > this model. > > Thanking You, > > J.Sashank > Junior Undergraduate > Computer Science and Engineering > IIT Bombay > > ___________________________________________________________ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ From stolcke at speech.sri.com Sun Jun 17 23:24:19 2007 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Sun, 17 Jun 2007 23:24:19 -0700 Subject: help In-Reply-To: <541903.49958.qm@web35702.mail.mud.yahoo.com> References: <541903.49958.qm@web35702.mail.mud.yahoo.com> Message-ID: <46762513.7000002@speech.sri.com> sahar magdy wrote: > hi , > when i run SRILM version 1.3 , i 've a problem that the fngram-count > output is only the count file and it is empty file , and when i opened > it , it gives me an error(the archive is either in unknown format or > damaged) and then I downloaded SRILM version 1.5.4 but when i > decompress the package it gives me the following errors i also tried > to downlaod it more than one time and it gives me tha same result > I don't know what tool you're using to unpack the compressed tar file. But I know for a fact that if you use GNU tar (as part of the cygwin utilities) it will work. I suggest you use that. Andreas > > ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open > mips-elf (bin\sgi --> mips-elf) > ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link > points to missing file > ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open > sparc-elf (bin\sun4 --> sparc-elf) > ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link > points to missing file > ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open > sparc-elf (bin\sun4_solaris --> sparc-elf) > ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link > points to missing file > ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open > i386-solaris_g (lib\i386-solaris_m --> i386-solaris_g) > ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link > points to missing file > ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Cannot open > i386-solaris_c (lib\i386-solaris-p4_c --> i386-solaris_c) > ! C:\Documents and Settings\sahar\Desktop\srilm.tgz: Symbolic link > points to missing file > > ------------------------------------------------------------------------ > We won't tell. Get more on shows you hate to love > > (and love to hate): Yahoo! TV's Guilty Pleasures list. > From stolcke at speech.sri.com Sun Jun 17 23:29:40 2007 From: stolcke at speech.sri.com (Andreas Stolcke) Date: Sun, 17 Jun 2007 23:29:40 PDT Subject: [Fwd: SRILM] In-Reply-To: Your message of Mon, 18 Jun 2007 06:52:50 +0100. <963905.98762.qm@web25404.mail.ukl.yahoo.com> Message-ID: <200706180629.l5I6TeJ11816@huge> In message <963905.98762.qm at web25404.mail.ukl.yahoo.com>you wrote: > Hi, J. > > You can use class-ngram (see manpages) to generate > classes from text automatically. Two files, class > count (standard N-gram file with class labels as > units) and class definition (telling you class > assignments for words) files are generated. Use those > to train LMs with ngram-count as usual, you need just > to add -classes option to refer to the > class-definition file. > If you want to use classes of your own, it's a bit > more tricky, since you have to take care of correct > class-definition file forming. I would add that one you have your class definitions (by hand, or as the result of ngram-class), the recommended procedure is to filter your training data through replace-words-with-classes classes=CLASS-DEFINITIONS-FILE and the train the ngram model on the output (see training-scripts(1) manpage). In testing you use that model together with the ngram -classes option as Ilya said. Andreas > > regards, > Ilya > > --- Andreas Stolcke wrote: > > > > > Anyone can give J. some pointers on how to build > > class-based LMs? > > > > Andreas > > > > > Date: Sat, 16 Jun 2007 16:20:18 +0530 (IST) > > Subject: SRILM > > From: "J.Sashank" > > To: stolcke at speech.sri.com > > > > Sir, > > I am undergraduate student studying in IIT > > Bombay . I am working on a > > research project which involves trigram model.I want > > to use > > class-based trigram model but I cannot find its the > > usage in the SRILM > > package . Can you please tell me about the usage of > > the package for > > this model. > > > > Thanking You, > > > > J.Sashank > > Junior Undergraduate > > Computer Science and Engineering > > IIT Bombay > > > > > > > > ___________________________________________________________ > Yahoo! Answers - Got a question? Someone out there knows the answer. Try it > now. > http://uk.answers.yahoo.com/ From svp at zuzino.net.ru Wed Jun 20 12:33:40 2007 From: svp at zuzino.net.ru (Sergey Protasov) Date: Wed, 20 Jun 2007 23:33:40 +0400 Subject: using negative data Message-ID: <150c31280706201233q3d4ebb60haa64c5675928da62@mail.gmail.com> Dear Experts... Suppose I have big corpus of incorrect sentences. *The event that he smiled at me gives me hope *But my presents to win his heart have failed *Absence to comply may result in dismissal *The party is who we should invite *I really like the fashion you do your hair ... Can I (How can I) improve language model using this negative examples? Can I improve ngram model using SRILM?