From ghaffar1 at cs.sfu.ca  Tue Apr  4 15:10:49 2006
From: ghaffar1 at cs.sfu.ca (GholamReza Haffari)
Date: Tue, 04 Apr 2006 15:10:49 -0700
Subject: pls help (urgent)
Message-ID: <200604042210.k34MAnO2008779@rm-rstar.sfu.ca>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20060404/7d3550ac/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lm.cc
Type: application/octet-stream
Size: 1095 bytes
Desc: not available
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20060404/7d3550ac/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: makefile
Type: application/octet-stream
Size: 193 bytes
Desc: not available
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20060404/7d3550ac/attachment-0001.obj>

From patryale at iro.umontreal.ca  Tue Apr  4 20:15:21 2006
From: patryale at iro.umontreal.ca (Alexandre Patry)
Date: Tue, 04 Apr 2006 23:15:21 -0400
Subject: pls help (urgent)
In-Reply-To: <200604042210.k34MAnO2008779@rm-rstar.sfu.ca>
References: <200604042210.k34MAnO2008779@rm-rstar.sfu.ca>
Message-ID: <1144206921.10174.7.camel@localhost.localdomain>

Hi,

you have specified the path to the libraries, but you did not specify
that the compiler should link against them.  In your makefile, if you
change the line:

libs=-L$(libdir)

for:

libs=-L$(libdir) -loolm -lmisc -ldstruct

it should work.

Good luck,

Alexandre

Le mardi 04 avril 2006 ? 15:10 -0700, GholamReza Haffari a ?crit :
> Hi there,
> 
> Currently I am trying to get the srilm working but I have a problem: when I
> use "make" to compile and build the attached sample file, it gives me some
> error messages. It seems to me that the libs of the sri toolkit may not be
> installed properly in my machine. currently in the lib directory the
> following files and directory exist:
> 
> i386-solaris_m	
> i686\libdstruct.a	
> i686\liblattice.a	
> i686\libmisc.a	
> i686\liboolm.a
> 
> Is everything fine? Would you please help me to find out where the problem
> is?
> thanks,
> -Reza
> 
> PS. for more reference I have copied the error messages here:
> 
> /tmp/cc7U0zaO.o(.text+0xc3): In function `main':
> : undefined reference to `File::File(char const*, char const*, int)'
> /tmp/cc7U0zaO.o(.text+0xdf): In function `main':
> : undefined reference to `File::File(char const*, char const*, int)'
> /tmp/cc7U0zaO.o(.text+0xf5): In function `main':
> : undefined reference to `File::getline()'
> /tmp/cc7U0zaO.o(.text+0x11d): In function `main':
> : undefined reference to `Vocab::parseWords(char*, char const**, unsigned 


From ghaffar1 at cs.sfu.ca  Wed Apr  5 10:58:07 2006
From: ghaffar1 at cs.sfu.ca (GholamReza Haffari)
Date: Wed, 05 Apr 2006 10:58:07 -0700
Subject: sample program
Message-ID: <200604051758.k35Hw71E028663@rm-rstar.sfu.ca>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20060405/c7e96fc9/attachment.ksh>

From Antoine.Ghaoui at jinny.ie  Mon Apr 24 03:04:16 2006
From: Antoine.Ghaoui at jinny.ie (Antoine Ghaoui)
Date: Mon, 24 Apr 2006 13:04:16 +0300
Subject: Info on FLM format
Message-ID: <063d01c66786$72f0a550$16c864c1@Italy1>

Hello,

can you please tell me where I can find the formats of the files for the FLM and how to use SRILM to implement FLM?

Thanks

Antoine
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20060424/6be524e1/attachment.html>

From amittai at mit.edu  Tue Apr 25 16:57:58 2006
From: amittai at mit.edu (amittai e axelrod)
Date: Wed, 26 Apr 2006 00:57:58 +0100
Subject: Info on FLM format
In-Reply-To: <063d01c66786$72f0a550$16c864c1@Italy1>
References: <063d01c66786$72f0a550$16c864c1@Italy1>
Message-ID: <5734eadd0604251657g16b60a58pee9b391637ffd0d6@mail.gmail.com>

On 4/24/06, Antoine Ghaoui <Antoine.Ghaoui at jinny.ie> wrote:
> can you please tell me where I can find the formats of the files for the FLM
> and how to use SRILM to implement FLM?

Hi--

A good place to start is Chapter 5 of the report from the 2002
Johns Hopkins summer workshop where the FLM tools were
implemented. The report is here:
www.clsp.jhu.edu/ws2002/groups/arabic/arabic-final.pdf

All versions of SRILM after v1.4 have these FLM tools
included, just look in the flm/ directory. You may want
to start with the "fngram" and "fngram-count" functions.

~amittai


From ioparin at yahoo.co.uk  Thu May 11 08:12:33 2006
From: ioparin at yahoo.co.uk (ilya oparin)
Date: Thu, 11 May 2006 16:12:33 +0100 (BST)
Subject: GT coeffs in -make-big-lm
Message-ID: <20060511151233.11799.qmail@web86908.mail.ukl.yahoo.com>

Hi!
   
  When I trained a very large model (corpus size approx. 600 mln tokens), I found out a feature that look a bit odd. Since the LM is going to be huge, I'm using -make-big-lm script to calculate in a distributed way 4 partial LMs and then merge those into the resulting one. 
  After I put to calculation 4 -make-big-lm tasks, GT coefficients for the first one are output in the home directory (and then it takes some time to get that something is possibly wrong, since this output is not reported in manual), and the other running tasks are just using those, presuming GT pre-computation was done in advance. It should not seriously damage a large model, but it's good to be as precise as possible. So I had to delete GT files manually after each consequent (not simultaneous then) -make-big-lm execution, presuming n-gram merge would correctly renormalize the probabilities. Is it correct or I'd rather calculate GT coefficients from the whole .ngram file, save in the home directory and use for each partial -make-big-lm calculation?


best regards,
Ilya
		
---------------------------------
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20060511/34d96e4f/attachment.html>

From stolcke at speech.sri.com  Thu May 11 19:55:52 2006
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Thu, 11 May 2006 19:55:52 PDT
Subject: GT coeffs in -make-big-lm 
In-Reply-To: Your message of Thu, 11 May 2006 16:12:33 +0100.
             <20060511151233.11799.qmail@web86908.mail.ukl.yahoo.com> 
Message-ID: <200605120255.k4C2tqW12176@huge>


> Hi!
>    
>   When I trained a very large model (corpus size approx. 600 mln tokens), I f
> ound out a feature that look a bit odd. Since the LM is going to be huge, I'm
>  using -make-big-lm script to calculate in a distributed way 4 partial LMs an
> d then merge those into the resulting one. 
>   After I put to calculation 4 -make-big-lm tasks, GT coefficients for the fi
> rst one are output in the home directory (and then it takes some time to get 
> that something is possibly wrong, since this output is not reported in manual
> ), and the other running tasks are just using those, presuming GT pre-computa
> tion was done in advance. It should not seriously damage a large model, but i
> t's good to be as precise as possible. So I had to delete GT files manually a
> fter each consequent (not simultaneous then) -make-big-lm execution, presumin
> g n-gram merge would correctly renormalize the probabilities. Is it correct o
> r I'd rather calculate GT coefficients from the whole .ngram file, save in th
> e home directory and use for each partial -make-big-lm calculation?

It is true make-big-lm saves the statistics needed for count smoothing
in files, so that if you rerun the script they are not recomputed 
(since this step is potentially expensive).  I'm sorry this is not 
documented well.

However, the filenames are keyed to the values of the "-name" option.
so if you want to do several runs in the same directory just specify 
a separate -name parameter in each case.

--Andreas 


From ioparin at yahoo.co.uk  Sun May 21 13:34:22 2006
From: ioparin at yahoo.co.uk (ilya oparin)
Date: Sun, 21 May 2006 21:34:22 +0100 (BST)
Subject: [SRILM]: FLM
Message-ID: <20060521203422.79256.qmail@web86914.mail.ukl.yahoo.com>

Hello,

I've been recently playing with the factored language models for the Czech language. The FLM module works perfectly with small subcorpora. However, when I try to train the model even on my heldout data (60 mln tokens), it takes huge amount of time to get the model trained (by now it's been two days I have it running). Memory problems can expected as well. So, there is almost no sense in trying to train LM on my training data (550 mln).
The question is: does anybody have experience in training FLMs on huge corpora: parallelizing tasks and so on? There is no direct way as with normal models (ngram-merge and make-big-lm features) - but are there some indirect ones?

thanks in advance,
ilya

Send instant messages to your online friends http://uk.messenger.yahoo.com 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20060521/17e8e9eb/attachment.html>

From bertoldi at itc.it  Mon May 22 06:44:24 2006
From: bertoldi at itc.it (Nicola Bertoldi)
Date: Mon, 22 May 2006 15:44:24 +0200
Subject: [SRILM]: lattice-tool: problems while reading word-mesh
Message-ID: <4471C038.8000504@itc.it>

Hello,
I've been recently started to use lattice-tool
and I got first problems in reading word-meshes.

In particular, if I run this 2 commands
(i.e. first create a word-mesh and read it)
lattice-tool -read-htk -in-lattice input.slf -write-mesh output.cn
lattice-tool -read-mesh -in-lattice output.cn

I got this error message:
lattice-tool: 
/hardmnt/voxgate/ssi/HermesTools/srilm/include/LHash.cc:251: Boolean 
LHash<KeyT, DataT>::locate(KeyT, unsigned int&) const [with KeyT = 
NodeIndex, DataT = LatticeNode]: Assertion `!Map_noKeyP(key)' failed.
Abort


Who can help me?

best regards
and thanks in advance,
Nicola


From stolcke at speech.sri.com  Mon May 22 11:02:01 2006
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Mon, 22 May 2006 11:02:01 -0700
Subject: [SRILM]: lattice-tool: problems while reading word-mesh
In-Reply-To: <4471C038.8000504@itc.it>
References: <4471C038.8000504@itc.it>
Message-ID: <4471FC99.5040805@speech.sri.com>

Nicola Bertoldi wrote:

> Hello,
> I've been recently started to use lattice-tool
> and I got first problems in reading word-meshes.
>
> In particular, if I run this 2 commands
> (i.e. first create a word-mesh and read it)
> lattice-tool -read-htk -in-lattice input.slf -write-mesh output.cn
> lattice-tool -read-mesh -in-lattice output.cn
>
> I got this error message:
> lattice-tool: 
> /hardmnt/voxgate/ssi/HermesTools/srilm/include/LHash.cc:251: Boolean 
> LHash<KeyT, DataT>::locate(KeyT, unsigned int&) const [with KeyT = 
> NodeIndex, DataT = LatticeNode]: Assertion `!Map_noKeyP(key)' failed.
> Abort
>
>
> Who can help me?
>
This looks like a know bug in SRILM 1.4.6.  Please try getting the 1.5.0 
beta version, that should fix it.

--Andreas


From bertoldi at itc.it  Tue May 30 08:24:25 2006
From: bertoldi at itc.it (Nicola Bertoldi)
Date: Tue, 30 May 2006 17:24:25 +0200
Subject: Lattice-Tool: problems with pruning
Message-ID: <447C63A9.4010602@itc.it>

While pruning a lattice wrt posterior probs
with this command:

lattice-tool -in-lattice lattice -read-htk -out-lattice - -write-htk 
-posterior-prune 1.0e-1

I got this error

Lattice::computeForwardBackward: warning: called with unreachable nodes


If I decrease pruning threshold this error disappears.

Who can help me?

best regards
Nicola


From ioparin at yahoo.co.uk  Wed May 31 03:41:51 2006
From: ioparin at yahoo.co.uk (ilya oparin)
Date: Wed, 31 May 2006 11:41:51 +0100 (BST)
Subject: [SRILM]: -debug 2 info
Message-ID: <20060531104152.63656.qmail@web86903.mail.ukl.yahoo.com>

Hi!

When I calculate perplexity of my POS-based class model (word can belong to many classes, class-definition file I create myself on the base of a POS-tagged data), with "-debug 2" I get the output I can not fully understand. For testing puropses I measure ppl on the same data I trained the class model (i.e. there should not be ay OOVs). However, in the debug output, for every N-gram there is a string of the format
P(w| w...) = [OOV][n-gram][n-gram]...[OOV][n-gram][n-gram]...
As far as I get it, [n-gram]s refer to different combinations of assigning words to classes. But why fo those [OOV] may appear (and they appear in equal intervals between strings of [n-gram]s for each word)?

I have only one guess: since [OOVs] are only missing for the last (</s>| ...) n-gram, those [OOV] may correspond to a check if a word is present in the implicit stop-word vocabulary or something... 

It would be great if anybody could comment on that.


best regards,
Ilya
		
---------------------------------
 All New Yahoo! Mail ? Tired of Vi at gr@! come-ons? Let our SpamGuard protect you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20060531/fdd2a32e/attachment.html>

From ioparin at yahoo.co.uk  Sun Jun 11 06:05:10 2006
From: ioparin at yahoo.co.uk (ilya oparin)
Date: Sun, 11 Jun 2006 14:05:10 +0100 (BST)
Subject: GT coefficients
Message-ID: <20060611130510.81544.qmail@web25401.mail.ukl.yahoo.com>

Hello!

If I count GT coefficients in advance and then feed GT-files (generated by make-gt-discounts) to ngram-count or make-big-lm, I get warnings of the kind

file.gt1: line 9: warning: discount coefficient 1 = 0.0
file.gt1: line 9: warning: discount coefficient 2 = 0.0
...

and so on for all the gt parameters. Files themselves are alright and do not contain any zeroes. Number next to line corresponds to the last line in a gt-file. 
The model I get with this differs from that I get when just use ngram-count without loading GT coefficients (it appears much smaller in bigrams and trigrams) with the same gtmin and gtmax values. 
Could anybody tell me why it happens like this? 


best regards,
Ilya
 Send instant messages to your online friends http://uk.messenger.yahoo.com 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20060611/b69b466d/attachment.html>

From stolcke at speech.sri.com  Thu Jun 29 18:27:32 2006
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Thu, 29 Jun 2006 18:27:32 PDT
Subject: SRILM bug-fix
Message-ID: <200606300127.k5U1RWg0005724@choro.speech.sri.com>


Recent versions of SRILM have a bug in the option handling of 
ngram, hidden-ngram, and lattice-tool concerning interpolated LMs
with more than 6 components. 

The bug is triggered by the use of -mix-lm[789] in conjunction with the -bayes
option.  This will be fixed in the next release, but that might
take a while, so I'm including a patch below.

This bug was found by Richard Zens of RWTH Aachen.

--Andreas

*** /tmp/T005lmct	Thu Jun 29 06:02:25 2006
--- lm/src/ngram.cc	Thu Jun 29 05:59:48 2006
***************
*** 738,744 ****
  				mixLambda6);
  	}
  	if (mixFile7) {
! 	    useLM = makeMixLM(mixFile6, *vocab, classVocab, order, useLM,
  				mixLambda7,
  				mixLambda + mixLambda1 + mixLambda2 +
  				mixLambda3 + mixLambda4 + mixLambda5 +
--- 738,744 ----
  				mixLambda6);
  	}
  	if (mixFile7) {
! 	    useLM = makeMixLM(mixFile7, *vocab, classVocab, order, useLM,
  				mixLambda7,
  				mixLambda + mixLambda1 + mixLambda2 +
  				mixLambda3 + mixLambda4 + mixLambda5 +
***************
*** 745,751 ****
  				mixLambda6 + mixLambda7);
  	}
  	if (mixFile8) {
! 	    useLM = makeMixLM(mixFile6, *vocab, classVocab, order, useLM,
  				mixLambda8,
  				mixLambda + mixLambda1 + mixLambda2 +
  				mixLambda3 + mixLambda4 + mixLambda5 +
--- 745,751 ----
  				mixLambda6 + mixLambda7);
  	}
  	if (mixFile8) {
! 	    useLM = makeMixLM(mixFile8, *vocab, classVocab, order, useLM,
  				mixLambda8,
  				mixLambda + mixLambda1 + mixLambda2 +
  				mixLambda3 + mixLambda4 + mixLambda5 +
***************
*** 752,758 ****
  				mixLambda6 + mixLambda7 + mixLambda8);
  	}
  	if (mixFile9) {
! 	    useLM = makeMixLM(mixFile6, *vocab, classVocab, order, useLM,
  				mixLambda9, 1.0);
  	}
      }
--- 752,758 ----
  				mixLambda6 + mixLambda7 + mixLambda8);
  	}
  	if (mixFile9) {
! 	    useLM = makeMixLM(mixFile9, *vocab, classVocab, order, useLM,
  				mixLambda9, 1.0);
  	}
      }
*** /tmp/T005lmct	Thu Jun 29 06:02:25 2006
--- lm/src/hidden-ngram.cc	Thu Jun 29 06:01:12 2006
***************
*** 1178,1184 ****
  				mixLambda6);
  	}
  	if (mixFile7) {
! 	    useLM = makeMixLM(mixFile6, *vocab, classVocab, order, useLM,
  				mixLambda7,
  				mixLambda + mixLambda1 + mixLambda2 +
  				mixLambda3 + mixLambda4 + mixLambda5 +
--- 1178,1184 ----
  				mixLambda6);
  	}
  	if (mixFile7) {
! 	    useLM = makeMixLM(mixFile7, *vocab, classVocab, order, useLM,
  				mixLambda7,
  				mixLambda + mixLambda1 + mixLambda2 +
  				mixLambda3 + mixLambda4 + mixLambda5 +
***************
*** 1185,1191 ****
  				mixLambda6 + mixLambda7);
  	}
  	if (mixFile8) {
! 	    useLM = makeMixLM(mixFile6, *vocab, classVocab, order, useLM,
  				mixLambda8,
  				mixLambda + mixLambda1 + mixLambda2 +
  				mixLambda3 + mixLambda4 + mixLambda5 +
--- 1185,1191 ----
  				mixLambda6 + mixLambda7);
  	}
  	if (mixFile8) {
! 	    useLM = makeMixLM(mixFile8, *vocab, classVocab, order, useLM,
  				mixLambda8,
  				mixLambda + mixLambda1 + mixLambda2 +
  				mixLambda3 + mixLambda4 + mixLambda5 +
***************
*** 1192,1198 ****
  				mixLambda6 + mixLambda7 + mixLambda8);
  	}
  	if (mixFile9) {
! 	    useLM = makeMixLM(mixFile6, *vocab, classVocab, order, useLM,
  				mixLambda9, 1.0);
  	}
      }
--- 1192,1198 ----
  				mixLambda6 + mixLambda7 + mixLambda8);
  	}
  	if (mixFile9) {
! 	    useLM = makeMixLM(mixFile9, *vocab, classVocab, order, useLM,
  				mixLambda9, 1.0);
  	}
      }
*** /tmp/T005lmct	Thu Jun 29 06:02:25 2006
--- lattice/src/lattice-tool.cc	Thu Jun 29 06:01:53 2006
***************
*** 1128,1134 ****
  				mixLambda6);
  	}
  	if (mixFile7) {
! 	    useLM = makeMixLM(mixFile6, *vocab, classVocab, order, useLM,
  				mixLambda7,
  				mixLambda + mixLambda1 + mixLambda2 +
  				mixLambda3 + mixLambda4 + mixLambda5 +
--- 1128,1134 ----
  				mixLambda6);
  	}
  	if (mixFile7) {
! 	    useLM = makeMixLM(mixFile7, *vocab, classVocab, order, useLM,
  				mixLambda7,
  				mixLambda + mixLambda1 + mixLambda2 +
  				mixLambda3 + mixLambda4 + mixLambda5 +
***************
*** 1135,1141 ****
  				mixLambda6 + mixLambda7);
  	}
  	if (mixFile8) {
! 	    useLM = makeMixLM(mixFile6, *vocab, classVocab, order, useLM,
  				mixLambda8,
  				mixLambda + mixLambda1 + mixLambda2 +
  				mixLambda3 + mixLambda4 + mixLambda5 +
--- 1135,1141 ----
  				mixLambda6 + mixLambda7);
  	}
  	if (mixFile8) {
! 	    useLM = makeMixLM(mixFile8, *vocab, classVocab, order, useLM,
  				mixLambda8,
  				mixLambda + mixLambda1 + mixLambda2 +
  				mixLambda3 + mixLambda4 + mixLambda5 +
***************
*** 1142,1148 ****
  				mixLambda6 + mixLambda7 + mixLambda8);
  	}
  	if (mixFile9) {
! 	    useLM = makeMixLM(mixFile6, *vocab, classVocab, order, useLM,
  				mixLambda9, 1.0);
  	}
      }
--- 1142,1148 ----
  				mixLambda6 + mixLambda7 + mixLambda8);
  	}
  	if (mixFile9) {
! 	    useLM = makeMixLM(mixFile9, *vocab, classVocab, order, useLM,
  				mixLambda9, 1.0);
  	}
      }