From cristinaguerreroflores at gmail.com  Tue Oct  1 05:09:51 2013
From: cristinaguerreroflores at gmail.com (Cristina Guerrero)
Date: Tue, 1 Oct 2013 14:09:51 +0200
Subject: [SRILM User List] Confusion network combination
Message-ID: <CALkXYSW7KsBLhfJsgehKRyqZo5VSS22Y1WuqJvmqE7kdhPWRdw@mail.gmail.com>

Hello,
I am looking for information to accomplish confusion-network-combination
with the SRILM toolkit. I want to use the lattices generated by different
speech recognizers over the same speech segment. I haven't found a detailed
description of the steps to follow, so here is what I'm doing right now:
1- Extract lattices from the various recognizers (With HTK in my case)
2- Take one of these lattices as a starting point and convert it into a
confusion network MESH0 (lattice-tool -read-htk -in-lattice LATTICE0
write-mesh MESH0). *I know -posterior-prune can be applied to the lattice
before building the mesh for better results according to the "Finding
consensus.." paper.
3- Then, take the next lattice (LATTICE1) and merge it with the previously
generated mesh ( lattice-tool -in-lattice LATTICE1 -init-mesh MESH0
-write-mesh MESH1).
4- And repeat the merging step (3) using the previous mesh to initialize
the next lattice.

I'd really appreciate it if anyone could tell me if the described procedure
is the correct one, or provide me more information about it.
Thanks a lot in advance,

Cristina
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131001/2a48da2c/attachment.html>

From ammansik at cis.hut.fi  Tue Oct  1 06:49:05 2013
From: ammansik at cis.hut.fi (=?ISO-8859-1?Q?Andr=E9_Mansikkaniemi?=)
Date: Tue, 01 Oct 2013 16:49:05 +0300
Subject: [SRILM User List] Lattice-tool and -word-posteriors-for-sentences
Message-ID: <524AD2D1.4080600@cis.hut.fi>

Hi,

Been trying to use the lattice-tool and the
'-word-posteriors-for-sentences' option to calculate posterior
probabilities for words in an ASR output hypothesis.

So I have a lattice and hypothesis file to begin with, and run the
following commands to generate the posterior probabilities.

lattice-tool -read-htk -in-lattice test.lat -write-mesh test.mesh
lattice-tool -read-mesh -in-lattice test.mesh
-word-posteriors-for-sentences test.hyp > test.posteriors

Is this the correct way to do it, since I always end up getting 0
posterior probabilities for all words in the sentence? I tried to
replace the test.hyp with output generated from lattice-tool's own
-viterbi-decode but result is still the same.

BR,
Andr?


From stolcke at icsi.berkeley.edu  Tue Oct  1 10:05:55 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Tue, 01 Oct 2013 10:05:55 -0700
Subject: [SRILM User List] Confusion network combination
In-Reply-To: <CALkXYSW7KsBLhfJsgehKRyqZo5VSS22Y1WuqJvmqE7kdhPWRdw@mail.gmail.com>
References: <CALkXYSW7KsBLhfJsgehKRyqZo5VSS22Y1WuqJvmqE7kdhPWRdw@mail.gmail.com>
Message-ID: <524B00F3.9080006@icsi.berkeley.edu>

On 10/1/2013 5:09 AM, Cristina Guerrero wrote:
> Hello,
> I am looking for information to accomplish 
> confusion-network-combination with the SRILM toolkit. I want to use 
> the lattices generated by different speech recognizers over the same 
> speech segment. I haven't found a detailed description of the steps to 
> follow, so here is what I'm doing right now:
> 1- Extract lattices from the various recognizers (With HTK in my case)
> 2- Take one of these lattices as a starting point and convert it into 
> a confusion network MESH0 (lattice-tool -read-htk -in-lattice LATTICE0 
> write-mesh MESH0). *I know -posterior-prune can be applied to the 
> lattice before building the mesh for better results according to the 
> "Finding consensus.." paper.
> 3- Then, take the next lattice (LATTICE1) and merge it with the 
> previously generated mesh ( lattice-tool -in-lattice LATTICE1 
> -init-mesh MESH0 -write-mesh MESH1).
> 4- And repeat the merging step (3) using the previous mesh to 
> initialize the next lattice.

What you're doing works, but it is a roundabout way to perform confusion 
network combination and you don't have control over the weighting of the 
posterior probabilities in each input CN.  A more straightforward 
approach is to dump the various CNs for each utterance, then combine 
them in one step using the command

nbest-lattice -use-mesh -lattice-files FILE

where FILE contains a list of CN files and associated weights (see man 
page for details).

Andreas


From stolcke at icsi.berkeley.edu  Tue Oct  1 10:11:26 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Tue, 01 Oct 2013 10:11:26 -0700
Subject: [SRILM User List] Lattice-tool and
	-word-posteriors-for-sentences
In-Reply-To: <524AD2D1.4080600@cis.hut.fi>
References: <524AD2D1.4080600@cis.hut.fi>
Message-ID: <524B023E.7000307@icsi.berkeley.edu>

On 10/1/2013 6:49 AM, Andr? Mansikkaniemi wrote:
> Hi,
>
> Been trying to use the lattice-tool and the
> '-word-posteriors-for-sentences' option to calculate posterior
> probabilities for words in an ASR output hypothesis.
>
> So I have a lattice and hypothesis file to begin with, and run the
> following commands to generate the posterior probabilities.
>
> lattice-tool -read-htk -in-lattice test.lat -write-mesh test.mesh
> lattice-tool -read-mesh -in-lattice test.mesh
> -word-posteriors-for-sentences test.hyp > test.posteriors

Try

	lattice-tool -read-htk -in-lattice test.lat -word-posteriors-for-sentences test.hyp > test.posteriors

The -word-posteriors-for-sentences option triggers CN construction from the input lattice, and then aligns each line in test.hyp to that CN.

Andreas
  

From stolcke at icsi.berkeley.edu  Tue Oct  1 10:24:41 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Tue, 01 Oct 2013 10:24:41 -0700
Subject: [SRILM User List] Count-lm reference request
In-Reply-To: <8D08C80B82D9757-1094-339A7@webmail-d268.sysops.aol.com>
References: <EA8C89F2-6271-4681-B4EB-11CFBC5E7A83@my.ku.edu.tr>
	<8D08C80B82D9757-1094-339A7@webmail-d268.sysops.aol.com>
Message-ID: <524B0559.1080200@icsi.berkeley.edu>

On 9/30/2013 10:46 PM, E wrote:
> Hello,
>
> I'm trying to understand the meaning of "google.count.lm0" file as 
> given in FAQ section on creating LM from Web1T corpus. From what I 
> read in Sec 11.4.1 Deleted Interpolation Smoothing in Spoken Language 
> Processing, by Huang et al.
> (equation 11.22) bigram case
>
> P(w_i | w_{i-1}) = \lambda * P_{MLE}(w_i | w_{i-1}) + (1 - \lambda) * 
> P(w_i)
>
> They call \lambda's as the mixture weights. I wonder if they are 
> conceptually the same as the ones used in google.countlm. If so why 
> are they arranged in a 15x5 matrix? Where can I read more about the same?

I don't have access to the book chapter you cite, but from the equation 
it looks like a single fixed interpolation weight is used.

In the SRILM count-lm implementation you have separate lambdas assigned 
to different groups of context ngrams, as a function of the frequency of 
those contexts.  This is what is called "Jelinek-Mercer" smoothing in 
http://acl.ldc.upenn.edu/P/P96/P96-1041.pdf , where the bucketing of the 
contexts is done based on frequency (as suggested in the paper).  The 
specifics are spelled out in the ngram(1) man page.  The relevant bits are:

                    mixweights M
                     w01 w02 ... w0N
                     w11 w12 ... w1N
                     ...
                     wM1 wM2 ... wMN
                    countmodulus m

              M specifies the number of mixture weight bins (minus  
1).   m  is
               the  width  of a mixture weight bin.  Thus, wij is the 
mixture weight used to interpolate an j-th order
               maximum-likelihood estimate with lower-order estimates 
given that the (j-1)-gram context has been  seen
               with  a  frequency between i*m and (i+1)*m-1 times.  (For 
contexts with frequency greater than M*m, the
               i=M weights are used.)


Andreas


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131001/1803f045/attachment.html>

From stolcke at icsi.berkeley.edu  Tue Oct  1 21:20:15 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Tue, 01 Oct 2013 21:20:15 -0700
Subject: [SRILM User List] 1-count Higher order ngrams not excluded by
 gtmin
In-Reply-To: <CANVWLK7Uamzg7y6r+24Zs_SPnySCifrKHQ4B-NYSuS5mENG3oQ@mail.gmail.com>
References: <CANVWLK7Uamzg7y6r+24Zs_SPnySCifrKHQ4B-NYSuS5mENG3oQ@mail.gmail.com>
Message-ID: <524B9EFF.9050307@icsi.berkeley.edu>

On 9/28/2013 12:21 AM, Mohammed Mediani wrote:
> Dear Andreas,
> I noticed that when I train a 6-gram KN LM, I get some 1-count ngrams 
> which are no prefixes of any higher order ngrams in the 4 and 3 
> models. Are those another exception besides the one stated in Warning4 
> (http://www.speech.sri.com/projects/srilm/manpages/ngram-discount.7.html)? 
>
SRILM always includes unigrams for all words in the LM vocabulary. This 
happens to make up for some limitations of the ARPA format. It does not 
allow a separate definition of what the LM vocabulary is, so it is 
implicitly defined by the unigram list.  Also, there is no way to 
specify a backoff to "zero-grams" (uniform distribution), so unigram 
probabilities for all words (whether observed in the training set or 
not) are given explicitly.

Andreas


From otheremailid at aol.com  Wed Oct  2 01:16:03 2013
From: otheremailid at aol.com (E)
Date: Wed, 2 Oct 2013 04:16:03 -0400 (EDT)
Subject: [SRILM User List] Count-lm reference request
In-Reply-To: <524B0559.1080200@icsi.berkeley.edu>
References: <EA8C89F2-6271-4681-B4EB-11CFBC5E7A83@my.ku.edu.tr>
	<8D08C80B82D9757-1094-339A7@webmail-d268.sysops.aol.com>
	<524B0559.1080200@icsi.berkeley.edu>
Message-ID: <8D08D5EC52F045F-1094-3E82E@webmail-d268.sysops.aol.com>

Thanks for the pointers! Three questions - 


1. The same number of bins are used for all n-grams even though number of ngrams for each N may differ. In web1T,  

Number of unigrams:         13,588,391
Number of fivegrams:     1,176,470,663


Would it make any improvement if fivegrams were binned more number of times than unigrams?

  
2. For a particular ngram in test data, the algorithm will decide which bin Wij's to use based on how many times that n-gram occurred in training data. Is this right?


3. What does it mean when some weights are zero after tuning them. I used just 10 sentences  (5 repeated) in tune.txt and got google.countlm as at the bottom.


For ex. w01, w02 are non-zero but w03 is zero. Does this mean that in the development set, there were no trigrams that corresponded to counts in bin 0?


order 5                                                                                                                                 
mixweights 15
 0.5 0.5 0 0 0 
 0.5 0.5 0 0 0 
 0.5 0.5 0 0 0 
 0.5 0.5 0.5 0.5 0.198641
 0.5 0.5 0 0 0 
 0.5 0.5 0.5 0 0.5 
 0.5 0.5 0.5 0.5 0
 0.5 0.5 0.5 0 0.5 
 0.5 0.5 0.5 0.5 0
 0.5 0.5 0 0 0.5 
 0.5 0.5 0.054722 0 0.5 
 0.5 0.5 0.5 0.5 0.5 
 0.5 0.5 0.5 0.5 0
 0.5 0.5 0.5 0.5 0.5 
 0.5 0.5 0.5 0 0.5 
 1 1.97997e-05 0.0844577 0.030065 3.44131e-06
countmodulus 40
vocabsize 13588391
totalcount 4294967295

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131002/00dee2c2/attachment.html>

From stolcke at icsi.berkeley.edu  Wed Oct  2 08:55:51 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Wed, 02 Oct 2013 08:55:51 -0700
Subject: [SRILM User List] Count-lm reference request
In-Reply-To: <8D08D5EC52F045F-1094-3E82E@webmail-d268.sysops.aol.com>
References: <EA8C89F2-6271-4681-B4EB-11CFBC5E7A83@my.ku.edu.tr>
	<8D08C80B82D9757-1094-339A7@webmail-d268.sysops.aol.com>
	<524B0559.1080200@icsi.berkeley.edu>
	<8D08D5EC52F045F-1094-3E82E@webmail-d268.sysops.aol.com>
Message-ID: <524C4207.8070209@icsi.berkeley.edu>

On 10/2/2013 1:16 AM, E wrote:
> Thanks for the pointers! Three questions -
>
> 1. The same number of bins are used for all n-grams even though number 
> of ngrams for each N may differ. In web1T,
> Number of unigrams:         13,588,391
> Number of fivegrams:     1,176,470,663
> Would it make any improvement if fivegrams were binned more number of 
> times than unigrams?
That's good idea, but I haven't tried it, so I cannot say how much it 
would help.
It might also help to just have more bins for lower-order ngrams since 
there are more samples of them (more data, hence more parameters can be 
estimated).

>
> 2. For a particular ngram in test data, the algorithm will decide 
> which bin Wij's to use based on how many times that n-gram occurred in 
> training data. Is this right?
Right.

>
> 3. What does it mean when some weights are zero after tuning them. I 
> used just 10 sentences  (5 repeated) in tune.txt and got 
> google.countlm as at the bottom.
>
> For ex. w01, w02 are non-zero but w03 is zero. Does this mean that in 
> the development set, there were no trigrams that corresponded to 
> counts in bin 0?

Correct.

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131002/54716f61/attachment.html>

From otheremailid at aol.com  Wed Oct  9 02:05:55 2013
From: otheremailid at aol.com (E)
Date: Wed, 9 Oct 2013 05:05:55 -0400 (EDT)
Subject: [SRILM User List] ngram-count hangs and other problems
Message-ID: <8D092E5E574EFD0-1114-1F1@webmail-m231.sysops.aol.com>


Hello,


Please find my files here  http://goo.gl/WVMEcw


To keep file size small I've only shared unigram counts. When I run the following command-


ngram-count -order 1 -vocab wordList -read ngramCounts -lm ug.lm


I get below output-
warning: no singleton counts
GT discounting disabled
BOW numerator for context "" is -126.947 < 0


I understand that the "singleton" warning is because there are no ngrams that occur only once. Still the "ug.lm" file is generated.


Two issues-
If I use the following command suggested elsewhere in the mailing list to fix "BOW numerator .." warning, I get more warnings and the original warning is still present.


ngram -lm ug.lm -renorm -write-lm ug_norm.lm


If to fix the "singleton" warning, I use WittenBell smoothing (As advised in another thread here), ngram-count hangs indefinitely.


ngram-count -order 1 -vocab wordList -read ngramCounts -lm ug.lm -wbdiscount1


How do I debug this issue?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131009/8234bbd0/attachment.html>

From otheremailid at aol.com  Wed Oct  9 07:31:43 2013
From: otheremailid at aol.com (E)
Date: Wed, 9 Oct 2013 10:31:43 -0400 (EDT)
Subject: [SRILM User List] ngram-count hangs and other problems
In-Reply-To: <8D092E5E574EFD0-1114-1F1@webmail-m231.sysops.aol.com>
References: <8D092E5E574EFD0-1114-1F1@webmail-m231.sysops.aol.com>
Message-ID: <8D0931368982F22-1114-246A@webmail-m231.sysops.aol.com>


Perhaps the ngramCount file I used crosses some limit on count of a particular ngram. Because some very large count words have positive log probability in the "ug.lm" file. BTW I used bin/i686/ngram-count executable.
I used Web1T to obtain these counts. Is there a workaround, like assigning artificial counts (= upperlimit) to the troublesome ngrams?


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131009/c1293197/attachment.html>

From stolcke at icsi.berkeley.edu  Wed Oct  9 09:43:37 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Wed, 09 Oct 2013 09:43:37 -0700
Subject: [SRILM User List] ngram-count hangs and other problems
In-Reply-To: <8D0931368982F22-1114-246A@webmail-m231.sysops.aol.com>
References: <8D092E5E574EFD0-1114-1F1@webmail-m231.sysops.aol.com>
	<8D0931368982F22-1114-246A@webmail-m231.sysops.aol.com>
Message-ID: <525587B9.8000607@icsi.berkeley.edu>

On 10/9/2013 7:31 AM, E wrote:
> Perhaps the ngramCount file I used crosses some limit on count of a 
> particular ngram. Because some very large count words have positive 
> log probability in the "ug.lm" file. BTW I used bin/i686/ngram-count 
> executable.
> I used Web1T to obtain these counts. Is there a workaround, like 
> assigning artificial counts (= upperlimit) to the troublesome ngrams?

My suspicion is that you're exceeding memory limits with this data.  
Possibly you are also exceeding the range of 32bit integers with some 
large unigram counts.

1) Make sure you're building 64-bit executables.   If "file 
bin/i686/ngram-count" says that it's an 32-bit executable, do a "make 
clean" and rebuilt with "make MACHINE_TYPE=i686-m64  ..." .

2) To find out what the memory demand of your job is, try scaling back 
the data size (say take 1/100 or 1/10 of it), and monitor the memory 
usage with "top" or similar utility.  Then extrapolate (linearly) to the 
full data size.

3) If you find your computer doesn't have enough memory try the memory 
saving techniques discussed at 
http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.7.html under 
"Large data and memory issues".

Good luck!

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131009/a86a740a/attachment.html>

From rimlaatar at yahoo.fr  Thu Oct 10 01:45:24 2013
From: rimlaatar at yahoo.fr (Laatar Rim)
Date: Thu, 10 Oct 2013 09:45:24 +0100 (BST)
Subject: [SRILM User List] installation srilm
Message-ID: <1381394724.72897.YahooMailNeo@web133001.mail.ir2.yahoo.com>

Hello,
I tried to install on my machine SRIL Ubunti 12.04 i686, I followed the following steps:
1. I downloaded the file
2. I decompress
3. I edit the file:Makefile.machine.i686


# Tcl support (standard in Linux)
???? TCL_INCLUDE = /usr/include/tcl8.5

???? TCL_LIBRARY = -ltcl8.5


and 

?? GCC_FLAGS = -mtune=pentium3 -Wall -Wno-unused-variable -Wno-uninitialized
?? CC = /usr/bin/gcc $(GCC_FLAGS)
?? CXX = /usr/bin/gcc/g++ $(GCC_FLAGS)

but when I execute the command make World? it shows me the following error:

hp at ubuntu:~/SRILM/srilm$ make World
make: /home/hp/SRILM/srilm/bin/i686/sbin/machine-type : commande introuvable
Makefile:13: /home/hp/SRILM/srilm/bin/i686/common/Makefile.common.variables: Aucun fichier ou dossier de ce type
make: *** Pas de r?gle pour fabriquer la cible ? /home/hp/SRILM/srilm/bin/i686/common/Makefile.common.variables ?. Arr?t.
hp at ubuntu:~/SRILM/srilm$

plz help me !!


?
----
Cordialement


Rim LAATAR?
Ing?nieur? Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax(ENIS)
?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles Technologies ? laFSEGS?--Option TALN
Site web:Rim LAATAR BEN SAID
Tel: (+216) 99 64 74 98?
----
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131010/aac2e974/attachment.html>

From otheremailid at aol.com  Thu Oct 10 05:37:34 2013
From: otheremailid at aol.com (E)
Date: Thu, 10 Oct 2013 08:37:34 -0400 (EDT)
Subject: [SRILM User List] ngram-count hangs and other problems
Message-ID: <8D093CCA0EFA02B-1114-B92D@webmail-m231.sysops.aol.com>


Thanks!


>1) Make sure you're building 64-bit executables.   If "file bin/i686/ngram-count" says that it's          an 32-bit >executable, do a "make clean" and rebuilt with "make          MACHINE_TYPE=i686-m64  ..." .
          
This worked. I had to use "make OPTION=_l" though. Now there is no problem of ngrams with positive log probability.          


But when I run below command-


bin/i686_l/ngram-count -order 1 -vocab wordList -read ngramCounts -lm ug.lm -wbdiscount1


The memory usage is not much (~ 5mb) but the CPU usage is in high 90's. I tried your suggestion to scale down data. Just used 100 unigrams and the *.lm file was created within minutes. 


And for the complete data, using -wbdiscount took about 2 hours. 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131010/3d18dc6e/attachment.html>

From stolcke at icsi.berkeley.edu  Thu Oct 10 07:40:56 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Thu, 10 Oct 2013 07:40:56 -0700
Subject: [SRILM User List] installation srilm
In-Reply-To: <1381394724.72897.YahooMailNeo@web133001.mail.ir2.yahoo.com>
References: <1381394724.72897.YahooMailNeo@web133001.mail.ir2.yahoo.com>
Message-ID: <5256BC78.2050000@icsi.berkeley.edu>

On 10/10/2013 1:45 AM, Laatar Rim wrote:
> Hello,
> I tried to install on my machine SRIL Ubunti 12.04 i686, I followed 
> the following steps:
> 1. I downloaded the file
> 2. I decompress
> 3. I edit the file:Makefile.machine.i686
>
>
> # Tcl support (standard in Linux)
>      TCL_INCLUDE = /usr/include/tcl8.5
This needs to be -I/usr/include/tcl8.5  .
Make there sure is a tcl.h file in that directory.

>
>      TCL_LIBRARY = -ltcl8.5
>
>
>
> and
>
>    GCC_FLAGS = -mtune=pentium3 -Wall -Wno-unused-variable 
> -Wno-uninitialized
>    CC = /usr/bin/gcc $(GCC_FLAGS)
>    CXX = /usr/bin/gcc/g++ $(GCC_FLAGS)
>
> but when I execute the command make World  it shows me the following 
> error:
>
> hp at ubuntu:~/SRILM/srilm$ make World
> make: /home/hp/SRILM/srilm/bin/i686/sbin/machine-type : commande 
> introuvable
The SRILM variable needs to point to the top of the directory tree, not 
the bin/i686 directory.
The machine-type script lives in $SRILM/sbin .

Andreas

> Makefile:13: 
> /home/hp/SRILM/srilm/bin/i686/common/Makefile.common.variables: Aucun 
> fichier ou dossier de ce type
> make: *** Pas de r?gle pour fabriquer la cible ? 
> /home/hp/SRILM/srilm/bin/i686/common/Makefile.common.variables ?. Arr?t.
> hp at ubuntu:~/SRILM/srilm$
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131010/5cfbb7ad/attachment.html>

From stolcke at icsi.berkeley.edu  Thu Oct 10 07:51:14 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Thu, 10 Oct 2013 07:51:14 -0700
Subject: [SRILM User List] ngram-count hangs and other problems
In-Reply-To: <8D093CCA0EFA02B-1114-B92D@webmail-m231.sysops.aol.com>
References: <8D093CCA0EFA02B-1114-B92D@webmail-m231.sysops.aol.com>
Message-ID: <5256BEE2.5080504@icsi.berkeley.edu>

On 10/10/2013 5:37 AM, E wrote:
>
>
> Thanks!
>
> >1) Make sure you're building 64-bit executables.   If "file 
> bin/i686/ngram-count" says that it's an 32-bit >executable, do a "make 
> clean" and rebuilt with "make MACHINE_TYPE=i686-m64  ..." .
>
> This worked. I had to use "make OPTION=_l" though. Now there is no 
> problem of ngrams with positive log probability.
FYI,  OPTION=_l triggers the use of 64-bit integer counts stored in a 
lookup table so that each instance takes only 32bits (assuming the 
counts are used sparsely).   This is a way to support large counts on 
32bit machines, but doesn't really make sense on 64-bit machines.

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131010/315292d7/attachment.html>

From xulikui123321 at 163.com  Mon Oct 14 18:22:08 2013
From: xulikui123321 at 163.com (=?GBK?B?0Ow=?=)
Date: Tue, 15 Oct 2013 09:22:08 +0800 (CST)
Subject: [SRILM User List] ngam stdin/stdout
Message-ID: <748c760b.1f52.141b9b4b0e6.Coremail.xulikui123321@163.com>

when compute perplexity with ngram, the usage is  ngram -lm language.lm -order 4 -ppl test.txt. 
but now I want  compute perplexity with ngram from stdin, what's  the command should i use?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131015/05c6d1a0/attachment.html>

From venkataraman.anand at gmail.com  Mon Oct 14 18:29:52 2013
From: venkataraman.anand at gmail.com (Anand Venkataraman)
Date: Mon, 14 Oct 2013 18:29:52 -0700
Subject: [SRILM User List] ngam stdin/stdout
In-Reply-To: <748c760b.1f52.141b9b4b0e6.Coremail.xulikui123321@163.com>
References: <748c760b.1f52.141b9b4b0e6.Coremail.xulikui123321@163.com>
Message-ID: <CAHzk6qf7g-hYjEzv5gn_tyzj4S8oPcVoUP2qrY3CK2LeR+ftQw@mail.gmail.com>

Pipe into the command and use - (hyphen) for the arg to -ppl

&
--
Sent from my Google Nexus
On Oct 14, 2013 6:26 PM, "?" <xulikui123321 at 163.com> wrote:

> when compute perplexity with ngram, the usage is  ngram -lm language.lm
> -order 4 -ppl test.txt.
> but now I want  compute perplexity with ngram from stdin, what's  the
> command should i use?
>
>
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131014/04974c50/attachment.html>

From venkataraman.anand at gmail.com  Mon Oct 14 21:26:16 2013
From: venkataraman.anand at gmail.com (Anand Venkataraman)
Date: Mon, 14 Oct 2013 21:26:16 -0700
Subject: [SRILM User List] ngam stdin/stdout
In-Reply-To: <32a5560f.74ec.141ba375c87.Coremail.xulikui123321@163.com>
References: <748c760b.1f52.141b9b4b0e6.Coremail.xulikui123321@163.com>
	<CAHzk6qf7g-hYjEzv5gn_tyzj4S8oPcVoUP2qrY3CK2LeR+ftQw@mail.gmail.com>
	<32a5560f.74ec.141ba375c87.Coremail.xulikui123321@163.com>
Message-ID: <CAF6FMTWTQv92t8nkC-unq0-5Tj3EFDvrwO5fauKQ+SYKtvPrrg@mail.gmail.com>

bash$ echo $SENTENCE | ngram ... -ppl -

However, if you're planning to do this on a per-sentence basis, it's
inefficient.

You should ideally compute it on whole files using ngram -debug 1 and
post-process the output to extract ppls for individual sentences. That way
you can get away with not having to invoke ngram/load the lm multiple times.

&


On Mon, Oct 14, 2013 at 8:44 PM, ? <xulikui123321 at 163.com> wrote:

> thank you very much, the problem is solved! if I want to computer
> perplexity of sentence, not a file, like ngram -lm language.lm -order 4
> -ppl " I MISS YOU" ,what's  the command should i use( i want to
> compute perplexity with ngram on hadoop)?
>
>
>
>
>
>
> At 2013-10-15 09:29:52,"Anand Venkataraman" <venkataraman.anand at gmail.com>
> wrote:
>
> Pipe into the command and use - (hyphen) for the arg to -ppl
>
> &
> --
> Sent from my Google Nexus
> On Oct 14, 2013 6:26 PM, "?" <xulikui123321 at 163.com> wrote:
>
>> when compute perplexity with ngram, the usage is  ngram -lm language.lm
>> -order 4 -ppl test.txt.
>> but now I want  compute perplexity with ngram from stdin, what's  the
>> command should i use?
>>
>>
>>
>> _______________________________________________
>> SRILM-User site list
>> SRILM-User at speech.sri.com
>> http://www.speech.sri.com/mailman/listinfo/srilm-user
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131014/778d2cd7/attachment.html>

From stolcke at icsi.berkeley.edu  Mon Oct 14 21:42:44 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Mon, 14 Oct 2013 21:42:44 -0700
Subject: [SRILM User List] ngam stdin/stdout
In-Reply-To: <CAF6FMTWTQv92t8nkC-unq0-5Tj3EFDvrwO5fauKQ+SYKtvPrrg@mail.gmail.com>
References: <748c760b.1f52.141b9b4b0e6.Coremail.xulikui123321@163.com>	<CAHzk6qf7g-hYjEzv5gn_tyzj4S8oPcVoUP2qrY3CK2LeR+ftQw@mail.gmail.com>	<32a5560f.74ec.141ba375c87.Coremail.xulikui123321@163.com>
	<CAF6FMTWTQv92t8nkC-unq0-5Tj3EFDvrwO5fauKQ+SYKtvPrrg@mail.gmail.com>
Message-ID: <525CC7C4.4070809@icsi.berkeley.edu>

On 10/14/2013 9:26 PM, Anand Venkataraman wrote:
> bash$ echo $SENTENCE | ngram ... -ppl -
>
> However, if you're planning to do this on a per-sentence basis, it's 
> inefficient.
>
> You should ideally compute it on whole files using ngram -debug 1 and 
> post-process the output to extract ppls for individual sentences. That 
> way you can get away with not having to invoke ngram/load the lm 
> multiple times.
FYI, the ngram -escape option was created to embed useful 
metainformation in the input that is passed through to the output. This 
allows you post-process the output and associate the ppl information 
with subdivisions of the input stream, if needed.

Andreas

>
> &
>
>
> On Mon, Oct 14, 2013 at 8:44 PM, ? <xulikui123321 at 163.com 
> <mailto:xulikui123321 at 163.com>> wrote:
>
>     thank you very much, the problem is solved! if I want to computer
>     perplexity of sentence, not a file, like ngram -lm language.lm
>     -order 4 -ppl " I MISS YOU" ,what's  the command should i use( i
>     want to compute perplexity with ngram on hadoop)?
>
>
>
>
>
>
>     At 2013-10-15
>     09:29:52,"Anand Venkataraman" <venkataraman.anand at gmail.com
>     <mailto:venkataraman.anand at gmail.com>> wrote:
>
>         Pipe into the command and use - (hyphen) for the arg to -ppl
>
>         &
>         --
>         Sent from my Google Nexus
>
>         On Oct 14, 2013 6:26 PM, "?" <xulikui123321 at 163.com
>         <mailto:xulikui123321 at 163.com>> wrote:
>
>             when compute perplexity with ngram, the usage is  ngram
>             -lm language.lm -order 4 -ppl test.txt.
>             but now I want compute perplexity with ngram from stdin,
>             what's  the command should i use?
>
>
>
>             _______________________________________________
>             SRILM-User site list
>             SRILM-User at speech.sri.com <mailto:SRILM-User at speech.sri.com>
>             http://www.speech.sri.com/mailman/listinfo/srilm-user
>
>
>
>
>
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131014/e1077a6a/attachment.html>

From xulikui123321 at 163.com  Tue Oct 15 03:59:39 2013
From: xulikui123321 at 163.com (=?GBK?B?0Ow=?=)
Date: Tue, 15 Oct 2013 18:59:39 +0800 (CST)
Subject: [SRILM User List] use ngram computer perplexity of per line string
	on hadoop
Message-ID: <5a51f5b2.114d9.141bbc56c35.Coremail.xulikui123321@163.com>

I have solved the problem of computer perplexity of per line string by use ngram, as follows(perl script):
foreach my $line(@lines){
$str = `echo $line | ngram -lm news.lm -ppl - -debug 1`;
print $str ."\n";
}
but if the language model too big, I should load the language every line. that's a waste of time, is there some method that I only should load the language model once? like replace the lm file (news.lm) with a file pointer?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131015/075adc3a/attachment.html>

From christophe.servan at gmail.com  Tue Oct 15 05:56:48 2013
From: christophe.servan at gmail.com (Christophe Servan)
Date: Tue, 15 Oct 2013 14:56:48 +0200
Subject: [SRILM User List] use ngram computer perplexity of per line
 string on hadoop
In-Reply-To: <5a51f5b2.114d9.141bbc56c35.Coremail.xulikui123321@163.com>
References: <5a51f5b2.114d9.141bbc56c35.Coremail.xulikui123321@163.com>
Message-ID: <CAAsGDkrpWng2n4QvbmkAXXC9YXw+_eFSAE9T55_Q4VOvyrxVWA@mail.gmail.com>

Hi,
long ago I used the ngram program as server. It is related to the switch
-server-port.
This may be a solution.

Best,

Christophe


2013/10/15 ? <xulikui123321 at 163.com>

> I have solved the problem of computer perplexity of per line string by use
> ngram, as follows(perl script):
> foreach my $line(@lines){
> $str = `echo $line | ngram -lm news.lm -ppl - -debug 1`;
> print $str ."\n";
> }
> but if the language model too big, I should load the language every line.
> that's a waste of time, is there some method that I only should load the
> language model once? like replace the lm file (news.lm) with a file
> pointer?
>
>
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131015/44cd6ffb/attachment.html>

From cristinaguerreroflores at gmail.com  Wed Oct 16 06:35:57 2013
From: cristinaguerreroflores at gmail.com (Cristina Guerrero)
Date: Wed, 16 Oct 2013 15:35:57 +0200
Subject: [SRILM User List] Hypothesis from a mesh
Message-ID: <CALkXYSUiAb83VUOx__XqwAqMTbooM9WHgXuHGaD+Xw4LaKca+Q@mail.gmail.com>

I'm working with confusion networks/sausages.
Observing the posteriors in each 'align' it seems that the command:
lattice-tool -read-mesh -in-lattice MY.MESH -viterbi-decode
extracts the hypothesis with the lowest WER (it is with the highest
posteriors per align). Is this correct? From my observation, using
"-posterior-decode" doesn't extract the best-hypothesis out of a mesh.

Cristina
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131016/805ef292/attachment.html>

From stolcke at icsi.berkeley.edu  Wed Oct 16 09:28:22 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Wed, 16 Oct 2013 09:28:22 -0700
Subject: [SRILM User List] Hypothesis from a mesh
In-Reply-To: <CALkXYSUiAb83VUOx__XqwAqMTbooM9WHgXuHGaD+Xw4LaKca+Q@mail.gmail.com>
References: <CALkXYSUiAb83VUOx__XqwAqMTbooM9WHgXuHGaD+Xw4LaKca+Q@mail.gmail.com>
Message-ID: <525EBEA6.8030906@icsi.berkeley.edu>

On 10/16/2013 6:35 AM, Cristina Guerrero wrote:
> I'm working with confusion networks/sausages.
> Observing the posteriors in each 'align' it seems that the command: 
> lattice-tool -read-mesh -in-lattice MY.MESH -viterbi-decode
> extracts the hypothesis with the lowest WER (it is with the highest 
> posteriors per align). Is this correct? From my observation, using 
> "-posterior-decode" doesn't extract the best-hypothesis out of a mesh.

Posterior decoding will extract the words with the highest posterior 
probability estimates for each alignment position.   Of course these are 
ESTIMATES, and even if you had accurate posterior probabilities the 
actual words spoken could be different.   That's the nature of a 
probabilistic classifier!

Andreas


From prashant.mathur at xrce.xerox.com  Fri Oct 18 08:38:50 2013
From: prashant.mathur at xrce.xerox.com (MATHUR, Prashant)
Date: Fri, 18 Oct 2013 15:38:50 +0000
Subject: [SRILM User List] linear interpolation of LM
Message-ID: <6507F4CC05459348A4F0D2F41256C11E2B7DBB3F@engins.xrce.xeroxlabs.com>

Hi,

I wanted to know how do I do linear interpolation of several models given their weights?
Also, can I interpolated more than 10 models at once?

I tried several hit/trials so far. Nothing seems to work for me.

$ngram -lm small.lm.1 -lambda 0.7 -mix-lm big.lm.1 -unk
It doesn't throw any output or error.

when I try -write option
$ngram -lm small.lm.1 -lambda 0.7 -mix-lm big.lm.1 -unk  -write-lm mixed.lm.1
write() method not implemented
error writing mixed.lm.1

but when I try
$ngram -lm small.lm.1 -lambda 0.7 -mix-lm2 big.lm.1 -mix-lambda2 0.3 -unk  -write-lm mixed.lm.1
Then the mixed.lm.1 file is the same as small.lm.1

My SRILM version is 1.5.3.
I read that there are many ways of interpolation such as count-based and log-linear interpolation.
I tried the options -count-lm (throws a format error), -loglinear-mix (didn't do anything)

I am out of options. Please help!


Thanks,
--
Prashant
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131018/0e938dd2/attachment.html>

From stolcke at icsi.berkeley.edu  Fri Oct 18 16:40:24 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Fri, 18 Oct 2013 16:40:24 -0700
Subject: [SRILM User List] linear interpolation of LM
In-Reply-To: <6507F4CC05459348A4F0D2F41256C11E2B7DBB3F@engins.xrce.xeroxlabs.com>
References: <6507F4CC05459348A4F0D2F41256C11E2B7DBB3F@engins.xrce.xeroxlabs.com>
Message-ID: <5261C6E8.7090700@icsi.berkeley.edu>

On 10/18/2013 8:38 AM, MATHUR, Prashant wrote:
> Hi,
>
> I wanted to know how do I do linear interpolation of several models 
> given their weights?
> Also, can I interpolated more than 10 models at once?
>
> I tried several hit/trials so far. Nothing seems to work for me.
>
> $ngram -lm small.lm.1 -lambda 0.7 -mix-lm big.lm.1 -unk
> It doesn't throw any output or error.
>
> when I try -write option
> $ngram -lm small.lm.1 -lambda 0.7 -mix-lm big.lm.1 -unk  -write-lm 
> mixed.lm.1
> write() method not implemented
> error writing mixed.lm.1
The above command should work, unless you are not giving the complete 
command line in your example (if you add the -bayes option then you will 
see the "not implemented"  error).


>
> but when I try
> $ngram -lm small.lm.1 -lambda 0.7 -mix-lm2 big.lm.1 -mix-lambda2 0.3 
> -unk  -write-lm mixed.lm.1
> Then the mixed.lm.1 file is the same as small.lm.1
>
> My SRILM version is 1.5.3.
> I read that there are many ways of interpolation such as count-based 
> and log-linear interpolation.
> I tried the options -count-lm (throws a format error), -loglinear-mix 
> (didn't do anything)
>
> I am out of options. Please help!

You are using a very old version of SRILM.   Please get the latest 
stable version (1.7.0).

If you want to try the current beta version (1.7.1) you will find a new 
option (ngram -read-mix-lms)  that allows you to specify the mixture 
component LMs in a separate file, and also allows an arbitrary number of 
components.

Andreas


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131018/9cd84e4f/attachment.html>

From rimlaatar at yahoo.fr  Fri Oct 25 06:17:59 2013
From: rimlaatar at yahoo.fr (Laatar Rim)
Date: Fri, 25 Oct 2013 14:17:59 +0100 (BST)
Subject: [SRILM User List] installation srilm
Message-ID: <1382707079.14682.YahooMailNeo@web133002.mail.ir2.yahoo.com>

when i run: 

??? make World > make.output 2>&1

the result : 

?make: /sbin/machine-type : commande introuvable
Makefile:13: /common/Makefile.common.variables: Aucun fichier ou dossier de ce type
make: *** Pas de r?gle pour fabriquer la cible ? /common/Makefile.common.variables ?. Arr?t.

can you help me plz !!!

----
Cordialement


Rim LAATAR?
Ing?nieur? Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax(ENIS)
?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles Technologies ? laFSEGS?--Option TALN
Site web:Rim LAATAR BEN SAID
Tel: (+216) 99 64 74 98?
----
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131025/a4ba2737/attachment.html>

From christophe.servan at gmail.com  Fri Oct 25 06:24:52 2013
From: christophe.servan at gmail.com (Christophe Servan)
Date: Fri, 25 Oct 2013 15:24:52 +0200
Subject: [SRILM User List] installation srilm
In-Reply-To: <1382707079.14682.YahooMailNeo@web133002.mail.ir2.yahoo.com>
References: <1382707079.14682.YahooMailNeo@web133002.mail.ir2.yahoo.com>
Message-ID: <CAAsGDkqyto2hqaNzW=Py5JUT1iupzSrowm3Yz376d6y-ieY=Qw@mail.gmail.com>

Hi,
you have to set the SRILM environment variable before launching the
compilation process.

Best,

Christophe


Le 25 octobre 2013 15:17, Laatar Rim <rimlaatar at yahoo.fr> a ?crit :

> when i run:
>
>     make World > make.output 2>&1
>
> the result :
>  make: /sbin/machine-type : commande introuvable
> Makefile:13: /common/Makefile.common.variables: Aucun fichier ou dossier
> de ce type
> make: *** Pas de r?gle pour fabriquer la cible ?
> /common/Makefile.common.variables ?. Arr?t.
>
> can you help me plz !!!
> ----
> Cordialement
>
> *Rim LAATAR *
> Ing?nieur  Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax (ENIS<http://www.enis.rnu.tn/>
> )
> ?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles
> Technologies ? la FSEGS <http://www.fsegs.rnu.tn/> --Option TALN
> Site web: Rim LAATAR BEN SAID<https://sites.google.com/site/rimlaatarbnsaid/>
> Tel: (+216) 99 64 74 98
> ----
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131025/3b24fe99/attachment.html>

From christophe.servan at gmail.com  Fri Oct 25 06:31:25 2013
From: christophe.servan at gmail.com (Christophe Servan)
Date: Fri, 25 Oct 2013 15:31:25 +0200
Subject: [SRILM User List] installation srilm
In-Reply-To: <1382707742.7358.YahooMailNeo@web133006.mail.ir2.yahoo.com>
References: <1382707079.14682.YahooMailNeo@web133002.mail.ir2.yahoo.com>
	<CAAsGDkqyto2hqaNzW=Py5JUT1iupzSrowm3Yz376d6y-ieY=Qw@mail.gmail.com>
	<1382707742.7358.YahooMailNeo@web133006.mail.ir2.yahoo.com>
Message-ID: <CAAsGDkq+amUJJ7LZq2cFp=nS8LS+rU4XVeoGHTrCvJB-L9yv=A@mail.gmail.com>

You hate to set your SRILM variable like this :
export SRILM=/usr/local/srilm-1.4.5
you don't have to add it to your path.

Best,

Christophe


2013/10/25 Laatar Rim <rimlaatar at yahoo.fr>

> yes I executed these two commands:
> export SRILM=/usr/local/srilm-1.4.5/bin/i686/
> export PATH=$PATH:$SRILM
>
>
> ----
> Cordialement
>
> *Rim LAATAR *
> Ing?nieur  Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax (ENIS<http://www.enis.rnu.tn/>
> )
> ?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles
> Technologies ? la FSEGS <http://www.fsegs.rnu.tn/> --Option TALN
> Site web: Rim LAATAR BEN SAID<https://sites.google.com/site/rimlaatarbnsaid/>
> Tel: (+216) 99 64 74 98
> ----
>
>
>   Le Vendredi 25 octobre 2013 13h24, Christophe Servan <
> christophe.servan at gmail.com> a ?crit :
>  Hi,
> you have to set the SRILM environment variable before launching the
> compilation process.
>
> Best,
>
> Christophe
>
>
> Le 25 octobre 2013 15:17, Laatar Rim <rimlaatar at yahoo.fr> a ?crit :
>
> when i run:
>
>     make World > make.output 2>&1
>
> the result :
>  make: /sbin/machine-type : commande introuvable
> Makefile:13: /common/Makefile.common.variables: Aucun fichier ou dossier
> de ce type
> make: *** Pas de r?gle pour fabriquer la cible ?
> /common/Makefile.common.variables ?. Arr?t.
>
> can you help me plz !!!
> ----
> Cordialement
>
> *Rim LAATAR *
> Ing?nieur  Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax (ENIS<http://www.enis.rnu.tn/>
> )
> ?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles
> Technologies ? la FSEGS <http://www.fsegs.rnu.tn/> --Option TALN
> Site web: Rim LAATAR BEN SAID<https://sites.google.com/site/rimlaatarbnsaid/>
> Tel: (+216) 99 64 74 98
> ----
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131025/7f825a73/attachment.html>

From rimlaatar at yahoo.fr  Fri Oct 25 07:10:50 2013
From: rimlaatar at yahoo.fr (Laatar Rim)
Date: Fri, 25 Oct 2013 15:10:50 +0100 (BST)
Subject: [SRILM User List] role of make World
Message-ID: <1382710250.46434.YahooMailNeo@web133006.mail.ir2.yahoo.com>

hi,
what is the role of make world and how can i know tha srilm is perfectly installed? in my machine?
thanks

?
----
Cordialement


Rim LAATAR?
Ing?nieur? Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax(ENIS)
?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles Technologies ? laFSEGS?--Option TALN
Site web:Rim LAATAR BEN SAID
Tel: (+216) 99 64 74 98?
----
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131025/c314ee10/attachment.html>

From stolcke at icsi.berkeley.edu  Fri Oct 25 09:22:35 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Fri, 25 Oct 2013 09:22:35 -0700
Subject: [SRILM User List] role of make World
In-Reply-To: <1382710250.46434.YahooMailNeo@web133006.mail.ir2.yahoo.com>
References: <1382710250.46434.YahooMailNeo@web133006.mail.ir2.yahoo.com>
Message-ID: <526A9ACB.5080105@icsi.berkeley.edu>

On 10/25/2013 7:10 AM, Laatar Rim wrote:
> hi,
> what is the role of make world and how can i know tha srilm is 
> perfectly installed  in my machine?
> thanks
make World builds the SRILM binary libraries and executables and some 
scripts from source files, and installs them in $SRILM/lib and $SRILM/bin .

To verify that it works first see if $SRILM/bin/$MACHINE_TYPE is 
populated with executable files.  ($MACHINE_TYPE is a string identifying 
your platform, like i686 for Intel-based Linux).

make test will run a suite of tests of the SRILM tools and tell you if 
any unexpected results are found.

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131025/3b145739/attachment.html>

From vkepuska at fit.edu  Mon Oct 28 09:53:26 2013
From: vkepuska at fit.edu (Veton  Kepuska)
Date: Mon, 28 Oct 2013 16:53:26 +0000
Subject: [SRILM User List] My makefile fails in cygwin?
Message-ID: <1D4DDF8036F9CD4ABBCF12DECB691B126E320961@EX10-BE1.fit.edu>

Here is the error message:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
make release-libraries
make[1]: Entering directory '/cygdrive/u/public_html/ece5527/srilm'
for subdir in misc dstruct lm flm lattice utils; do \
        (cd $subdir/src; make SRILM=/cygdrive/u/public_html/ece5527/srilm MACHIN
E_TYPE=cygwin OPTION= MAKE_PIC= release-libraries) || exit 1; \
done
make[2]: Entering directory '/cygdrive/u/public_html/ece5527/srilm/misc/src'
g++ -Wall -Wno-unused-variable -Wno-uninitialized -DINSTANTIATE_TEMPLATES    -I.
-I../../include   -c -g -O2 -o ../obj/cygwin/File.o File.cc
In file included from File.cc:27:0:
srilm_iconv.h:15:25: fatal error: iconv.h: No such file or directory
# include_next <iconv.h>
                         ^
compilation terminated.
/cygdrive/u/public_html/ece5527/srilm/common/Makefile.common.targets:93: recipe
for target '../obj/cygwin/File.o' failed
make[2]: *** [../obj/cygwin/File.o] Error 1
make[2]: Leaving directory '/cygdrive/u/public_html/ece5527/srilm/misc/src'
Makefile:105: recipe for target 'release-libraries' failed
make[1]: *** [release-libraries] Error 1
make[1]: Leaving directory '/cygdrive/u/public_html/ece5527/srilm'
Makefile:54: recipe for target 'World' failed
make: *** [World] Error 2
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Can someone help?

Thanks

--Veton

[Google Groups]

SmartPhoneE

Visit this group<http://groups.google.com/group/smartphonee>

The learning and knowledge that we have, is, at the most, but little compared with that of which we are ignorant. - Plato
"Those that would give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." - Benjamin Franklin, A Historical Review of Pennsylvania, 1759
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dr. Veton K?puska, Associate Professor
ECE Department
Florida Institute of Technology
Olin Engineering Building
150 West University Blvd.
Melbourne, FL 32901-6975
Tel. (321) 674-7183
Mob. (321) 759-3157
E-mail: vkepuska at fit.edu<mailto:vkepuska at fit.edu>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The information transmitted (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is intended only for the person(s) or entity/entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131028/343e28fe/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 2676 bytes
Desc: image001.gif
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131028/343e28fe/attachment.gif>

From stolcke at icsi.berkeley.edu  Mon Oct 28 11:57:06 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Tue, 29 Oct 2013 02:57:06 +0800
Subject: [SRILM User List] My makefile fails in cygwin?
In-Reply-To: <1D4DDF8036F9CD4ABBCF12DECB691B126E320961@EX10-BE1.fit.edu>
References: <1D4DDF8036F9CD4ABBCF12DECB691B126E320961@EX10-BE1.fit.edu>
Message-ID: <526EB382.90203@icsi.berkeley.edu>


Your cygwin installation is missing the iconv package, it seems. Fire up 
the cygwin setup.exe , and when you get to the screen where you can 
modify what's to be installed, search for "iconv" and add it.

Andreas


On 10/29/2013 12:53 AM, Veton Kepuska wrote:
>
> Here is the error message:
>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>
> make release-libraries
>
> make[1]: Entering directory '/cygdrive/u/public_html/ece5527/srilm'
>
> for subdir in misc dstruct lm flm lattice utils; do \
>
>         (cd $subdir/src; make 
> SRILM=/cygdrive/u/public_html/ece5527/srilm MACHIN
>
> E_TYPE=cygwin OPTION= MAKE_PIC= release-libraries) || exit 1; \
>
> done
>
> make[2]: Entering directory 
> '/cygdrive/u/public_html/ece5527/srilm/misc/src'
>
> g++ -Wall -Wno-unused-variable -Wno-uninitialized 
> -DINSTANTIATE_TEMPLATES    -I.
>
> -I../../include   -c -g -O2 -o ../obj/cygwin/File.o File.cc
>
> In file included from File.cc:27:0:
>
> srilm_iconv.h:15:25: fatal error: iconv.h: No such file or directory
>
> # include_next <iconv.h>
>
>                          ^
>
> compilation terminated.
>
> /cygdrive/u/public_html/ece5527/srilm/common/Makefile.common.targets:93: 
> recipe
>
> for target '../obj/cygwin/File.o' failed
>
> make[2]: *** [../obj/cygwin/File.o] Error 1
>
> make[2]: Leaving directory 
> '/cygdrive/u/public_html/ece5527/srilm/misc/src'
>
> Makefile:105: recipe for target 'release-libraries' failed
>
> make[1]: *** [release-libraries] Error 1
>
> make[1]: Leaving directory '/cygdrive/u/public_html/ece5527/srilm'
>
> Makefile:54: recipe for target 'World' failed
>
> make: *** [World] Error 2
>
> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>
> Can someone help?
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131029/a41de75c/attachment.html>

From vkepuska at fit.edu  Mon Oct 28 13:53:47 2013
From: vkepuska at fit.edu (Veton  Kepuska)
Date: Mon, 28 Oct 2013 20:53:47 +0000
Subject: [SRILM User List] My makefile fails in cygwin?
In-Reply-To: <526EB382.90203@icsi.berkeley.edu>
References: <1D4DDF8036F9CD4ABBCF12DECB691B126E320961@EX10-BE1.fit.edu>
	<526EB382.90203@icsi.berkeley.edu>
Message-ID: <1D4DDF8036F9CD4ABBCF12DECB691B126E321B01@EX10-BE1.fit.edu>

Andreas,

Thank you very much for your information. I did that but I am getting this error message which hinders my installation even thought I did include (serveral times) the stddef package in Cygwin.
>>>>>>>>>>>>>>>>>>>>>

make[1]: Entering directory '/cygdrive/u/public_html/ece5527/srilm'
for subdir in misc dstruct lm flm lattice utils; do \
        (cd $subdir/src; make SRILM=/cygdrive/u/public_html/ece5527/srilm MACHIN
E_TYPE=cygwin OPTION= MAKE_PIC= release-libraries) || exit 1; \
done
make[2]: Entering directory '/cygdrive/u/public_html/ece5527/srilm/misc/src'
g++ -Wall -Wno-unused-variable -Wno-uninitialized -DINSTANTIATE_TEMPLATES    -I.
-I../../include   -c -g -O2 -o ../obj/cygwin/File.o File.cc
In file included from /usr/include/sys/reent.h:14:0,
                 from /usr/include/string.h:11,
                 from File.cc:12:
/usr/include/sys/_types.h:72:20: fatal error: stddef.h: No such file or director
y
#include <stddef.h>
                    ^
compilation terminated.
/cygdrive/u/public_html/ece5527/srilm/common/Makefile.common.targets:93: recipe
for target '../obj/cygwin/File.o' failed
make[2]: *** [../obj/cygwin/File.o] Error 1
make[2]: Leaving directory '/cygdrive/u/public_html/ece5527/srilm/misc/src'
Makefile:106: recipe for target 'release-libraries' failed
make[1]: *** [release-libraries] Error 1
make[1]: Leaving directory '/cygdrive/u/public_html/ece5527/srilm'
Makefile:55: recipe for target 'World' failed
make: *** [World] Error 2

<<<<<<<<<<<<<<<<<<<<

Thanks

--Veton

From: Andreas Stolcke [mailto:stolcke at icsi.berkeley.edu]
Sent: Monday, October 28, 2013 2:57 PM
To: Veton Kepuska; 'srilm-user at speech.sri.com'
Subject: Re: [SRILM User List] My makefile fails in cygwin?


Your cygwin installation is missing the iconv package, it seems.   Fire up the cygwin setup.exe , and when you get to the screen where you can modify what's to be installed, search for "iconv"  and add it.

Andreas


On 10/29/2013 12:53 AM, Veton Kepuska wrote:
Here is the error message:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
make release-libraries
make[1]: Entering directory '/cygdrive/u/public_html/ece5527/srilm'
for subdir in misc dstruct lm flm lattice utils; do \
        (cd $subdir/src; make SRILM=/cygdrive/u/public_html/ece5527/srilm MACHIN
E_TYPE=cygwin OPTION= MAKE_PIC= release-libraries) || exit 1; \
done
make[2]: Entering directory '/cygdrive/u/public_html/ece5527/srilm/misc/src'
g++ -Wall -Wno-unused-variable -Wno-uninitialized -DINSTANTIATE_TEMPLATES    -I.
-I../../include   -c -g -O2 -o ../obj/cygwin/File.o File.cc
In file included from File.cc:27:0:
srilm_iconv.h:15:25: fatal error: iconv.h: No such file or directory
# include_next <iconv.h>
                         ^
compilation terminated.
/cygdrive/u/public_html/ece5527/srilm/common/Makefile.common.targets:93: recipe
for target '../obj/cygwin/File.o' failed
make[2]: *** [../obj/cygwin/File.o] Error 1
make[2]: Leaving directory '/cygdrive/u/public_html/ece5527/srilm/misc/src'
Makefile:105: recipe for target 'release-libraries' failed
make[1]: *** [release-libraries] Error 1
make[1]: Leaving directory '/cygdrive/u/public_html/ece5527/srilm'
Makefile:54: recipe for target 'World' failed
make: *** [World] Error 2
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Can someone help?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131028/4af2cd7a/attachment.html>

From stolcke at icsi.berkeley.edu  Mon Oct 28 14:29:50 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Tue, 29 Oct 2013 05:29:50 +0800
Subject: [SRILM User List] My makefile fails in cygwin?
In-Reply-To: <1D4DDF8036F9CD4ABBCF12DECB691B126E321B01@EX10-BE1.fit.edu>
References: <1D4DDF8036F9CD4ABBCF12DECB691B126E320961@EX10-BE1.fit.edu>
	<526EB382.90203@icsi.berkeley.edu>
	<1D4DDF8036F9CD4ABBCF12DECB691B126E321B01@EX10-BE1.fit.edu>
Message-ID: <526ED74E.2000702@icsi.berkeley.edu>

On 10/29/2013 4:53 AM, Veton Kepuska wrote:
>
> Andreas,
>
> Thank you very much for your information. I did that but I am getting 
> this error message which hinders my installation even thought I did 
> include (serveral times) the stddef package in Cygwin.
>
> >>>>>>>>>>>>>>>>>>>>>
>
> make[1]: Entering directory '/cygdrive/u/public_html/ece5527/srilm'
>
> for subdir in misc dstruct lm flm lattice utils; do \
>
> (cd $subdir/src; make SRILM=/cygdrive/u/public_html/ece5527/srilm MACHIN
>
> E_TYPE=cygwin OPTION= MAKE_PIC= release-libraries) || exit 1; \
>
> done
>
> make[2]: Entering directory 
> '/cygdrive/u/public_html/ece5527/srilm/misc/src'
>
> g++ -Wall -Wno-unused-variable -Wno-uninitialized 
> -DINSTANTIATE_TEMPLATES    -I.
>
> -I../../include -c -g -O2 -o ../obj/cygwin/File.o File.cc
>
> In file included from /usr/include/sys/reent.h:14:0,
>
> from /usr/include/string.h:11,
>
> from File.cc:12:
>
> /usr/include/sys/_types.h:72:20: fatal error: stddef.h: No such file 
> or director
>
> y
>
> #include <stddef.h>
>
Very odd.  I also have the #include <stddef.h>  in sys/_types.h  but no 
such file exists on my system.
But I don't get this error, so the conditionals in this header don't 
fire on my system.  I'm using gcc 4.7.3.

I don't really understand how these system header files are supposed to 
interact.  You could try creating a dummy stddef.h in /usr/include .

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131029/3154819a/attachment.html>

From rimlaatar at yahoo.fr  Tue Oct 29 03:13:41 2013
From: rimlaatar at yahoo.fr (Laatar Rim)
Date: Tue, 29 Oct 2013 10:13:41 +0000 (GMT)
Subject: [SRILM User List] problem with  /sbin/machine-type
Message-ID: <1383041621.54905.YahooMailNeo@web133005.mail.ir2.yahoo.com>


to install SRILM I followed the? steps:
1 - downolad package
2 - srilm / common I CAHNGE: 
?# Tcl support (standard in Linux)
???? TCL_INCLUDE = -I/usr/include/tcl8.5? 

???? TCL_LIBRARY = -ltcl8.5

?# Use the GNU C compiler.
? GCC_FLAGS = -march=i686 -Wreturn-type -Wimplicit
CC = /usr/bin/gcc $(GCC_FLAGS)
CXX = /usr/bin/g++ -Wno-deprecated $(GCC_FLAGS) -DINSTANTIATE_TEMPLATES

3 - run /home/hp/SRILM/srilm/sbin/machine-type
set environment variables: 
export SRILM=/home/hp/SRILM/srilm
export PATH=$PATH:$SRILM
but I still have these errors: hp at ubuntu:~/SRILM/srilm$ make test
make: /sbin/machine-type : commande introuvable
Makefile:13: /common/Makefile.common.variables: Aucun fichier ou dossier de ce type
make: *** Pas de r?gle pour fabriquer la cible ? /common/Makefile.common.variables ?. Arr?t.

plz help me !!!


?
----
Cordialement


Rim LAATAR?
Ing?nieur? Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax(ENIS)
?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles Technologies ? laFSEGS?--Option TALN
Site web:Rim LAATAR BEN SAID
Tel: (+216) 99 64 74 98?
----
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131029/5ae87c2c/attachment.html>

From fsegs.fatmamallek at gmail.com  Tue Oct 29 12:31:05 2013
From: fsegs.fatmamallek at gmail.com (fatma mallek)
Date: Tue, 29 Oct 2013 20:31:05 +0100
Subject: [SRILM User List] problem with n-gram command
Message-ID: <CAJCQjKcJQw=gW-zrgH5ATjVQNKgG5yMxA++ybJDpTrOHe+r_kA@mail.gmail.com>

Hi ,

i'm using SRILM with Cygwin and i can't generate the n-gram count.

*$ ngram-count -text corpus1.txt -order 3 -write corpus2.txt*
* -bash: ngram-count : commande introuvable*
*
*
Can someone help me please?

Best regards,
Fatma
-- 
*
-----------------------------------------------------------------------------------------------------------------------
*
Fatma MALLEK
https://sites.google.com/site/fatmamallek87/home
* *
Email: fsegs.fatmamallek at gmail.com <mehdiboudabous at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131029/ccdf193b/attachment.html>

From okuru13 at ku.edu.tr  Tue Oct 29 12:35:22 2013
From: okuru13 at ku.edu.tr (Onur Kuru)
Date: Tue, 29 Oct 2013 21:35:22 +0200
Subject: [SRILM User List] problem with n-gram command
In-Reply-To: <CAJCQjKcJQw=gW-zrgH5ATjVQNKgG5yMxA++ybJDpTrOHe+r_kA@mail.gmail.com>
References: <CAJCQjKcJQw=gW-zrgH5ATjVQNKgG5yMxA++ybJDpTrOHe+r_kA@mail.gmail.com>
Message-ID: <090CCEC4-4213-41F8-9847-A2046D97D85C@my.ku.edu.tr>

I think it should have been:
ngram-count -text corpus1.txt -write-order 3

On Oct 29, 2013, at 9:31 PM, fatma mallek wrote:

> Hi ,
> 
> i'm using SRILM with Cygwin and i can't generate the n-gram count. 
> 
> $ ngram-count -text corpus1.txt -order 3 -write corpus2.txt
>  -bash: ngram-count : commande introuvable
> 
> Can someone help me please?
> 
> Best regards,
> Fatma
> -- 
> -----------------------------------------------------------------------------------------------------------------------
> Fatma MALLEK
> https://sites.google.com/site/fatmamallek87/home
>  
> Email: fsegs.fatmamallek at gmail.com
> 
> 
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131029/780adc86/attachment.html>

From venkataraman.anand at gmail.com  Tue Oct 29 12:55:18 2013
From: venkataraman.anand at gmail.com (Anand Venkataraman)
Date: Tue, 29 Oct 2013 12:55:18 -0700
Subject: [SRILM User List] problem with n-gram command
In-Reply-To: <CAJCQjKcJQw=gW-zrgH5ATjVQNKgG5yMxA++ybJDpTrOHe+r_kA@mail.gmail.com>
References: <CAJCQjKcJQw=gW-zrgH5ATjVQNKgG5yMxA++ybJDpTrOHe+r_kA@mail.gmail.com>
Message-ID: <CAF6FMTVO35AqzTYGgVvW4nLQOfqT69QoxMe-1qWbNT3chBqFvg@mail.gmail.com>

On Tue, Oct 29, 2013 at 12:31 PM, fatma mallek
<fsegs.fatmamallek at gmail.com>wrote:

> commande introuvable


This is a bash message that it couldn't find the ngram-count. Please check
and fix your $PATH environment variable, or invoke ngram-count with its
full path.

&
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131029/834d39ed/attachment.html>

From fsegs.fatmamallek at gmail.com  Tue Oct 29 13:13:53 2013
From: fsegs.fatmamallek at gmail.com (fatma mallek)
Date: Tue, 29 Oct 2013 21:13:53 +0100
Subject: [SRILM User List] problem with n-gram command
In-Reply-To: <CAJCQjKfNPq6Q2VK16F91d726D1WikaD3q5nMCr+KosiLM6GD5Q@mail.gmail.com>
References: <CAJCQjKcJQw=gW-zrgH5ATjVQNKgG5yMxA++ybJDpTrOHe+r_kA@mail.gmail.com>
	<CAF6FMTVO35AqzTYGgVvW4nLQOfqT69QoxMe-1qWbNT3chBqFvg@mail.gmail.com>
	<CAJCQjKfNPq6Q2VK16F91d726D1WikaD3q5nMCr+KosiLM6GD5Q@mail.gmail.com>
Message-ID: <CAJCQjKfEw_SQ1uYw0TLinzBhtekKA0mAwuFEP7GVfpM7xhmZww@mail.gmail.com>

http://www.cs.brandeis.edu/~cs114/CS114_docs/SRILM_Tutorial_20080512.pdf

this is the link of the document.


2013/10/29 fatma mallek <fsegs.fatmamallek at gmail.com>

> thanks Anand,
>
> i actually edited in this file "?c:\cygwin\home\yourname\.bashrc?
> *export SRILM=/srilm*
> *export MACHINE_TYPE=cygwin*
> *export PATH=$PATH:$pwd:$SRILM/bin/cygwin*
> *export MANPATH=$MANPATH:$SRILM/man*
>
> exactly like the doc joined explain
> I maked the same things step by step! but i don't know where is the
> problem!
>
>
>
> 2013/10/29 Anand Venkataraman <venkataraman.anand at gmail.com>
>
>>
>> On Tue, Oct 29, 2013 at 12:31 PM, fatma mallek <
>> fsegs.fatmamallek at gmail.com> wrote:
>>
>>> commande introuvable
>>
>>
>> This is a bash message that it couldn't find the ngram-count. Please
>> check and fix your $PATH environment variable, or invoke ngram-count with
>> its full path.
>>
>> &
>>
>
>
>
> --
> *
> -----------------------------------------------------------------------------------------------------------------------
> *
> Fatma MALLEK
> https://sites.google.com/site/fatmamallek87/home
> * *
> Email: fsegs.fatmamallek at gmail.com <mehdiboudabous at gmail.com>
>
>
>


-- 
*
-----------------------------------------------------------------------------------------------------------------------
*
Fatma MALLEK
https://sites.google.com/site/fatmamallek87/home
* *
Email: fsegs.fatmamallek at gmail.com <mehdiboudabous at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131029/5bf1b7b3/attachment.html>

From stolcke at icsi.berkeley.edu  Tue Oct 29 16:01:28 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Wed, 30 Oct 2013 07:01:28 +0800
Subject: [SRILM User List] problem with  /sbin/machine-type
In-Reply-To: <1383041621.54905.YahooMailNeo@web133005.mail.ir2.yahoo.com>
References: <1383041621.54905.YahooMailNeo@web133005.mail.ir2.yahoo.com>
Message-ID: <52703E48.5000803@icsi.berkeley.edu>

On 10/29/2013 6:13 PM, Laatar Rim wrote:
>
> to install SRILM I followed the  steps:
> 1 - downolad package
> 2 - srilm / common I CAHNGE:
>  # Tcl support (standard in Linux)
>      TCL_INCLUDE = -I/usr/include/tcl8.5
>
>      TCL_LIBRARY = -ltcl8.5
>
>  # Use the GNU C compiler.
>   GCC_FLAGS = -march=i686 -Wreturn-type -Wimplicit
> CC = /usr/bin/gcc $(GCC_FLAGS)
> CXX = /usr/bin/g++ -Wno-deprecated $(GCC_FLAGS) -DINSTANTIATE_TEMPLATES
>
> 3 - run /home/hp/SRILM/srilm/sbin/machine-type
> set environment variables:
> export SRILM=/home/hp/SRILM/srilm
> export PATH=$PATH:$SRILM
> but I still have these errors: hp at ubuntu:~/SRILM/srilm$ make test
> make: /sbin/machine-type : commande introuvable
> Makefile:13: /common/Makefile.common.variables: Aucun fichier ou 
> dossier de ce type
> make: *** Pas de r?gle pour fabriquer la cible ? 
> /common/Makefile.common.variables ?. Arr?t.

Try running

         make SRILM=/home/hp/SRILM/srilm  World

If that goes well (no error message)

         make SRILM=/home/hp/SRILM/srilm test

Andreas


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131030/86344e3c/attachment.html>

From rimlaatar at yahoo.fr  Wed Oct 30 06:14:59 2013
From: rimlaatar at yahoo.fr (Laatar Rim)
Date: Wed, 30 Oct 2013 13:14:59 +0000 (GMT)
Subject: [SRILM User List] problem with installig srilm
Message-ID: <1383138899.70167.YahooMailNeo@web133001.mail.ir2.yahoo.com>

hi,

i run?  make SRILM=/home/hp/SRILM/srilm? World the result is : 


cd ..; /home/hp/SRILM/srilm/sbin/make-standard-directories
make ../obj/i686/STAMP ../bin/i686/STAMP
make[3]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ?
make[3]: ? ../obj/i686/STAMP ? est ? jour.
make[3]: ? ../bin/i686/STAMP ? est ? jour.
make[3]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ?
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/utils/src ?
cd ..;
 /home/hp/SRILM/srilm/sbin/make-standard-directories
make ../obj/i686/STAMP ../bin/i686/STAMP
make[3]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/utils/src ?
make[3]: ? ../obj/i686/STAMP ? est ? jour.
make[3]: ? ../bin/i686/STAMP ? est ? jour.
make[3]: quittant le r?pertoire ? /home/hp/SRILM/srilm/utils/src ?
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/utils/src ?
make[1]: quittant le r?pertoire ? /home/hp/SRILM/srilm ?
make release-headers
make[1]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm ?
for subdir in misc dstruct lm flm lattice utils; do \
??? ??? (cd $subdir/src; make SRILM=/home/hp/SRILM/srilm MACHINE_TYPE=i686 OPTION= MAKE_PIC= release-headers) || exit 1; \
??? done
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/misc/src ?
make[2]: Rien ? faire pour ? release-headers ?.
make[2]: quittant le
 r?pertoire ? /home/hp/SRILM/srilm/misc/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/dstruct/src ?
make[2]: Rien ? faire pour ? release-headers ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/dstruct/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lm/src ?
make[2]: Rien ? faire pour ? release-headers ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lm/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/flm/src ?
make[2]: Rien ? faire pour ? release-headers ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/flm/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ?
make[2]: Rien ? faire pour ? release-headers ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/utils/src ?
make[2]: Rien ? faire
 pour ? release-headers ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/utils/src ?
make[1]: quittant le r?pertoire ? /home/hp/SRILM/srilm ?
make depend
make[1]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm ?
for subdir in misc dstruct lm flm lattice utils; do \
??? ??? (cd $subdir/src; make SRILM=/home/hp/SRILM/srilm MACHINE_TYPE=i686 OPTION= MAKE_PIC= depend) || exit 1; \
??? done
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/misc/src ?
rm -f Dependencies.i686
/usr/bin/gcc
 -march=i686 -Wreturn-type -Wimplicit -D_FILE_OFFSET_BITS=64?? 
-I/usr/include/tcl8.5?? -I. -I../../include -MM? ./option.c ./zio.c 
./fcheck.c ./fake-rand48.c ./version.c ./ztest.c | sed -e "s&^\([^ 
]\)&../obj/i686"'$(OBJ_OPTION)'"/\1&g" -e 
"s&\.o&.o&g" >> Dependencies.i686
/usr/bin/g++
 -Wno-deprecated -march=i686
 -Wreturn-type -Wimplicit -DINSTANTIATE_TEMPLATES 
-D_FILE_OFFSET_BITS=64?? -I/usr/include/tcl8.5?? -I. -I../../include 
-MM? ./Debug.cc ./File.cc ./MStringTokUtil.cc ./tls.cc ./tserror.cc 
./tclmain.cc ./testFile.cc | sed -e "s&^\([^ 
]\)&../obj/i686"'$(OBJ_OPTION)'"/\1&g" -e 
"s&\.o&.o&g" >> Dependencies.i686
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus:
 attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
/home/hp/SRILM/srilm/sbin/generate-program-dependencies
 ../bin/i686 ../obj/i686 "" ztest testFile | sed -e 
"s&\.o&.o&g" >> Dependencies.i686
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/misc/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/dstruct/src ?
rm -f Dependencies.i686
/usr/bin/gcc
 -march=i686 -Wreturn-type -Wimplicit -D_FILE_OFFSET_BITS=64?? 
-I/usr/include/tcl8.5?? -I. -I../../include -MM? ./qsort.c ./maxalloc.c |
 sed -e "s&^\([^ ]\)&../obj/i686"'$(OBJ_OPTION)'"/\1&g" -e 
"s&\.o&.o&g" >> Dependencies.i686
/usr/bin/g++
 -Wno-deprecated -march=i686 -Wreturn-type -Wimplicit 
-DINSTANTIATE_TEMPLATES
 -D_FILE_OFFSET_BITS=64?? -I/usr/include/tcl8.5?? -I. -I../../include 
-MM? ./MemStats.cc ./LHashTrie.cc ./SArrayTrie.cc ./BlockMalloc.cc 
./DStructThreads.cc ./Array.cc ./IntervalHeap.cc ./Map.cc ./SArray.cc 
./LHash.cc ./Map2.cc ./Trie.cc ./CachedMem.cc ./testArray.cc 
./testMap.cc ./benchHash.cc ./testHash.cc ./testSizes.cc 
./testCachedMem.cc ./testBlockMalloc.cc ./testMap2.cc ./testTrie.cc | 
sed -e "s&^\([^ ]\)&../obj/i686"'$(OBJ_OPTION)'"/\1&g" -e 
"s&\.o&.o&g" >> Dependencies.i686
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid
 for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for
 C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by
 default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
/home/hp/SRILM/srilm/sbin/generate-program-dependencies
 ../bin/i686 ../obj/i686 "" maxalloc testArray testMap benchHash 
testHash testSizes testCachedMem testBlockMalloc testMap2 testTrie | sed
 -e "s&\.o&.o&g" >> Dependencies.i686
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/dstruct/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lm/src ?
rm -f Dependencies.i686
/usr/bin/gcc
 -march=i686 -Wreturn-type -Wimplicit -D_FILE_OFFSET_BITS=64?? 
-I/usr/include/tcl8.5?? -I. -I../../include -MM? ./matherr.c | sed -e 
"s&^\([^ ]\)&../obj/i686"'$(OBJ_OPTION)'"/\1&g" -e 
"s&\.o&.o&g" >>
 Dependencies.i686
/usr/bin/g++ -Wno-deprecated 
-march=i686 -Wreturn-type -Wimplicit -DINSTANTIATE_TEMPLATES 
-D_FILE_OFFSET_BITS=64?? -I/usr/include/tcl8.5?? -I. -I../../include 
-MM? ./Prob.cc ./Counts.cc ./XCount.cc ./Vocab.cc ./VocabMap.cc 
./VocabMultiMap.cc ./VocabDistance.cc ./SubVocab.cc ./MultiwordVocab.cc 
./TextStats.cc ./LM.cc ./LMClient.cc ./LMStats.cc ./RefList.cc ./Bleu.cc
 ./NBest.cc ./NBestSet.cc ./NgramLM.cc ./NgramStatsInt.cc 
./NgramStatsShort.cc ./NgramStatsLong.cc ./NgramStatsLongLong.cc 
./NgramStatsFloat.cc ./NgramStatsDouble.cc ./NgramStatsXCount.cc 
./NgramCountLM.cc ./MSWebNgramLM.cc ./Discount.cc ./ClassNgram.cc 
./SimpleClassNgram.cc ./DFNgram.cc ./SkipNgram.cc ./HiddenNgram.cc 
./HiddenSNgram.cc ./VarNgram.cc ./DecipherNgram.cc ./TaggedVocab.cc 
./TaggedNgram.cc ./TaggedNgramStats.cc ./StopNgram.cc 
./StopNgramStats.cc ./MultiwordLM.cc ./NonzeroLM.cc ./BayesMix.cc 
./LoglinearMix.cc ./AdaptiveMix.cc
 ./AdaptiveMarginals.cc ./CacheLM.cc ./DynamicLM.cc ./HMMofNgrams.cc 
./WordAlign.cc ./WordLattice.cc ./WordMesh.cc ./simpleTrigram.cc 
./LMThreads.cc ./NgramStats.cc ./Trellis.cc ./testBinaryCounts.cc 
./testHash.cc ./testProb.cc ./testXCount.cc ./testParseFloat.cc 
./testVocabDistance.cc ./testNgram.cc ./testNgramAlloc.cc 
./testMultiReadLM.cc ./hoeffding.cc ./tolower.cc ./testLattice.cc 
./testError.cc ./testNBest.cc ./testMix.cc ./testTaggedVocab.cc 
./testVocab.cc ./ngram.cc ./ngram-count.cc ./ngram-merge.cc 
./ngram-class.cc ./disambig.cc ./anti-ngram.cc ./nbest-lattice.cc 
./nbest-mix.cc ./nbest-optimize.cc ./nbest-pron-score.cc ./segment.cc 
./segment-nbest.cc ./hidden-ngram.cc ./multi-ngram.cc | sed -e 
"s&^\([^ ]\)&../obj/i686"'$(OBJ_OPTION)'"/\1&g" -e 
"s&\.o&.o&g" >> Dependencies.i686
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention :
 command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option
 ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is
 valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but
 not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by
 default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus:
 attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command
 line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option
 ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is
 valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but
 not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by
 default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
/home/hp/SRILM/srilm/sbin/generate-program-dependencies
 ../bin/i686 ../obj/i686 "" testBinaryCounts testHash testProb 
testXCount testParseFloat testVocabDistance testNgram testNgramAlloc 
testMultiReadLM hoeffding tolower testLattice testError testNBest 
testMix testTaggedVocab testVocab? ngram ngram-count ngram-merge 
ngram-class disambig anti-ngram
 nbest-lattice nbest-mix nbest-optimize nbest-pron-score segment 
segment-nbest hidden-ngram multi-ngram | sed -e "s&\.o&.o&g"
 >> Dependencies.i686
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lm/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/flm/src ?
rm -f Dependencies.i686
/usr/bin/g++
 -Wno-deprecated -march=i686 -Wreturn-type -Wimplicit 
-DINSTANTIATE_TEMPLATES -D_FILE_OFFSET_BITS=64?? -I/usr/include/tcl8.5??
 -I. -I../../include -MM? ./FDiscount.cc ./FNgramStats.cc 
./FNgramStatsInt.cc ./FNgramSpecs.cc ./FNgramSpecsInt.cc 
./FactoredVocab.cc ./FNgramLM.cc ./ProductVocab.cc ./ProductNgram.cc 
./FLMThreads.cc ./strtolplusb.cc ./wmatrix.cc ./pngram.cc 
./fngram-count.cc ./fngram.cc | sed -e "s&^\([^ 
]\)&../obj/i686"'$(OBJ_OPTION)'"/\1&g" -e 
"s&\.o&.o&g" >> Dependencies.i686
cc1plus: attention : command line option ?-Wimplicit? is valid
 for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for
 C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
/home/hp/SRILM/srilm/sbin/generate-program-dependencies
 ../bin/i686 ../obj/i686 "" pngram fngram-count fngram? | sed -e 
"s&\.o&.o&g" >> Dependencies.i686
make[2]: quittant le r?pertoire ?
 /home/hp/SRILM/srilm/flm/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ?
rm -f Dependencies.i686
/usr/bin/g++
 -Wno-deprecated -march=i686 -Wreturn-type -Wimplicit 
-DINSTANTIATE_TEMPLATES -D_FILE_OFFSET_BITS=64?? -I/usr/include/tcl8.5??
 -I. -I../../include -MM? ./Lattice.cc ./LatticeAlign.cc 
./LatticeExpand.cc ./LatticeIndex.cc ./LatticeNBest.cc 
./LatticeNgrams.cc ./LatticeReduce.cc ./HTKLattice.cc ./LatticeLM.cc 
./LatticeThreads.cc ./LatticeDecode.cc ./testLattice.cc 
./lattice-tool.cc | sed -e "s&^\([^ 
]\)&../obj/i686"'$(OBJ_OPTION)'"/\1&g" -e 
"s&\.o&.o&g" >> Dependencies.i686
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option
 ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is
 valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: attention : command line option ?-Wimplicit? is valid for C/ObjC but not for C++ [enabled by default]
/home/hp/SRILM/srilm/sbin/generate-program-dependencies
 ../bin/i686 ../obj/i686 "" testLattice? lattice-tool | sed -e 
"s&\.o&.o&g" >> Dependencies.i686
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/utils/src ?
rm -f Dependencies.i686
/home/hp/SRILM/srilm/sbin/generate-program-dependencies
 ../bin/i686 ../obj/i686 ""? | sed -e "s&\.o&.o&g" >> 
Dependencies.i686
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/utils/src ?
make[1]: quittant le r?pertoire ? /home/hp/SRILM/srilm ?
make
 release-libraries
make[1]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm ?
for subdir in misc dstruct lm flm lattice utils; do \
??? ??? (cd $subdir/src; make SRILM=/home/hp/SRILM/srilm MACHINE_TYPE=i686 OPTION= MAKE_PIC= release-libraries) || exit 1; \
??? done
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/misc/src ?
make[2]: Rien ? faire pour ? release-libraries ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/misc/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/dstruct/src ?
make[2]: Rien ? faire pour ? release-libraries ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/dstruct/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lm/src ?
make[2]: Rien ? faire pour ? release-libraries ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lm/src ?
make[2]: entrant dans
 le r?pertoire ? /home/hp/SRILM/srilm/flm/src ?
make[2]: Rien ? faire pour ? release-libraries ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/flm/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ?
make[2]: Rien ? faire pour ? release-libraries ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/utils/src ?
make[2]: Rien ? faire pour ? release-libraries ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/utils/src ?
make[1]: quittant le r?pertoire ? /home/hp/SRILM/srilm ?
make release-programs
make[1]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm ?
for subdir in misc dstruct lm flm lattice utils; do \
??? ??? (cd $subdir/src; make SRILM=/home/hp/SRILM/srilm MACHINE_TYPE=i686 OPTION= MAKE_PIC= release-programs) || exit 1;
 \
??? done
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/misc/src ?
make[2]: Rien ? faire pour ? release-programs ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/misc/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/dstruct/src ?
make[2]: Rien ? faire pour ? release-programs ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/dstruct/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lm/src ?
make[2]: Rien ? faire pour ? release-programs ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lm/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/flm/src ?
make[2]: Rien ? faire pour ? release-programs ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/flm/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ?
make[2]: Rien ? faire pour ?
 release-programs ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/utils/src ?
make[2]: Rien ? faire pour ? release-programs ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/utils/src ?
make[1]: quittant le r?pertoire ? /home/hp/SRILM/srilm ?
make release-scripts
make[1]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm ?
for subdir in misc dstruct lm flm lattice utils; do \
??? ??? (cd $subdir/src; make SRILM=/home/hp/SRILM/srilm MACHINE_TYPE=i686 OPTION= MAKE_PIC= release-scripts) || exit 1; \
??? done
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/misc/src ?
make[2]: Rien ? faire pour ? release-scripts ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/misc/src ?
make[2]: entrant dans le r?pertoire ?
 /home/hp/SRILM/srilm/dstruct/src ?
make[2]: Rien ? faire pour ? release-scripts ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/dstruct/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lm/src ?
make[2]: Rien ? faire pour ? release-scripts ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lm/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/flm/src ?
make[2]: Rien ? faire pour ? release-scripts ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/flm/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ?
make[2]: Rien ? faire pour ? release-scripts ?.
make[2]: quittant le r?pertoire ? /home/hp/SRILM/srilm/lattice/src ?
make[2]: entrant dans le r?pertoire ? /home/hp/SRILM/srilm/utils/src ?
make[2]: Rien ? faire pour ? release-scripts ?.
make[2]: quittant le r?pertoire ?
 /home/hp/SRILM/srilm/utils/src ?
make[1]: quittant le r?pertoire ? /home/hp/SRILM/srilm ?


can some one tell me what's the problem !!!!!

?
----
Cordialement


Rim LAATAR?
Ing?nieur? Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax(ENIS)
?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles Technologies ? laFSEGS?--Option TALN
Site web:Rim LAATAR BEN SAID
Tel: (+216) 99 64 74 98?
----
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131030/89206c15/attachment.html>

From sergey.zablotskiy at uni-ulm.de  Thu Oct 31 03:21:36 2013
From: sergey.zablotskiy at uni-ulm.de (Sergey Zablotskiy)
Date: Thu, 31 Oct 2013 11:21:36 +0100
Subject: [SRILM User List] make-big-lm with kn-/wb-discount
Message-ID: <52722F30.7050205@uni-ulm.de>

Hi Everybody,

is there any workaround to combine modified Kneser-Ney smoothing for 
lower-order n-grams along with Witten-Bell smooting for higher-order 
n-grams using the MAKE-BIG-LM training script?

I am getting the following error/message:
make-big-lm: must use one of GT, KN, or WB discounting for all orders

while executing:
 >> make-big-lm -read ${count_file} -vocab ${vocab} -unk -order 4 \
         -kndiscount1 -kndiscount2 -kndiscount3 -wbdiscount4 \
         -interpolate -lm name.lm

I can not use the kndiscount for 4-Gram because some counts of counts 
are zero in my case.

Thank you very much in advance,
Regards
Sergey.

-- 
M.Sc. Sergey Zablotskiy
Institute of Communications Engineering
University of Ulm
Albert-Einstein-Allee 43, Room 43.1.225
89081 Ulm, Germany

Phone: +49 731 50-26275
Fax:   +49 731 50-26259
http://www.uni-ulm.de/in/nt/staff/research-assistants-external/zablotskiy.html


From stolcke at icsi.berkeley.edu  Thu Oct 31 16:02:03 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Fri, 01 Nov 2013 07:02:03 +0800
Subject: [SRILM User List] make-big-lm with kn-/wb-discount
In-Reply-To: <52722F30.7050205@uni-ulm.de>
References: <52722F30.7050205@uni-ulm.de>
Message-ID: <5272E16B.9040405@icsi.berkeley.edu>

On 10/31/2013 6:21 PM, Sergey Zablotskiy wrote:
> Hi Everybody,
>
> is there any workaround to combine modified Kneser-Ney smoothing for 
> lower-order n-grams along with Witten-Bell smooting for higher-order 
> n-grams using the MAKE-BIG-LM training script?
>
> I am getting the following error/message:
> make-big-lm: must use one of GT, KN, or WB discounting for all orders
>
> while executing:
> >> make-big-lm -read ${count_file} -vocab ${vocab} -unk -order 4 \
>         -kndiscount1 -kndiscount2 -kndiscount3 -wbdiscount4 \
>         -interpolate -lm name.lm
>
> I can not use the kndiscount for 4-Gram because some counts of counts 
> are zero in my case.
>

1) It does not make sense to combine KN discounting for lower-order 
ngrams with some other method since the KN method of discounting the 
lower-order ngram is designed precisely to complement the discounting 
for the highest-order ngrams.

2) make-big-lm invokes a helper script called make-kn-discounts to 
compute the discounting factors based on the counts-of-counts.  It tries 
to fill in for missing (zero) counts-of-counts based on an empirical 
regularity in the counts-of-counts (the details are in Section 4 of this 
paper 
<http://www.speech.sri.com/cgi-bin/run-distill?papers/asru2007-mt-lm.ps.gz>).
If that mechanism doesn't work for some reason we should try to fix it.

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131101/7b34a072/attachment.html>

From Joris.Pelemans at esat.kuleuven.be  Fri Nov  1 17:00:26 2013
From: Joris.Pelemans at esat.kuleuven.be (Joris Pelemans)
Date: Sat, 02 Nov 2013 01:00:26 +0100
Subject: [SRILM User List] Adding n-grams to an existing LM
Message-ID: <5274409A.7020003@esat.kuleuven.be>

Hello,

I have an existing 5-gram LM with KN discounting and I would like to add 
new words to it. To estimate reasonable n-gram probabilities for a new 
word, I am now using (a fraction of) the probabilities of a synonym of 
the word. I am simply replacing every occurrence of the synonym with the 
new word, copying the logprob (or slightly altering it in case of a 
fraction) and alpha and adding the new line to the LM. Obviously the 
resulting n-gram is no longer normalized. I thought I would be able to 
fix this relatively easily with:

ngram -lm src.arpa -order 5 -renorm -write-lm dest.arpa

but I get a lot of errors of the type "BOW numerator for context is ... 
< 0" and "BOW denominator for context is ... <= 0.

What do these errors mean, can I ignore them or is there a better way to 
renormalize my new LMs?

Thanks in advance,

Joris

From stolcke at icsi.berkeley.edu  Fri Nov  1 18:07:00 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Sat, 02 Nov 2013 09:07:00 +0800
Subject: [SRILM User List] Adding n-grams to an existing LM
In-Reply-To: <5274409A.7020003@esat.kuleuven.be>
References: <5274409A.7020003@esat.kuleuven.be>
Message-ID: <52745034.1050402@icsi.berkeley.edu>

On 11/2/2013 8:00 AM, Joris Pelemans wrote:
> Hello,
>
> I have an existing 5-gram LM with KN discounting and I would like to 
> add new words to it. To estimate reasonable n-gram probabilities for a 
> new word, I am now using (a fraction of) the probabilities of a 
> synonym of the word. I am simply replacing every occurrence of the 
> synonym with the new word, copying the logprob (or slightly altering 
> it in case of a fraction) and alpha and adding the new line to the LM. 
> Obviously the resulting n-gram is no longer normalized. I thought I 
> would be able to fix this relatively easily with:
>
> ngram -lm src.arpa -order 5 -renorm -write-lm dest.arpa
>
> but I get a lot of errors of the type "BOW numerator for context is 
> ... < 0" and "BOW denominator for context is ... <= 0.

The BOW for a given context is is computed as 1 - sum of all 
higher-order probabilities (in a given context), divided by 1 - sum of 
all backoff probabilities for those same ngrams.  So, if you're adding 
ngrams to a context, those sums can exceed 1, and you end up with 
negative numerators and/or denominators.

The ngram -renorm option only recomputes the backoff weights to achieve 
normalization, it does not modified the explicitly given ngram 
probabilities.

>
> What do these errors mean, can I ignore them or is there a better way 
> to renormalize my new LMs?

I think you should split the existing ngram probabilities among all the 
synonyms, when the synonym occurs in the final position of the ngram.  
That would not add anything to the sums of probabilities involved in the 
BOW computation.

For example, if have p(c | a b) = x  and d and c synonyms, you set

p(c | a b ) = x/2
p(d | a b) = x/2

If, however, the synonyms occur in the context portion of the ngram, you 
can just copy the parameter (as you have been doing).

p( e | a c) = p(e | a d)

Then, use -renorm to recompute the BOWs.

Andreas


From Joris.Pelemans at esat.kuleuven.be  Sat Nov  2 06:16:16 2013
From: Joris.Pelemans at esat.kuleuven.be (Joris Pelemans)
Date: Sat, 02 Nov 2013 14:16:16 +0100
Subject: [SRILM User List] Adding n-grams to an existing LM
In-Reply-To: <52745034.1050402@icsi.berkeley.edu>
References: <5274409A.7020003@esat.kuleuven.be>
	<52745034.1050402@icsi.berkeley.edu>
Message-ID: <5274FB20.9020908@esat.kuleuven.be>

On 11/02/13 02:07, Andreas Stolcke wrote:
> On 11/2/2013 8:00 AM, Joris Pelemans wrote:
>> but I get a lot of errors of the type "BOW numerator for context is 
>> ... < 0" and "BOW denominator for context is ... <= 0.
>
> The BOW for a given context is is computed as 1 - sum of all 
> higher-order probabilities (in a given context), divided by 1 - sum of 
> all backoff probabilities for those same ngrams.  So, if you're adding 
> ngrams to a context, those sums can exceed 1, and you end up with 
> negative numerators and/or denominators.
I can see how that happens for the numerators, but aren't the backoff 
weights recomputed and thus this not prevent the denominators from 
ending up negative? What if I remove all the backoff weights and then 
renormalize? I'm just asking out of interest, I got rid of all the 
denominator complaints (see below).
>> What do these errors mean, can I ignore them or is there a better way 
>> to renormalize my new LMs?
>
> I think you should split the existing ngram probabilities among all 
> the synonyms, when the synonym occurs in the final position of the 
> ngram.  That would not add anything to the sums of probabilities 
> involved in the BOW computation.
That did take care of most of the errors. Only a handful of numerator 
complaints left, but I guess that might be due to bad scripting on my 
behalf. I find it strange though that the complaints I get, concern 
n-grams that aren't in the LM at all. The following is the first 
complaint that I get:

BOW numerator for context "negentig Hills" is -0.0120325 < 0

But if I grep the LM (before and after renormalization) for "negentig 
Hills" it gives me nothing? If there are no 3-grams with this context, 
how can 1 - (sum of all higher-order probabilities with this context) be 
negative?

> For example, if have p(c | a b) = x  and d and c synonyms, you set
>
> p(c | a b ) = x/2
> p(d | a b) = x/2
OK, that makes sense. And just to be complete (in case others might want 
to know), if I want to map d onto c with a certainty of say 0.1, then I 
just do:

p(c | a b ) = 0.9*x
p(d | a b) = 0.1*x

> If, however, the synonyms occur in the context portion of the ngram, 
> you can just copy the parameter (as you have been doing).
>
> p( e | a c) = p(e | a d)

And this stays the same for the 0.1 example/

Thanks already!

Joris

From Joris.Pelemans at esat.kuleuven.be  Sat Nov  2 07:46:31 2013
From: Joris.Pelemans at esat.kuleuven.be (Joris Pelemans)
Date: Sat, 02 Nov 2013 15:46:31 +0100
Subject: [SRILM User List] Adding n-grams to an existing LM
In-Reply-To: <52745034.1050402@icsi.berkeley.edu>
References: <5274409A.7020003@esat.kuleuven.be>
	<52745034.1050402@icsi.berkeley.edu>
Message-ID: <52751047.1040901@esat.kuleuven.be>

On 11/02/13 02:07, Andreas Stolcke wrote:
> On 11/2/2013 8:00 AM, Joris Pelemans wrote:
>>
>> What do these errors mean, can I ignore them or is there a better way 
>> to renormalize my new LMs?
>
> I think you should split the existing ngram probabilities among all 
> the synonyms, when the synonym occurs in the final position of the 
> ngram.  That would not add anything to the sums of probabilities 
> involved in the BOW computation.
>
> For example, if have p(c | a b) = x  and d and c synonyms, you set
>
> p(c | a b ) = x/2
> p(d | a b) = x/2

Another question with regards to this problem. Say, I don't know a good 
synonym for d, but I still want to include it by mapping it onto <unk> 
(what else, right?), obviously by a very small fraction of the <unk> 
probability, since it's a class. The above technique would lead to 
gigantic LMs, since <unk> is all over the place. Is there a smart way in 
the SRILM toolkit that lets you specify that some words should be 
modeled as <unk>?

Regards,

Joris

From stolcke at icsi.berkeley.edu  Sat Nov  2 18:32:10 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Sat, 02 Nov 2013 18:32:10 -0700
Subject: [SRILM User List] Adding n-grams to an existing LM
In-Reply-To: <5274FB20.9020908@esat.kuleuven.be>
References: <5274409A.7020003@esat.kuleuven.be>
	<52745034.1050402@icsi.berkeley.edu>
	<5274FB20.9020908@esat.kuleuven.be>
Message-ID: <5275A79A.3070309@icsi.berkeley.edu>

On 11/2/2013 6:16 AM, Joris Pelemans wrote:
> On 11/02/13 02:07, Andreas Stolcke wrote:
>> On 11/2/2013 8:00 AM, Joris Pelemans wrote:
>>> but I get a lot of errors of the type "BOW numerator for context is 
>>> ... < 0" and "BOW denominator for context is ... <= 0.
>>
>> The BOW for a given context is is computed as 1 - sum of all 
>> higher-order probabilities (in a given context), divided by 1 - sum 
>> of all backoff probabilities for those same ngrams.  So, if you're 
>> adding ngrams to a context, those sums can exceed 1, and you end up 
>> with negative numerators and/or denominators.
> I can see how that happens for the numerators, but aren't the backoff 
> weights recomputed and thus this not prevent the denominators from 
> ending up negative? What if I remove all the backoff weights and then 
> renormalize? I'm just asking out of interest, I got rid of all the 
> denominator complaints (see below).

The same reasoning applies to the denominator, since it obtained by 
summing over ngram one order less.  If you're adding trigrams and 
bigrams, say, then the denominator for bigram BOWs will be affected by 
the added bigrams.

>>> What do these errors mean, can I ignore them or is there a better 
>>> way to renormalize my new LMs?
>>
>> I think you should split the existing ngram probabilities among all 
>> the synonyms, when the synonym occurs in the final position of the 
>> ngram.  That would not add anything to the sums of probabilities 
>> involved in the BOW computation.
> That did take care of most of the errors. Only a handful of numerator 
> complaints left, but I guess that might be due to bad scripting on my 
> behalf. I find it strange though that the complaints I get, concern 
> n-grams that aren't in the LM at all. The following is the first 
> complaint that I get:
>
> BOW numerator for context "negentig Hills" is -0.0120325 < 0
>
> But if I grep the LM (before and after renormalization) for "negentig 
> Hills" it gives me nothing? If there are no 3-grams with this context, 
> how can 1 - (sum of all higher-order probabilities with this context) 
> be negative?

The ngrams in these messages are printed in reverse order.  That's 
because the contexts are stored in a trie that's indexed 
most-recent-word-first.

Andreas

>
>> For example, if have p(c | a b) = x  and d and c synonyms, you set
>>
>> p(c | a b ) = x/2
>> p(d | a b) = x/2
> OK, that makes sense. And just to be complete (in case others might 
> want to know), if I want to map d onto c with a certainty of say 0.1, 
> then I just do:
>
> p(c | a b ) = 0.9*x
> p(d | a b) = 0.1*x
>
>> If, however, the synonyms occur in the context portion of the ngram, 
>> you can just copy the parameter (as you have been doing).
>>
>> p( e | a c) = p(e | a d)
>
> And this stays the same for the 0.1 example/
>
> Thanks already!
>
> Joris


From stolcke at icsi.berkeley.edu  Sat Nov  2 18:35:07 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Sat, 02 Nov 2013 18:35:07 -0700
Subject: [SRILM User List] Adding n-grams to an existing LM
In-Reply-To: <52751047.1040901@esat.kuleuven.be>
References: <5274409A.7020003@esat.kuleuven.be>
	<52745034.1050402@icsi.berkeley.edu>
	<52751047.1040901@esat.kuleuven.be>
Message-ID: <5275A84B.8060401@icsi.berkeley.edu>

On 11/2/2013 7:46 AM, Joris Pelemans wrote:
> On 11/02/13 02:07, Andreas Stolcke wrote:
>>
>> For example, if have p(c | a b) = x  and d and c synonyms, you set
>>
>> p(c | a b ) = x/2
>> p(d | a b) = x/2
>
> Another question with regards to this problem. Say, I don't know a 
> good synonym for d, but I still want to include it by mapping it onto 
> <unk> (what else, right?), obviously by a very small fraction of the 
> <unk> probability, since it's a class. The above technique would lead 
> to gigantic LMs, since <unk> is all over the place. Is there a smart 
> way in the SRILM toolkit that lets you specify that some words should 
> be modeled as <unk>?

I'm not sure I understand what you mean.  <unk>  is a special word that 
all words not in the vocabulary are mapped to at test time.  So the way 
you 'model'  a word by <unk> is to not include it in the vocabulary of 
your LM.

Andreas


From Joris.Pelemans at esat.kuleuven.be  Sun Nov  3 01:43:55 2013
From: Joris.Pelemans at esat.kuleuven.be (Joris Pelemans)
Date: Sun, 03 Nov 2013 10:43:55 +0100
Subject: [SRILM User List] Adding n-grams to an existing LM
In-Reply-To: <5275A84B.8060401@icsi.berkeley.edu>
References: <5274409A.7020003@esat.kuleuven.be>
	<52745034.1050402@icsi.berkeley.edu>
	<52751047.1040901@esat.kuleuven.be>
	<5275A84B.8060401@icsi.berkeley.edu>
Message-ID: <52761ADB.50906@esat.kuleuven.be>

On 11/03/13 02:35, Andreas Stolcke wrote:
> On 11/2/2013 7:46 AM, Joris Pelemans wrote:
>> On 11/02/13 02:07, Andreas Stolcke wrote:
>>>
>>> For example, if have p(c | a b) = x  and d and c synonyms, you set
>>>
>>> p(c | a b ) = x/2
>>> p(d | a b) = x/2
>>
>> Another question with regards to this problem. Say, I don't know a 
>> good synonym for d, but I still want to include it by mapping it onto 
>> <unk> (what else, right?), obviously by a very small fraction of the 
>> <unk> probability, since it's a class. The above technique would lead 
>> to gigantic LMs, since <unk> is all over the place. Is there a smart 
>> way in the SRILM toolkit that lets you specify that some words should 
>> be modeled as <unk>?
>
> I'm not sure I understand what you mean.  <unk>  is a special word 
> that all words not in the vocabulary are mapped to at test time.  So 
> the way you 'model'  a word by <unk> is to not include it in the 
> vocabulary of your LM.
I am investigating different techniques to introduce new words to the 
vocabulary. Say I have a vocabulary of 100,000 words and I want to 
introduce 1 new word X (for the sake of simplicity). I could do one of 3 
options:

 1. use the contexts in which X appears in some training data (but
    sometimes X may not appear (enough))
 2. estimate the probability of X by taking a fraction of the prob mass
    of a synonym of X (which I described earlier)
 3. estimate the probability of X by taking a fraction of the prob mass
    of the <unk> class (if e.g. no good synonym is at hand)

I could then compare the perplexities of these 3 LMs with a vocabulary 
of size 100,001 words to see which technique is best for a given 
word/situation.

Joris

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131103/f18ab666/attachment.html>

From stolcke at icsi.berkeley.edu  Sun Nov  3 16:01:40 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Sun, 03 Nov 2013 16:01:40 -0800
Subject: [SRILM User List] Adding n-grams to an existing LM
In-Reply-To: <52761ADB.50906@esat.kuleuven.be>
References: <5274409A.7020003@esat.kuleuven.be>
	<52745034.1050402@icsi.berkeley.edu>
	<52751047.1040901@esat.kuleuven.be>
	<5275A84B.8060401@icsi.berkeley.edu>
	<52761ADB.50906@esat.kuleuven.be>
Message-ID: <5276E3E4.7010801@icsi.berkeley.edu>

On 11/3/2013 1:43 AM, Joris Pelemans wrote:
> On 11/03/13 02:35, Andreas Stolcke wrote:
>> On 11/2/2013 7:46 AM, Joris Pelemans wrote:
>>> On 11/02/13 02:07, Andreas Stolcke wrote:
>>>>
>>>> For example, if have p(c | a b) = x  and d and c synonyms, you set
>>>>
>>>> p(c | a b ) = x/2
>>>> p(d | a b) = x/2
>>>
>>> Another question with regards to this problem. Say, I don't know a 
>>> good synonym for d, but I still want to include it by mapping it 
>>> onto <unk> (what else, right?), obviously by a very small fraction 
>>> of the <unk> probability, since it's a class. The above technique 
>>> would lead to gigantic LMs, since <unk> is all over the place. Is 
>>> there a smart way in the SRILM toolkit that lets you specify that 
>>> some words should be modeled as <unk>?
>>
>> I'm not sure I understand what you mean.  <unk>  is a special word 
>> that all words not in the vocabulary are mapped to at test time.  So 
>> the way you 'model'  a word by <unk> is to not include it in the 
>> vocabulary of your LM.
> I am investigating different techniques to introduce new words to the 
> vocabulary. Say I have a vocabulary of 100,000 words and I want to 
> introduce 1 new word X (for the sake of simplicity). I could do one of 
> 3 options:
>
>  1. use the contexts in which X appears in some training data (but
>     sometimes X may not appear (enough))
>  2. estimate the probability of X by taking a fraction of the prob
>     mass of a synonym of X (which I described earlier)
>  3. estimate the probability of X by taking a fraction of the prob
>     mass of the <unk> class (if e.g. no good synonym is at hand)
>
> I could then compare the perplexities of these 3 LMs with a vocabulary 
> of size 100,001 words to see which technique is best for a given 
> word/situation.
>
And option 3 is effectively already implemented by the way unseen words 
are mapped to <unk>.  If you want to compute perplexity in a fair way 
you would take the LM containing <unk> and for every occurrence of X you 
add log p(X | <unk>)  (the share of unk-probability mass you want to 
give to X).  That way you don't need to add any ngrams to the LM.  What 
this effectively does is simulate a class-based Ngram model where <unk> 
is a class and X one of its members.

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131103/7d3a56d0/attachment.html>

From Joris.Pelemans at esat.kuleuven.be  Mon Nov  4 01:01:26 2013
From: Joris.Pelemans at esat.kuleuven.be (Joris Pelemans)
Date: Mon, 04 Nov 2013 10:01:26 +0100
Subject: [SRILM User List] Adding n-grams to an existing LM
In-Reply-To: <5276E3E4.7010801@icsi.berkeley.edu>
References: <5274409A.7020003@esat.kuleuven.be>
	<52745034.1050402@icsi.berkeley.edu>
	<52751047.1040901@esat.kuleuven.be>
	<5275A84B.8060401@icsi.berkeley.edu>
	<52761ADB.50906@esat.kuleuven.be>
	<5276E3E4.7010801@icsi.berkeley.edu>
Message-ID: <52776266.3020409@esat.kuleuven.be>

On 11/04/13 01:01, Andreas Stolcke wrote:
> On 11/3/2013 1:43 AM, Joris Pelemans wrote:
>> I am investigating different techniques to introduce new words to the 
>> vocabulary. Say I have a vocabulary of 100,000 words and I want to 
>> introduce 1 new word X (for the sake of simplicity). I could do one 
>> of 3 options:
>>
>>  1. use the contexts in which X appears in some training data (but
>>     sometimes X may not appear (enough))
>>  2. estimate the probability of X by taking a fraction of the prob
>>     mass of a synonym of X (which I described earlier)
>>  3. estimate the probability of X by taking a fraction of the prob
>>     mass of the <unk> class (if e.g. no good synonym is at hand)
>>
>> I could then compare the perplexities of these 3 LMs with a 
>> vocabulary of size 100,001 words to see which technique is best for a 
>> given word/situation.
>>
> And option 3 is effectively already implemented by the way unseen 
> words are mapped to <unk>.  If you want to compute perplexity in a 
> fair way you would take the LM containing <unk> and for every 
> occurrence of X you add log p(X | <unk>)  (the share of 
> unk-probability mass you want to give to X).  That way you don't need 
> to add any ngrams to the LM.  What this effectively does is simulate a 
> class-based Ngram model where <unk> is a class and X one of its members.
Yes, this is exactly what I meant when I asked for a "smart way in the 
SRILM toolkit", so I assume this is included. I looked up how to use 
class-based models and I think I found what I need to do. Is the 
following the correct way to calculate perplexity for these models?

ngram -lm class_lm.arpa -ppl test.txt -order n -classes expansions.class

where expansions.class contains lines like this:

<unk> p(X | <unk>) X
<unk> p(Y | <unk>) Y
<unk> 1-p(X | <unk>)-p(Y | <unk>) not_mapped

I assume the last line is necessary since the man page for 
"classes-format" says "All expansion probabilities for a given class 
should sum to one, although this is not necessarily enforced by the 
software and would lead to improper models."

Joris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131104/a8ab5f4c/attachment.html>

From stolcke at icsi.berkeley.edu  Mon Nov  4 09:16:25 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Mon, 04 Nov 2013 09:16:25 -0800
Subject: [SRILM User List] Adding n-grams to an existing LM
In-Reply-To: <52776266.3020409@esat.kuleuven.be>
References: <5274409A.7020003@esat.kuleuven.be>
	<52745034.1050402@icsi.berkeley.edu>
	<52751047.1040901@esat.kuleuven.be>
	<5275A84B.8060401@icsi.berkeley.edu>
	<52761ADB.50906@esat.kuleuven.be>
	<5276E3E4.7010801@icsi.berkeley.edu>
	<52776266.3020409@esat.kuleuven.be>
Message-ID: <5277D669.1070500@icsi.berkeley.edu>

On 11/4/2013 1:01 AM, Joris Pelemans wrote:
> On 11/04/13 01:01, Andreas Stolcke wrote:
>> On 11/3/2013 1:43 AM, Joris Pelemans wrote:
>>> I am investigating different techniques to introduce new words to 
>>> the vocabulary. Say I have a vocabulary of 100,000 words and I want 
>>> to introduce 1 new word X (for the sake of simplicity). I could do 
>>> one of 3 options:
>>>
>>>  1. use the contexts in which X appears in some training data (but
>>>     sometimes X may not appear (enough))
>>>  2. estimate the probability of X by taking a fraction of the prob
>>>     mass of a synonym of X (which I described earlier)
>>>  3. estimate the probability of X by taking a fraction of the prob
>>>     mass of the <unk> class (if e.g. no good synonym is at hand)
>>>
>>> I could then compare the perplexities of these 3 LMs with a 
>>> vocabulary of size 100,001 words to see which technique is best for 
>>> a given word/situation.
>>>
>> And option 3 is effectively already implemented by the way unseen 
>> words are mapped to <unk>.  If you want to compute perplexity in a 
>> fair way you would take the LM containing <unk> and for every 
>> occurrence of X you add log p(X | <unk>)  (the share of 
>> unk-probability mass you want to give to X).  That way you don't need 
>> to add any ngrams to the LM.  What this effectively does is simulate 
>> a class-based Ngram model where <unk> is a class and X one of its 
>> members.
> Yes, this is exactly what I meant when I asked for a "smart way in the 
> SRILM toolkit", so I assume this is included. I looked up how to use 
> class-based models and I think I found what I need to do. Is the 
> following the correct way to calculate perplexity for these models?
>
> ngram -lm class_lm.arpa -ppl test.txt -order n -classes expansions.class
>
> where expansions.class contains lines like this:
>
> <unk> p(X | <unk>) X
> <unk> p(Y | <unk>) Y
> <unk> 1-p(X | <unk>)-p(Y | <unk>) not_mapped
Yes, except you have to use a new class symbol, like UNKWORD, and 
replace the "not_mapped"  with the standard <unk>.

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131104/c1d65522/attachment.html>

From rimlaatar at yahoo.fr  Tue Nov 26 02:28:31 2013
From: rimlaatar at yahoo.fr (Laatar Rim)
Date: Tue, 26 Nov 2013 10:28:31 +0000 (GMT)
Subject: [SRILM User List] commands used to build a ML type N-Class
Message-ID: <1385461711.76816.YahooMailNeo@web173202.mail.ir2.yahoo.com>

Hello,

what are the commands used to build a ML type N-Class ?

Thanks a lot
?
----
Cordialement


Rim LAATAR?
Ing?nieur? Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax(ENIS)
?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles Technologies ? laFSEGS?--Option TALN
Site web:Rim LAATAR BEN SAID
Tel: (+216) 99 64 74 98?
----
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131126/d919f64e/attachment.html>

From stolcke at icsi.berkeley.edu  Wed Nov 27 00:37:11 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Wed, 27 Nov 2013 00:37:11 -0800
Subject: [SRILM User List] commands used to build a ML type N-Class
In-Reply-To: <1385539922.30780.YahooMailNeo@web173203.mail.ir2.yahoo.com>
References: <1385461711.76816.YahooMailNeo@web173202.mail.ir2.yahoo.com>
	<5294DE68.108@icsi.berkeley.edu>
	<1385536581.45235.YahooMailNeo@web173205.mail.ir2.yahoo.com>
	<1385539922.30780.YahooMailNeo@web173203.mail.ir2.yahoo.com>
Message-ID: <5295AF37.3080807@icsi.berkeley.edu>

On 11/27/2013 12:12 AM, Laatar Rim wrote:
> H i,
> /I am trying to train a class-based LM.  I was hoping there is an//  step-by-step guide for doing this !!!/

See the thread 
at//http://www.speech.sri.com/pipermail/srilm-user/2011q3/001078.html on 
this mailing list (and the link to tutorial page that is given there).

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131127/8818afec/attachment.html>

From stolcke at icsi.berkeley.edu  Wed Nov 27 10:04:02 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Wed, 27 Nov 2013 10:04:02 -0800
Subject: [SRILM User List] commands used to build a ML type N-Class
In-Reply-To: <1385541844.41439.YahooMailNeo@web173202.mail.ir2.yahoo.com>
References: <1385461711.76816.YahooMailNeo@web173202.mail.ir2.yahoo.com>
	<5294DE68.108@icsi.berkeley.edu>
	<1385536581.45235.YahooMailNeo@web173205.mail.ir2.yahoo.com>
	<1385539922.30780.YahooMailNeo@web173203.mail.ir2.yahoo.com>
	<5295AF37.3080807@icsi.berkeley.edu>
	<1385541844.41439.YahooMailNeo@web173202.mail.ir2.yahoo.com>
Message-ID: <52963412.1050401@icsi.berkeley.edu>

On 11/27/2013 12:44 AM, Laatar Rim wrote:
> J'ai d?ja vu le lien aussi j'ai test? ces commandes :
> ngram-class -vocab vocab_file \
>              -text input_file \
>              -numclasses num \
>              -class-counts output.class-counts \
>              -classes output.classes
> ma question est : pour construire un mod?le de langage de type n classe , on a besoin seulement denumclasses (nombre de classes) et ? quoi sertreplace-word-with-classes ??
> Merci d'avance pour vos ?claircissement ..

replace-words-with-classes  replaces word labels with class labels, so that you can train a class-level ngram model.
It is described in thetraining-scripts(1)  <http://www.speech.sri.com/projects/srilm/manpages/training-scripts.1.html>  man page.

Andreas


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131127/38d6fd5e/attachment.html>

From rimlaatar at yahoo.fr  Mon Dec  2 23:41:37 2013
From: rimlaatar at yahoo.fr (Laatar Rim)
Date: Tue, 3 Dec 2013 07:41:37 +0000 (GMT)
Subject: [SRILM User List] class based language model s
Message-ID: <1386056497.13953.YahooMailNeo@web173204.mail.ir2.yahoo.com>

Hello,
To build a class based language?model??with srilm , I use the same commands ?specific to ?LM type n-gram and just replace the corpus of words with a corpus of classes ??
?
----
Cordialement


Rim LAATAR?
Ing?nieur? Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax(ENIS)
?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles Technologies ? laFSEGS?--Option TALN
Site web:Rim LAATAR BEN SAID
Tel: (+216) 99 64 74 98?
----
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131203/4f9440dd/attachment.html>

From stolcke at icsi.berkeley.edu  Tue Dec  3 00:12:29 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Tue, 03 Dec 2013 00:12:29 -0800
Subject: [SRILM User List] class based language model s
In-Reply-To: <1386056497.13953.YahooMailNeo@web173204.mail.ir2.yahoo.com>
References: <1386056497.13953.YahooMailNeo@web173204.mail.ir2.yahoo.com>
Message-ID: <529D926D.20502@icsi.berkeley.edu>

On 12/2/2013 11:41 PM, Laatar Rim wrote:
> Hello,
> To build a class based language model with srilm , I use the same 
> commands  specific to  LM type n-gram and just replace the corpus of 
> words with a corpus of classes ??

Yes.   replace-words-with-classes automates the replacement of word 
string by class labels.

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131203/a2b9d651/attachment.html>

From stolcke at icsi.berkeley.edu  Tue Dec  3 14:48:57 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Tue, 03 Dec 2013 14:48:57 -0800
Subject: [SRILM User List] class based language model s
In-Reply-To: <1386103837.59641.YahooMailNeo@web173205.mail.ir2.yahoo.com>
References: <1386056497.13953.YahooMailNeo@web173204.mail.ir2.yahoo.com>
	<529D926D.20502@icsi.berkeley.edu>
	<1386063351.55568.YahooMailNeo@web173203.mail.ir2.yahoo.com>
	<529E205C.40302@icsi.berkeley.edu>
	<1386103837.59641.YahooMailNeo@web173205.mail.ir2.yahoo.com>
Message-ID: <529E5FD9.9010705@icsi.berkeley.edu>

On 12/3/2013 12:50 PM, Laatar Rim wrote:
> in the class format:
> /class/  [/p/]/word1/  /word2/  ...
> how can i calculate p ?

Use replace-words-with-classes with the outfile=  option.   This is 
explained in a previous post 
<http://www.speech.sri.com/pipermail/srilm-user/2007q2/000445.html>.

Andreas

>
> Le Mardi 3 d?cembre 2013 18h18, Andreas Stolcke 
> <stolcke at icsi.berkeley.edu> a ?crit :
> On 12/3/2013 1:35 AM, Laatar Rim wrote:
>> hello,
>>
>> on the internet I found this:
>>  to build and use a simple class language model:
>> Induce classes:
>> ngram-class -vocab vocab_file \
>>              -text input_file \
>>              -numclasses num \
>>              -class-counts output.class-counts \
>>              -classes output.classes
>> in this exemple we need only number of class, how can i use corpus of class ???
> The steps for building a class-based LM are:
>
> 1. prepare class definition file in the format described in the
>    classes-format(5) manual page.  this can be done by hand or from other
>    knowledge sources, or automatically using word clustering algorithms
>    (see ngram-class(1)).
>
> 2. condition the training data or counts to replace words with class 
> labels,
>    using the "replace-words-with-classes" filter (see training-scripts(1)
>    man page).
>
> 3. run ngram-count on the result of step 2.
>
>
> Andreas
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131203/bff1bf2b/attachment.html>

From rimlaatar at yahoo.fr  Tue Dec 10 22:20:26 2013
From: rimlaatar at yahoo.fr (Laatar Rim)
Date: Wed, 11 Dec 2013 06:20:26 +0000 (GMT)
Subject: [SRILM User List] class-based model
Message-ID: <1386742826.82557.YahooMailNeo@web173205.mail.ir2.yahoo.com>

Dear Andreas,
First i'm so sorry for disturbing you,?

from many day?I want to train a class-based model ,So I use (1), (2) and (3) to create the class model.?

(1)? ngram-class -vocab '/home/hp/Documents/SRILM/tata.txt' -text '/home/hp/Documents/SRILM/trainingData.txt' -numclasses 37 -class-counts output.class-counts -classes '/home/hp/Documents/SRILM/Replace_word_with_class_SRILM'?


(2)?? replace-words-with-classes? classes='/home/hp/Documents/SRILM/Replace_word_with_class_SRILM' '/home/hp/Documents/SRILM/trainingData.txt' > output_text_with_classes_2


(3)?ngram-count -tolower -text '/home/hp/Documents/SRILM/trainingData.txt' -lm class_based_model_2

	* tata.txt : a list of all the words in the vocabulary?

	* trainingData.txt : my training data

	* Replace_word_with_class: my file class definition ( format like this :name class p word1 word2 ...)

????????????????????????????[Quantite?0.21 ????????? ??????? ?????????? ???????? ?????? ?????? ????????? ?????? ???????? ??????? ???????? ???????? ????? ?????? ?????? ?????????
????????????????????????? ??[Promotion?0.245 ???????????? ??????????? ????????????? ?????????? ???????????? ???????? ????????? ????????????????? ??????????????

The result:  class_based_model_2 is like this : 

-2.44486	?	-0.1822249
-4.447026	??????????	-0.0797594
-4.447026	???????????	-0.282028
-3.075958	?????	-0.3957056
-4.748056	????	-0.1852052
-4.748056	???	-0.07981876
-4.748056	?????	-0.1853914
-4.447026	??????	-0.1845916

i want to know if this commands and the result are true , and why in myclass_based_model_2 it found only the world and not a class ??

Please help me !!

Thanks A Lot ..
?
----
Cordialement


Rim LAATAR?
Ing?nieur? Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax(ENIS)
?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles Technologies ? laFSEGS?--Option TALN
Site web:Rim LAATAR BEN SAID
Tel: (+216) 99 64 74 98?
----
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131211/b4199a34/attachment.html>

From stolcke at icsi.berkeley.edu  Wed Dec 11 00:53:19 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Wed, 11 Dec 2013 00:53:19 -0800
Subject: [SRILM User List] class-based model
In-Reply-To: <1386742826.82557.YahooMailNeo@web173205.mail.ir2.yahoo.com>
References: <1386742826.82557.YahooMailNeo@web173205.mail.ir2.yahoo.com>
Message-ID: <52A827FF.1090004@icsi.berkeley.edu>

On 12/10/2013 10:20 PM, Laatar Rim wrote:
> Dear Andreas,
> First i'm so sorry for disturbing you,
>
> from many day I want to train a class-based model ,So I use (1), (2) 
> and (3) to create the class model.
>
> (1)  ngram-class -vocab '/home/hp/Documents/SRILM/tata.txt' -text 
> '/home/hp/Documents/SRILM/trainingData.txt' -numclasses 37 
> -class-counts output.class-counts -classes 
> '/home/hp/Documents/SRILM/Replace_word_with_class_SRILM'
>
>
> (2) replace-words-with-classes 
> classes='/home/hp/Documents/SRILM/Replace_word_with_class_SRILM' 
> '/home/hp/Documents/SRILM/trainingData.txt' > output_text_with_classes_2
>
> (3) ngram-count -tolower -text 
> '/home/hp/Documents/SRILM/trainingData.txt' -lm class_based_model_2

For step 3 you need to use

     ngram-count -text output_text_with_classes_2 -lm class_based_model_2

To evaluate the LM you would then use

     ngram -lm class_based_model_2 -classes 
'/home/hp/Documents/SRILM/Replace_word_with_class_SRILM' -ppl ... (or 
other options that use the LM)


Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131211/366fc5f5/attachment.html>

From stolcke at icsi.berkeley.edu  Wed Dec 11 08:51:24 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Wed, 11 Dec 2013 08:51:24 -0800
Subject: [SRILM User List] class-based model
In-Reply-To: <1386769259.94283.YahooMailNeo@web173202.mail.ir2.yahoo.com>
References: <1386742826.82557.YahooMailNeo@web173205.mail.ir2.yahoo.com>
	<52A827FF.1090004@icsi.berkeley.edu>
	<1386769259.94283.YahooMailNeo@web173202.mail.ir2.yahoo.com>
Message-ID: <52A8980C.1020009@icsi.berkeley.edu>

On 12/11/2013 5:40 AM, Laatar Rim wrote:
> Hello,
> Tahnk you so much , another question how can i interpret this result ?
The same way you interpret a standard LM.   The class-based LM just uses 
a different way to compute the word probabilities.

Check the tutorials that are linked to at 
http://www.speech.sri.com/projects/srilm/manpages/, for example, the 
lecture by Jurafsky 
<http://www.stanford.edu/class/cs224s/lec/224s.09.lec11.pdf>.
The interpretation of perplexity (= average branching factor) is the 
same no matter what type of LM you are using.

Andreas


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131211/91bd7aae/attachment.html>

From stolcke at icsi.berkeley.edu  Wed Dec 11 09:48:03 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Wed, 11 Dec 2013 09:48:03 -0800
Subject: [SRILM User List] class-based model
In-Reply-To: <1386783558.12847.YahooMailNeo@web173204.mail.ir2.yahoo.com>
References: <1386742826.82557.YahooMailNeo@web173205.mail.ir2.yahoo.com>
	<52A827FF.1090004@icsi.berkeley.edu>
	<1386769259.94283.YahooMailNeo@web173202.mail.ir2.yahoo.com>
	<52A8980C.1020009@icsi.berkeley.edu>
	<1386783558.12847.YahooMailNeo@web173204.mail.ir2.yahoo.com>
Message-ID: <52A8A553.5060206@icsi.berkeley.edu>

On 12/11/2013 9:39 AM, Laatar Rim wrote:
> for example how can i interpret this line :
>
> \3-grams:
> -0.4148394    CLASS-00001 CLASS-00001 CLASS-00009

log_10 P(CLASS-00009 | CLASS-00001 CLASS-00001)  = -0.4148394

So a word from CLASS-00009 following two words from class CLASS-00001 
has probability 10^ -0.4148394, times the probability of the word in 
class CLASS-00009 (which you can get from the class membership file.

Andreas


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131211/33b4b220/attachment.html>

From stolcke at icsi.berkeley.edu  Thu Dec 12 14:07:08 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Thu, 12 Dec 2013 14:07:08 -0800
Subject: [SRILM User List] Google 1B Word Language Modeling Benchmark
Message-ID: <52AA338C.1040908@icsi.berkeley.edu>


Ciprian Chelba asked me to forward the following information about a 
recently launched initiative in large-scale LM benchmarking.  More 
information at 
https://code.google.com/p/1-billion-word-language-modeling-benchmark/ .

Andreas

_________________________________________________________________________________________________________
Here is a brief description of the project.

"The purpose of the project is to make available a standard training and 
test setup for language modeling experiments.

The training/held-out data was produced from a download at statmt.org 
<http://statmt.org/> using a combination of Bash shell and Perl scripts 
distributed here.

This also means that your results on this data set are reproducible by 
the research community at large.

Besides the scripts needed to rebuild the training/held-out data, it 
also makes available log-probability values for each word in each of ten 
held-out data sets, for each of the following baseline models:

  * unpruned Katz (1.1B n-grams),
  * pruned Katz (~15M n-grams),
  * unpruned Interpolated Kneser-Ney (1.1B n-grams),
  * pruned Interpolated Kneser-Ney (~15M n-grams)

ArXiv paper: http://arxiv.org/abs/1312.3005

Happy benchmarking!"

-- 
-Ciprian


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131212/09ad5492/attachment.html>

From stolcke at icsi.berkeley.edu  Thu Dec 12 14:21:17 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Thu, 12 Dec 2013 14:21:17 -0800
Subject: [SRILM User List] class-based model
In-Reply-To: <1386852561.98570.YahooMailNeo@web173205.mail.ir2.yahoo.com>
References: <1386742826.82557.YahooMailNeo@web173205.mail.ir2.yahoo.com>
	<52A827FF.1090004@icsi.berkeley.edu>
	<1386769259.94283.YahooMailNeo@web173202.mail.ir2.yahoo.com>
	<52A8980C.1020009@icsi.berkeley.edu>
	<1386783558.12847.YahooMailNeo@web173204.mail.ir2.yahoo.com>
	<52A8A553.5060206@icsi.berkeley.edu>
	<1386784962.96731.YahooMailNeo@web173204.mail.ir2.yahoo.com>
	<1386852561.98570.YahooMailNeo@web173205.mail.ir2.yahoo.com>
Message-ID: <52AA36DD.4060409@icsi.berkeley.edu>

On 12/12/2013 4:49 AM, Laatar Rim wrote:
> Hello ,
> this line : -0.7027302    ?????????? CLASS-00021 means  : probability that 
> word ??????????in class CLASS-00021  is 10 ^ -0.7027302
>
> PLZ tell me if i'm wrong

I assume this is a line from the LM.
It is NOT a class-membership probability.   The N-gram model can have a 
mix of word and class labels.   A word label simply represents a class 
consisting only of the word itself.  Therefore, the above line means 
that class CLASS-00021 has probability 10 ^ -0.7027302 when the previous 
word is ?????????? .

The class membership probabilities are stored in the file that you 
specify with ngram -classes .

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131212/a4c6596a/attachment.html>

From rimlaatar at yahoo.fr  Tue Dec 17 02:58:09 2013
From: rimlaatar at yahoo.fr (Laatar Rim)
Date: Tue, 17 Dec 2013 10:58:09 +0000 (GMT)
Subject: [SRILM User List] class based model
Message-ID: <1387277889.14584.YahooMailNeo@web173202.mail.ir2.yahoo.com>

Dear Andreas ,?

First , sorry to disturb you by my stupid questions , but i? still have an ambiguous about class based model and i will be very grateful if you can help me.

There are my questions: 


1- The file : class format ( class p word1 word2 ...) , it supports only a simple words or it can support word such as? : ? 
good-morning , thank-you ...

2-Yhe class model can have a mixte of word and
 class definition ?

3- You say that A word label simply represents a class consisting only of the word itself , but i don't 
have class that contains one word , and is that means my model is wrong ?

4-? To execute this command :replace-words-with-classes? classes='/home/hp/Documents/SRILM/Replace_word_with_class_SRILM' '/home/hp/Documents/SRILM/trainingData.txt' > output_text_with_classes_2 :?

trainingData.txt? must continue punctuation marks or only phrases.

Thank you..


?
----
Cordialement


Rim LAATAR?
Ing?nieur? Informatique, de l'?cole Nationale d?Ing?nieurs de Sfax(ENIS)
?tudiante en mast?re de recherche, Syst?me d'Information & Nouvelles Technologies ? laFSEGS?--Option TALN
Site web:Rim LAATAR BEN SAID
Tel: (+216) 99 64 74 98?
----
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131217/0d2be3f8/attachment.html>

From stolcke at icsi.berkeley.edu  Tue Dec 17 17:22:45 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Tue, 17 Dec 2013 17:22:45 -0800
Subject: [SRILM User List] class based model
In-Reply-To: <1387277889.14584.YahooMailNeo@web173202.mail.ir2.yahoo.com>
References: <1387277889.14584.YahooMailNeo@web173202.mail.ir2.yahoo.com>
Message-ID: <52B0F8E5.9060301@icsi.berkeley.edu>

On 12/17/2013 2:58 AM, Laatar Rim wrote:
> Dear Andreas ,
>
> First , sorry to disturb you by my stupid questions , but i  still 
> have an ambiguous about class based model and i will be very grateful 
> if you can help me.
>
> There are my questions:
>
> 1- The file : class format ( class p word1 word2 ...) , it supports 
> only a simple words or it can support word such as  :
> good-morning , thank-you ...

The expansion of a class can be one or more words, e.g.,

CITY    0.123        New York

>
> 2-Yhe class model can have a mixte of word and class definition ?

Yes.   The LM could have an ngram    "the CITY" (see above).

>
> 3- You say that A word label simply represents a class consisting only 
> of the word itself , but i don't have class that contains one word , 
> and is that means my model is wrong ?

what is meant is that a class ngrams with a mix of words and class 
labels is equivalent to class ngram model that has only class-based 
ngrams, where the word labels are replaced by classes that have only 
that one word as their membership.

>
> 4-  To execute this command :replace-words-with-classes 
> classes='/home/hp/Documents/SRILM/Replace_word_with_class_SRILM' 
> '/home/hp/Documents/SRILM/trainingData.txt' > 
> output_text_with_classes_2 :
>
> trainingData.txt must continue punctuation marks or only phrases.
It depends on whether your ngram model is supposed to include 
punctuation or not.  The software doesn't care whether you have 
punctuation, it treats period, comma, etc. as word strings just like any 
other.   It depends on your application (the program that uses the LM) 
whether punctuation is appropriate or not.

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131217/f0dfa28a/attachment.html>

From stolcke at icsi.berkeley.edu  Wed Dec 18 08:12:18 2013
From: stolcke at icsi.berkeley.edu (Andreas Stolcke)
Date: Wed, 18 Dec 2013 08:12:18 -0800
Subject: [SRILM User List] class based model
In-Reply-To: <1387353916.63557.YahooMailNeo@web173202.mail.ir2.yahoo.com>
References: <1387277889.14584.YahooMailNeo@web173202.mail.ir2.yahoo.com>
	<52B0F8E5.9060301@icsi.berkeley.edu>
	<1387353916.63557.YahooMailNeo@web173202.mail.ir2.yahoo.com>
Message-ID: <52B1C962.3020004@icsi.berkeley.edu>

On 12/18/2013 12:05 AM, Laatar Rim wrote:
> Hi,
>
> Thanks a lot,
>
> The expansion of a class can be one or more words, e.g.,
>
> CITY    0.123        New York
>
> and what about words of each class ? it must be simple ( new , york, 
> good, morning ..)  or i can use words like ( New-York , good-morning 
> ...)  ???
>
> [the class file defintion can support word like New-York , 
> good-morning  or words must be simple .. ??]
Yes, you can have hyphens in word strings.    Everything that is not 
whitespace (space, tab, newline) can be part of words.

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131218/4ef82147/attachment.html>