From stolcke at speech.sri.com  Tue Apr  1 11:17:30 2003
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Tue, 01 Apr 2003 11:17:30 PST
Subject: Question about Lattice Tool 
In-Reply-To: Your message of Mon, 31 Mar 2003 18:50:31 +0200.
             <3E8871D7.89295212@itc.it> 
Message-ID: <200304011917.LAA13991@huge>


In message <3E8871D7.89295212 at itc.it>you wrote:
> Dear Dr. Andreas Stolcke and users of SIRLM
> 
> I have implemented our Lattice Tool based mainly on SIRLM tool kit (Lattice
> Tool) (our lattice definition has time information).
> My problem now is how to measure word error rate from the generated lattice
> file with the corresponding real utterance.  That means I need to compute the
> lower word error rate.
> Could you please tell some works about this problem before ?
> 
> In SIRLM Lattice Tool: there is a function namely:
> unsigned latticeWER(const VocabIndex *words,
>    unsigned &sub, unsigned &ins, unsigned &del)
>         { SubVocab ignoreWords(vocab);
>    return latticeWER(words, sub, ins, del, ignoreWords);
>  };
> seems to solve my problem. Unfortunately, I could not understand how this
> function work.
> I am looking forward to hearing from you.
> Thank you in advanced.
> 
> Vu Hai Quan.

Vu,

The lattice word error rate is computed by dynamic programming, very similar
to the standard word error computation on strings.  You keep a table 
that has the minimum error from the beginning of the string and lattice to
each pair of string position and lattice node.  For lack of time I cannot
describe the algorithm in more detail, but maybe someone on the list can
help with that.  It's really quite straightforward if you are familiar
with string alignment.  If you're not, then do a google search on
"string alignment algorithm" and you will find a dozen or so references
that should help you get going.

Incidentally, even if your lattices have time information it should be
irrelevant to the lattice error computation.  so you could convert
them to PSFG format and still use the SRILM tool.

--Andreas 

PS.  Please only send email related to the mailing list itself to
majordomo at speech.sri.com.  Once you've signed up, send all email to
srilm-user at speech.sri.com.


From stolcke at speech.sri.com  Sun Apr  6 14:25:19 2003
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Sun, 06 Apr 2003 14:25:19 PDT
Subject: can SRILM be used for machine translation?
Message-ID: <200304062125.OAA21692@huge>


Dear Chintan,

I have no experience with MT applications using SRILM or any other LM
toolkit.  I am forwarding your email to the srilm-user list in hopes that
someone there can help.  However, I wouldn't expect anyone to have 
the time to give you detailed guidance -- maybe some useful pointers.

By the way, you cannot send email to srilm-user at speech.sri.com without 
subscribing to the list first. Send a message to majordomo at speech.sri.com
with "help" in the message body for more information.

--Andreas 

------- Forwarded Message

From: "chintan shah" <shah_chintu at hotmail.com>
To: srilm-user at speech.sri.com
Subject: can SRILM be used for machine translation?
Date: Sun, 06 Apr 2003 08:45:15 +0000
Mime-Version: 1.0
Content-Type: text/plain; format=flowed
Message-ID: <F124CcUB5AyjgLjjq5000004abe at hotmail.com>
X-OriginalArrivalTime: 06 Apr 2003 08:45:15.0445 (UTC) FILETIME=[D814C650:01C2FC18]
X-Spam-Status: No, score=0.8 threshold=8.0
X-Spam-Level: x

Respected Sir,
Myself is Chintan and I am a final year undergraduate student and want to 
know about SRI language modeling toolkit whether it can be used for machine 
translation and how to manage corpus for that. We are not having any budget, 
so if you can resource us then it would be great pleasure. The textbook of 
Daniel Martin and Jurafsky , i do have. So sir, if you could guide us about 
which chapters to take immediately, it shall be great favour.

Yours Thankfully,
    Chintan.

------- End of Forwarded Message


From melis at cs.utwente.nl  Thu Apr 10 01:56:31 2003
From: melis at cs.utwente.nl (Paul Melis)
Date: Thu, 10 Apr 2003 10:56:31 +0200
Subject: Vocabularies when interpolation
Message-ID: <20030410105631.B11073@luistervink.cs.utwente.nl>

Hello Andreas,

When performing interpolation with 

ngram -lm .. -mix-lm .. -lambda ...

the vocabularies of the LM's being mixed get merged if I understand it correctly (from doing some test runs). Is there a way to force the resulting output LM to have a predefined vocabulary (e.g. the vocab of one of the LM's being mixed)? 

Regards,
Paul


-- 
melis at cs.utwente.nl


From stolcke at speech.sri.com  Thu Apr 10 09:15:07 2003
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Thu, 10 Apr 2003 09:15:07 PDT
Subject: Vocabularies when interpolation 
In-Reply-To: Your message of Thu, 10 Apr 2003 10:56:31 +0200.
             <20030410105631.B11073@luistervink.cs.utwente.nl> 
Message-ID: <200304101615.JAA27187@huge>


In message <20030410105631.B11073 at luistervink.cs.utwente.nl>you wrote:
> Hello Andreas,
> 
> When performing interpolation with 
> 
> ngram -lm .. -mix-lm .. -lambda ...
> 
> the vocabularies of the LM's being mixed get merged if I understand it correc
> tly (from doing some test runs). Is there a way to force the resulting output
>  LM to have a predefined vocabulary (e.g. the vocab of one of the LM's being 
> mixed)? 

No, but you can limit the LM vocabulary either before or after the 
merging.  The proper way to do this is to specify the same vocabulary
when building the various LM components. If that is not possible
(e.g., you got the LMs from someone else) you can modify the 
LM vocabulary post-training using the "change-lm-vocab" script.
Check the "lm-scripts" man page.

--Andreas 


From ejoy at peoplemail.com.cn  Thu Apr 17 06:23:32 2003
From: ejoy at peoplemail.com.cn (Zhang Le)
Date: Thu, 17 Apr 2003 21:23:32 +0800
Subject: srilm works on FreeBSD
Message-ID: <20030417132332.GA406@>

Hi all,
     I just managed to get srilm 1.3.3 work on an FreeBSD.
     I change the following lines in bin/machine-type to detect FreeBSD.

 set MACHINE_TYPE = cygwin
 else if (`uname -s` =~ FreeBSD*) then
+set MACHINE_TYPE = freebsd
 else if (`uname -s` == Darwin) then
 set MACHINE_TYPE = macosx
  
  And add a common/Makefile.machine.freebsd modified from cygwin
  configure file(see attachment).

  "gmake World" now works fine under FreeBSD.

     here is uname -a:
     FreeBSD  4.8-RELEASE FreeBSD 4.8-RELEASE #0: Sat Apr 12 22:18:07
     CST 2003     zl@:/usr/src/sys/compile/MYKERNEL  i386

     I also test it on an FreeBSD 5.0-RELEASE.
--
                                     Sincerely yours,
                                            Zhang Le
-------------- next part --------------
#
#    File:   Makefile.i686
#    Author: The SRI DECIPHER (TM) System
#    Date:   Fri Feb 19 22:45:31 PST 1999
#
#    Description:
#	Machine dependent compilation options and variable definitions
#	for CYGWIN/i686 platform
#
#    Copyright (c) 1999-2002 SRI International.  All Rights Reserved.
#
#    $Header: /home/srilm/devel/common/RCS/Makefile.machine.cygwin,v 1.4 2003/02/27 18:25:11 stolcke Exp $
#

   # Use the GNU C compiler.
   GCC_FLAGS = -Wreturn-type -Wimplicit
   CC = gcc $(GCC_FLAGS)
   CXX = g++ -Wno-deprecated $(GCC_FLAGS) -DINSTANTIATE_TEMPLATES

   # Optional compilation flags.
   OPTIMIZE_FLAGS = -g -O2
   DEBUG_FLAGS = -g -DDEBUG
   PROFILE_FLAGS = -g -pg -O2

   # Optional linking flags.
   EXPORT_LDFLAGS = -s

   # Shared compilation flags.
   CFLAGS = $(ADDITIONAL_CFLAGS) $(INCLUDES)
   CXXFLAGS = $(ADDITIONAL_CXXFLAGS) $(INCLUDES)

   # Shared linking flags.
   LDFLAGS = $(ADDITIONAL_LDFLAGS) -L$(SRILM_LIBDIR)

   # Other useful compilation flags.
   ADDITIONAL_CFLAGS =
   ADDITIONAL_CXXFLAGS =

   # Other useful include directories.
   ADDITIONAL_INCLUDES = 

   # Other useful linking flags.
   ADDITIONAL_LDFLAGS = 

   # Other useful libraries.
   ADDITIONAL_LIBRARIES = -lm 

   # run-time linker path flag
   RLD_FLAG = -R

   # Tcl support (part of cygwin)
   TCL_INCLUDE = -I/usr/local/include/tcl8.3
   TCL_LIBRARY = -L/usr/local/lib -ltcl83

   # No ranlib
   RANLIB = :

   # Generate dependencies from source files.
   GEN_DEP = $(CC) $(CFLAGS) -MM

   GEN_DEP.cc = $(CXX) $(CXXFLAGS) -MM

   # Run lint.
   LINT = lint
   LINT_FLAGS = -DDEBUG $(CFLAGS)

   # Location of gawk binary
   GAWK = /usr/bin/gawk


From julyjune03 at yahoo.com  Fri Apr 18 21:54:23 2003
From: julyjune03 at yahoo.com (June July)
Date: Fri, 18 Apr 2003 21:54:23 -0700 (PDT)
Subject: help with ngram-count
Message-ID: <20030419045423.33794.qmail@web41604.mail.yahoo.com>

I encountered the following problem reported from ngram-count: BOW denominator for context "D SMALL" is 0 <= 0,numerator is 0.0909091 The switches I invoked is: zcat EN.count.1.gz EN.count.2.gz EN.count.3.gz | perl -pe 's/<UNK>/<unk>/g' | ./bin/ngram-count -memuse -read - -vocab ML.vocab -order 3 -cdiscount3 0  -cdiscount2 0 -cdiscount1 0  -unk -lm -  | ./bin/add-dummy-bows - | perl -pe 's/<unk>/<UNK>/g' | gzip >! EN.arpabo.3.gz Could someone help me to get rid of that warning msg?   Thanks, June   


---------------------------------
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20030418/ea0138d8/attachment.html>

From stolcke at speech.sri.com  Sat Apr 19 10:13:13 2003
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Sat, 19 Apr 2003 10:13:13 PDT
Subject: help with ngram-count 
In-Reply-To: Your message of Fri, 18 Apr 2003 21:54:23 -0700.
             <20030419045423.33794.qmail@web41604.mail.yahoo.com> 
Message-ID: <200304191713.KAA24463@huge>


For ngram backup you distribute the probabilty mass left over by 
ngrams of order k in proportion to probabilities given by ngrams of order k-1.

What the error message is saying is that the k-1-grams don't assign any
probability to the words that don't already have k-grams.  This can happen
especially when you disable smoothing as you did.

The problem should go away if you include all trigrams from your training 
data.  the default minimum count for trigrams 2, so you need to use
-gt3min 1 in addition to the options you have.

--Andreas

In message <20030419045423.33794.qmail at web41604.mail.yahoo.com>you wrote:
> --0-1120635126-1050728063=:32317
> Content-Type: text/plain; charset=us-ascii
> 
> I encountered the following problem reported from ngram-count: BOW denominato
> r for context "D SMALL" is 0 <= 0,numerator is 0.0909091 The switches I invok
> ed is: zcat EN.count.1.gz EN.count.2.gz EN.count.3.gz | perl -pe 's/<UNK>/<un
> k>/g' | ./bin/ngram-count -memuse -read - -vocab ML.vocab -order 3 -cdiscount
> 3 0  -cdiscount2 0 -cdiscount1 0  -unk -lm -  | ./bin/add-dummy-bows - | perl
>  -pe 's/<unk>/<UNK>/g' | gzip >! EN.arpabo.3.gz Could someone help me to get 
> rid of that warning msg?   Thanks, June   
> 
> 


From dpico at dsic.upv.es  Mon Jun 16 09:14:42 2003
From: dpico at dsic.upv.es (=?ISO-8859-1?Q?David_Pic=F3_Vila?=)
Date: Mon, 16 Jun 2003 18:14:42 +0200
Subject: Problems compiling srilm
Message-ID: <3EEDECF2.7060906@dsic.upv.es>

Hello!

I hope this is the right forum to ask this and I am not disturbing 
anyone. I am trying to install SRILM in a SuSe 8.2 Linux platform and I 
cannot get ngram and ngram-count compiled! Apparently, all the versions 
of compilers, etc. are correct, and also system variables, etc., but gcc 
seems to complain about some uncompatible compiler options.

Has anyone in the list already had this problem and know how to solve it?
Thank you very much in advance for your help!

David

--
David Pic? Vila
Departament de Sistemes Inform?tics i Computaci?
Universitat Polit?cnica de Val?ncia
Val?ncia, Spain
Email: dpico at dsic.upv.es
Tel: +34 963877007 ext. 73528


From julyjune03 at yahoo.com  Mon Jun 23 10:45:44 2003
From: julyjune03 at yahoo.com (June July)
Date: Mon, 23 Jun 2003 10:45:44 -0700 (PDT)
Subject: class based SRI LM 
Message-ID: <20030623174544.79857.qmail@web41601.mail.yahoo.com>

Hi,
 
   I tried to build class based LMs in the following way:
 
   step-1:  ngram-class -text test.in -numclasses 100 -class-counts text.cnt -classes text.cls  -save 100
 
   step-2:  ngram-count -read  text.cnt -memuse -kndiscount -kndiscount1 -kndiscount2 -lm text.srilm.gz
 
    I found that the class count output "text.cnt" from step-1 is only bigram-counts.  Thus the final class-LM text.srilm.gz is also a bigram one. 
 
   Could anyone tell me if I am using the toolkit correctly?  How to build a trigram class-based LM?  Also are there any published paper/document that I can look up for detail information? 
 
   Many thanks,
 
-June


---------------------------------
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20030623/a92bc8f7/attachment.html>

From yangl at ecn.purdue.edu  Mon Jun 23 13:59:05 2003
From: yangl at ecn.purdue.edu (Yang Liu)
Date: Mon, 23 Jun 2003 15:59:05 -0500 (EST)
Subject: class based SRI LM 
Message-ID: <200306232059.h5NKx5u2018424@sohmm.ecn.purdue.edu>

Hi June,
After you get the automatically induced classes (the class definition in file 
text.cls), you can map all the words in your training set to classes using: 
replace-words-with-classes classes=text.cls training_set > training_set_classes
Then you can any order class-based LM from that.

Hope this helps.
-- Yang


>Hi,
> 
>   I tried to build class based LMs in the following way:
> 
>   step-1:  ngram-class -text test.in -numclasses 100 -class-counts text.cnt 
-classes text.cls  -save 100
> 
>   step-2:  ngram-count -read  text.cnt -memuse -kndiscount -kndiscount1 
-kndiscount2 -lm text.srilm.gz
> 
>    I found that the class count output "text.cnt" from step-1 is only 
bigram-counts.  Thus the final class-LM text.srilm.gz is also a bigram one. 
> 
>   Could anyone tell me if I am using the toolkit correctly?  How to build a 
trigram class-based LM?  Also are there any published paper/document that I can 
look up for detail information? 
> 
>   Many thanks,
> 
>-June
>
>
>---------------------------------
>Do you Yahoo!?
>SBC Yahoo! DSL - Now only $29.95 per month!


From julyjune03 at yahoo.com  Mon Jun 23 14:02:22 2003
From: julyjune03 at yahoo.com (June July)
Date: Mon, 23 Jun 2003 14:02:22 -0700 (PDT)
Subject: class based SRI LM 
In-Reply-To: <200306232059.h5NKx5u2018424@sohmm.ecn.purdue.edu>
Message-ID: <20030623210222.31446.qmail@web41609.mail.yahoo.com>

Thanks alot! 


Yang Liu <yangl at ecn.purdue.edu> wrote:
Hi June,
After you get the automatically induced classes (the class definition in file 
text.cls), you can map all the words in your training set to classes using: 
replace-words-with-classes classes=text.cls training_set > training_set_classes
Then you can any order class-based LM from that.

Hope this helps.
-- Yang


>Hi,
> 
> I tried to build class based LMs in the following way:
> 
> step-1: ngram-class -text test.in -numclasses 100 -class-counts text.cnt 
-classes text.cls -save 100
> 
> step-2: ngram-count -read text.cnt -memuse -kndiscount -kndiscount1 
-kndiscount2 -lm text.srilm.gz
> 
> I found that the class count output "text.cnt" from step-1 is only 
bigram-counts. Thus the final class-LM text.srilm.gz is also a bigram one. 
> 
> Could anyone tell me if I am using the toolkit correctly? How to build a 
trigram class-based LM? Also are there any published paper/document that I can 
look up for detail information? 
> 
> Many thanks,
> 
>-June
>
>
>---------------------------------
>Do you Yahoo!?
>SBC Yahoo! DSL - Now only $29.95 per month!


---------------------------------
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20030623/85a5c57e/attachment.html>