From marco.turchi at gmail.com  Sat Jan  6 12:24:15 2007
From: marco.turchi at gmail.com (marco turchi)
Date: Sat, 6 Jan 2007 20:24:15 +0000
Subject: compilation problems
Message-ID: <79a042480701061224k1309d5c0ie43a49e376ece31f@mail.gmail.com>

Dear all,
I'm a new user, and I'm trying to compile and install it, but I have some
problems.
I set the srlim home variable inside the Makefile, and I run make World, but
I obtain this set of errors:

/enm/local/bin/gcc -mtune=i686 -Wreturn-type -Wimplicit
-D_FILE_OFFSET_BITS=64   -I. -I/usr/local/Moses/srilm//include   -c -g -O3
-o ../obj/i686/option.o option.c
cc1: invalid option `tune=i686'
make[2]: *** [../obj/i686/option.o] Error 1
I have the same error for other files: qsort.c matherr.c FDiscount.cc
Lattice.cc ngram.cc fngram-count.cc lattice-tool.cc

I try to change the mtune variable without good result, so I remove this
flag. Using this brute solution, I'm able to compile quite all the files,
but I have other errors


/enm/local/bin/g++ -Wreturn-type -Wimplicit -DINSTANTIATE_TEMPLATES
-D_FILE_OFFSET_BITS=64    -I. -I/usr/local/Moses/srilm//include   -c -g -O3
-o ../obj/i686/DFNgram.o DFNgram.cc
Trellis.h:203: sorry, not implemented: use of `enumeral_type' in template
type
   unification

make[2]: *** [../obj/i686/DFNgram.o] Error 1
make[2]: Leaving directory `/usr/local/Moses/srilm/lm/src'
make[2]: Entering directory `/usr/local/Moses/srilm/flm/src'

make[2]: *** No rule to make target
`/usr/local/Moses/srilm//lib/i686/liboolm.a', needed by
`../bin/i686/lattice-tool'.  Stop.

can you help me??
Please, can you tell me where I can find all the other messages of the
mailing list?
Thanks a lot
Marco Turchi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070106/e4d56696/attachment.html>

From sanyaade at hotmail.com  Sat Jan  6 19:55:04 2007
From: sanyaade at hotmail.com (sanyaade)
Date: Sun, 7 Jan 2007 03:55:04 -0000
Subject: compilation problems
References: <79a042480701061224k1309d5c0ie43a49e376ece31f@mail.gmail.com>
Message-ID: <BAY124-DAV26750712DE4126D34EDF1CBBD0@phx.gbl>

What platform are you on  -> Windows, Linux, Unix, etc..

If you are on windows then its got to be in your root directory -> c:\srilm (Cygwin installed)

On linux put it in your home /home/srilm or on root -> /srilm
then do: set 1.) srlim home variable inside the Makefile and 2.) make world

Hope this help!

God blesses!!!

Best regards,
Sanyaade


  ----- Original Message ----- 
  From: marco turchi 
  To: srilm-user at speech.sri.com 
  Sent: Saturday, January 06, 2007 8:24 PM
  Subject: compilation problems


  Dear all, 
  I'm a new user, and I'm trying to compile and install it, but I have some problems. 
  I set the srlim home variable inside the Makefile, and I run make World, but I obtain this set of errors:

  /enm/local/bin/gcc -mtune=i686 -Wreturn-type -Wimplicit -D_FILE_OFFSET_BITS=64    -I. -I/usr/local/Moses/srilm//include   -c -g -O3 -o ../obj/i686/option.o option.c
  cc1: invalid option `tune=i686'
  make[2]: *** [../obj/i686/option.o] Error 1
  I have the same error for other files: qsort.c matherr.c FDiscount.cc Lattice.cc ngram.cc fngram-count.cc lattice-tool.cc

  I try to change the mtune variable without good result, so I remove this flag. Using this brute solution, I'm able to compile quite all the files, but I have other errors 


  /enm/local/bin/g++ -Wreturn-type -Wimplicit -DINSTANTIATE_TEMPLATES -D_FILE_OFFSET_BITS=64    -I. -I/usr/local/Moses/srilm//include   -c -g -O3 -o ../obj/i686/DFNgram.o DFNgram.cc
  Trellis.h:203: sorry, not implemented: use of `enumeral_type' in template type
     unification

  make[2]: *** [../obj/i686/DFNgram.o] Error 1 
  make[2]: Leaving directory `/usr/local/Moses/srilm/lm/src'
  make[2]: Entering directory `/usr/local/Moses/srilm/flm/src'

  make[2]: *** No rule to make target `/usr/local/Moses/srilm//lib/i686/liboolm.a', needed by `../bin/i686/lattice-tool'.   Stop.

  can you help me??
  Please, can you tell me where I can find all the other messages of the mailing list?
  Thanks a lot
  Marco Turchi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070107/b04509b9/attachment.html>

From marco.turchi at gmail.com  Sun Jan  7 06:06:22 2007
From: marco.turchi at gmail.com (marco turchi)
Date: Sun, 7 Jan 2007 14:06:22 +0000
Subject: compilation problems
In-Reply-To: <BAY124-DAV26750712DE4126D34EDF1CBBD0@phx.gbl>
References: <79a042480701061224k1309d5c0ie43a49e376ece31f@mail.gmail.com>
	 <BAY124-DAV26750712DE4126D34EDF1CBBD0@phx.gbl>
Message-ID: <79a042480701070606p6f65f00ar4de95b163b3d985d@mail.gmail.com>

Dear Sanyaade,
I'm working under Linux.
I move srilm in my home directory, but I obtain the same errors. :-(

Thanks
Marco


On 1/7/07, sanyaade <sanyaade at hotmail.com> wrote:
>
>  What platform are you on  -> Windows, Linux, Unix, etc..
>
> If you are on windows then its got to be in your root directory ->
> c:\srilm (Cygwin installed)
>
> On linux put it in your home /home/srilm or on root -> /srilm
> then do: set 1.) srlim home variable inside the Makefile and 2.) make
> world
>
> Hope this help!
>
> God blesses!!!
>
> Best regards,
> Sanyaade
>
>
>
> ----- Original Message -----
> *From:* marco turchi <marco.turchi at gmail.com>
> *To:* srilm-user at speech.sri.com
> *Sent:* Saturday, January 06, 2007 8:24 PM
> *Subject:* compilation problems
>
>
>
> Dear all,
> I'm a new user, and I'm trying to compile and install it, but I have some
> problems.
> I set the srlim home variable inside the Makefile, and I run make World,
> but I obtain this set of errors:
>
> /enm/local/bin/gcc -mtune=i686 -Wreturn-type -Wimplicit
> -D_FILE_OFFSET_BITS=64    -I. -I/usr/local/Moses/srilm//include   -c -g
> -O3 -o ../obj/i686/option.o option.c
> cc1: invalid option `tune=i686'
> make[2]: *** [../obj/i686/option.o] Error 1
> I have the same error for other files: qsort.c matherr.c FDiscount.cc
> Lattice.cc ngram.cc fngram-count.cc lattice-tool.cc
>
> I try to change the mtune variable without good result, so I remove this
> flag. Using this brute solution, I'm able to compile quite all the files,
> but I have other errors
>
>
> /enm/local/bin/g++ -Wreturn-type -Wimplicit -DINSTANTIATE_TEMPLATES
> -D_FILE_OFFSET_BITS=64    -I. -I/usr/local/Moses/srilm//include   -c -g
> -O3 -o ../obj/i686/DFNgram.o DFNgram.cc
> Trellis.h:203: sorry, not implemented: use of `enumeral_type' in template
> type
>    unification
>
> make[2]: *** [../obj/i686/DFNgram.o] Error 1
> make[2]: Leaving directory `/usr/local/Moses/srilm/lm/src'
> make[2]: Entering directory `/usr/local/Moses/srilm/flm/src'
>
> make[2]: *** No rule to make target
> `/usr/local/Moses/srilm//lib/i686/liboolm.a', needed by
> `../bin/i686/lattice-tool'.   Stop.
>
> can you help me??
> Please, can you tell me where I can find all the other messages of the
> mailing list?
> Thanks a lot
> Marco Turchi
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070107/360704ee/attachment.html>

From marco.turchi at gmail.com  Sun Jan  7 11:44:54 2007
From: marco.turchi at gmail.com (marco turchi)
Date: Sun, 7 Jan 2007 19:44:54 +0000
Subject: compilation problems
In-Reply-To: <d8b45312205d5bd5b7742398355d7c37@sonic.net>
References: <79a042480701061224k1309d5c0ie43a49e376ece31f@mail.gmail.com>
	 <BAY124-DAV26750712DE4126D34EDF1CBBD0@phx.gbl>
	 <79a042480701070606p6f65f00ar4de95b163b3d985d@mail.gmail.com>
	 <d8b45312205d5bd5b7742398355d7c37@sonic.net>
Message-ID: <79a042480701071144s367f7ac6q17716624ba3ce78b@mail.gmail.com>

Hi Russel,
you are right I've gcc 3.2.3. It is not a good new. :-)
Where can I find all the other messages of this mailing list?

Thanks
Marco


On 1/7/07, Russell Sheptak <rus at sonic.net> wrote:
>
> Check which version of GCC you're using.  I suspect it is version 3.x
> (a simple gcc --version should give you the info you need).   I think
> you need gcc 4 to successfully compile it on linux.
>
> rus
>
>
>
> On Jan 7, 2007, at 6:06 AM, marco turchi wrote:
>
> > Dear Sanyaade,
> > I'm working under Linux.
> > I move srilm in my home directory, but I obtain the same errors. :-(
> >
> > Thanks
> > Marco
> >
> >
> > On 1/7/07, sanyaade <sanyaade at hotmail.com> wrote: What platform are
> > you on -> Windows, Linux, Unix, etc..
> >>
> >> If you are on windows then its got to be in your root directory ->
> >> c:\srilm (Cygwin installed)
> >>
> >> On linux put it in your home /home/srilm or on root -> /srilm
> >> then do: set 1.) srlim home variable inside the Makefile and 2.) make
> >> world
> >>
> >> Hope this help!
> >>
> >> God blesses!!!
> >>
> >> Best regards,
> >>
> >> Sanyaade
> >>
> >>
> >>
> >>> ----- Original Message -----
> >>> From: marco turchi
> >>> To: srilm-user at speech.sri.com
> >>> Sent: Saturday, January 06, 2007 8:24 PM
> >>> Subject: compilation problems
> >>>
> >>>
> >>> Dear all,
> >>> I'm a new user, and I'm trying to compile and install it, but I have
> >>> some problems.
> >>> I set the srlim home variable inside the Makefile, and I run make
> >>> World, but I obtain this set of errors:
> >>>
> >>> /enm/local/bin/gcc -mtune=i686 -Wreturn-type -Wimplicit
> >>> -D_FILE_OFFSET_BITS=64 -I. -I/usr/local/Moses/srilm//include -c
> >>> -g -O3 -o ../obj/i686/option.o option.c
> >>> cc1: invalid option `tune=i686'
> >>> make[2]: *** [../obj/i686/option.o] Error 1
> >>> I have the same error for other files: qsort.c matherr.c
> >>> FDiscount.cc Lattice.cc ngram.cc fngram-count.cc lattice-tool.cc
> >>>
> >>> I try to change the mtune variable without good result, so I remove
> >>> this flag. Using this brute solution, I'm able to compile quite all
> >>> the files, but I have other errors
> >>>
> >>>
> >>> /enm/local/bin/g++ -Wreturn-type -Wimplicit -DINSTANTIATE_TEMPLATES
> >>> -D_FILE_OFFSET_BITS=64 -I. -I/usr/local/Moses/srilm//include -c
> >>> -g -O3 -o ../obj/i686/DFNgram.o DFNgram.cc
> >>>  Trellis.h:203: sorry, not implemented: use of `enumeral_type' in
> >>> template type
> >>> unification
> >>>
> >>> make[2]: *** [../obj/i686/DFNgram.o] Error 1
> >>> make[2]: Leaving directory `/usr/local/Moses/srilm/lm/src'
> >>> make[2]: Entering directory `/usr/local/Moses/srilm/flm/src'
> >>>
> >>> make[2]: *** No rule to make target
> >>> `/usr/local/Moses/srilm//lib/i686/liboolm.a', needed by
> >>> `../bin/i686/lattice-tool'. Stop.
> >>>
> >>> can you help me??
> >>> Please, can you tell me where I can find all the other messages of
> >>> the mailing list?
> >>> Thanks a lot
> >>> Marco Turchi
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070107/2261d4b8/attachment.html>

From stolcke at speech.sri.com  Sun Jan  7 11:51:40 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Sun, 07 Jan 2007 11:51:40 PST
Subject: compilation problems 
In-Reply-To: Your message of Sun, 07 Jan 2007 19:44:54 +0000.
             <79a042480701071144s367f7ac6q17716624ba3ce78b@mail.gmail.com> 
Message-ID: <200701071951.LAA01733@tonga>


In message <79a042480701071144s367f7ac6q17716624ba3ce78b at mail.gmail.com>you wro
te:
> 
> Hi Russel,
> you are right I've gcc 3.2.3. It is not a good new. :-)

What you saw is definitely a problem I have seen with old versions
of gcc. Try 3.4.3 or newer.

> Where can I find all the other messages of this mailing list?

Send a message with the body

	help

to majordomo at speech.sri.com to get instructions on how to retrieve
archives of old messages (as well as other documentation).

Andreas 


From john at johnfry.org  Sun Jan  7 19:58:06 2007
From: john at johnfry.org (John Fry)
Date: Sun, 07 Jan 2007 19:58:06 -0800
Subject: compilation problems
In-Reply-To: <200701071951.LAA01733@tonga> (Andreas Stolcke's message of "Sun,
	07 Jan 2007 11:51:40 PST")
References: <200701071951.LAA01733@tonga>
Message-ID: <87wt3y6ugh.fsf@lld.sjsu.edu>

Andreas Stolcke <stolcke at speech.sri.com> writes:

> Send a message with the body
>
> 	help
>
> to majordomo at speech.sri.com to get instructions on how to retrieve
> archives of old messages (as well as other documentation).

Hi Andreas,

Before I start complaining, let me say that SRILM is a fantastic,
world-class system, and we're all *extremely* grateful to you for
opening it up to us and continuing to support it.

That said, I must point out that using majordomo, a perl script from
1992, to retrieve old messages is completely unworkable.  If you don't
believe me, try it yourself.

Maybe one of these days you can persuade a summer intern to archive
the srilm-user mailing list on the web, where it will be searchable?

Best,

John


From stolcke at speech.sri.com  Mon Jan  8 11:06:12 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Mon, 08 Jan 2007 11:06:12 -0800
Subject: compilation problems
In-Reply-To: <87wt3y6ugh.fsf@lld.sjsu.edu>
References: <200701071951.LAA01733@tonga> <87wt3y6ugh.fsf@lld.sjsu.edu>
Message-ID: <45A29624.8000209@speech.sri.com>

John Fry wrote:
> Andreas Stolcke <stolcke at speech.sri.com> writes:
>
>   
>> Send a message with the body
>>
>> 	help
>>
>> to majordomo at speech.sri.com to get instructions on how to retrieve
>> archives of old messages (as well as other documentation).
>>     
>
> Hi Andreas,
>
> Before I start complaining, let me say that SRILM is a fantastic,
> world-class system, and we're all *extremely* grateful to you for
> opening it up to us and continuing to support it.
>   
Thanks, that's nice to hear.
> That said, I must point out that using majordomo, a perl script from
> 1992, to retrieve old messages is completely unworkable.  If you don't
> believe me, try it yourself.
>
> Maybe one of these days you can persuade a summer intern to archive
> the srilm-user mailing list on the web, where it will be searchable?
>   
Believe me, converting from majordomo to mailman has been on our to-do 
list for a while now.
Any day now ...

Andreas


From yozhik at computer.org  Tue Jan 16 15:38:55 2007
From: yozhik at computer.org (Tom Murray)
Date: Tue, 16 Jan 2007 15:38:55 -0800
Subject: Bug in lattice-tool?
In-Reply-To: <39abe3570701161526s290a0374w97e7d6326516cb62@mail.gmail.com>
References: <39abe3570701161526s290a0374w97e7d6326516cb62@mail.gmail.com>
Message-ID: <39abe3570701161538j470e9312j7a9e9a5965fa8fa1@mail.gmail.com>

Hi,

I was seeing weird behavior in lattice-tool, mixing in an external LM to a
lattice for nbest decoding.

Tracking things down, I found that if I zeroed out the external LM scores as
they were added into the lattice during expansion, the resulting hyp scores
were always zero, that is the scores  from the lattice were discarded. I
observed this for both HTK and PFSG lattices.

Attached is a patch (to version 1.5.1) which I believe fixes the problem.
What I found is that, as old transitions were replaced during expansion
(Lattice::expandAddTransition() in LatticeExpand.cc), the old weights were
discarded. This caused the problem because theinitial transitions loaded
from the lattice files were replaced during expansion.

Cheers,

tm


P.S. I also made some changes to functionality, let me know if anyone is
interested in them: (1) allowing scaling of the external LM as it's used to
reweight the lattice and (2) outputing (weighted) acoustic and LM scores to
the nbest list as they were actually evaluated during decoding; currently
only the original scores from the lattice are output for HTK lattices and
zeros are output for PFSG lattices, because they don't fill the internal HTK
structures used for score output.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070116/5dbd3434/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: LatticeExpand.patch
Type: application/octet-stream
Size: 630 bytes
Desc: not available
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20070116/5dbd3434/attachment.obj>

From yozhik at computer.org  Thu Jan 18 09:55:01 2007
From: yozhik at computer.org (Tom Murray)
Date: Thu, 18 Jan 2007 09:55:01 -0800
Subject: Fwd: Bug in lattice-tool?
In-Reply-To: <200701180657.WAA24896@tonga>
References: <39abe3570701171423p4bb5d962qf6dbed50cca8aeda@mail.gmail.com>
	 <200701180657.WAA24896@tonga>
Message-ID: <39abe3570701180955g5b08279aj4b2c2eb6132259b1@mail.gmail.com>

Thanks, Andreas. I'm forwarding this to the list because I think it
may be quite useful to a number of people.

---------- Forwarded message ----------
From: Andreas Stolcke <stolcke at speech.sri.com>
Date: Jan 17, 2007 10:57 PM
Subject: Re: Bug in lattice-tool?
To: Tom Murray <yozhik at computer.org>


Tom,

what you are trying to do can be done with lattice-tool as it is,
but it requires two passes.  That's how we rescore lattices ourselves.

step 1: expand lattice with new LM, write new lattices
step 2: read rescored lattices, choosing scaling factors and decoding
        1-best or n-best.

You are trying to combine these steps into one, and it fails because
the LM rescoring function overrides the combined scores.
This behavior is by design and some other functions depend on it,
but it needs to be better documented.

BTW, I don't think your patch will necessarily do the right thing.
It simply adds the new LM score to the old combined score, instead
of replacing the old LM score in the combination of scores.
There are ways to fix this, but it would require more extensive code
changes.

I would recommend the 2-step approach.  It also has the advantage
hat you can rerun step2 (n-best decoding) multiple times to try different
scaling factors.

One more thing:  since your LM does not contain multiwords you need
to split the multiwords prior to LM expansion. Simply add the -split-multiwords
option in step 1.

Andreas

In message <39abe3570701171423p4bb5d962qf6dbed50cca8aeda at mail.gmail.com>you wro
te:
> ------=_Part_119177_28709660.1169072629160
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
>
> Hi, Andreas--
>
> What we want to do with lattice-tool is this: generate an n-best list
> from a lattice using an external LM, where the path scores are a
> weighted sum of the AM and LM scores in the lattice and the scores of
> the external LM.
>
> Attached is a tarred directory with an HTK lattice, an LM, and a test
> script test-lattice.sh. Also included is the output of v1.5.1
> lattice-tool, compared with my patched version which adds the
> transition log weights as I described.
>
> The script runs lattice-tool three times, first with default
> -htk-lmscale and -htk-acscale, and then with the lmscale and the
> acscale zeroed out. You can see that the n-best list is the same for
> all three for the v1.5.1 output. For mine it differs.
>
> To give a little more detail of where I think the bug is, according to
> my understanding of what's going on:
>
> When you load the HTK file, you create a node for each HTK edge, and
> then connect this new node from the start node and to the end node.
> The weight of the connection from the start to the new node is the
> weighted sum (according to lmscale, acscale, etc.) of the various
> scores from the HTK edge.
>
> Now, during expansion, old nodes and transitions are replaced by new
> ones, with the old nodes deleted. I printed out all the node indices,
> and the initial nodes corresponding to the HTK edges are deleted
> during this stage. I became convince of this when I added a line to
> zero out the probs from the external LM, and all the hyp scores during
> n-best output had score = 0.
>
> Please let me know if I'm misunderstanding something. Thanks for your help,
>
> tm


From stolcke at speech.sri.com  Tue Feb  6 17:43:50 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Tue, 06 Feb 2007 17:43:50 PST
Subject: Google language model 
In-Reply-To: Your message of Tue, 06 Feb 2007 15:01:54 -0500.
             <200702062003.l16K33Jk028807@linus.mitre.org> 
Message-ID: <200702070143.l171hp108262@huge>


In message <200702062003.l16K33Jk028807 at linus.mitre.org>you wrote:
> Hi Andreas,
> 
> I have been using SRILM for some time now and am interested in using it
> in conjunction with the Google language model.
> 
> >From looking at the documentation and code, I can see that it reads the
> format, but do not see strategies to keep portions of the model in
> memory and others on disk, for example.  Obviously one would need to do
> something like this to hold the entire model.  However, I've also used
> and tweaked enough of the code to know you're a serious hacker, and that
> I might have missed something.
> 
> One thought I had was to point ngram-count to the Google LM, then use a
> word list to filter only the n-grams that I need SRILM to estimate
> probabilities for.  Beyond that, I'm stumped.
> 
> So, can you offer any feedback?  What are some strategies you recommend
> for using the Google LM?  

The Google LM (with nontrivial data size) is really meant to be used 
in conjunction with the -limit-vocab option, which restricts loading 
of parameters to a subset of the vocabulary (i.e., the subset used in your
test or tuning data).

An example of this appears in
$SRILM/test/tests/ngram-count-lm-limit-vocab/run-test.

BTW, there is no "Google LM" per se in SRILM.  You use the "CountLM" class,
and designate the counts to be read in Google format.
See the -count-lm option as described in ngram(1) man page.

Hope this clarifies things.

Andreas 


From marco.turchi at gmail.com  Wed Feb  7 04:21:56 2007
From: marco.turchi at gmail.com (marco turchi)
Date: Wed, 7 Feb 2007 12:21:56 +0000
Subject: compilation problems
In-Reply-To: <45A29624.8000209@speech.sri.com>
References: <200701071951.LAA01733@tonga> <87wt3y6ugh.fsf@lld.sjsu.edu>
	 <45A29624.8000209@speech.sri.com>
Message-ID: <79a042480702070421t2c9ec21cu9617b1c34029eb32@mail.gmail.com>

Dear Andreas,
I wrote on this mailing list a month ago. I had some compilation
problems. You suggested me to install a new gcc version. I did it, and
I was able to compile srilm.
In this day I have realized that some executable files are empty:
ngram
ngram-count
ngram-merge ...

I have the object file of them.

I have compiled srilm again and I have found these errors:

/usr/bin/g++4 -mtune=pentium3 -Wreturn-type -Wimplicit
-DINSTANTIATE_TEMPLATES -D_FILE_OFFSET_BITS=64    -I. -I../../include
 -c -g -O3 -o ../obj/i686/tclmain.o tclmain.cc
tclmain.cc:8:17: error: tcl.h: No such file or directory
make[2]: *** [../obj/i686/tclmain.o] Error 1


../../lib/i686/libdstruct.a ../../lib/i686/libmisc.a -ltcl -lm 2>&1 | c++filt
g++4: ../../lib/i686/libmisc.a: No such file or directory
/usr/local/Moses/srilm/sbin/decipher-install 0555 ../bin/i686/ngram
../../bin/i686
ERROR:  File to be installed (../bin/i686/ngram) does not exist.
ERROR:  File to be installed (../bin/i686/ngram) is not a plain file.
Usage:  decipher-install <mode> <file1> ... <fileN> <directory>
        mode:                 file permission mode, in octal
        file1 ... fileN:      files to be installed
        directory:            where the files should be installed

files =  ../bin/i686/ngram
directory =  ../../bin/i686
mode =  0555

make[2]: [../../bin/i686/ngram] Error 1 (ignored)
touch ../../bin/i686/ngram
and so on for the other files...

please can you help me?

thanks
Marco

On 1/8/07, Andreas Stolcke <stolcke at speech.sri.com> wrote:
> John Fry wrote:
> > Andreas Stolcke <stolcke at speech.sri.com> writes:
> >
> >
> >> Send a message with the body
> >>
> >>      help
> >>
> >> to majordomo at speech.sri.com to get instructions on how to retrieve
> >> archives of old messages (as well as other documentation).
> >>
> >
> > Hi Andreas,
> >
> > Before I start complaining, let me say that SRILM is a fantastic,
> > world-class system, and we're all *extremely* grateful to you for
> > opening it up to us and continuing to support it.
> >
> Thanks, that's nice to hear.
> > That said, I must point out that using majordomo, a perl script from
> > 1992, to retrieve old messages is completely unworkable.  If you don't
> > believe me, try it yourself.
> >
> > Maybe one of these days you can persuade a summer intern to archive
> > the srilm-user mailing list on the web, where it will be searchable?
> >
> Believe me, converting from majordomo to mailman has been on our to-do
> list for a while now.
> Any day now ...
>
> Andreas
>
>
>
>


From patryale at iro.umontreal.ca  Wed Feb  7 05:32:18 2007
From: patryale at iro.umontreal.ca (Alexandre Patry)
Date: Wed, 07 Feb 2007 08:32:18 -0500
Subject: compilation problems
In-Reply-To: <79a042480702070421t2c9ec21cu9617b1c34029eb32@mail.gmail.com>
References: <200701071951.LAA01733@tonga> <87wt3y6ugh.fsf@lld.sjsu.edu>
 <45A29624.8000209@speech.sri.com>
 <79a042480702070421t2c9ec21cu9617b1c34029eb32@mail.gmail.com>
Message-ID: <1170855139.6266.4.camel@localhost.localdomain>

Hi,

the compiler does not seem to find a TCL header files (tclmain.cc:8:17:
error: tcl.h: No such file or directory).  

Did you set the TCL_INCLUDE and TCL_LIBRARY variables in the
common/Makefile.machine.ARCH file?

Mine look like it (in common/Makefile.machine.i686):

8<----------------------------------
# Tcl support (standard in Linux)
TCL_INCLUDE = -I/usr/include/tcl8.4
TCL_LIBRARY =  -L/usr/lib/tcl8.4 -ltcl
8<----------------------------------

Hope this help,

Alexandre

Le mercredi 07 f?vrier 2007 ? 12:21 +0000, marco turchi a ?crit :
> Dear Andreas,
> I wrote on this mailing list a month ago. I had some compilation
> problems. You suggested me to install a new gcc version. I did it, and
> I was able to compile srilm.
> In this day I have realized that some executable files are empty:
> ngram
> ngram-count
> ngram-merge ...
> 
> I have the object file of them.
> 
> I have compiled srilm again and I have found these errors:
> 
> /usr/bin/g++4 -mtune=pentium3 -Wreturn-type -Wimplicit
> -DINSTANTIATE_TEMPLATES -D_FILE_OFFSET_BITS=64    -I. -I../../include
>  -c -g -O3 -o ../obj/i686/tclmain.o tclmain.cc
> tclmain.cc:8:17: error: tcl.h: No such file or directory
> make[2]: *** [../obj/i686/tclmain.o] Error 1
> 
> 
> 
> ../../lib/i686/libdstruct.a ../../lib/i686/libmisc.a -ltcl -lm 2>&1 | c++filt
> g++4: ../../lib/i686/libmisc.a: No such file or directory
> /usr/local/Moses/srilm/sbin/decipher-install 0555 ../bin/i686/ngram
> ../../bin/i686
> ERROR:  File to be installed (../bin/i686/ngram) does not exist.
> ERROR:  File to be installed (../bin/i686/ngram) is not a plain file.
> Usage:  decipher-install <mode> <file1> ... <fileN> <directory>
>         mode:                 file permission mode, in octal
>         file1 ... fileN:      files to be installed
>         directory:            where the files should be installed
> 
> files =  ../bin/i686/ngram
> directory =  ../../bin/i686
> mode =  0555
> 
> make[2]: [../../bin/i686/ngram] Error 1 (ignored)
> touch ../../bin/i686/ngram
> and so on for the other files...
> 
> please can you help me?
> 
> thanks
> Marco
> 
> On 1/8/07, Andreas Stolcke <stolcke at speech.sri.com> wrote:
> > John Fry wrote:
> > > Andreas Stolcke <stolcke at speech.sri.com> writes:
> > >
> > >
> > >> Send a message with the body
> > >>
> > >>      help
> > >>
> > >> to majordomo at speech.sri.com to get instructions on how to retrieve
> > >> archives of old messages (as well as other documentation).
> > >>
> > >
> > > Hi Andreas,
> > >
> > > Before I start complaining, let me say that SRILM is a fantastic,
> > > world-class system, and we're all *extremely* grateful to you for
> > > opening it up to us and continuing to support it.
> > >
> > Thanks, that's nice to hear.
> > > That said, I must point out that using majordomo, a perl script from
> > > 1992, to retrieve old messages is completely unworkable.  If you don't
> > > believe me, try it yourself.
> > >
> > > Maybe one of these days you can persuade a summer intern to archive
> > > the srilm-user mailing list on the web, where it will be searchable?
> > >
> > Believe me, converting from majordomo to mailman has been on our to-do
> > list for a while now.
> > Any day now ...
> >
> > Andreas
> >
> >
> >
> >


From stolcke at speech.sri.com  Mon Feb 12 09:41:23 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Mon, 12 Feb 2007 09:41:23 -0800
Subject: Perplexity
In-Reply-To: <20070212100115.81215.qmail@web36804.mail.mud.yahoo.com>
References: <20070212100115.81215.qmail@web36804.mail.mud.yahoo.com>
Message-ID: <45D0A6C3.3020800@speech.sri.com>

Martha Yifiru wrote:
> Hi,
>
> I want to compare morph-based language model with
> word-based one. To do this I have to do some
> manipulation on the calculation of perplexity for
> morph-based language model so as to have fair
> comparison. I was thinking that the source code for
> perplexity calculation is in ngram.cc but it does not
> seem that the actual perplexity calculation is in
> ngram.cc.
>
> Can anyone help me?
>
>   
The source code for perplexity computation is in lm/src/TextStats.cc .
However, there is no need to modify the code.
When you have different token counts (words versus morphs) the
perplexities are no longer comparable, but the log probabilities are.
You can get the log probability from the perplexity output, e.g.:

file ../ngram-count-gt/eval97.text: 5290 sentences, 38238 words, 681 OOVs
0 zeroprobs, logprob= -86334.6 ppl= 103.502 ppl1= 198.958
                                   ^^^^^^^^
Assume the "words" in this example are actually morphs, and the actual 
number
of words (including sentence boundaries) is less, say, 25000.  then the 
word-perplexity is

    10^ -(-86334.6 / 25000 ) = 2840.43

--Andreas


From Antoine.Ghaoui at jinny.ie  Thu Feb 15 00:09:39 2007
From: Antoine.Ghaoui at jinny.ie (Antoine Ghaoui)
Date: Thu, 15 Feb 2007 10:09:39 +0200
Subject: Language Model output problem using FLM
Message-ID: <EADC72A6-A7DD-401D-8740-0B53EB5DB6E2@jinny.ie>

Hello,

I'm trying to use fngram-count to generate a Language Model based on  
Morphology.
I'm trying to generate a trigram model in order to be familiar with  
the tool.

The factor file is:

## word trigram
1
W : 2 W(-1) W(-2) ntextfile_99.flm.cnt ntextfile_99.flm.lm 3
W1W2    W2      kndiscount gtmin 1 interpolate
W1      W1      kndiscount gtmin 1 interpolate
0       0       kndiscount gtmin 1

The command line used is:
fngram-count -factor-file flm_spc.1 -text ntextfile_99.flm -lm  
ntextfile_99.flm.lm -vocab ntextfile.vocab.flm

The lm file generated is a little bit strange. A part of it is shown  
below:
\data\
ngram 0x0=18119
ngram 0x1=2855740
ngram 0x2=0
ngram 0x3=6490198


\0x0-grams:
-2.313375       </s>
-99     <s>
.
.
\0x1-grams:
-0.9892201      <s> W-LTN       -1.629908
.
.
\\0x2-grams:

\0x3-grams:
-0.9725394      <s> <s> W-LTN   -1.654503
.
.
\end\

Can you please help on this? Is it normal to have ngram 0x2=0? How  
can I get the old format?

Thanks for your help

Antoine


From amittai at mit.edu  Thu Feb 15 07:49:46 2007
From: amittai at mit.edu (amittai e axelrod)
Date: Thu, 15 Feb 2007 15:49:46 +0000
Subject: Language Model output problem using FLM
In-Reply-To: <EADC72A6-A7DD-401D-8740-0B53EB5DB6E2@jinny.ie>
References: <EADC72A6-A7DD-401D-8740-0B53EB5DB6E2@jinny.ie>
Message-ID: <5734eadd0702150749r25ed8de5s6353bffc06845e2a@mail.gmail.com>

On 2/15/07, Antoine Ghaoui <Antoine.Ghaoui at jinny.ie> wrote:
> ## word trigram
> 1
> W : 2 W(-1) W(-2) ntextfile_99.flm.cnt ntextfile_99.flm.lm 3
> W1W2    W2      kndiscount gtmin 1 interpolate
> W1      W1      kndiscount gtmin 1 interpolate
> 0       0       kndiscount gtmin 1
> Can you please help on this? Is it normal to have ngram 0x2=0?

Yes (for a regular trigram LM in FLM format). The short answer is that
this indicates that you have no histories that consist simply of W2.

> How can I get the old format?

You can't. This is the standard FLM file format-- but it's really
equivalent to the LM format, it's just labelled a bit differently.

Because a FLM allows you to select arbitrary combinations of factors
to use as the ngram history, the header of the FLM file will contain a
list of how many of each possible combination of factors you're using
for your history. However, as your FLM specification narrows down
which factor combinations are valid histories, some (or many) of the
entries in the FLM header will have a count of zero.

For example, a FLM header corresponding to an FLM over a trigram with
3 factors per word, might look something like this:
<<<
\data\
ngram 0x0=61628
ngram 0x1=1267167
ngram 0x2=278079
ngram 0x4=1136820
ngram 0x8=2021099
ngram 0x10=0
ngram 0x3=1352676
ngram 0x5=1267167
ngram 0x6=1339994
ngram 0x9=0
ngram 0xA=2824147
ngram 0xC=4578754
ngram 0x11=0
ngram 0x12=0
ngram 0x14=0
ngram 0x18=0
ngram 0x7=1352676
ngram 0xB=0
ngram 0xD=0
ngram 0xE=4702913
ngram 0x13=0
ngram 0x15=4497090
ngram 0x16=4534847
ngram 0x19=0
ngram 0x1A=2824147
ngram 0x1C=4578754
ngram 0xF=0
ngram 0x17=4542579
ngram 0x1B=0
ngram 0x1D=0
ngram 0x1E=425916
ngram 0x1F=325041
>>>

...and this is also normal. While in a normal trigram LM you'd see
"1-gram", "2-gram", etc, a FLM will just number all the nodes in the
possible backoff graph and use each node's label in the header rather
than write out which particular factor combination it represents. If
you want to figure out which particular factor combination each hex
label means, I think the counting mechanism is commented in the FLM
code.

In the case of a trigram model, though, there's only one combination
of factors that's not used as a history and thus has zero entries
(namely that of W2 alone), and therefore that's the one labelled 0x2
:)

~amittai


From liangy at mail.rockefeller.edu  Tue Feb 27 14:35:34 2007
From: liangy at mail.rockefeller.edu (Yupu Liang)
Date: Tue, 27 Feb 2007 17:35:34 -0500
Subject: help on installing srilm on redhat
Message-ID: <AC9C76D1-6C66-4C8A-8E49-304D05CACBD6@rockefeller.edu>

Hi, I am new to the toolkit and want to install it on redhat.

the command I ran is "make MACHINE_TYPE=i686 World"

And I got the following error g++: ../../lib/i686/libmisc.a: No such  
file or directory

I tried to read the make file to find out where the libmisc.a got  
generated but didn't get any luck.

Could somebody help out?

Thanks a lot,
Yupu


From hanisaf at gmail.com  Wed Mar  7 16:17:47 2007
From: hanisaf at gmail.com (Hani Safadi)
Date: Wed, 7 Mar 2007 19:17:47 -0500
Subject: cahce based models
Message-ID: <990817d50703071617i46a7ed3t92813a9287edc0ea@mail.gmail.com>

Hi,
I would like to get more information on the cache-based models
implemented in SRILM. and how to use them.
The paper briefly mentions them, and there is no information in the man pages.
Thanks
-- 
Looking forward to hearing from you.
Best wishes,
Hani Safadi


From j.ganitkevitch at googlemail.com  Wed Mar  7 17:14:21 2007
From: j.ganitkevitch at googlemail.com (Juri Ganitkevitch)
Date: Thu, 8 Mar 2007 02:14:21 +0100
Subject: cahce based models
In-Reply-To: <990817d50703071617i46a7ed3t92813a9287edc0ea@mail.gmail.com>
References: <990817d50703071617i46a7ed3t92813a9287edc0ea@mail.gmail.com>
Message-ID: <3BE78265-2376-4D96-8AB4-547D82E15E92@gmail.com>

Hi Hani,

if I'm correctly interpreting your question, the LM subclass CacheLM  
provides a simple cache component implementation.

Word probability is boosted if the very same word occured in a window  
of the last N words (more occurencies yield higher probability). You  
get ngram to interpolate whatever model you're using with a cache  
component using -cache. The source code of this one is very  
straightforward if you're interested in the details.

If you're looking for the original papers, Kuhn and De Mori published  
on this in 1990 (as to my knowledge at least).

Hope this helps.

Cheers from Aachen,

Juri


On 8. Mar, 2007, at 01:17, Hani Safadi wrote:

> Hi,
> I would like to get more information on the cache-based models
> implemented in SRILM. and how to use them.
> The paper briefly mentions them, and there is no information in the  
> man pages.
> Thanks
> -- 
> Looking forward to hearing from you.
> Best wishes,
> Hani Safadi


From stolcke at speech.sri.com  Wed Mar  7 17:27:24 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Wed, 07 Mar 2007 17:27:24 PST
Subject: cahce based models 
In-Reply-To: Your message of Thu, 08 Mar 2007 02:14:21 +0100.
             <3BE78265-2376-4D96-8AB4-547D82E15E92@gmail.com> 
Message-ID: <200703080127.l281ROA13913@huge>


In message <3BE78265-2376-4D96-8AB4-547D82E15E92 at gmail.com>you wrote:
> Hi Hani,
> 
> if I'm correctly interpreting your question, the LM subclass CacheLM  
> provides a simple cache component implementation.
> 
> Word probability is boosted if the very same word occured in a window  
> of the last N words (more occurencies yield higher probability). You  
> get ngram to interpolate whatever model you're using with a cache  
> component using -cache. The source code of this one is very  
> straightforward if you're interested in the details.
> 
> If you're looking for the original papers, Kuhn and De Mori published  
> on this in 1990 (as to my knowledge at least).
> 
> Hope this helps.
> 
> Cheers from Aachen,
> 
> Juri

Thanks for this dead-on response!

At risk of stating the obvious, the code for CacheLM is in
$SRILM/lm/src/CacheLM.cc, and is quite short and easy to follow.

Best,

Andreas 

> 
> On 8. Mar, 2007, at 01:17, Hani Safadi wrote:
> 
> > Hi,
> > I would like to get more information on the cache-based models
> > implemented in SRILM. and how to use them.
> > The paper briefly mentions them, and there is no information in the  
> > man pages.
> > Thanks
> > -- 
> > Looking forward to hearing from you.
> > Best wishes,
> > Hani Safadi
> 


From hanisaf at gmail.com  Wed Mar  7 20:28:32 2007
From: hanisaf at gmail.com (Hani Safadi)
Date: Wed, 7 Mar 2007 23:28:32 -0500
Subject: cahce based models
In-Reply-To: <200703080127.l281ROA13913@huge>
References: <3BE78265-2376-4D96-8AB4-547D82E15E92@gmail.com>
	 <200703080127.l281ROA13913@huge>
Message-ID: <990817d50703072028o5a68049j5352af582ce007e9@mail.gmail.com>

Hi there,
Thank you for your answers,
I would like to compare several language models, including the cache
model defined in Kuhn and De Mori paper.
I was using the CMU SLM toolkit, and moved recently to SRILM because
of the richness of the implemented algorithm. The only obstacle I
found is the sparse documentation of the project.

I can infer from your answers that to use cache model, I can either:
1- Use the subclass CacheLM using a programming language.
2- use the option -cache with the ngram command.
I still prefer to master the existing commands before using any API,
so now, suppose I want to use ngram -cache 10
and I would like to define to word classes,
The pdf paper says that "Word classes may be defined manually". I
would like to know how to do that, and how to pass the classes file to
ngram.

Finally, I have a comment to the maintainers of this wonderful
project. Why don't you provide a tutorial to use SRILM. This can help
many new comers, given that the documentation is not complete.

Thanks
Looking forward to hearing from you
regards
Hani


On 3/7/07, Andreas Stolcke <stolcke at speech.sri.com> wrote:
>
> In message <3BE78265-2376-4D96-8AB4-547D82E15E92 at gmail.com>you wrote:
> > Hi Hani,
> >
> > if I'm correctly interpreting your question, the LM subclass CacheLM
> > provides a simple cache component implementation.
> >
> > Word probability is boosted if the very same word occured in a window
> > of the last N words (more occurencies yield higher probability). You
> > get ngram to interpolate whatever model you're using with a cache
> > component using -cache. The source code of this one is very
> > straightforward if you're interested in the details.
> >
> > If you're looking for the original papers, Kuhn and De Mori published
> > on this in 1990 (as to my knowledge at least).
> >
> > Hope this helps.
> >
> > Cheers from Aachen,
> >
> > Juri
>
> Thanks for this dead-on response!
>
> At risk of stating the obvious, the code for CacheLM is in
> $SRILM/lm/src/CacheLM.cc, and is quite short and easy to follow.
>
> Best,
>
> Andreas
>
> >
> > On 8. Mar, 2007, at 01:17, Hani Safadi wrote:
> >
> > > Hi,
> > > I would like to get more information on the cache-based models
> > > implemented in SRILM. and how to use them.
> > > The paper briefly mentions them, and there is no information in the
> > > man pages.
> > > Thanks
> > > --
> > > Looking forward to hearing from you.
> > > Best wishes,
> > > Hani Safadi
> >
>
>


-- 
Looking forward to hearing from you.
Best wishes,
Hani Safadi


From j.ganitkevitch at googlemail.com  Thu Mar  8 01:11:32 2007
From: j.ganitkevitch at googlemail.com (Juri Ganitkevitch)
Date: Thu, 8 Mar 2007 10:11:32 +0100
Subject: cahce based models
In-Reply-To: <990817d50703072028o5a68049j5352af582ce007e9@mail.gmail.com>
References: <3BE78265-2376-4D96-8AB4-547D82E15E92@gmail.com> <200703080127.l281ROA13913@huge> <990817d50703072028o5a68049j5352af582ce007e9@mail.gmail.com>
Message-ID: <4F7EF298-AAC4-44E7-AFB3-D84C7E81549F@gmail.com>

Hi Hani,

> I can infer from your answers that to use cache model, I can either:
> 1- Use the subclass CacheLM using a programming language.
> 2- use the option -cache with the ngram command.
Actually, -cache uses the implementation given in the CacheLM class.  
If you want to extend fuctionality I figure your best bet would be to  
either extend the CacheLM or LM class (don't think any other language  
than C/C++ would be good here, as you'll get horrible performance for  
invoking wrappers for every word).
You would then need to plug your class in ngram (possibly ngram-count  
as well if you have stuff to count/train). This is actually quite  
simple, you can best observe the steps necessary by searching for  
cache in ngram.cc. You'll find essentially two parts, one where  
command line parameters are defined and mapped to variables and a  
second where the model is initiated and mixed into the current model  
in use.

> I still prefer to master the existing commands before using any API,
> so now, suppose I want to use ngram -cache 10
To my knowledge (this would vary with texts and languages of course)  
a value of 100 is a good starting point

> and I would like to define to word classes,
> The pdf paper says that "Word classes may be defined manually". I
> would like to know how to do that, and how to pass the classes file to
> ngram.
Given the current code, I figure you'll need to implement your own  
cache model, as this one does not incorporate any kind of word class  
support. Either you map words to classes (and operate on those) in  
your model, or you have a LM wrapper (a bit like the classes that  
provide for combining LMs) that feeds the cache model with classes  
rather than words. Sadly I don't know if there is such an approach  
implemented in SRILM.

Documentation is a bit sparse, true. As long as you don't want to  
code around in SRILM the manpages and -help options provide you with  
a bit of an overview.

For coding I have found it to be helpful to follow the course main()  
in either ngram or ngram-count to figure out how it works. Code's  
clean and the naming gives you a good insight about what's going on.

Take care,
		Juri


From joel.pinto at idiap.ch  Thu Mar  8 06:33:52 2007
From: joel.pinto at idiap.ch (Joel Pinto)
Date: Thu, 08 Mar 2007 15:33:52 +0100
Subject: ngram manipulation
Message-ID: <45F01ED0.2030305@idiap.ch>

Hello SRILM users,

I have a question on the use of srilm toolkit for LM manipulation.

The language model in the arpa format gives conditional probabilities
e.g  p(wd3|wd1, wd2)
Can I compute the joint probability p(wd1, wd2, wd3)  using any utility.

I have a heavy LM with (ngram 1=50002, ngram 2=29077135, ngram 3=40083381).


Any help would be greatly appreciated.
Thanks,
joel.


arpa format:
p(wd3|wd1,wd2) = if(trigram exists)           p_3(wd1,wd2,wd3)
                else if(bigram w1,w2 exists) bo_wt_2(w1,w2)*p(wd3|wd2)
                else                         p(wd3|w2)

p(wd2|wd1)= if(bigram exists) p_2(wd1,wd2)
            else              bo_wt_1(wd1)*p_1(wd2)


From stolcke at speech.sri.com  Thu Mar  8 08:21:42 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Thu, 08 Mar 2007 08:21:42 PST
Subject: ngram manipulation 
In-Reply-To: Your message of Thu, 08 Mar 2007 15:33:52 +0100.
             <45F01ED0.2030305@idiap.ch> 
Message-ID: <200703081621.l28GLh101953@huge>


There is a hack to do it.
Remove from your LM any ngrams involving the <s> or </s> token (without
changing the other probabilities nad backoff weights).
Then feed your ngrams to "ngram -debug 1 -ppl").  The "sentence"
log probabilities will now correspond to joint ngram probabilities,
since the initial word will back off to a unigram probability, and 
the final </s> will count as an OOV and not contrinute to the total
log probability.

It would be easy to add an option somewhere to make this more convenient,
without the need to hack the LM itself.

--Andreas

In message <45F01ED0.2030305 at idiap.ch>you wrote:
> Hello SRILM users,
> 
> I have a question on the use of srilm toolkit for LM manipulation.
> 
> The language model in the arpa format gives conditional probabilities
> e.g  p(wd3|wd1, wd2)
> Can I compute the joint probability p(wd1, wd2, wd3)  using any utility.
> 
> I have a heavy LM with (ngram 1=50002, ngram 2=29077135, ngram 3=40083381).
> 
> 
> Any help would be greatly appreciated.
> Thanks,
> joel.
> 
> 
> arpa format:
> p(wd3|wd1,wd2) = if(trigram exists)           p_3(wd1,wd2,wd3)
>                 else if(bigram w1,w2 exists) bo_wt_2(w1,w2)*p(wd3|wd2)
>                 else                         p(wd3|w2)
> 
> p(wd2|wd1)= if(bigram exists) p_2(wd1,wd2)
>             else              bo_wt_1(wd1)*p_1(wd2)
> 


From jkurlandski at hotmail.com  Sat Mar 10 11:36:44 2007
From: jkurlandski at hotmail.com (Kurlandski Jerry)
Date: Sat, 10 Mar 2007 14:36:44 -0500
Subject: problems running tests
Message-ID: <BAY128-F385C31DBD1584EBE697AF3BF7F0@phx.gbl>

Hello,

I'm a newcomer to SRI LM and am having problems running the tests. Between a 
third and half the tests do not match the reference output. One example is 
the first test, adapt-marginals. Here is the stderr output:

../ngram-count-gt/swbd.3bo.gz: line 8: ngram line has 1 fields (3 expected)
format error in lm file
../ngram-count-gt/eval97.text: line 5293: 5290 sentences, 38238 words, 0 
OOVs
0 zeroprobs, logprob= 0 ppl= 1 ppl1= 1
using WittenBell for 1-grams
warning: distributing 0.0720362 left-over probability mass over all 3379 
words
writing 3380 1-grams
../ngram-count-gt/swbd.3bo.gz: line 8: ngram line has 1 fields (3 expected)
format error in lm file


The vocab-aliases test has very similar error output:

reading 33110 1-grams
../ngram-count-gt/swbd.3bo.gz: line 8: ngram line has 1 fields (3 expected)
format error in lm file


And ngram-prune's output is:

swbd.3bo.gz: line 7: ngram line has 1 fields (3 expected)
format error in lm file
pruned.gz: No such file or directory


I am running SRI LM version 1.5.1 with the latest version of Cygwin on a 
Windows 2000 platform. Any help would be appreciated.

Thanks.


Further details:

I wondered if the issue might have to do with gunzip. So I typed the 
following at the command line, and got the following output:

$ gunzip -f swbd.3bo.gz
gunzip: swbd.3bo.gz: invalid compressed data--format violated


I tried unzipping with WinZip and got the following message:
Invalid compressed data--unable to inflate.

Still, Winzip did give me an apparently unzipped version of the file, so I 
ran just the adapt-marginals test against the unzipped file. However, I got 
the same output as described above.

_________________________________________________________________
The average US Credit Score is 675. The cost to see yours: $0 by Experian. 
http://www.freecreditreport.com/pm/default.aspx?sc=660600&bcd=EMAILFOOTERAVERAGE


From stolcke at speech.sri.com  Sat Mar 10 12:27:00 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Sat, 10 Mar 2007 12:27:00 PST
Subject: problems running tests 
In-Reply-To: Your message of Sat, 10 Mar 2007 14:36:44 -0500.
             <BAY128-F385C31DBD1584EBE697AF3BF7F0@phx.gbl> 
Message-ID: <200703102027.l2AKR0g16858@huge>


In message <BAY128-F385C31DBD1584EBE697AF3BF7F0 at phx.gbl>you wrote:
> Hello,
> 
> I'm a newcomer to SRI LM and am having problems running the tests. Between a 
> third and half the tests do not match the reference output. One example is 
> the first test, adapt-marginals. Here is the stderr output:
> 
> ../ngram-count-gt/swbd.3bo.gz: line 8: ngram line has 1 fields (3 expected)
> format error in lm file
> ../ngram-count-gt/eval97.text: line 5293: 5290 sentences, 38238 words, 0 
> OOVs
> 0 zeroprobs, logprob= 0 ppl= 1 ppl1= 1
> using WittenBell for 1-grams
> warning: distributing 0.0720362 left-over probability mass over all 3379 
> words
> writing 3380 1-grams
> ../ngram-count-gt/swbd.3bo.gz: line 8: ngram line has 1 fields (3 expected)
> format error in lm file
> 
> 
> The vocab-aliases test has very similar error output:
> 
> reading 33110 1-grams
> ../ngram-count-gt/swbd.3bo.gz: line 8: ngram line has 1 fields (3 expected)
> format error in lm file

This indicates either

1) there is some problem with your cygwin installation
2) the files were somehow corrupted in unpacking.

If you have access to a unix or linux system you could unpack the tar.gz 
file there and make sure the swbd.3bo.gz file can uncompressed.
I suspect it's something having to do with the way Windows distingishes
"text" from "binary" files.

Andreas 

PS. If you built SRILM for the "win32" platform compressed files won't 
be supported, and you should run the go.unzip script in the test directory
before attempting to run the tests.  However, this assumes you have aworking
gunzip in your cygwin installation.


From bplank at science.uva.nl  Mon Mar 12 10:05:49 2007
From: bplank at science.uva.nl (B. Plank)
Date: Mon, 12 Mar 2007 18:05:49 +0100 (CET)
Subject: tolower option
Message-ID: <4224.146.50.144.82.1173719149.squirrel@webmail.science.uva.nl>

Dear SRILM mailing list,

I am wondering.. when I try to train a language model with ngram-count and
the ?tolower option,
I?m getting the following error:

assertion "i < maxWordLength" failed: file "Vocab.cc", line 97

The input corpus (-text) is an utf8 file. Might this cause the problem?

I am grateful for any suggestion.

Barbara


From Antoine.Ghaoui at jinny.ie  Tue Mar 13 08:52:49 2007
From: Antoine.Ghaoui at jinny.ie (Antoine Ghaoui)
Date: Tue, 13 Mar 2007 17:52:49 +0200
Subject: Error in discount estimator
Message-ID: <BE1B4EB9-B210-4EF0-8319-A57383F7D008@jinny.ie>

Hello,

When using fngram-count to generate a Language Model, i'm getting the  
following error:

warning: one of required modified KneserNey count-of-count is zero
error in discount estimator


Can someone help?

knowing that the factor file is


## root trigram
1
R : 2 R(-1) R(-2) ntextfile_99.flm.cnt ntextfile_99.flm.lm 3
R1R2    R2      kndiscount gtmin 1 interpolate
R1      R1      kndiscount gtmin 1 interpolate
0       0       kndiscount gtmin 1

Thanks

Antoine


From stolcke at speech.sri.com  Wed Mar 14 10:32:08 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Wed, 14 Mar 2007 10:32:08 -0700
Subject: tolower option
In-Reply-To: <4224.146.50.144.82.1173719149.squirrel@webmail.science.uva.nl>
References: <4224.146.50.144.82.1173719149.squirrel@webmail.science.uva.nl>
Message-ID: <45F83198.9090704@speech.sri.com>

B. Plank wrote:
> Dear SRILM mailing list,
>
> I am wondering.. when I try to train a language model with ngram-count and
> the ?tolower option,
> I?m getting the following error:
>
> assertion "i < maxWordLength" failed: file "Vocab.cc", line 97
>
> The input corpus (-text) is an utf8 file. Might this cause the problem?
>
> I am grateful for any suggestion.
>
>   
-tolower is simply implemented by the C library tolower() function, 
which is controlled by the OS's locale settings.
I am not sure if tolower() works correctly for UTF8, and if it does you 
probably have to set LC_CTYPE to something
appropriate. In other words, this is all beyond the scope of what the 
SRILM code itself handles.

I would write a little test program that calls tolower() on some test 
data to make sure it does what you want.

Andreas


From stolcke at speech.sri.com  Thu Mar 15 11:26:10 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Thu, 15 Mar 2007 11:26:10 -0700
Subject: Error in discount estimator
In-Reply-To: <BE1B4EB9-B210-4EF0-8319-A57383F7D008@jinny.ie>
References: <BE1B4EB9-B210-4EF0-8319-A57383F7D008@jinny.ie>
Message-ID: <45F98FC2.8090207@speech.sri.com>

Antoine Ghaoui wrote:
> Hello,
>
> When using fngram-count to generate a Language Model, i'm getting the 
> following error:
>
> warning: one of required modified KneserNey count-of-count is zero
> error in discount estimator
>
>
> Can someone help?
>
> knowing that the factor file is

This is a problem with the frequency distribution of factors in your data.
You probably have no singleton ngrams for some factor-ngram.  Try using 
a discounting method like wbdiscount
instead of kndiscount.

Andreas

>
>
> ## root trigram
> 1
> R : 2 R(-1) R(-2) ntextfile_99.flm.cnt ntextfile_99.flm.lm 3
> R1R2    R2      kndiscount gtmin 1 interpolate
> R1      R1      kndiscount gtmin 1 interpolate
> 0       0       kndiscount gtmin 1
>
> Thanks
>
> Antoine
>


From stolcke at speech.sri.com  Tue Mar 20 21:27:00 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Tue, 20 Mar 2007 20:27:00 -0800
Subject: SRILM beginning and end tokens? 
In-Reply-To: Your message of Tue, 20 Mar 2007 19:33:26 -0400.
             <20070320233327.E8AD478B51@epoch.cs> 
Message-ID: <200703210427.l2L4R0r24813@speech.sri.com>


In message <20070320233327.E8AD478B51 at epoch.cs>you wrote:
> Dear Andreas,
> 
> I am very grateful to benefit from your work by using this toolkit.  It's
> great!  
> 
> I noticed it adds <s> and </s> tokens if they aren't there.  However, I'm
> modelling with trigrams, and it seems to add only one begin/end pair per
> sentence.  Is there an option I missed, or do I need to insert them myself?

For </s>, there is never a reason to add more than one such token,
the last ngram probability that goes into the sentence probability is

	p( </s> | ... ) 

For <s>, you also need no more than one token, since the backoff will
establish that 

	p( w1 | ... <s> ) = p(w1 | <s>)

I know that some other implementations add additional higher-order ngrams 
by filling in multiple copies of <s>, but I believe that is not well motivated.
It could also lead to unnatural count-of-count statistics for KN and GT
smoothing.

Andreas 

> 
> Thank you!
> -Amber
> 
> 
> \   L. Amber Wilcox-O'Hearn * http://www.cs.toronto.edu/~amber/   /
> -\  Graduate student * Computational Linguistics Research Group  /-
> --\   Department of Computer Science * University of Toronto    /--


From stolcke at speech.sri.com  Mon Mar 26 10:38:12 2007
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Mon, 26 Mar 2007 10:38:12 -0700
Subject: Question about using SRI with Large Data
In-Reply-To: <845615.46861.qm@web38109.mail.mud.yahoo.com>
References: <845615.46861.qm@web38109.mail.mud.yahoo.com>
Message-ID: <46080504.50500@speech.sri.com>

Ibrahim Zaghloul wrote:
> Dear Eng. Andreas
>
> I am trying to use SRI LM with a counts file that is 5 GB, but I failed
> with all the ways. I got this counts by using the vocab option to limit
> the counts. I generated 8 sorted files as my data was divided to 8
> parts and then used ngram-merge to merge them. The result was the above
> file 5 GB.
> I tried to use the ordinal command:
> ngram-count -read ngram-file -lm output-lm-file
> but the result was a long error ending with Assertion 'body !=0' failed
> I tried to use this command
> make-big-lm -read ngrams-file -lm lm-file
>     but also the above error was the result.
> Also I tried to use the -gtNmin option, but also recieved the above
> error.
Please check $SRILM/doc/FAQ for a list of measures to try.  If none of 
them work then you just have too much
data and too little memory, and need to get a larger machine.  Note that 
you should ALWAYS succeed by raising
the minimum counts sufficiently.  The exact values will depend on your 
data and the amount of memory you have.
>
> When I tried to use make-google-ngrams, the result was the error:
> "/sri/bin/make-google-ngrams gzip=0 cna.ngrams
> sort: invalid option -- 2"
>
make-google-ngrams not the right tool for this problem.

Andreas