SRILM to Sphinx lm.DMP

Yannick Estève yannick.esteve at lium.univ-lemans.fr
Wed Apr 16 06:28:59 PDT 2008


In fact, I believe it is necessary to use the "add-dummy-bows" script  
too which is a part of srilm.
lm3g2dmp waits for a value for each low-order ngrams. This script adds  
the value 0 to missing back-off weights.

So you have to do something like that:

#srilm tools
gunzip -c lm.arpa.gz | gawk -f sort-lm | gzip -c > lm.sorted.arpa.gz
gunzip -c lm.sorted.arpa.gz | gawk -f add-dummy-bows | gzip -c >  
lm.sphinx.arpa.gz

and then:
#cmu sphinx tools
lm3g2dmp lm.sphinx.arpa.gz .



Notice that only 3-gram LMs work with Sphinx. LIUM distributes a open  
source tool which allows to rescore sphinx3 word-lattices with a 4- 
gram LM.

This is available here:
http://www-lium.univ-lemans.fr/tools/index.php?option=com_content&task=blogcategory&id=21&Itemid=47



Best regards,
-Yannick

Le 16 avr. 08 à 12:40, Sopheap SENG a écrit :

> Hello,
>
> On the Sphinx website (http://cmusphinx.sourceforge.net/html/cmusphinx.php 
> )  there is a tool called  lm3g2dmp  that converts a 3-gram lm to  
> binary DMP format to use in Sphinx 3 decoder.
>
> The ngram-count doesnt output n-gram in the rigth order for Sphinx's  
> lm3g2dmp utility. You will need to resort the lm somehow.
>
> sort-lm could do that but I used a script written by  
> fuegen at ira.uka.de to convert before passing to lm3g2dmp.
>
> if you could not find this script on the net, please e-mail me.
>
> Best,
>
> Sopheap
>
>
>
> On Tue, Apr 15, 2008 at 9:24 PM, Andreas Stolcke <stolcke at speech.sri.com 
> > wrote:
> Christian Schrumpf wrote:
> Dear Mr. Stolcke,
>  how can I convert an n-gram lm prduced with the ngram-count program  
> of SRILM to a lm I can use in Sphinx 3?
> Thank you in advance.
> I understand Sphinx LMs require the N-grams to be sorted in a  
> certain way.
> The "sort-lm" command described in the lm-scripts man page was made  
> for this reason.
>
> If you google "srilm sphinx" you will find several mentions of  
> apparently successful use of SRILM in combination with Sphinx.
>
> Andreas
>
>
>
>
>
> -- 
> ---------------------------------------------
> Sopheap SENG
>
> Laboratoire d'Informatique de Grenoble (LIG)
> Equipe GETALP Bureau C118
> 220, avenue de la Chimie
> Campus Scientifique, BP53
> 38041 GRENOBLE Cedex 9, FRANCE
> Tél : (33)-4-76-63-55-81
> Télécopie : (33)-4-76-63-55-52
> Courriel : sopheap.seng at imag.f
> URL : http://www-geod.imag.fr
> ---------------------------------------------
> Enseignant
> Institut de Technologie du Cambodge
> BP 86, Bd de Pochentong
> Phnom Penh - Cambodge
> Tél : (855)-23-88-03-70/98-24-45
> Télécopie : (855)-23-88-03-69
> Courriel : sopheap.seng at itc.edu.kh
> URL : http://www.itc.edu.kh
> ---------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20080416/3f0bc286/attachment.html>


More information about the SRILM-User mailing list