disambig with "open vocabulary" LM

Andreas Stolcke stolcke at speech.sri.com
Tue Jan 28 09:34:06 PST 2003


In message <3E36AB98.3070405 at ira.uka.de>you wrote:
> Hi,
> I would like to use the disambig program with an open-vocabulary LM 
> (built with ngram-count and -unk option).
> I get the following error message: "warning: non-zero probability for 
> <unk> in closed-vocabulary LM" (the LM read by disambig is not 
> recognized as an open-vocabulary LM).
> What is the matter? Are we supposed to use only closed-vocabulary LM 
> with disambig?
> Can anyone help?
> Thanks,
> 
> Amélie
> 
> PS: is there anywhere I can find an archive of the mailing-list?
> 

Amélie,

this is an omission in disambig, to tell the vocabulary object that 
<unk> is to be treated as a regular word.  Please try the following patch:

===================================================================
RCS file: RCS/disambig.cc,v
retrieving revision 1.34
diff -c -r1.34 disambig.cc
*** /tmp/T00M2saV	Tue Jan 28 09:30:49 2003
--- disambig.cc	Tue Jan 28 09:23:02 2003
***************
*** 709,714 ****
--- 709,715 ----
  
      vocab.toLower = tolower1? true : false;
      hiddenVocab.toLower = tolower2 ? true : false;
+     hiddenVocab.unkIsWord = keepUnk ? true : false;
  
      if (mapFile) {
  	File file(mapFile, "r");

===================================================================

A similar patch belongs in hidden-ngram.cc:

===================================================================
RCS file: RCS/hidden-ngram.cc,v
retrieving revision 1.37
diff -c -r1.37 hidden-ngram.cc
*** /tmp/T0aSC8P_	Tue Jan 28 09:32:03 2003
--- hidden-ngram.cc	Tue Jan 28 09:24:59 2003
***************
*** 1007,1012 ****
--- 1007,1013 ----
       */
      Vocab vocab;
      vocab.toLower = toLower? true : false;
+     vocab.unkIsWord = keepUnk ? true : false;
  
      SubVocab hiddenVocab(vocab);
      SubVocab *classVocab = 0;

===================================================================

As to the mailing list archives:  send a message to majordomo at speech.sri.com
with "help" in the body.  You will receive instructions on how to retrieve
the archives of this mailing list. (Unfortunately there is no web interface.)

--Andreas




More information about the SRILM-User mailing list