Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Re: disambig with "open vocabulary" LM

From: Andreas Stolcke <stolcke at ADDRESS HIDDEN>
Date: Tue, 28 Jan 2003 09:34:06 PST

In message <3E36AB98.3070405 at ADDRESS HIDDEN>you wrote:
> Hi,
> I would like to use the disambig program with an open-vocabulary LM
> (built with ngram-count and -unk option).
> I get the following error message: "warning: non-zero probability for
> <unk> in closed-vocabulary LM" (the LM read by disambig is not
> recognized as an open-vocabulary LM).
> What is the matter? Are we supposed to use only closed-vocabulary LM
> with disambig?
> Can anyone help?
> Thanks,
>
> Amélie
>
> PS: is there anywhere I can find an archive of the mailing-list?
>

Amélie,

this is an omission in disambig, to tell the vocabulary object that
<unk> is to be treated as a regular word.  Please try the following patch:

===================================================================
RCS file: RCS/disambig.cc,v
retrieving revision 1.34
diff -c -r1.34 disambig.cc
*** /tmp/T00M2saV Tue Jan 28 09:30:49 2003
--- disambig.cc Tue Jan 28 09:23:02 2003
***************
*** 709,714 ****
--- 709,715 ----
  
      vocab.toLower = tolower1? true : false;
      hiddenVocab.toLower = tolower2 ? true : false;
+     hiddenVocab.unkIsWord = keepUnk ? true : false;
  
      if (mapFile) {
   File file(mapFile, "r");

===================================================================

A similar patch belongs in hidden-ngram.cc:

===================================================================
RCS file: RCS/hidden-ngram.cc,v
retrieving revision 1.37
diff -c -r1.37 hidden-ngram.cc
*** /tmp/T0aSC8P_ Tue Jan 28 09:32:03 2003
--- hidden-ngram.cc Tue Jan 28 09:24:59 2003
***************
*** 1007,1012 ****
--- 1007,1013 ----
       */
      Vocab vocab;
      vocab.toLower = toLower? true : false;
+     vocab.unkIsWord = keepUnk ? true : false;
  
      SubVocab hiddenVocab(vocab);
      SubVocab *classVocab = 0;

===================================================================

As to the mailing list archives:  send a message to majordomo at ADDRESS HIDDEN
with "help" in the body.  You will receive instructions on how to retrieve
the archives of this mailing list. (Unfortunately there is no web interface.)

--Andreas

Click here to go to the SRILM home page.