In message <Pine.LNX.4.44.0303111624410.915-100000 at ADDRESS HIDDEN-muenche
n.de>you wrote:
> Hi Andreas,
>
> I installed the new version 1.3.3 of the SRI LM toolkit on a Linux
> machine, (Linux 2.4.19, GNU libc 2.2.5, gcc version 2.95.3). I have
> problems with reading data from STDIN in ngram:
>
> Version 1.3.2 and older this worked:
> cat 300classes | ngram -order 2 -ppl DEVTEST.sri -unk -lm 300Klassen.LM -clas
> ses -
> file DEVTEST.sri: 515 sentences, 13964 words, 0 OOVs
> 0 zeroprobs, logprob= -30572.9 ppl= 129.28 ppl1= 154.67
>
>
> This produces a warning with Version 1.3.3:
> cat 300classes | ngram -order 2 -ppl DEVTEST.sri -unk -lm 300Klassen.LM -clas
> ses -
> warning: '-' used multiple times for input
> file DEVTEST.sri: 515 sentences, 13964 words, 0 OOVs
> 0 zeroprobs, logprob= -78894.3 ppl= 281112 ppl1= 446516
>
> But this works perfectly well with Version 1.3.3:
> ngram -order 2 -ppl DEVTEST.sri -unk -lm 300Klassen.LM -classes 300classes
> file DEVTEST.sri: 515 sentences, 13964 words, 0 OOVs
> 0 zeroprobs, logprob= -30572.9 ppl= 129.28 ppl1= 154.67
>
> Is this problem due to my configuration?
>
> Regards,
> Karl
>
Karl,
what you see is an unfortunate byproduct of the new -limit-vocab
facility. It requires the class definition file to be read multiple
times to work correctly (at least in the current implementation).
However, the simple patch included below avoids the problem when the
-limit-vocab option is not being used (as in your case).
Note that another scenario where the classes file is read multiple times
is when you are mixing several models. The message
warning: '-' used multiple times for input
at least warns you that something is trying to read stdin multiple times.
--Andreas
*** /tmp/T00o6hhK Tue Mar 11 23:40:05 2003
--- lm/src/ngram.cc Tue Mar 11 23:37:22 2003
***************
*** 369,377 ****
--- 369,379 ----
* the class names (the first column of the class definitions)
* into the vocabulary.
*/
+ if (limitVocab) {
File file(classesFile, "r");
classVocab->read(file);
}
+ }
ngramLM =
decipherHack ? new DecipherNgram(*vocab, order, !decipherNoBackoff) :
Click here to go to the SRILM home page.