make-ngram-pfsg: bad results with new gawk version

Andreas Stolcke stolcke at speech.sri.com
Fri Mar 5 07:52:32 PST 2004


Thanks for tracking this down.  I'll add a note somewhere that one better
set LC_NUMERIC=C or LC_ALL=C for gawk scripts to do proper artihmetic.

--Andreas

In message <40487FE8.3020708 at ei.tum.de>you wrote:
> Hi Andreas,
> 
> Andreas Stolcke wrote:
> > This is quite odd.
> 
> I think so, too :)
> 
> > make-ngram-pfsg doesn't perform much arithmetic on the log probabilties
> > in the LM.  It only scales and rounds them.
>  >
> > Can you apply the scale_log() function in make-ngram-pfsg to your LM
> > probabilties and backoff weights, and extract the cases where the output
> > differs?
> 
> old awk:
> 	add_trans BO  -> </s> -0.314718
> 	scale_log(prob) = -7247
> 	add_trans <s> -> BO  -2.596963
> 	scale_log(prob) = -59800
> 
> new awk:
> 	logscale = 23027
> 	add_trans BO  -> </s> -0.314718
> 	scale_log(prob) = 0
> 	add_trans <s> -> BO  -2.596963
> 	scale_log(prob) = -46054
> 
> Note that I printed the logscale which seems to be correct.
> ...
> I think I found the problem:
> 
> The float log-probs (x) seem to be converted to integers when 
> multiplying them with the logscale:
> 
> function scale_log(x) {
> 	return rint(x * logscale);
> }
> 
> This seems to be related to the locale settings
> http://mail.gnu.org/archive/html/bug-gnu-utils/2002-07/msg00196.html
> 
> If I set LC_ALL="C" in my shell, it also works as expected. So the bad 
> behaviour seems to occur with gawk 3.1.3 AND LC_ALL=""...
> 
> 
> Regards.
> Matthias
> 
> 
> > --Andreas
> > 
> > In message <40475599.9070700 at ei.tum.de>you wrote:
> > 
> >>Hello again,
> >>
> >>forgot to say that I tested this with srilm 1.3.3 and 1.3.1.
> >>
> >>Matthias
> >>
> >>Matthias Thomae wrote:
> >>
> >>>Hello Andreas,
> >>>
> >>>make-ngram-pfsg gives me different results with different versions of 
> >>>gawk. The header and the links are the same, but the weights differ 
> >>>substantially.
> >>>
> >>>I see the old behaviour with gawk 3.1.0 (on debian) and 3.1.1 (on suse), 
> >>>and the differing one with 3.1.3-1 and 3.1.3-2 (on debian). The newly 
> >>>created PFSGs cause some ASR error degradation...
> >>>
> >>>Any clues?
> >>>
> >>>Regards.
> >>>Matthias
> >>
> > 
> > 
> 




More information about the SRILM-User mailing list