make-ngram-pfsg: bad results with new gawk version

Matthias Thomae thomae at ei.tum.de
Fri Mar 5 05:26:00 PST 2004


Hi Andreas,

Andreas Stolcke wrote:
> This is quite odd.

I think so, too :)

> make-ngram-pfsg doesn't perform much arithmetic on the log probabilties
> in the LM.  It only scales and rounds them.
 >
> Can you apply the scale_log() function in make-ngram-pfsg to your LM
> probabilties and backoff weights, and extract the cases where the output
> differs?

old awk:
	add_trans BO  -> </s> -0.314718
	scale_log(prob) = -7247
	add_trans <s> -> BO  -2.596963
	scale_log(prob) = -59800

new awk:
	logscale = 23027
	add_trans BO  -> </s> -0.314718
	scale_log(prob) = 0
	add_trans <s> -> BO  -2.596963
	scale_log(prob) = -46054

Note that I printed the logscale which seems to be correct.
...
I think I found the problem:

The float log-probs (x) seem to be converted to integers when 
multiplying them with the logscale:

function scale_log(x) {
	return rint(x * logscale);
}

This seems to be related to the locale settings
http://mail.gnu.org/archive/html/bug-gnu-utils/2002-07/msg00196.html

If I set LC_ALL="C" in my shell, it also works as expected. So the bad 
behaviour seems to occur with gawk 3.1.3 AND LC_ALL=""...


Regards.
Matthias


> --Andreas
> 
> In message <40475599.9070700 at ei.tum.de>you wrote:
> 
>>Hello again,
>>
>>forgot to say that I tested this with srilm 1.3.3 and 1.3.1.
>>
>>Matthias
>>
>>Matthias Thomae wrote:
>>
>>>Hello Andreas,
>>>
>>>make-ngram-pfsg gives me different results with different versions of 
>>>gawk. The header and the links are the same, but the weights differ 
>>>substantially.
>>>
>>>I see the old behaviour with gawk 3.1.0 (on debian) and 3.1.1 (on suse), 
>>>and the differing one with 3.1.3-1 and 3.1.3-2 (on debian). The newly 
>>>created PFSGs cause some ASR error degradation...
>>>
>>>Any clues?
>>>
>>>Regards.
>>>Matthias
>>
> 
> 




More information about the SRILM-User mailing list