This is a multi-part message in MIME format.
--Boundary_(ID_fEV0I1hR9hYElZh2ehtMGQ)
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT
Hi,
I have the following problem.
The n-gram counts are computed from raw text corpus by using
'ngram-count' and 'ngram-merge'.
I experiment with different vocabularies and bigram and trigram models.
In each experiment I run again 'ngram-count -vocab -order' and make the
language model with ' make-big-lm -trust-totals'.
I test language models on my test set and noticed some mistakes. Some
bigrams, which are present in the bigram model get lost in the trigram
model. When I omit the -trust-totals option, the results are OK.
Why should I not trust the totals in my case? Are the counts of
different orders made by 'ngram-count' and 'ngram-merge' not in line?
Regards,
Mirjam.
--Boundary_(ID_fEV0I1hR9hYElZh2ehtMGQ)
Content-type: text/x-vcard; name=mirjam.sepesy.vcf; charset=us-ascii
Content-transfer-encoding: 7BIT
Content-disposition: attachment; filename=mirjam.sepesy.vcf
Content-description: Card for Mirjam Sepesy Maucec
begin:vcard
n:Sepesy Maucec;Mirjam
x-mozilla-html:FALSE
org:Faculty of Electrical Engineering and Computer Science, Smetanova 17, 2000 Maribor
adr:;;;;;;
version:2.1
email;internet:mirjam.sepesy at ADDRESS HIDDEN
title:PhD
note:Phone: ++386 (0)2 220-7225
x-mozilla-cpt:;7072
fn:Mirjam Sepesy Maucec
end:vcard
--Boundary_(ID_fEV0I1hR9hYElZh2ehtMGQ)--
Click here to go to the SRILM home page.