Hi,
I'm using disambig for part-of-speech tagging. I create a language model
over sequences of tags with ngram-count, and provide P(word|tag) in the
map file.
What I would like to do is to start with this model, based on tagged
corpus, and improve it using the Baum-Welch (forwad-backward) algorithm,
with untagged corpus. After each iteration I should get a new language
model for the tags and a new map file . After each iteration I would
like to test the model on some held-out data, so I know when to stop.
How can I implement that in SRILM?
Thanks,
Roy.
Click here to go to the SRILM home page.