Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Re: converting ngram format model to AT&T FSM format

From: Andreas Stolcke <stolcke at ADDRESS HIDDEN>
Date: Tue, 13 Sep 2005 18:50:30 PDT

In message <20050909032901.81882.qmail at ADDRESS HIDDEN>you wrote:
> Hi,
> I'm trying to convert an n-gram model (e.g., a.lm) into AT&T FSM format.
> I have first used make-ngram-pfsg (e.g., make-ngram-pfsg a.lm > a.pfsg), then
>  I used pfsg-to-fsm (e.g., pfsg-fsm a.pfsg > a.fsm). I have some questions re
> garding the interpretation of the transition probabilities and labels:
> 1. words are represented as themselves in the n-gram format, but in the FSM f
> ormat model, the transitions seem to have an index. Which word is represented
>  with which index? Can it be extracted from the order of the unigrams in the
> ngram format file? Is 0 representing an epsilon?

Use

pfsg-to-fsm symbolfile=FILE

to dump the index-to-word mapping to FILE.  FILE can then be used with the
FSM tool options -i and -o (this is explained in the pfsg-scripts man page).

> 2. Are the transition probabilities -10000.5*logprobs?

They are, because that's what make-ngram-pfsg outputs, and pfsg-to-fsm doesn't
change the scaling except changing the sign.  But you can undo this scaling
by using the

pfsg-to-fsm scale=S

option and setting S=1/-10000.5.   Note this will give you back log-base-10,
not log-base-e.

> 3. What do the state potentials represent?

They are the costs of ending a path in a given state.
I don't think they're used in the encoding of PFSGs.

> Also, is there a better way of doing these?

Probably, but not in SRILM ;-)

--Andreas

Click here to go to the SRILM home page.