From nbassiou at aiia.csd.auth.gr  Tue Jul  1 00:53:47 2008
From: nbassiou at aiia.csd.auth.gr (Nikoletta Bassiou)
Date: Tue, 1 Jul 2008 10:53:47 +0300
Subject: Class n-grams
Message-ID: <001a01c8db4f$9b2c9a30$1904cf9b@aiia.csd.auth.gr>

I would like to build a class trigram using ngram-class but according to the documentation only class bigram is implemented. 
If this is true, do you know any other way I can build a class trigram? Is there an improvision for extending ngram-class for higher order n-grams (n>3)?

Nikoletta
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20080701/80766562/attachment.html>

From stolcke at speech.sri.com  Tue Jul  1 09:25:22 2008
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Tue, 01 Jul 2008 09:25:22 -0700
Subject: Class n-grams
In-Reply-To: <001a01c8db4f$9b2c9a30$1904cf9b@aiia.csd.auth.gr>
References: <001a01c8db4f$9b2c9a30$1904cf9b@aiia.csd.auth.gr>
Message-ID: <486A5A72.7000500@speech.sri.com>

Nikoletta Bassiou wrote:
> I would like to build a class trigram using ngram-class but according 
> to the documentation only class bigram is implemented.
> If this is true, do you know any other way I can build a class 
> trigram? Is there an improvision for extending ngram-class for higher 
> order n-grams (n>3)?
>  
> Nikoletta
The bigram restriction only applies to the statistics used to learn the 
word classes.  Once you have the classes you can apply them to your text 
and build an ngram of any order.

Andreas


From stolcke at speech.sri.com  Thu Jul  3 22:17:59 2008
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Thu, 03 Jul 2008 22:17:59 -0700
Subject: Class n-grams
In-Reply-To: <20080702111233.9779bb00@cronus.aiia.csd.auth.gr>
References: <20080702111233.9779bb00@cronus.aiia.csd.auth.gr>
Message-ID: <486DB287.7080805@speech.sri.com>

Basiou Nikoletta wrote:
> Dear Andreas,
>  
> thanks a lot for your answer. Actually, i want to build the classes 
> from trigram statistics/counts. Is there any improvision for such an 
> implementation in the near future or there are restrictions due to 
> higher memory and process requirements?
It would take a lot longer and is currently not implemented. 
I vaguely recall a paper by Herman Ney and colleagues many years ago 
showing that inducing classes based on higher-order statistics doesn't 
buy that much
(i.e., it is sufficient to learn the classes using bigram statistics, 
and then use them in higher-order class-based models).

Andreas

>  
> Looking forward for your answer,
> Nikoletta
>
>     ------------------------------------------------------------------------
>     *From:* Andreas Stolcke [mailto:stolcke at speech.sri.com]
>     *To:* Nikoletta Bassiou [mailto:nbassiou at aiia.csd.auth.gr]
>     *Cc:* srilm-user at speech.sri.com
>     *Sent:* Tue, 01 Jul 2008 19:25:22 +0300
>     *Subject:* Re: Class n-grams
>
>     Nikoletta Bassiou wrote:
>     > I would like to build a class trigram using ngram-class but
>     according
>     > to the documentation only class bigram is implemented.
>     > If this is true, do you know any other way I can build a class
>     > trigram? Is there an improvision for extending ngram-class for
>     higher
>     > order n-grams (n>3)?
>     >
>     > Nikoletta
>     The bigram restriction only applies to the statistics used to
>     learn the
>     word classes. Once you have the classes you can apply them to your
>     text
>     and build an ngram of any order.
>
>     Andreas
>
>
>  
>  


From marco.turchi at gmail.com  Tue Jul 29 18:03:40 2008
From: marco.turchi at gmail.com (marco turchi)
Date: Wed, 30 Jul 2008 02:03:40 +0100
Subject: strange symbols
Message-ID: <79a042480807291803w44eb15c8ic8d4c4ef4a0e8182@mail.gmail.com>

Dear all,
I'm using srilm on some data crawled from the Web. The lm contains some
strange symbols as these:
\1-grams:
-6.774207       ^A      0
-6.774207       ^C
-6.774207       ^D
-6.774207       ^E      0
-6.774207       ^F      0
-6.774207       ^G      0
-6.774207       ^H      0
-6.774207       ^K      0
-6.774207       ^N      0
-6.774207       ^O
-6.774207       ^P
-6.774207       ^T      0
-6.774207       ^X
-6.774207       ^Y      0
-6.774207       ^\
-6.774207       ^]
-6.774207       ^^      0
-6.774207       ^_

these symbols are not the simple combination of ^ and a letter but it seems
to be something different as a character that has been truncated or
something similar.
Do u have an idea what they are and how to remove them?

thanks a lot
Marco
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20080730/31a4387e/attachment.html>

From stolcke at speech.sri.com  Tue Jul 29 23:27:13 2008
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Tue, 29 Jul 2008 23:27:13 PDT
Subject: strange symbols 
In-Reply-To: Your message of Wed, 30 Jul 2008 02:03:40 +0100.
             <79a042480807291803w44eb15c8ic8d4c4ef4a0e8182@mail.gmail.com> 
Message-ID: <200807300627.m6U6RDb17302@huge>


They look like ASCII control characters (character values < 0x20).
You need to do a better job filtering your training data.

--Andreas

In message <79a042480807291803w44eb15c8ic8d4c4ef4a0e8182 at mail.gmail.com>you wro
te:
> 
> Dear all,
> I'm using srilm on some data crawled from the Web. The lm contains some
> strange symbols as these:
> \1-grams:
> -6.774207       ^A      0
> -6.774207       ^C
> -6.774207       ^D
> -6.774207       ^E      0
> -6.774207       ^F      0
> -6.774207       ^G      0
> -6.774207       ^H      0
> -6.774207       ^K      0
> -6.774207       ^N      0
> -6.774207       ^O
> -6.774207       ^P
> -6.774207       ^T      0
> -6.774207       ^X
> -6.774207       ^Y      0
> -6.774207       ^\
> -6.774207       ^]
> -6.774207       ^^      0
> -6.774207       ^_
> 
> these symbols are not the simple combination of ^ and a letter but it seems
> to be something different as a character that has been truncated or
> something similar.
> Do u have an idea what they are and how to remove them?
> 
> thanks a lot
> Marco
> 
> ------=_Part_45139_5077409.1217379820193
> Content-Type: text/html; charset=ISO-8859-1
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
> 
> <div dir="ltr">Dear all, <br>I&#39;m using srilm on some data crawled from th
> e Web. The lm contains some strange symbols as these:<br>\1-grams:<br>-6.7742
> 07&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ^A&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0<br>
> -6.774207&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ^C<br>-6.774207&nbsp;&nbsp;&nbs
> p;&nbsp;&nbsp;&nbsp; ^D<br>-6.774207&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ^E&n
> bsp;&nbsp;&nbsp;&nbsp;&nbsp; 0<br>
> -6.774207&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ^F&nbsp;&nbsp;&nbsp;&nbsp;&nbsp
> ; 0<br>-6.774207&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ^G&nbsp;&nbsp;&nbsp;&nbs
> p;&nbsp; 0<br>-6.774207&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ^H&nbsp;&nbsp;&nb
> sp;&nbsp;&nbsp; 0<br>-6.774207&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ^K&nbsp;&n
> bsp;&nbsp;&nbsp;&nbsp; 0<br>-6.774207&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ^N&
> nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0<br>-6.774207&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb
> sp; ^O<br>-6.774207&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ^P<br>-6.774207&nbsp;
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ^T&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0<br>-6.77420
> 7&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ^X<br>
> -6.774207&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ^Y&nbsp;&nbsp;&nbsp;&nbsp;&nbsp
> ; 0<br>-6.774207&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ^\<br>-6.774207&nbsp;&nb
> sp;&nbsp;&nbsp;&nbsp;&nbsp; ^]<br>-6.774207&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs
> p; ^^&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0<br>-6.774207&nbsp;&nbsp;&nbsp;&nbsp;&nb
> sp;&nbsp; ^_<br><br>these symbols are not the simple combination of ^ and a l
> etter but it seems to be something different as a character that has been tru
> ncated or something similar.<br>
> Do u have an idea what they are and how to remove them?<br><br>thanks a lot<b
> r>Marco<br></div>
> 
> ------=_Part_45139_5077409.1217379820193--


From stolcke at speech.sri.com  Thu Aug 14 13:16:18 2008
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Thu, 14 Aug 2008 13:16:18 -0700
Subject: a naive question need your help
In-Reply-To: <fc5871f0808140421r6787cf65sc3f3828924aba73f@mail.gmail.com>
References: <fc5871f0808140421r6787cf65sc3f3828924aba73f@mail.gmail.com>
Message-ID: <48A49292.6020603@speech.sri.com>

jian zhu wrote:
> Hi professor stolcke:
>     I am a computer programmer from China. Thanks a lot for your great
> work on language model, and unselfishly sharing the perfect slm
> tookit!
>
>     I have a naive question need your help.
>     I want to use "disambig" tool for part-of-speech tagging, but I
> have some trouble
> with it.
>     I use the tool as following:
>     disambig -text file -map wtfile -lm ttfile
>
>     file      ---   word text
>     wtfile   ---   P(word|tag2) emit file
>     ttfile    ---    P(tag2|tag1) transit file
>
>     ttfile can be trained using "ngram-count" tool, but i don't know
> how i can get
>     wtfile, i don't know how i can get this file by using srilm.
>
>     it's format is as following:
>     -map file
>    Specifies the file containing the V1-to-V2 mapping information.
> Each line of file contains the mapping for a single word in V1:
> 	w1	w21 [p21] w22 [p22] ...
>
>      where w1 is a word from V1, which has possible mappings w21, w22,
> ... from V2. Optionally, each of these can be followed by a numeric
> string for the probability p21, which defaults to 1. The number is
> used as the conditional probability P(w1|w21), but the program does
> not depend on these numbers being properly normalized.
>
>     Thank you very much!
>      Looking forward for your help.
>   
There is no ready-made tool for estimating and formatting the map 
probabilities.  It is such a simple format that you should be able to 
write a perl script or similar to estimate these probabilities from 
data.  Note that for taggers it is usually more convenient to construct 
the map file with probabilities p(w21 | w1) and use the -scale option.
To estimate p(POS | word) you can count occurrences in a tagged training 
corpus (possibly with some smoothing to allow for unseen combinations 
(for unseen words and open-class POS classes).   In the absence of 
training data you can try a uniform POS distribution.

I know that people have built POS taggers with SRILM.  I suggest that 
you direct further questions to the srilm-user mailing list.

Andreas

> Best Regards
> jianzhu
> 2008-08-14
>   


From liuchangliang at hccl.ioa.ac.cn  Tue Aug 26 19:31:45 2008
From: liuchangliang at hccl.ioa.ac.cn (liuchangliang)
Date: Wed, 27 Aug 2008 10:31:45 +0800
Subject: A question about  Lattice::LatticeWER( )
Message-ID: <001301c907ed$0c746cd0$255d4670$@ioa.ac.cn>

  Hi:

 
         I use lattice-tool to compute the lattice WER. In the result, the
insertion error is always very high. 

In the source code of function Lattice::LatticeWER( ), there is a sentence:

         * NOTE: since we process nodes in topological order this

    * will allow chains of multiple insertions.

 
    I don't know what this sentence mean? Does that mean the insertion error
of the result is not reliable ?

 
Thanks

chliu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20080827/a6203d46/attachment.html>

From stolcke at speech.sri.com  Tue Aug 26 20:34:28 2008
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Tue, 26 Aug 2008 20:34:28 -0700
Subject: A question about  Lattice::LatticeWER( )
In-Reply-To: <001301c907ed$0c746cd0$255d4670$@ioa.ac.cn>
References: <001301c907ed$0c746cd0$255d4670$@ioa.ac.cn>
Message-ID: <48B4CB44.40803@speech.sri.com>

liuchangliang wrote:
>
> Hi:
>
> I use lattice-tool to compute the lattice WER. In the result, the 
> insertion error is always very high.
>
> In the source code of function Lattice::LatticeWER( ), there is a 
> sentence:
>
> * NOTE: since we process nodes in topological order this
>
> * will allow chains of multiple insertions.
>
> I don?t know what this sentence mean? Does that mean the insertion 
> error of the result is not reliable ?
>
No. It is a comment regarding the workings of the algorithm that aligns 
a word string to the lattice, and that topological order is required for 
correct computation of insertions.

Andreas


From stolcke at speech.sri.com  Fri Sep  5 07:48:34 2008
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Fri, 05 Sep 2008 07:48:34 -0700
Subject: -read-google
In-Reply-To: <01ab01c90f53$415c25b0$ed1610ac@selena>
References: <01ab01c90f53$415c25b0$ed1610ac@selena>
Message-ID: <48C146C2.1090405@speech.sri.com>

Mirjam Sepesy Mau?ec wrote:
> Hi,
>  
> I have my counts in Google directory structure (by make-google-ngrams).
> I would like to use make-big-lm (bacause ngram-count runs out of memory),
> but the script expects the switch -read (not -read-google)?
Mirjam,

I believe this mailing list is meant for users of the CMU-Cambridge SLM 
toolkit, but your question is obviously about SRILM.
Please join the srilm-user mailing list and ask your SRILM questions 
there (see http://www.speech.sri.com/projects/srilm/#srilm-user for 
instructions).

Regarding your question:  make-big-lm does not support the -read-google 
option because its approach is incompatible with the google directory 
structure.
However, you could enumerate all the count files under the google 
directory, prepend "-read" to each, and give that long string of 
arguments to make-big-lm.

    make-big-lm `find  /path/to/google-ngrams/data -name \*.gz  \! -name 
\*_cs.gz  | xargs -n 1 echo "-read" `  other-options ....

assuming your OS allows command lines this long.

Andreas

>  
> Thanks,
> Mirjam


From debond at gmx.net  Mon Sep  8 05:39:43 2008
From: debond at gmx.net (Christine de Bond)
Date: Mon, 08 Sep 2008 14:39:43 +0200
Subject: No subject
Message-ID: <20080908123943.146280@gmx.net>

Hello,

I tried out:

ngram-count   -write-vocab vocab.txt   -text input.txt

and in the resulting file there is an entry " -pau- " which is not in my input.txt.
Does anybody know where this pau comes from and what it means?

Best regards,
Christine
-- 
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten 
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser


From stolcke at speech.sri.com  Mon Sep  8 09:50:11 2008
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Mon, 08 Sep 2008 09:50:11 -0700
Subject: 
In-Reply-To: <20080908123943.146280@gmx.net>
References: <20080908123943.146280@gmx.net>
Message-ID: <48C557C3.8060203@speech.sri.com>

Christine de Bond wrote:
> Hello,
>
> I tried out:
>
> ngram-count   -write-vocab vocab.txt   -text input.txt
>
> and in the resulting file there is an entry " -pau- " which is not in my input.txt.
> Does anybody know where this pau comes from and what it means?
>   
It's a predefine vocabulary item used to represent nonspeech (eg., in 
lattices).  
This word does not take up any probability mass so it doesn't interfere 
with the LM building.

Andreas

> Best regards,
> Christine
>   


From mirjam.sepesy at uni-mb.si  Thu Sep 11 03:27:12 2008
From: mirjam.sepesy at uni-mb.si (Mirjam Sepesy Maucec)
Date: Thu, 11 Sep 2008 12:27:12 +0200
Subject: Fw: GT coefficients
Message-ID: <0a4201c913f8$f4221020$ed1610ac@selena>

Hi,

I found an old question and no answer (in the SRI-LM Mailing List Archive) . I attach it!
I tackle the same problem:
When I convert decimal ,  (comma) into a . (dot) in discount files, warnings disappear...
Discount files were produced by make-big-lm script.

Best,

Mirjam

----- Original Message ----- 
From: ilya oparin 
To: srilm-list 
Sent: Sunday, June 11, 2006 3:05 PM
Subject: GT coefficients


Hello!

If I count GT coefficients in advance and then feed GT-files (generated by make-gt-discounts) to ngram-count or make-big-lm, I get warnings of the kind

file.gt1: line 9: warning: discount coefficient 1 = 0.0
file.gt1: line 9: warning: discount coefficient 2 = 0.0
...

and so on for all the gt parameters. Files themselves are alright and do not contain any zeroes. Number next to line corresponds to the last line in a gt-file. 
The model I get with this differs from that I get when just use ngram-count without loading GT coefficients (it appears much smaller in bigrams and trigrams) with the same gtmin and gtmax values. 
Could anybody tell me why it happens like this? 


best regards,
Ilya
Send instant messages to your online friends http://uk.messenger.yahoo.com 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20080911/57bbf2de/attachment.html>

From stolcke at speech.sri.com  Thu Sep 11 03:57:43 2008
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Thu, 11 Sep 2008 03:57:43 -0700
Subject: Fw: GT coefficients
In-Reply-To: <0a4201c913f8$f4221020$ed1610ac@selena>
References: <0a4201c913f8$f4221020$ed1610ac@selena>
Message-ID: <48C8F9A7.9020903@speech.sri.com>

Mirjam Sepesy Maucec wrote:
> Hi,
>  
> I found an old question and no answer (in the SRI-LM Mailing List 
> Archive) . I attach it!
> I tackle the same problem:
> When I convert decimal ,  (comma) into a . (dot) in discount files, 
> warnings disappear...
> Discount files were produced by make-big-lm script.
I decimal numbers in the discount files apear with commas instead of 
decimal points that's almost certainly a locale setting issue.  The 
CHANGES file has the following :

        * Matthias Thomae <thomae at ei.tum.de> found that make-ngram-pfsg
        (and probably other gawk scripts) may not work correctly with recent
        versions of gawk unless the environment is set to LC_NUMERIC=C.

Note that the gt files are computed by gawk scripts.

What I can do is set  LC_NUMERIC=C in make-big-lm to avoid the problem 
in most common cases.

Andreas

 
>  
> Best,
>  
> Mirjam
>  
> ----- Original Message -----
> *From:* ilya oparin <mailto:ioparin at yahoo.co.uk>
> *To:* srilm-list <mailto:srilm-user at speech.sri.com>
> *Sent:* Sunday, June 11, 2006 3:05 PM
> *Subject:* GT coefficients
>
> Hello!
>
> If I count GT coefficients in advance and then feed GT-files 
> (generated by make-gt-discounts) to ngram-count or make-big-lm, I get 
> warnings of the kind
>
> file.gt1: line 9: warning: discount coefficient 1 = 0.0
> file.gt1: line 9: warning: discount coefficient 2 = 0.0
> ...
>
> and so on for all the gt parameters. Files themselves are alright and 
> do not contain any zeroes. Number next to line corresponds to the last 
> line in a gt-file.
> The model I get with this differs from that I get when just use 
> ngram-count without loading GT coefficients (it appears much smaller 
> in bigrams and trigrams) with the same gtmin and gtmax values.
> Could anybody tell me why it happens like this?
>
>
> best regards,
> Ilya
>
> Send instant messages to your online friends 
> http://uk.messenger.yahoo.com
>


From stolcke at speech.sri.com  Thu Sep 25 12:57:36 2008
From: stolcke at speech.sri.com (Andreas Stolcke)
Date: Thu, 25 Sep 2008 12:57:36 -0700
Subject: Please advise
In-Reply-To: <648456300809220236m5695fe43me23b14d83d0d086d@mail.gmail.com>
References: <648456300809220236m5695fe43me23b14d83d0d086d@mail.gmail.com>
Message-ID: <48DBED30.1010709@speech.sri.com>

Nisha Yadav wrote:
> Hi,
>
> I am a new user of srilm toolkit and have been using the same to 
> generate some language model. I will be grateful to have your advice 
> regarding the following.
>
> 1) While assigning backoff probabilities <s> is assigned a very small 
> probability i.e. 1E-99 but </s> is assigned a non-zero probability 
> 0.181. That is to say in the output lm file I can see the following 
> entries for <s> and </s>
>
> -0.7421436    </s>   
> -99                <s>    -0.3938685
>
> Can you please explain why is srilm doing this?
that's because an LM never needs to predict the beginning-of-sentence 
token, only the end-of-sentence.  The -99 is just a dummy entry to 
satisfy the LM format.
>
> 2) For perplexity calculation, ppl command outputs 2 values ppl and 
> ppl1. Which of these these two is to be taken into account to compare 
> the model performance generated by 2-order, 3-order...ngrams and so on?
Please use the FAQ first for questions about SRILM.  You will find the 
answer in
http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.7.html .
When you cannot find the answer send email to srilm-user at speech.sri.com 
(you need to join the mailing list first).
>
> 3) How much significance can be attached to these values when the 
> difference between them is relatively small or lies in the first digit 
> after decimal. That is to say if the perplexity value (ppl) for the 
> language models for 1-gram, 2-gram, 3-gram etc. are
>
> for n = 1, 68.17368,
> for n = 2, 26.52578,
> for n = 3, 26.61326,
> for n = 4, 25.89838,
> for n = 5, 25.89838,
>
> can we say that the model performance is better with n = 4 in 
> comparison to n = 3 and 2 based on these values? Please note that the 
> size of our corpus is not very large, approximately 8000 tokens. 
> Thanks in advance,
It looks like n=4  is better but obviously not by much.  Whether 
difference matters depends on your application (like MT, ASR, etc.).

Andreas


From dmitry.kan at gmail.com  Sat Sep 27 06:34:24 2008
From: dmitry.kan at gmail.com (Dmitry Kan)
Date: Sat, 27 Sep 2008 16:34:24 +0300
Subject: Visualization
Message-ID: <9a4d1d60809270634s4c1fa548s2439571bdd95a448@mail.gmail.com>

Hello list,

I was just wondering are there any visualization tools available for
having some diagram (with statistical information for example) of a
produced language model?

-- 
Regards,
Dmitry Kan


From gelbart at icsi.berkeley.edu  Mon Sep 29 11:39:50 2008
From: gelbart at icsi.berkeley.edu (David Gelbart)
Date: Mon, 29 Sep 2008 11:39:50 -0700 (PDT)
Subject: Language model visualization
In-Reply-To: <9a4d1d60809270634s4c1fa548s2439571bdd95a448@mail.gmail.com>
References: <9a4d1d60809270634s4c1fa548s2439571bdd95a448@mail.gmail.com>
Message-ID: <Pine.LNX.4.63.0809291137260.4904@lamb.ICSI.Berkeley.EDU>

Hi Dmitry,

Here is an example of language model visualization that might be of 
interest:

http://www.chrisharrison.net/projects/trigramviz/index.html

This is a similar tree-style visualization, in this case not 
statistical but it might give you some ideas:

http://services.alphaworks.ibm.com/manyeyes/page/Word_Tree.html

On Sat, 27 Sep 2008, Dmitry Kan wrote:

> Hello list,
>
> I was just wondering are there any visualization tools available for
> having some diagram (with statistical information for example) of a
> produced language model?
>
>