...corpus.
DF frequencies in Switchboard were estimated from a hand-labeled subset of 60 conversation sides, containing 40,500 words. The coverage figure takes into account the further limits on modeled repetitions and utterance-medial deletions described below.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...segments.
 A preliminary version of annotated Switchboard data was made available to the 1995 Johns Hopkins Language Modeling Workshop; the LDC will release a final version.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...models.
Both baseline and DF models were trained on the same data, which corresponds to only a portion of the full training corpus. Therefore, the perplexity figures are higher here than in some of the comparisons below.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...data.
Due to differences in amount of training data and type of segmentation, the perplexities are not directly comparable to the previous two studies.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Andreas Stolcke
Fri Jun 28 19:31:43 PDT 1996