- ...corpus.
-
DF frequencies in Switchboard were estimated from a hand-labeled
subset of 60 conversation sides, containing 40,500 words.
The coverage figure takes into account the further limits on
modeled repetitions and utterance-medial deletions described below.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...segments.
- A preliminary version of annotated Switchboard data was made available to
the 1995 Johns Hopkins Language Modeling Workshop; the LDC will release a
final version.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...models.
-
Both baseline and DF models were trained on the same data,
which corresponds to only a portion of the full training corpus.
Therefore, the perplexity figures are higher here than in some of the
comparisons below.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...data.
-
Due to differences in amount of training data and type of segmentation, the
perplexities are not directly comparable to the previous two studies.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.