Five of the sentences read by each speaker are also read by six other speakers (for comparability).The remaining three sentences read by each speaker were unique to that speaker (for coverage). You can access its documentation in the usual way, using This gives us a sense of what a speech processing system would have to do in producing or recognizing speech in this particular dialect (New England).The inclusion of speaker demographics brings in many more independent variables, that may help to account for variation in the data, and which facilitate later uses of the corpus for purposes that were not envisaged when the corpus was created, such as sociolinguistics.A third property is that there is a sharp division between the original linguistic event captured as an audio recording, and the annotations of that event.Finally, notice that even though TIMIT is a speech corpus, its transcriptions and associated data are just text, and can be processed using programs just like any other text corpus.Therefore, many of the computational methods described in this book are applicable.
For each of eight dialect regions, 50 male and female speakers having a range of ages and educational backgrounds each read ten carefully chosen sentences.
Moreover, notice that all of the data types included in the TIMIT corpus fall into the two basic categories of lexicon and text, which we will discuss below.
Even the speaker demographics data is just another instance of the lexicon data type.
These are organized into a tree structure, shown schematically in 1.2.
At the top level there is a split between training and testing sets, which gives away its intended use for developing and evaluating statistical models.
As in other chapters, there will be many examples drawn from practical experience managing linguistic data, including data that has been collected in the course of linguistic fieldwork, laboratory work, and web crawling.