Skip to main content
Article thumbnail
Location of Repository

What linguists always wanted to know about german and did not know how to estimate

By Erhard Hinrichs and Sandra Kübler


This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogues, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper die tageszeitung´(taz). The approach can be used more generally as a means of distinguishing and classifying language corpora of different genres

Topics: ddc:400
Year: 2006
OAI identifier:

Suggested articles


  1. (2005). A Finite-State Approach to Shallow Parsing and Grammatical Functions Annotation of German.
  2. (1989). A typology of English texts. doi
  3. (2002). Annotating topological fields and chunks– and revising POS tags at the same time. doi
  4. (1986). Der Begriff “Mittelfeld”, Anmerkungen über die Theorie der topologischen Felder.
  5. (1995). Guidelines für das Tagging deutscher Textkorpora mit STTS. Unpublished technical Report.
  6. (2003). Mehrfache Vorfeldbesetzung. doi
  7. (2000). Stylebook for the German Treebank in Verbmobil. doi
  8. (2003). Stylebook for the Tübingen Treebank of Written German (TüBa-D/Z). Seminar für Sprachwissenschaft, Universität Tübingen. Contact information: Erhard W.
  9. (1988). Variation across Speech and Writing. Cambridge: doi

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.