Skip to main content
Article thumbnail
Location of Repository

Genres in formation? An exploratory study of web pages using cluster analysis

By M. Santini


The Web is a new, large and heterogeneous community where the interaction among the users and the possibility offered by technology may modify existing genres or create new ones. In fact, most genres being borrowed from the paper world have undergone adjustments when moving on to the Web (for instance, online newspapers and online manuals). Also, there is a \ud family of genres, which have been created specifically for the Web, e.g. home pages, splash screens, newsletters, hotlists. Besides these, are there other emerging genres on the Web for which a genre label has not been coined \ud yet? Is it possible to capture genres in formation in an automated way? An experiment using cluster analysis has been set up to provide initial answers to these questions. Results show that the main clusters have a shape which is \ud quite well-defined and show a number of regularities. Interestingly, Web pages appear to have been clustered according to their rhetorical/discoursal types (informational, instructional, argumentative, etc.), rather than genre classes (e.g. sermons and editorials, both argumentative, belong to the same cluster). The perception of rhetorical/discoursal types in Web pages \ud has been confirmed by a small-scale Web user study

Topics: G000 Computing and Mathematical Sciences
Year: 2005
OAI identifier:

Suggested articles


  1. (1997). A nonprojective dependency parser”, doi
  2. (2004). A Shallow Approach To Syntactic Feature Extraction For Genre Classification",
  3. (1976). A Text Grammar of English, Quelle & Meyer,
  4. (1999). An Empirical Comparison of Four Initialization Methods for the K-Means Algorithm", doi
  5. (1996). Assessing agreement on classification tasks: the kappa statistic",
  6. (1973). Cluster Analysis for Application, doi
  7. (2001). Genre Based Navigation on the Web", doi
  8. (2000). Genres and the Web: is the personal home page the first uniquely digital genre?”, doi
  9. (2002). Grammatical word class variation within the British National Corpus Sampler",
  10. (1997). Reproduced and emergent genres of communication on the World-Wide Web" doi
  11. (1983). Rhetorical theory and readers' classification of text types", doi
  12. (1998). Routing documents according to style",
  13. (2000). Text Genre Detection Using Common Word Frequencies", doi
  14. (1999). The Effects of Linking on Genres of Web Documents, doi
  15. (1998). The Evolution of Cybergenre",
  16. (1999). The Functionality Attribute of Cybergenres", doi
  17. (1998). The Impact of Corpus Size on Question Answering Performance", doi
  18. (1988). Variations across speech and writing, doi

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.