A survey of machine learning approaches to analysis of large corpora

Atwell, ES; Hu, X

unknown

A survey of machine learning approaches to analysis of large corpora

Authors: ES Atwell
X Hu
Publication date: 1 January 2003
Publisher: UCREL, Lancaster University

Abstract

Corpus-based Machine Learning of linguistic annotations has been a key topic for all areas of Natural Language Processing. This paper presents a survey, along three dimensions of classification. First we outline different linguistic level of analysis: Tokenisation, Part-of-Speech tagging, Parsing, Semantic analysis and Discourse annotation. Secondly, we introduce alternative approaches to Machine Learning applicable to linguistic annotation of corpora: N-gram and Markov models, Neural Networks, Transformation-Based Learning, Decision Tree learning, and Vector-based classification. Thirdly, weexamine a range of Machine Learning systems for the most challenging level of linguistic annotation, discourse analysis; these illustrate the various Machine Learning approaches. Our overall aim is to provide an ontology or framework for further development of our research

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

White Rose Research Online

oai:eprints.whiterose.ac.uk:82...

Last time updated on 02/02/2021

White Rose Research Online

oai:eprints.whiterose.ac.uk:82...

Last time updated on 18/02/2015