Location of Repository

Supervisor:

By Oulin Yang and Dr. Scott Sanner

Abstract

Natural Language Processing (NLP) has been an intensively studied subject of Arti cial Intelligence for decades. There are many powerful techniques available nowadays that boost the computer's ability to process human languages, especially in text format. These techniques are commonly applied in machine translation, search engines, querying and information retrieval systems etc. However, in some application areas, there is still great potential of enhancing NLP development through the use of machine learning techniques. This report relates to a new developing topic of recent years extracting times and events from news articles. By following the TimeML annotating speci cation, I developed a Java artifact, which is trained and evaluated on tagged news articles from TimeBank corpus. This is done either through a 'Dictionary Lookup ' baseline algorithm or variants of Naive Bayes algorithm. A comparison between these is presented in the experiment section. The results show that, by applying Tokenization, Part-of-Speech Tagging, self-prediction using TimeML tagging and other techniques, one variant of the Naive Bayes algorithm gives signi cantly better performance in this classi cation task. 2

Year: 2011
OAI identifier: oai:CiteSeerX.psu:10.1.1.187.1189
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://cs.anu.edu.au/students/... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.