Location of Repository

An Analysis of Sentence Boundary Detection Systems for English and Portuguese Documents

By Carlos N. Silla Jr and Celso A.A. Kaestner

Abstract

In this paper we present a study comparing the performance of different systems found in the literature that perform the task of automatic text segmentation in sentences for English documents. We also show the difficulties found to adapt these systems to make them work with Portuguese documents and the results obtained after the adaptation. We analyzed two systems that use a machine learning approach: MxTerminator and Satz, and a customized system based on fixed rules expressed by Regular Expressions. The results achieved by the Satz system were surprisingly positive for Portuguese documents

Topics: QA76
Publisher: Springer
Year: 2004
OAI identifier: oai:kar.kent.ac.uk:24119

Suggested articles

Preview

Citations

  1. (1997). A.: A maximum entropy approach to identifying sentence boundaries. In: doi
  2. (1997). Adaptive multilingual sentence boundary disambiguation. doi
  3. (2003). C.A.A.: Automatic sentence detection using regulares expressions (in portuguese). In:
  4. (1993). C4.5: Programs for Machine Learning. doi
  5. (1999). Data Mining: doi
  6. (1997). Machine Learning. doi
  7. (1994). SATZ - an adaptive sentence segmentation system.
  8. (2003). The lacio-web project: overview and issues in brazilian portuguese corpora creation. In:

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.