Using compression to identify acronyms in text

Bainbridge, David; Witten, Ian H.; Yeates, Stuart

research

Using compression to identify acronyms in text

Authors: David Bainbridge
Ian H. Witten
Stuart Yeates
Publication date: 1 January 2000
Publisher

Abstract

Text mining is about looking for patterns in natural language text, and may be defined as the process of analyzing text to extract information from it for particular purposes. In previous work, we claimed that compression is a key technology for text mining, and backed this up with a study that showed how particular kinds of lexical tokens---names, dates, locations, etc.---can be identified and located in running text, using compression models to provide the leverage necessary to distinguish different token types (Witten et al., 1999)Comment: 10 pages. A short form published in DCC200

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Name not available

oai:researchcommons.waikato.ac...

Last time updated on 16/11/2016

CiteSeerX

oai:CiteSeerX.psu:10.1.1.587.1...

Last time updated on 29/10/2017