Search CORE

4 research outputs found

Extracting information from short messages

Author: C. Cardie
I.-S. Kang
N. Stratica
R. Gaizauskas
R.L. Cooper
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Much currently transmitted information takes the form of e-mails or SMS text messages and so extracting information from such short messages is increasingly important. The words in a message can be partitioned into the syntactic structure, terms from the domain of discourse and the data being transmitted. This paper describes a light-weight Information Extraction component which uses pattern matching to separate the three aspects: the structure is supplied as a template; domain terms are the metadata of a data source (or their synonyms), and data is extracted as those words matching placeholders in the templates

Crossref

Enlighten

A HMM POS Tagger for Micro-blogging Type Texts

Author: A. Ritter
K. Gimpel
L. Barbosa
L. Derczynski
O. Etzioni
R. Cooper
T. Finin
Publication venue: Springer Verlag
Publication date: 01/01/2014
Field of study

The high volume of communication via micro-blogging type messages has created an increased demand for text processing tools customised the unstructured text genre. The available text processing tools developed on structured texts has been shown to deteriorate significantly when used on unstructured, micro-blogging type texts. In this paper, we present the results of testing a HMM based POS (Part-Of-Speech) tagging model customized for unstructured texts. We also evaluated the tagger against published CRF based state-of-the-art POS tagging models customized for Tweet messages using three publicly available Tweet corpora. Finally, we did cross-validation tests with both the taggers by training them on one Tweet corpus and testing them on another one

Crossref

AUT Scholarly Commons