Search CORE

2 research outputs found

Extracting Statistical Data Frames from Text

Author: Giovanni Marchisio
Jisheng Liang
Krzysztof Koperski
Thien Nguyen
Publication venue
Publication date: 01/01/2005
Field of study

We present a framework that bridges the gap between natural language processing (NLP) and text mining. Central to this is a new approach to text parameterization that captures many interesting attributes of text usually ignored by standard indices, like the term-document matrix. By storing NLP tags, the new index supports a higher degree of knowledge discovery and pattern finding from text. The index is relatively compact, enabling dynamic search of arbitrary relationships and events in large document collections. We can export search results in formats and data structures that are transparent to statistical analysis tools like S-PLUS®. In a number of experiments, we demonstrate how this framework can turn mountains of unstructured information into informative statistical graphs

CiteSeerX

Extracting statistical data frames from text

Author: Cohen K. B.
Giovanni Marchisio
Girvan M
Jisheng Liang
Krzysztof Koperski
Manning C. D.
Marchisio G.
Marchisio G.
Montes
Proceedings
Tan A. H.
Thien Nguyen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref