2 research outputs found

    Extracting Statistical Data Frames from Text

    No full text
    We present a framework that bridges the gap between natural language processing (NLP) and text mining. Central to this is a new approach to text parameterization that captures many interesting attributes of text usually ignored by standard indices, like the term-document matrix. By storing NLP tags, the new index supports a higher degree of knowledge discovery and pattern finding from text. The index is relatively compact, enabling dynamic search of arbitrary relationships and events in large document collections. We can export search results in formats and data structures that are transparent to statistical analysis tools like S-PLUS®. In a number of experiments, we demonstrate how this framework can turn mountains of unstructured information into informative statistical graphs
    corecore