761,805 research outputs found
Adaptive text mining: Inferring structure from sequences
Text mining is about inferring structure from sequences representing natural language text, and may be defined as the process of analyzing text to extract information that is useful for particular purposes. Although hand-crafted heuristics are a common practical approach for extracting information from text, a general, and generalizable, approach requires adaptive techniques. This paper studies the way in which the adaptive techniques used in text compression can be applied to text mining. It develops several examples: extraction of hierarchical phrase structures from text, identification of keyphrases in documents, locating proper names and quantities of interest in a piece of text, text categorization, word segmentation, acronym extraction, and structure recognition. We conclude that compression forms a sound unifying principle that allows many text mining problems to be tacked adaptively
Extraction of Semantic Relations from Wikipedia Text Corpus
This paper proposes the algorithm for automatic extraction of semantic relations using the rule-based approach. The authors suggest identifying certain verbs (predicates) between a subject and an object of expressions to obtain a sequence of semantic relations in the designed text corpus of Wikipedia articles. The synsets from WordNet are applied to extract semantic relations between concepts and their synonyms from the text corpus
FrameNet CNL: a Knowledge Representation and Information Extraction Language
The paper presents a FrameNet-based information extraction and knowledge
representation framework, called FrameNet-CNL. The framework is used on natural
language documents and represents the extracted knowledge in a tailor-made
Frame-ontology from which unambiguous FrameNet-CNL paraphrase text can be
generated automatically in multiple languages. This approach brings together
the fields of information extraction and CNL, because a source text can be
considered belonging to FrameNet-CNL, if information extraction parser produces
the correct knowledge representation as a result. We describe a
state-of-the-art information extraction parser used by a national news agency
and speculate that FrameNet-CNL eventually could shape the natural language
subset used for writing the newswire articles.Comment: CNL-2014 camera-ready version. The final publication is available at
link.springer.co
- …
