212 research outputs found
Text Classification Using Association Rules, Dependency Pruning and Hyperonymization
We present new methods for pruning and enhancing item- sets for text
classification via association rule mining. Pruning methods are based on
dependency syntax and enhancing methods are based on replacing words by their
hyperonyms of various orders. We discuss the impact of these methods, compared
to pruning based on tfidf rank of words.Comment: 16 pages, 2 figures, presented at DMNLP 201
Game analytics - maximizing the value of player data
During the years of the Information Age, technological advances in the computers,
satellites, data transfer, optics, and digital storage has led to the collection of an
immense mass of data on everything from business to astronomy, counting on the
power of digital computing to sort through the amalgam of information and generate meaning from the data. Initially, in the 1970s and 1980s of the previous century,
data were stored on disparate structures and very rapidly became overwhelming. The
initial chaos led to the creation of structured databases and database management
systems to assist with the management of large corpuses of data, and notably, the
effective and efficient retrieval of information from databases. The rise of the database management system increased the already rapid pace of information
gathering.peer-reviewe
First Elements on Knowledge Discovery guided by Domain Knowledge (KDDK)
International audienceIn this paper, we present research trends carried out in the Orpailleur team at Loria, showing how knowledge discovery and knowledge processing may be combined. The knowledge discovery in databases process (KDD) consists in processing a huge volume of data for extracting significant and reusable knowledge units. From a knowledge representation perspective, the KDD process may take advantage of domain knowledge embedded in ontologies relative to the domain of data, leading to the notion of ''knowledge discovery guided by domain knowledge'' or KDDK. The KDDK process is based on the classification process (and its multiple forms), e.g. for modeling, representing, reasoning, and discovering. Some applications are detailed, showing how KDDK can be instantiated in an application domain. Finally, an architecture of an integrated KDDK system is proposed and discussed
Feature Extraction and Duplicate Detection for Text Mining: A Survey
Text mining, also known as Intelligent Text Analysis is an important research area. It is very difficult to focus on the most appropriate information due to the high dimensionality of data. Feature Extraction is one of the important techniques in data reduction to discover the most important features. Proce- ssing massive amount of data stored in a unstructured form is a challenging task. Several pre-processing methods and algo- rithms are needed to extract useful features from huge amount of data. The survey covers different text summarization, classi- fication, clustering methods to discover useful features and also discovering query facets which are multiple groups of words or phrases that explain and summarize the content covered by a query thereby reducing time taken by the user. Dealing with collection of text documents, it is also very important to filter out duplicate data. Once duplicates are deleted, it is recommended to replace the removed duplicates. Hence we also review the literature on duplicate detection and data fusion (remove and replace duplicates).The survey provides existing text mining techniques to extract relevant features, detect duplicates and to replace the duplicate data to get fine grained knowledge to the user
- …