615 research outputs found
Categorization of interestingness measures for knowledge extraction
Finding interesting association rules is an important and active research
field in data mining. The algorithms of the Apriori family are based on two
rule extraction measures, support and confidence. Although these two measures
have the virtue of being algorithmically fast, they generate a prohibitive
number of rules most of which are redundant and irrelevant. It is therefore
necessary to use further measures which filter uninteresting rules. Many
synthesis studies were then realized on the interestingness measures according
to several points of view. Different reported studies have been carried out to
identify "good" properties of rule extraction measures and these properties
have been assessed on 61 measures. The purpose of this paper is twofold. First
to extend the number of the measures and properties to be studied, in addition
to the formalization of the properties proposed in the literature. Second, in
the light of this formal study, to categorize the studied measures. This paper
leads then to identify categories of measures in order to help the users to
efficiently select an appropriate measure by choosing one or more measure(s)
during the knowledge extraction process. The properties evaluation on the 61
measures has enabled us to identify 7 classes of measures, classes that we
obtained using two different clustering techniques.Comment: 34 pages, 4 figure
Text Analytics for Android Project
Most advanced text analytics and text mining tasks include text classification, text clustering, building ontology, concept/entity extraction, summarization, deriving patterns within the structured data, production of granular taxonomies, sentiment and emotion analysis, document summarization, entity relation modelling, interpretation of the output. Already existing text analytics and text mining cannot develop text material alternatives (perform a multivariant design), perform multiple criteria analysis,
automatically select the most effective variant according to different aspects (citation index of papers (Scopus, ScienceDirect, Google Scholar) and authors (Scopus, ScienceDirect, Google Scholar), Top 25 papers, impact factor of journals, supporting phrases, document name and contents, density of keywords), calculate utility degree and market value. However, the Text Analytics for Android Project can perform the aforementioned functions. To the best of the knowledge herein, these functions have not been previously implemented; thus this is the first attempt to do so. The Text Analytics for Android Project is briefly described in this article
Data mining in soft computing framework: a survey
The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in data-rich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included
Text mining with exploitation of user\u27s background knowledge : discovering novel association rules from text
The goal of text mining is to find interesting and non-trivial patterns or knowledge from unstructured documents. Both objective and subjective measures have been proposed in the literature to evaluate the interestingness of discovered patterns. However, objective measures alone are insufficient because such measures do not consider knowledge and interests of the users. Subjective measures require explicit input of user expectations which is difficult or even impossible to obtain in text mining environments.
This study proposes a user-oriented text-mining framework and applies it to the problem of discovering novel association rules from documents. The developed system, uMining, consists of two major components: a background knowledge developer and a novel association rules miner. The background knowledge developer learns a user\u27s background knowledge by extracting keywords from documents already known to the user (background documents) and developing a concept hierarchy to organize popular keywords. The novel association rule miner discovers association rules among noun phrases extracted from relevant documents (target documents) and compares the rules with the background knowledge to predict the rule novelty to the particular user (useroriented novelty).
The user-oriented novelty measure is defined as the semantic distance between the antecedent and the consequent of a rule in the background knowledge. It consists of two components: occurrence distance and connection distance. The former considers the co-occurrences of two keywords in the background documents: the more the shorter the distance. The latter considers the common connections of with others in the concept hierarchy. It is defined as the length of the connecting the two keywords in the concept hierarchy: the longer the path, distance.
The user-oriented novelty measure is evaluated from two perspectives: novelty prediction accuracy and usefulness indication power. The results show that the useroriented novelty measure outperforms the WordNet novelty measure and the compared objective measures in term of predicting novel rules and identifying useful rules
6 Seconds of Sound and Vision: Creativity in Micro-Videos
The notion of creativity, as opposed to related concepts such as beauty or
interestingness, has not been studied from the perspective of automatic
analysis of multimedia content. Meanwhile, short online videos shared on social
media platforms, or micro-videos, have arisen as a new medium for creative
expression. In this paper we study creative micro-videos in an effort to
understand the features that make a video creative, and to address the problem
of automatic detection of creative content. Defining creative videos as those
that are novel and have aesthetic value, we conduct a crowdsourcing experiment
to create a dataset of over 3,800 micro-videos labelled as creative and
non-creative. We propose a set of computational features that we map to the
components of our definition of creativity, and conduct an analysis to
determine which of these features correlate most with creative video. Finally,
we evaluate a supervised approach to automatically detect creative video, with
promising results, showing that it is necessary to model both aesthetic value
and novelty to achieve optimal classification accuracy.Comment: 8 pages, 1 figures, conference IEEE CVPR 201
Text Clustering and Classification Techniques- A Review
Text classification is the task of automatically sorting a set of documents into categories from a predefined set. Text Classification is a data mining technique used to predict group membership for data instances within a given dataset. It is used for classifying data into different classes by considering some constrains. Instead of traditional feature selection techniques used for text document classification. A Naive Bayesian model is easy to build, with no complicated iterative parameter estimation which makes it particularly useful for very large datasets. Automated Text categorization and class prediction is important for text categorization to reduce the feature size and to speed up the learning process of classifiers
- …