8,857 research outputs found
Attribute oriented induction with star schema
This paper will propose a novel star schema attribute induction as a new
attribute induction paradigm and as improving from current attribute oriented
induction. A novel star schema attribute induction will be examined with
current attribute oriented induction based on characteristic rule and using non
rule based concept hierarchy by implementing both of approaches. In novel star
schema attribute induction some improvements have been implemented like
elimination threshold number as maximum tuples control for generalization
result, there is no ANY as the most general concept, replacement the role
concept hierarchy with concept tree, simplification for the generalization
strategy steps and elimination attribute oriented induction algorithm. Novel
star schema attribute induction is more powerful than the current attribute
oriented induction since can produce small number final generalization tuples
and there is no ANY in the results.Comment: 23 Pages, IJDM
Post-processing of association rules.
In this paper, we situate and motivate the need for a post-processing phase to the association rule mining algorithm when plugged into the knowledge discovery in databases process. Major research effort has already been devoted to optimising the initially proposed mining algorithms. When it comes to effectively extrapolating the most interesting knowledge nuggets from the standard output of these algorithms, one is faced with an extreme challenge, since it is not uncommon to be confronted with a vast amount of association rules after running the algorithms. The sheer multitude of generated rules often clouds the perception of the interpreters. Rightful assessment of the usefulness of the generated output introduces the need to effectively deal with different forms of data redundancy and data being plainly uninteresting. In order to do so, we will give a tentative overview of some of the main post-processing tasks, taking into account the efforts that have already been reported in the literature.
Open issues in semantic query optimization in relational DBMS
After two decades of research into Semantic Query Optimization (SQO) there is clear agreement as to the efficacy of SQO. However, although there are some experimental implementations there are still no commercial implementations. We
first present a thorough analysis of research into SQO. We identify three problems which inhibit the effective use of SQO in Relational Database Management Systems(RDBMS). We then propose solutions to these problems and describe first steps towards the implementation of an effective semantic query optimizer for relational databases
Data Mining in a Multidimensional Environment
Data Mining and Data Warehousing are two hot topics in the database research area. Until recently, conventional data mining algorithms were primarily developed for a relational environment. But a data warehouse database is based on a multidimensional model. In our paper we apply this basis for a seamless integration of data mining in the multidimensional model for the example of discovering association rules. Furthermore, we propose this method as a userguided technique because of the clear structure both of model and data. We present both the theoretical basis and efficient algorithms for data mining in the multidimensional data model. Our approach uses directly the requirements of dimensions, classifications and sparsity of the cube. Additionally we give heuristics for optimizing the search for rules
Context Based Visual Content Verification
In this paper the intermediary visual content verification method based on
multi-level co-occurrences is studied. The co-occurrence statistics are in
general used to determine relational properties between objects based on
information collected from data. As such these measures are heavily subject to
relative number of occurrences and give only limited amount of accuracy when
predicting objects in real world. In order to improve the accuracy of this
method in the verification task, we include the context information such as
location, type of environment etc. In order to train our model we provide new
annotated dataset the Advanced Attribute VOC (AAVOC) that contains additional
properties of the image. We show that the usage of context greatly improve the
accuracy of verification with up to 16% improvement.Comment: 6 pages, 6 Figures, Published in Proceedings of the Information and
Digital Technology Conference, 201
Subgroup Discovery: Real-World Applications
Subgroup discovery is a data mining technique which extracts interesting rules with respect
to a target variable. An important characteristic of this task is the combination of predictive
and descriptive induction. In this paper, an overview about subgroup discovery is performed.
In addition, di erent real-world applications solved through evolutionary algorithms where the
suitability and potential of this type of algorithms for the development of subgroup discovery
algorithms are presented
Monitoring land use changes using geo-information : possibilities, methods and adapted techniques
Monitoring land use with geographical databases is widely used in decision-making. This report presents the possibilities, methods and adapted techniques using geo-information in monitoring land use changes. The municipality of Soest was chosen as study area and three national land use databases, viz. Top10Vector, CBS land use statistics and LGN, were used. The restrictions of geo-information for monitoring land use changes are indicated. New methods and adapted techniques improve the monitoring result considerably. Providers of geo-information, however, should coordinate on update frequencies, semantic content and spatial resolution to allow better possibilities of monitoring land use by combining data sets
OLEMAR: An Online Environment for Mining Association Rules in Multidimensional Data
Data warehouses and OLAP (online analytical processing) provide tools to explore and navigate through data cubes in order to extract interesting information under different perspectives and levels of granularity. Nevertheless, OLAP techniques do not allow the identification of relationships, groupings, or exceptions that could hold in a data cube. To that end, we propose to enrich OLAP techniques with data mining facilities to benefit from the capabilities they offer. In this chapter, we propose an online environment for mining association rules in data cubes. Our environment called OLEMAR (online environment for mining association rules), is designed to extract associations from multidimensional data. It allows the extraction of inter-dimensional association rules from data cubes according to a sum-based aggregate measure, a more general indicator than aggregate values provided by the traditional COUNT measure. In our approach, OLAP users are able to drive a mining process guided by a meta-rule, which meets their analysis objectives. In addition, the environment is based on a formalization, which exploits aggregate measures to revisit the definition of the support and the confidence of discovered rules. This formalization also helps evaluate the interestingness of association rules according to two additional quality measures: lift and loevinger. Furthermore, in order to focus on the discovered associations and validate them, we provide a visual representation based on the graphic semiology principles. Such a representation consists in a graphic encoding of frequent patterns and association rules in the same multidimensional space as the one associated with the mined data cube. We have developed our approach as a component in a general online analysis platform called Miningcubes according to an Apriori-like algorithm, which helps extract inter-dimensional association rules directly from materialized multidimensional structures of data. In order to illustrate the effectiveness and the efficiency of our proposal, we analyze a real-life case study about breast cancer data and conduct performance experimentation of the mining process
- âŠ