150,844 research outputs found
A shortest-path based clustering algorithm for joint human-machine analysis of complex datasets
Clustering is a technique for the analysis of datasets obtained by empirical
studies in several disciplines with a major application for biomedical
research. Essentially, clustering algorithms are executed by machines aiming at
finding groups of related points in a dataset. However, the result of grouping
depends on both metrics for point-to-point similarity and rules for
point-to-group association. Indeed, non-appropriate metrics and rules can lead
to undesirable clustering artifacts. This is especially relevant for datasets,
where groups with heterogeneous structures co-exist. In this work, we propose
an algorithm that achieves clustering by exploring the paths between points.
This allows both, to evaluate the properties of the path (such as gaps, density
variations, etc.), and expressing the preference for certain paths. Moreover,
our algorithm supports the integration of existing knowledge about admissible
and non-admissible clusters by training a path classifier. We demonstrate the
accuracy of the proposed method on challenging datasets including points from
synthetic shapes in publicly available benchmarks and microscopy data
OPTIMIZATION OF MARKET BASKET ANALYSIS USING CENTROID-BASED CLUSTERING ALGORITHM AND FP-GROWTH ALGORITHM
The proliferation of the food and beverage sales business requires the creativity of business owners to offer their flagship products to every consumer, both new and subscribed consumers. A large number of menu choices makes the ordering process long because consumers are confused about which menu will be the best choice. the seller to be able to provide the right recommendations so that orders can take place faster. Shopping cart analysis is an activity that has often been done to find out the items found that are sold simultaneously. The FP-Growth association method is a faster algorithm for generating association rules, but the association process in large dataset sizes tends to add large items so that the accuracy value of association rules decreases. So that in this study, the grouping of datasets was carried out using a clustering model with a centroid-based algorithm, namely k-means, k-medoids, and fuzzy c-means. This research was conducted through dataset collection, dataset preparation, clustering modeling, evaluation of clustering models using DBI and silhouette index, association modeling, and evaluation of association models using lift ratio. The results of this study showed that the clustering model with the best DBI and silhouette index values was at k=3 for k-means, k=2 for k-medoids, and k=7 for fuzzy c-means. The number of association rules is generated from the grouped data set using fuzzy c-means, but the highest average lift ratio is in the association rules generated from the grouping data set using k-means. From the association model using k-means and FP-Growth, 32 unique association rules were found with the 4 most frequently found items, namely cireng chili oil, regal milk coffee, banana cheese, and vietnam drip
Combining Clustering techniques and Formal Concept Analysis to characterize Interestingness Measures
Formal Concept Analysis "FCA" is a data analysis method which enables to
discover hidden knowledge existing in data. A kind of hidden knowledge
extracted from data is association rules. Different quality measures were
reported in the literature to extract only relevant association rules. Given a
dataset, the choice of a good quality measure remains a challenging task for a
user. Given a quality measures evaluation matrix according to semantic
properties, this paper describes how FCA can highlight quality measures with
similar behavior in order to help the user during his choice. The aim of this
article is the discovery of Interestingness Measures "IM" clusters, able to
validate those found due to the hierarchical and partitioning clustering
methods "AHC" and "k-means". Then, based on the theoretical study of sixty one
interestingness measures according to nineteen properties, proposed in a recent
study, "FCA" describes several groups of measures.Comment: 13 pages, 2 figure
Research on technical analysis of basketball match based on data mining
The aim of this paper is to preprocess basketball technology actions, to classify these actionswith data mining technology, to mine association rules among them. The main works are shown below:The common approaches of data mining are discussed, such as preprocessing technology, classification technology, clustering technology and mining rules technology. Both ID3 decision tree classification algorithm association and Apriori association rules algorithm are studied in detail.The paper discusses basketball technology actionsboth on a small scale and a large scale, J48 decision tree classification and Apriori association rules mining algorithm basketball are applied, all these research results should have useful instruction to team
Visual grouping of association rules by clustering conditional probabilities for categorical data
We demonstrate the use of a visual data-mining tool for non-technical domain experts within organizations to facilitate the extraction of meaningful information and knowledge from in-house databases. The tool is mainly based on the basic notion of grouping association rules. Association rules are useful in discovering items that are frequently found together. However in many applications, rules with lower frequencies are often interesting for the user. Grouping of association rules is one way to overcome the rare item problem. However some groups of association rules are too large for ease of understanding. In this chapter we propose a method for clustering categorical data based on the conditional probabilities of association rules for data sets with large numbers of attributes. We argue that the proposed method provides non-technical users with a better understanding of discovered patterns in the data set
MINING FOOD TRANSACTIONAL DATA TO PRODUCE ASSOCIATION RULES AS A BASIS OF BUSINESS ACTIONS
ABSTRAKSI: The food industry sells a range of product variations. The company wants to take advantage of their data by building business action from high volumes of transactional data. In this case, data mining technology needs to be implemented to explore valuable information on transactional data to assess customer\u27s preferences to products as a business strategy.Information about the customers’ behaviors of buying food products is important and this can be done by mapping the transaction data which is described as the pattern of customers’ tastes. The association method using apriori algorithm is used to map customers’ choice.The challenge is in the data itself, high volumes of data have to be prepared before the data is fetched to the mining process. Data reduction will be held to handle huge instances and attributes of the data. This research focused on the way the data were handled until the association rules were developed. To achieve this objective, three validation levels were implemented to verify the reliability of the association rules shows by percentage confidence.Furthermore, some data mining technique such as: clustering and time series pattern will be implemented to examine the truth of association rules which were built.It can be concluded that the association rules were established after three validation levels on reduced high volumes of transactional data, will generate strong association rules with confidence equal or higher than 70% and the rules established truth can be seen from the time series pattern on each group of goods which are then used as the basis of business actions.Kata Kunci : Data Reduction, Association Rules, Apriori, Confidence, Clustering, Time Series PatternsABSTRACT: -Keyword:
Specific Usage of Visual Data Analysis Techniques
The visualization techniques are very important tools for data mining processes. They are widely applied in many areas especially in supporting decision making processes. We use visualization tools for rule generation, classification and clustering. The paper presents application of data visualization techniques and tools for generation of association rules, classification and clustering
- …