150,844 research outputs found

    A shortest-path based clustering algorithm for joint human-machine analysis of complex datasets

    Full text link
    Clustering is a technique for the analysis of datasets obtained by empirical studies in several disciplines with a major application for biomedical research. Essentially, clustering algorithms are executed by machines aiming at finding groups of related points in a dataset. However, the result of grouping depends on both metrics for point-to-point similarity and rules for point-to-group association. Indeed, non-appropriate metrics and rules can lead to undesirable clustering artifacts. This is especially relevant for datasets, where groups with heterogeneous structures co-exist. In this work, we propose an algorithm that achieves clustering by exploring the paths between points. This allows both, to evaluate the properties of the path (such as gaps, density variations, etc.), and expressing the preference for certain paths. Moreover, our algorithm supports the integration of existing knowledge about admissible and non-admissible clusters by training a path classifier. We demonstrate the accuracy of the proposed method on challenging datasets including points from synthetic shapes in publicly available benchmarks and microscopy data

    OPTIMIZATION OF MARKET BASKET ANALYSIS USING CENTROID-BASED CLUSTERING ALGORITHM AND FP-GROWTH ALGORITHM

    Get PDF
    The proliferation of the food and beverage sales business requires the creativity of business owners to offer their flagship products to every consumer, both new and subscribed consumers. A large number of menu choices makes the ordering process long because consumers are confused about which menu will be the best choice. the seller to be able to provide the right recommendations so that orders can take place faster. Shopping cart analysis is an activity that has often been done to find out the items found that are sold simultaneously. The FP-Growth association method is a faster algorithm for generating association rules, but the association process in large dataset sizes tends to add large items so that the accuracy value of association rules decreases. So that in this study, the grouping of datasets was carried out using a clustering model with a centroid-based algorithm, namely k-means, k-medoids, and fuzzy c-means. This research was conducted through dataset collection, dataset preparation, clustering modeling, evaluation of clustering models using DBI and silhouette index, association modeling, and evaluation of association models using lift ratio. The results of this study showed that the clustering model with the best DBI and silhouette index values ​​was at k=3 for k-means, k=2 for k-medoids, and k=7 for fuzzy c-means. The number of association rules is generated from the grouped data set using fuzzy c-means, but the highest average lift ratio is in the association rules generated from the grouping data set using k-means. From the association model using k-means and FP-Growth, 32 unique association rules were found with the 4 most frequently found items, namely cireng chili oil, regal milk coffee, banana cheese, and vietnam drip

    Combining Clustering techniques and Formal Concept Analysis to characterize Interestingness Measures

    Full text link
    Formal Concept Analysis "FCA" is a data analysis method which enables to discover hidden knowledge existing in data. A kind of hidden knowledge extracted from data is association rules. Different quality measures were reported in the literature to extract only relevant association rules. Given a dataset, the choice of a good quality measure remains a challenging task for a user. Given a quality measures evaluation matrix according to semantic properties, this paper describes how FCA can highlight quality measures with similar behavior in order to help the user during his choice. The aim of this article is the discovery of Interestingness Measures "IM" clusters, able to validate those found due to the hierarchical and partitioning clustering methods "AHC" and "k-means". Then, based on the theoretical study of sixty one interestingness measures according to nineteen properties, proposed in a recent study, "FCA" describes several groups of measures.Comment: 13 pages, 2 figure

    Research on technical analysis of basketball match based on data mining

    Get PDF
    The aim of this paper is to preprocess basketball technology actions, to classify these actionswith data mining technology, to mine association rules among them. The main works are shown below:The common approaches of data mining are discussed, such as preprocessing technology, classification technology, clustering technology and mining rules technology. Both ID3 decision tree classification algorithm association and Apriori association rules algorithm are studied in detail.The paper discusses basketball technology actionsboth on a small scale and a large scale, J48 decision tree classification and Apriori association rules mining algorithm basketball are applied, all these research results should have useful instruction to team

    Visual grouping of association rules by clustering conditional probabilities for categorical data

    Full text link
    We demonstrate the use of a visual data-mining tool for non-technical domain experts within organizations to facilitate the extraction of meaningful information and knowledge from in-house databases. The tool is mainly based on the basic notion of grouping association rules. Association rules are useful in discovering items that are frequently found together. However in many applications, rules with lower frequencies are often interesting for the user. Grouping of association rules is one way to overcome the rare item problem. However some groups of association rules are too large for ease of understanding. In this chapter we propose a method for clustering categorical data based on the conditional probabilities of association rules for data sets with large numbers of attributes. We argue that the proposed method provides non-technical users with a better understanding of discovered patterns in the data set

    MINING FOOD TRANSACTIONAL DATA TO PRODUCE ASSOCIATION RULES AS A BASIS OF BUSINESS ACTIONS

    Get PDF
    ABSTRAKSI: The food industry sells a range of product variations. The company wants to take advantage of their data by building business action from high volumes of transactional data. In this case, data mining technology needs to be implemented to explore valuable information on transactional data to assess customer\u27s preferences to products as a business strategy.Information about the customers’ behaviors of buying food products is important and this can be done by mapping the transaction data which is described as the pattern of customers’ tastes. The association method using apriori algorithm is used to map customers’ choice.The challenge is in the data itself, high volumes of data have to be prepared before the data is fetched to the mining process. Data reduction will be held to handle huge instances and attributes of the data. This research focused on the way the data were handled until the association rules were developed. To achieve this objective, three validation levels were implemented to verify the reliability of the association rules shows by percentage confidence.Furthermore, some data mining technique such as: clustering and time series pattern will be implemented to examine the truth of association rules which were built.It can be concluded that the association rules were established after three validation levels on reduced high volumes of transactional data, will generate strong association rules with confidence equal or higher than 70% and the rules established truth can be seen from the time series pattern on each group of goods which are then used as the basis of business actions.Kata Kunci : Data Reduction, Association Rules, Apriori, Confidence, Clustering, Time Series PatternsABSTRACT: -Keyword:

    Specific Usage of Visual Data Analysis Techniques

    Get PDF
    The visualization techniques are very important tools for data mining processes. They are widely applied in many areas especially in supporting decision making processes. We use visualization tools for rule generation, classification and clustering. The paper presents application of data visualization techniques and tools for generation of association rules, classification and clustering
    corecore