189,400 research outputs found

    Image Mining for Flower Classification by Genetic Association Rule Mining Using GLCM features

    Full text link
    Image mining is concerned with knowledge discovery in image databases. It is the extension of data mining algorithms to image processing domain. Image mining plays a vital role in extracting useful information from images. In computer aided plant identification and classification system the image mining will take a crucial role for the flower classification. The content image based on the low-level features such as color and textures are used to flower image classification. A flower image is segmented using a histogram threshold based method. The data set has different flower species with similar appearance (small inter class variations) across different classes and varying appearance (large intra class variations) within a class. Also the images of flowers are of different pose with cluttered background under varying lighting conditions and climatic conditions. The flower images were collected from World Wide Web in addition to the photographs taken up in a natural scene. The proposed method is based on textural features such as Gray level co-occurrence matrix (GLCM). This paper introduces multi dimensional genetic association rule mining for classification of flowers effectively. The image Data mining approach has four major steps: Preprocessing, Feature Extraction, Preparation of Transactional database and multi dimensional genetic association rule mining and classification. The purpose of our experiments is to explore the feasibility of data mining approach. Results will show that there is promise in image mining based on multi dimensional genetic association rule mining. It is well known that data mining techniques are more suitable to larger databases than the one used for these preliminary tests. Computer-aided method using association rule could assist people and improve the accuracy of flower identification. In particular, a Computer aided method based on association rules becomes more accurate with a larger dataset .Experimental results show that this new method can quickly and effectively mine potential association rules

    Probabilistic Sparse Subspace Clustering Using Delayed Association

    Full text link
    Discovering and clustering subspaces in high-dimensional data is a fundamental problem of machine learning with a wide range of applications in data mining, computer vision, and pattern recognition. Earlier methods divided the problem into two separate stages of finding the similarity matrix and finding clusters. Similar to some recent works, we integrate these two steps using a joint optimization approach. We make the following contributions: (i) we estimate the reliability of the cluster assignment for each point before assigning a point to a subspace. We group the data points into two groups of "certain" and "uncertain", with the assignment of latter group delayed until their subspace association certainty improves. (ii) We demonstrate that delayed association is better suited for clustering subspaces that have ambiguities, i.e. when subspaces intersect or data are contaminated with outliers/noise. (iii) We demonstrate experimentally that such delayed probabilistic association leads to a more accurate self-representation and final clusters. The proposed method has higher accuracy both for points that exclusively lie in one subspace, and those that are on the intersection of subspaces. (iv) We show that delayed association leads to huge reduction of computational cost, since it allows for incremental spectral clustering

    A Methodology for Identifying Core Technologies Based on Technological Cross-Impact: Association Rule Mining and ANP Approach

    Get PDF
    There have been attempts to examine technological structure and linkage as technological impact. Cross-impact analysis (CIA) has been mainly employed with cross-impact index to identify core technologies. Cross-impact index, however, cannot successfully capture the overall relationship based on the impacts among technologies. Furthermore, it is a time-consuming task to calculate all cross-impact index especially based on patents without developing computer program. To address this limitation, this study suggests new approach to identify core technologies in technological cross-impact interrelationship. Specially, the approach applied data mining technique and multi-criteria decision making (MCDM) method to the co-classification information of registered patents. At first, technological cross-impact matrix is constructed with the confidence values by applying association rule mining (ARM) to the co-classification information of patents. Then, Analytic Hierarchical Process (ANP), one of MCDM methods, is employed to the constructed matrix for identifying core technologies from the perspectives of overall cross-impacts. A case study of telecommunication technology is conducted to illustrate the process of executing and utilizing the proposed approach. It is expected that suggested approach could help technology planners to formulate strategy and policy for technological innovation

    Methods of Association Mining by Variable-to-Set Affinity Testing

    Get PDF
    Statistical data mining refers to methods for identifying and validating interesting patterns from an overabundance of data. Data mining tasks in which the objective involves pairwise relationships between variables are known as association mining. In general, features sought by association mining methods are sets of variables, often small subsets of a larger collection, that are more associated internally than externally. Methods vary in both the measure of association that is studied and the algorithm by which associated sets are identified. This dissertation discusses provide a generalized framework for association mining called Variable-to-Set Affinity Testing (VSAT). Unlike conventional techniques for clustering or community detection, which usually maximize a score from a dissimilarity or adjacency matrix, the VSAT approach is an adaptive procedure grounded in statistical hypothesis testing principles. The framework is adaptable to a broad class of measurements for variable relationships, and is equipped with theoretical guarantees of error control. This dissertation also presents in detail two new association mining methods built in the VSAT framework. The first, Differential Correlation Mining (DCM), identifies variable sets that have higher average pairwise correlation in one sample condition than in another. Such artifacts are of scientific interest in many fields, including statistical genetics and neuroscience. Differential Correlation Mining is applied to high-dimensional data sets in these two fields. The second method, Coherent Set Mining (CSM), is a novel approach to association mining in binary data. Dichotomous observations are assumed to derive from a latent variable of interest via thresholding. The Coherent Set Mining method identifies variable sets that are strongly associated in the latent measure, despite distortions in the association structure of the observed data due to the thresholding process. Coherent Set Mining is applied to problems in text mining, statistical genetics, and product recommendation.Doctor of Philosoph

    A Universal Similarity Model for Transactional Data Clustering

    Get PDF
    Data mining methods are used to extract hidden knowledge from large database. Data partitioning methods are used to group up the relevant data values. Similar data values are grouped under the same cluster. K - means and Partitioning Around Medoids (PAM ) clustering algorithms are used to cluster numerical data. Distance measures are used to estimate the transaction similarity. Data partitioning solutions are identified using the cluster ensembl e models . The ensemble information matrix presents only cluster data point relations. Ensembles based clustering techniques produces final data partition based on incomplete information. Link - based approach improves the conventional matrix by discovering unknown entries through cluster similarity in an ensemble. Link - based algorithm is used for the underlying similarity assessment. Pairwise similarity and binary cluster association matrices summarize the underlying ensemble information. A weighted bipartite graph is formulated from the refined matrix. The graph partitioning technique is applied on the weighted bipartite graph. The Particle Swarm Optimization (PSO) clustering algorithm is a optimization based clustering scheme. It is integrated with the clu ster ensemble model. Binary , categorical and continuous data clustering is supported in the system. The attribute connectivity analysis is optimized for all attributes. Refined cluster - association matrix (RM) is updated with all attribute relationships

    A Novel Biclustering Approach to Association Rule Mining for Predicting HIV-1–Human Protein Interactions

    Get PDF
    Identification of potential viral-host protein interactions is a vital and useful approach towards development of new drugs targeting those interactions. In recent days, computational tools are being utilized for predicting viral-host interactions. Recently a database containing records of experimentally validated interactions between a set of HIV-1 proteins and a set of human proteins has been published. The problem of predicting new interactions based on this database is usually posed as a classification problem. However, posing the problem as a classification one suffers from the lack of biologically validated negative interactions. Therefore it will be beneficial to use the existing database for predicting new viral-host interactions without the need of negative samples. Motivated by this, in this article, the HIV-1–human protein interaction database has been analyzed using association rule mining. The main objective is to identify a set of association rules both among the HIV-1 proteins and among the human proteins, and use these rules for predicting new interactions. In this regard, a novel association rule mining technique based on biclustering has been proposed for discovering frequent closed itemsets followed by the association rules from the adjacency matrix of the HIV-1–human interaction network. Novel HIV-1–human interactions have been predicted based on the discovered association rules and tested for biological significance. For validation of the predicted new interactions, gene ontology-based and pathway-based studies have been performed. These studies show that the human proteins which are predicted to interact with a particular viral protein share many common biological activities. Moreover, literature survey has been used for validation purpose to identify some predicted interactions that are already validated experimentally but not present in the database. Comparison with other prediction methods is also discussed

    A Framework for High-Accuracy Privacy-Preserving Mining

    Full text link
    To preserve client privacy in the data mining process, a variety of techniques based on random perturbation of data records have been proposed recently. In this paper, we present a generalized matrix-theoretic model of random perturbation, which facilitates a systematic approach to the design of perturbation mechanisms for privacy-preserving mining. Specifically, we demonstrate that (a) the prior techniques differ only in their settings for the model parameters, and (b) through appropriate choice of parameter settings, we can derive new perturbation techniques that provide highly accurate mining results even under strict privacy guarantees. We also propose a novel perturbation mechanism wherein the model parameters are themselves characterized as random variables, and demonstrate that this feature provides significant improvements in privacy at a very marginal cost in accuracy. While our model is valid for random-perturbation-based privacy-preserving mining in general, we specifically evaluate its utility here with regard to frequent-itemset mining on a variety of real datasets. The experimental results indicate that our mechanisms incur substantially lower identity and support errors as compared to the prior techniques

    A Hybrid Web Recommendation System based on the Improved Association Rule Mining Algorithm

    Full text link
    As the growing interest of web recommendation systems those are applied to deliver customized data for their users, we started working on this system. Generally the recommendation systems are divided into two major categories such as collaborative recommendation system and content based recommendation system. In case of collaborative recommen-dation systems, these try to seek out users who share same tastes that of given user as well as recommends the websites according to the liking given user. Whereas the content based recommendation systems tries to recommend web sites similar to those web sites the user has liked. In the recent research we found that the efficient technique based on asso-ciation rule mining algorithm is proposed in order to solve the problem of web page recommendation. Major problem of the same is that the web pages are given equal importance. Here the importance of pages changes according to the fre-quency of visiting the web page as well as amount of time user spends on that page. Also recommendation of newly added web pages or the pages those are not yet visited by users are not included in the recommendation set. To over-come this problem, we have used the web usage log in the adaptive association rule based web mining where the asso-ciation rules were applied to personalization. This algorithm was purely based on the Apriori data mining algorithm in order to generate the association rules. However this method also suffers from some unavoidable drawbacks. In this paper we are presenting and investigating the new approach based on weighted Association Rule Mining Algorithm and text mining. This is improved algorithm which adds semantic knowledge to the results, has more efficiency and hence gives better quality and performances as compared to existing approaches.Comment: 9 pages, 7 figures, 2 table
    corecore