20 research outputs found

    Clustering web images using association rules, interestingness measures, and hypergraph partitions

    Full text link

    Association rules implementation for affinity analysis between elements composing multimedia objects

    Get PDF
    The multimedia objects are a constantly growing resource in the world wide web, consequently it has generated as a necessity the design of methods and tools that allow to obtain new knowledge from the information analyzed. Association rules are a technique of Data Mining, whose purpose is to search for correlations between elements of a collection of data (data) as support for decision making from the identification and analysis of these correlations. Using algorithms such as: A priori, Frequent Parent Growth, QFP Algorithm, CBA, CMAR, CPAR, among others. On the other hand, multimedia applications today require the processing of unstructured data provided by multimedia objects, which are made up of text, images, audio and videos. For the storage, processing and management of multimedia objects, solutions have been generated that allow efficient search of data of interest to the end user, considering that the semantics of a multimedia object must be expressed by all the elements that composed of. In this article an analysis of the state of the art in relation to the implementation of the Association Rules in the processing of Multimedia objects is made, in addition the analysis of the consulted literature allows to generate questions about the possibility of generating a method of association rules for the analysis of these objects.Universidad de la Costa, Universidad Pontificia Bolivariana

    Structural advances for pattern discovery in multi-relational databases

    Get PDF
    With ever-growing storage needs and drift towards very large relational storage settings, multi-relational data mining has become a prominent and pertinent field for discovering unique and interesting relational patterns. As a consequence, a whole suite of multi-relational data mining techniques is being developed. These techniques may either be extensions to the already existing single-table mining techniques or may be developed from scratch. For the traditionalists, single-table mining algorithms can be used to work on multi-relational settings by making inelegant and time consuming joins of all target relations. However, complex relational patterns cannot be expressed in a single-table format and thus, cannot be discovered. This work presents a new multi-relational frequent pattern mining algorithm termed Multi-Relational Frequent Pattern Growth (MRFP Growth). MRFP Growth is capable of mining multiple relations, linked with referential integrity, for frequent patterns that satisfy a user specified support threshold. Empirical results on MRFP Growth performance and its comparison with the state-of-the-art multirelational data mining algorithms like WARMR and Decentralized Apriori are discussed at length. MRFP Growth scores over the latter two techniques in number of patterns generated and speed. The realm of multi-relational clustering is also explored in this thesis. A multi-Relational Item Clustering approach based on Hypergraphs (RICH) is proposed. Experimentally RICH combined with MRFP Growth proves to be a competitive approach for clustering multi-relational data. The performance and iii quality of clusters generated by RICH are compared with other clustering algorithms. Finally, the thesis demonstrates the applied utility of the theoretical implications of the above mentioned algorithms in an application framework for auto-annotation of images in an image database. The system is called CoMMA which stands for Combining Multi-relational Multimedia for Associations

    Summarization Techniques for Pattern Collections in Data Mining

    Get PDF
    Discovering patterns from data is an important task in data mining. There exist techniques to find large collections of many kinds of patterns from data very efficiently. A collection of patterns can be regarded as a summary of the data. A major difficulty with patterns is that pattern collections summarizing the data well are often very large. In this dissertation we describe methods for summarizing pattern collections in order to make them also more understandable. More specifically, we focus on the following themes: 1) Quality value simplifications. 2) Pattern orderings. 3) Pattern chains and antichains. 4) Change profiles. 5) Inverse pattern discovery.Comment: PhD Thesis, Department of Computer Science, University of Helsink

    Online summarization of dynamic graphs using subjective interestingness for sequential data

    Get PDF
    Algorithms and the Foundations of Software technolog

    Unsupervised Detection of Emergent Patterns in Large Image Collections

    Get PDF
    With the advent of modern image acquisition and sharing technologies, billions of images are added to the Internet every day. This huge repository contains useful information, but it is very hard to analyze. If labeled information is available for this data, then supervised learning techniques can be used to extract useful information. Visual pattern mining approaches provide a way to discover visual structures and patterns in an image collection without the need of any supervision. The Internet contains images of various objects, scenes, patterns, and shapes. The majority of approaches for visual pattern discovery, on the other hand, find patterns that are related to object or scene categories.Emergent pattern mining techniques provide a way to extract generic, complex and hidden structures in images. This thesis describes research, experiments, and analysis conducted to explore various approaches to mine emergent patterns from image collections in an unsupervised way. These approaches are based on itemset mining and graph theoretic strategies. The itemset mining strategy uses frequent itemset mining and rare itemset mining techniques to discover patterns.The mining is performed on a transactional dataset which is obtained from the BoW representation of images. The graph-based approach represents visual word co-occurrences obtained from images in a co-occurrence graph.Emergent patterns form dense clusters in this graph that are extracted using normalized cuts. The patterns that are discovered using itemset mining approaches are:stripes and parallel lines;dots and checks;bright dots;single lines;intersections; and frames. The graph based approach revealed various interesting patterns, including some patterns that are related to object categories

    Acta Cybernetica : Volume 15. Number 2.

    Get PDF

    Método de reglas de asociación para el análisis de afinidad entre objetos de tipo texto

    Get PDF
    Maestría en IngenieríaData mining is considered a tool to extract knowledge in large volumes of information. One of the analyzes performed in data mining is the association rules, whose purpose is to look for co-occurrences among the records of a set of data. Its main application is in the analysis of market basket, where criteria for decision making are established based on the buying behavior of customers. Some of the algorithms are A priori, Frequent Parent Growth, QFP Algorithm, CBA, CMAR, CPAR. These algorithms have been designed to analyze structured databases; At present, various applications require the processing of unstructured data known as text type Objects. The purpose of this research is to generate a method to establish the relationship between the elements that make up an object of text type, for the acquisition of relevant information from the analysis of massive data sources of the same type.La minería de datos es considerada una herramienta para extraer conocimiento en grandes volúmenes de información. Uno de los análisis realizados en minería de datos son las reglas de asociación, cuyo propósito es buscar co-ocurrencias entre los registros de un conjunto de datos. Su principal aplicación se encuentra en el análisis de canasta de mercado, donde se establecen criterios para la toma de decisiones a partir del comportamiento de compra de los clientes. Algunos de los algoritmos son Apriori, Frequent Parent Growth, QFP Algorithm, CBA, CMAR, CPAR. Estos algoritmos han sido diseñados para analizar bases de datos estructuradas; en la actualidad, diversas aplicaciones requieren el procesamiento de datos no estructurados, como es el caso de los objetos de tipo texto. La investigación planteada tiene como propósito generar un método que permita establecer la relación existente entre los elementos que componen un objeto de tipo texto, para la adquisición de información relevante a partir del análisis de fuentes masivas de datos del mismo tipo

    AVATAR - Machine Learning Pipeline Evaluation Using Surrogate Model

    Get PDF
    © 2020, The Author(s). The evaluation of machine learning (ML) pipelines is essential during automatic ML pipeline composition and optimisation. The previous methods such as Bayesian-based and genetic-based optimisation, which are implemented in Auto-Weka, Auto-sklearn and TPOT, evaluate pipelines by executing them. Therefore, the pipeline composition and optimisation of these methods requires a tremendous amount of time that prevents them from exploring complex pipelines to find better predictive models. To further explore this research challenge, we have conducted experiments showing that many of the generated pipelines are invalid, and it is unnecessary to execute them to find out whether they are good pipelines. To address this issue, we propose a novel method to evaluate the validity of ML pipelines using a surrogate model (AVATAR). The AVATAR enables to accelerate automatic ML pipeline composition and optimisation by quickly ignoring invalid pipelines. Our experiments show that the AVATAR is more efficient in evaluating complex pipelines in comparison with the traditional evaluation approaches requiring their execution
    corecore