11 research outputs found

    Fouille d'items et d'itemsets représentatifs avec des méthodes de décomposition de matrices binaires et de sélection d'instances

    No full text
    This thesis focuses on mining representative items and itemsets using Binary Matrix Factorization (BMF) and instance selection. To accomplish this task, we first, in Chapter 1, consider the BMF problem by studying the literature on matrix decomposition techniques and the state-of-the-art algorithms. Then, we establish a connection between BMF problem and Unconstrained Binary Quadratic Programming (UBQP) problem in order to use UBQP's algorithms and heuristics, available in the literature, in case of BMF solutions. Next, in Chapter 2, we propose a new, efficient heuristic which flips 1 bit at the time in order to improve the solutions of BMF. Using the established link discussed in Chapter 2, which enables us to use heuristics of UBQP, we compare the proposed technique, called 1-opt-BMF with that of UBQP, called 1opt-UBQP as well as the standard approach, called 1-opt-Standard. We then show, theoretically and experimentally, the efficiency of 1-opt-BMF on a wide range of publicly available datasets. Next, in Chapter 3, we explore addressing the problem of finding representative itemsets via BMF. To do that, we first consider the theoretical relation between the frequent itemset mining problem and BMF; while established, we propose a new technique called Decomposition Itemset Miner (DIM). We then design a set of experiments to show the efficiency of DIM and the quality of its results.Finally, in Chapter 4, we consider the problem of finding representative objects (instances) in big, high-dimensional datasets. These objects helps us to find objects providing a global, top-view of the data and are very important in data analysis process. We first study the available methods for finding representative objects and discuss the pros and cons of each. We then formally define the Instance Selection Problem (ISP), provide three variants of that and examine their complexities before providing their solutions. In the experimental section, we show that although the ISP solutions can outperform other methods in some cases, in general it should be considered as a complementary technique in the context of finding representative objects.Dans cette thèse, nous nous intéressons à la recherche d'“items” et d'“itemsets” d'intérêt via la décomposition de matrice binaire (Binary Matrix Factorization, BMF) et à la recherche d'objets représentatifs. Pour cela, nous étudions l'état de l'art des techniques de décomposition matricielle. Nous établissons, dans le premier Chapitre, un lien entre BMF et le problème de programmation binaire quadratique sans contraintes (Unconstrained Binary Quadratic Programming, UBQP) afin d'utiliser les algorithmes et heuristiques existant dans la littérature pour UBQP et les appliquer à BMF.Nous proposons dans le Chapitre 2 une nouvelle heuristique adaptée au calcul de BMF. Cette technique efficace optimise les solutions de BMF ligne par ligne (ou colonne par colonne) en inversant 1 bit à chaque fois. En utilisant le lien établi dans le Chapitre 2 qui nous permet d'appliquer les algorithmes et heuristiques d'UBQP à BMF, nous comparons la méthode proposée (1-opt-BMF) avec les heuristiques spécialisées pour UBQP (1-opt-UBQP) ainsi que les heuristiques classiques (1-opt-Standard). Nous montrons ensuite, en théorie et en pratique, l'efficacité de 1-opt-BMF sur une large variété de données publiques. Dans le Chapitre 3, nous nous intéressons au problème de la recherche des itemsets représentatifs en utilisant BMF et 1-opt-BMF. Pour cela, nous considérons dans un premier temps le lien entre le problème de “frequent itemset mining” et BMF, et proposons une nouvelle méthode que nous appelons “Decomposition Itemset Miner” (DIM). Une série d'expérience montre la qualité des résultats obtenus et l'efficacité de notre méthode.Enfinf, nous nous intéressons, dans le Chapitre 4, à la recherche d'objets représentatifs (qui donnent une vue globale sur les données) dans des données de grandes dimensions. Nous examinons les méthodes disponibles dans la littérature en donnant les avantages et les inconvénients de chacune. Ensuite, nous défnissons mathématiquement le problème de sélection d'instance (Instance Selection Problem: ISP) et présentons trois variantes à ce problème ainsi que leur solutions. Dans les expériences, nous montrons que, bien qu'ISP puisse surpasser les autres méthodes dans certains cas, il vaut mieux le considérer en général comme une technique complémentaire dans le cadre de la recherche des objets représentatifs

    Fouille d'items et d'itemsets représentatifs avec des méthodes de décomposition de matrices binaires et de sélection d'instances

    Get PDF
    This thesis focuses on mining representative items and itemsets using Binary Matrix Factorization (BMF) and instance selection. To accomplish this task, we first, in Chapter 1, consider the BMF problem by studying the literature on matrix decomposition techniques and the state-of-the-art algorithms. Then, we establish a connection between BMF problem and Unconstrained Binary Quadratic Programming (UBQP) problem in order to use UBQP's algorithms and heuristics, available in the literature, in case of BMF solutions. Next, in Chapter 2, we propose a new, efficient heuristic which flips 1 bit at the time in order to improve the solutions of BMF. Using the established link discussed in Chapter 2, which enables us to use heuristics of UBQP, we compare the proposed technique, called 1-opt-BMF with that of UBQP, called 1opt-UBQP as well as the standard approach, called 1-opt-Standard. We then show, theoretically and experimentally, the efficiency of 1-opt-BMF on a wide range of publicly available datasets. Next, in Chapter 3, we explore addressing the problem of finding representative itemsets via BMF. To do that, we first consider the theoretical relation between the frequent itemset mining problem and BMF; while established, we propose a new technique called Decomposition Itemset Miner (DIM). We then design a set of experiments to show the efficiency of DIM and the quality of its results.Finally, in Chapter 4, we consider the problem of finding representative objects (instances) in big, high-dimensional datasets. These objects helps us to find objects providing a global, top-view of the data and are very important in data analysis process. We first study the available methods for finding representative objects and discuss the pros and cons of each. We then formally define the Instance Selection Problem (ISP), provide three variants of that and examine their complexities before providing their solutions. In the experimental section, we show that although the ISP solutions can outperform other methods in some cases, in general it should be considered as a complementary technique in the context of finding representative objects.Dans cette thèse, nous nous intéressons à la recherche d'“items” et d'“itemsets” d'intérêt via la décomposition de matrice binaire (Binary Matrix Factorization, BMF) et à la recherche d'objets représentatifs. Pour cela, nous étudions l'état de l'art des techniques de décomposition matricielle. Nous établissons, dans le premier Chapitre, un lien entre BMF et le problème de programmation binaire quadratique sans contraintes (Unconstrained Binary Quadratic Programming, UBQP) afin d'utiliser les algorithmes et heuristiques existant dans la littérature pour UBQP et les appliquer à BMF.Nous proposons dans le Chapitre 2 une nouvelle heuristique adaptée au calcul de BMF. Cette technique efficace optimise les solutions de BMF ligne par ligne (ou colonne par colonne) en inversant 1 bit à chaque fois. En utilisant le lien établi dans le Chapitre 2 qui nous permet d'appliquer les algorithmes et heuristiques d'UBQP à BMF, nous comparons la méthode proposée (1-opt-BMF) avec les heuristiques spécialisées pour UBQP (1-opt-UBQP) ainsi que les heuristiques classiques (1-opt-Standard). Nous montrons ensuite, en théorie et en pratique, l'efficacité de 1-opt-BMF sur une large variété de données publiques. Dans le Chapitre 3, nous nous intéressons au problème de la recherche des itemsets représentatifs en utilisant BMF et 1-opt-BMF. Pour cela, nous considérons dans un premier temps le lien entre le problème de “frequent itemset mining” et BMF, et proposons une nouvelle méthode que nous appelons “Decomposition Itemset Miner” (DIM). Une série d'expérience montre la qualité des résultats obtenus et l'efficacité de notre méthode.Enfinf, nous nous intéressons, dans le Chapitre 4, à la recherche d'objets représentatifs (qui donnent une vue globale sur les données) dans des données de grandes dimensions. Nous examinons les méthodes disponibles dans la littérature en donnant les avantages et les inconvénients de chacune. Ensuite, nous défnissons mathématiquement le problème de sélection d'instance (Instance Selection Problem: ISP) et présentons trois variantes à ce problème ainsi que leur solutions. Dans les expériences, nous montrons que, bien qu'ISP puisse surpasser les autres méthodes dans certains cas, il vaut mieux le considérer en général comme une technique complémentaire dans le cadre de la recherche des objets représentatifs

    Contributions to the mathematical modeling of estimation of distribution algorithms and pseudo-boolean functions

    Get PDF
    134 p.Maximice o minimice una función objetivo definida sobre un espacio discreto. Dado que la mayoría de dichos problemas no pueden ser resueltos mediante una búsqueda exhaustiva, su resolución se aproxima frecuentemente mediante algoritmos heurísticos. Sin embargo, no existe ningún algoritmo que se comporte mejor que el resto de algoritmos para resolver todas las instancias de cualquier problema. Por ello, el objetivo ideal es, dado una instancia de un problema, saber cuál es el algoritmo cuya resoluciones más eficiente. Las dos líneas principales de investigación para lograr dicho objetivo son estudiar las definiciones de los problemas y las posibles instancias que cada problema puede generar y el estudio delos diseños y características de los algoritmos. En esta tesis, se han tratado ambas lineas. Por un lado,hemos estudiado las funciones pseudo-Booleanas y varios problemas binarios específicos. Por otro lado,se ha presentado un modelado matemático para estudiar Algoritmos de Estimación de Distribuciones diseñados para resolver problemas basados en permutaciones. La principal motivación ha sido seguir progresando en este campo para comprender mejor las relaciones entre los Problemas de Optimización Combinatoria y los algoritmos de optimización

    Dynamic Programming Driven Memetic Search for the Steiner Tree Problem with Revenues, Budget, and Hop Constraints

    Full text link

    Algoritmos exatos e heurísticos para o problema da diversidade máxima

    Get PDF
    In this work, we suggest Branch-and-Cut algorithms and Lagrangian heuristics for the Maximum Diversity Problem. We use a standard formulation reinforced by some valid inequalities for the problem. These are dynamically dualized in the heuristics and used as cuts in the exact methods. To separate them efficiently, we propose some heuristic procedures. In addition, we perform computational tests and compare our exact and heuristics methods with those found in the literature for the problem.Neste trabalho, propomos algoritmos Branch-and-Cut e heurísticas Lagrangeanas para o Problema da Máxima Diversidade. Utilizamos uma formulação linear padrão fortalecida por desigualdades válidas para o problema. Estas são dualizadas dinamicamente nas heurísticas e utilizadas como cortes nos algoritmos exatos. Para tanto, propomos alguns procedimentos heurísticos para separá-las de maneira eficiente. Adicionalmente, realizamos testes computacionais e comparamos nossos algoritmos exatos e heurísticos com aqueles existentes na literatura para o problema

    Preventing premature convergence and proving the optimality in evolutionary algorithms

    Get PDF
    http://ea2013.inria.fr//proceedings.pdfInternational audienceEvolutionary Algorithms (EA) usually carry out an efficient exploration of the search-space, but get often trapped in local minima and do not prove the optimality of the solution. Interval-based techniques, on the other hand, yield a numerical proof of optimality of the solution. However, they may fail to converge within a reasonable time due to their inability to quickly compute a good approximation of the global minimum and their exponential complexity. The contribution of this paper is a hybrid algorithm called Charibde in which a particular EA, Differential Evolution, cooperates with a Branch and Bound algorithm endowed with interval propagation techniques. It prevents premature convergence toward local optima and outperforms both deterministic and stochastic existing approaches. We demonstrate its efficiency on a benchmark of highly multimodal problems, for which we provide previously unknown global minima and certification of optimality

    Scalable Graph Algorithms using Practically Efficient Data Reductions

    Get PDF

    From image co-segmentation to discrete optimization in computer vision - the exploration on graphical model, statistical physics, energy minimization, and integer programming

    Get PDF
    This dissertation aims to explore the ideas and frameworks for solving the discrete optimization problem in computer vision. Much of the work is inspired by the study of the image co-segmentation problem. It is through the research on this topic that the author has become very familiar with the graphical model and energy minimization point of view in handling computer vision problems - that is, how to combine the local information with the neighborhood interaction information in the graphical system for the inference; and also the author has come to the realization that many problems in and beyond computer vision can be solved in that way. At the beginning of this dissertation, we first give a comprehensive background review on graphical model, energy minimization, integer programming, as well as all their connections with the fundamental statistical physics. We aim to review the various aspects of the concepts, models, algorithms, etc., in a systematic way and from a different perspective. For instance, we review the correspondences between the commonly used unary/binary energy objective terms in computer vision with those of the fundamental Ising model in statistical physics; and also we summarize several widely used discrete energy minimization algorithms in computer vision under a unified framework in statistical physics; in addition we stress the close connections between the graphical model energy minimization and the integer programming problems, and especially we point out the central role of Mixed-Integer Quadratic Programming in discrete optimization in and beyond computer vision. Moreover, we explore the relationship between integer programming and energy minimization experimentally. We test integer programming methods on randomly generated energy formulations (as those would appear in computer vision problems), and similarly energy minimization methods on the integer programming problem of Graph K-coloring. Therefore we can easily compare the optimization performance of various methods (no matter whether they are designed for energy minimization or integer programming) on one platform. We come to the conclusion that sharing the methods across the fields (energy minimization in computer vision and integer programming in applied mathematics) is very helpful and beneficial. Based on the statistical physics inspired energy minimization framework we obtained, we formulate the task of density based clustering into this formulation. Energy is defined in terms of inhomogeneity in local point density. A sequence of energy minima are found to recursively partition the points, and thus we find a hierarchical embedding of clusters that are increasingly homogeneous in density. Energy is expressed as the sum of a unary (data) term and a binary (smoothness) term. The only parameter required to be specified by the user is a homogeneity criterion - the degree of acceptable fluctuation in density within a cluster. Thus, we do not have to specify, for example, the number of clusters present. Disjoint clusters with the same density are identified separately. Experimental results show that our method is able to handle clusters of different shapes, sizes and densities. We present the performance of our approach using the energy optimization algorithms ICM, LBP, Graph-cut, and Mean field theory algorithm. We also show that the family of commonly used spectral, graph clustering algorithms (such as Normalized-cut) is a special case of our formulation, using only the binary energy term while ignoring the unary term. After all the discussions above on the general framework for solving the discrete optimization problem in computer vision, the dissertation then focuses on the study of image co-segmentation, which is in fact carried out before the above topics. Image co-segmentation is the task of automatically discovering, locating and segmenting some unknown common object in a set of images. It has become a popular research topic in computer vision during recent years. The unsupervised nature is an important characteristic of the problem; i.e., the common object is a priori unknown. Moreover, the common object may be subject to viewpoint change, lighting condition change, occlusion, and deformation across the images; all these conditions make the co-segmentation task very challenging. In this part of the study we focus on the research of image co-segmentation and propose various approaches for addressing this problem. Most existing co-segmentation methods focus on co-segmenting the images with a very dominant common object, where the background interference is very limited. Such images are not realistic for the co-segmentation task, since in practice we may always encounter images with very rich and complex content where the common object is not dominant and appears simultaneously along with a large number of other objects. In this work we aim to address the image co-segmentation problem on this kind of image that cannot be handled properly with many previous methods. Two distinct approaches have been proposed in this work for image co-segmentation; the key difference lies in the method of common object discovery. The first approach is a "topology" based approach (also called a "point-region" approach) while the second one is a "sparse optimization" based approach. Specifically, in the first approach we combine the image key point features with the segment features together to discover the common object, while relying on the local topology consistency of both key point and segment layout for the robust recognition. The obtained initial foreground (the common object) in each image is refined through graphical model energy minimization based on a global appearance model extracted from the entire image dataset. The second approach is inspired by sparse optimization techniques; in this approach we use a sparse approximation scheme to find the optimal correspondence of the segments in two images as the initial estimation of the common object, based on some linear additive features extracted from the segments. In both proposed approaches, we emphasize the exploration of inter-image information in all steps of the algorithms; therefore, the common object need not to be dominant or salient in each individual image, as long as it is "common" across the image set. Extensive experiments have been conducted in this study to validate the performance of the proposed approaches. We carry out experiments on the widely used benchmark datasets for image co-segmentation, including iCoseg dataset, the multi-view co-segmentation dataset, Oxford flower dataset and so forth. Besides the above datasets, in order to better evaluate the performance on the rich and complex images with non-dominant common object, we also propose a new dataset in this work called richCoseg. Experiments are also conducted on this new dataset and qualitative and quantitative comparisons with the recent methods are provided. Finally, this dissertation also discusses very briefly some other vision problems the author has studied in previously published works

    Advances and Novel Approaches in Discrete Optimization

    Get PDF
    Discrete optimization is an important area of Applied Mathematics with a broad spectrum of applications in many fields. This book results from a Special Issue in the journal Mathematics entitled ‘Advances and Novel Approaches in Discrete Optimization’. It contains 17 articles covering a broad spectrum of subjects which have been selected from 43 submitted papers after a thorough refereeing process. Among other topics, it includes seven articles dealing with scheduling problems, e.g., online scheduling, batching, dual and inverse scheduling problems, or uncertain scheduling problems. Other subjects are graphs and applications, evacuation planning, the max-cut problem, capacitated lot-sizing, and packing algorithms
    corecore