642 research outputs found

    Optimal Trees

    Get PDF
    Not Availabl

    Efficient mining of discriminative molecular fragments

    Get PDF
    Frequent pattern discovery in structured data is receiving an increasing attention in many application areas of sciences. However, the computational complexity and the large amount of data to be explored often make the sequential algorithms unsuitable. In this context high performance distributed computing becomes a very interesting and promising approach. In this paper we present a parallel formulation of the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The application is characterized by a highly irregular tree-structured computation. No estimation is available for task workloads, which show a power-law distribution in a wide range. The proposed approach allows dynamic resource aggregation and provides fault and latency tolerance. These features make the distributed application suitable for multi-domain heterogeneous environments, such as computational Grids. The distributed application has been evaluated on the well known National Cancer Institute’s HIV-screening dataset

    Capacitated Trees, Capacitated Routing, and Associated Polyhedra

    Get PDF
    We study the polyhedral structure of two related core combinatorial problems: the subtree cardinalityconstrained minimal spanning tree problem and the identical customer vehicle routing problem. For each of these problems, and for a forest relaxation of the minimal spanning tree problem, we introduce a number of new valid inequalities and specify conditions for ensuring when these inequalities are facets for the associated integer polyhedra. The inequalities are defined by one of several underlying support graphs: (i) a multistar, a "star" with a clique replacing the central vertex; (ii) a clique cluster, a collection of cliques intersecting at a single vertex, or more generally at a central" clique; and (iii) a ladybug, consisting of a multistar as a head and a clique as a body. We also consider packing (generalized subtour elimination) constraints, as well as several variants of our basic inequalities, such as partial multistars, whose satellite vertices need not be connected to all of the central vertices. Our development highlights the relationship between the capacitated tree and capacitated forest polytopes and a so-called path-partitioning polytope,and shows how to use monotone polytopes and a set of simple exchange arguments to prove that valid inequalities are facets

    The capacitated minimum spanning tree problem

    Get PDF
    In this thesis we focus on the Capacitated Minimum Spanning Tree (CMST), an extension of the minimum spanning tree (MST) which considers a central or root vertex which receives and sends commodities (information, goods, etc) to a group of terminals. Such commodities flow through links which have capacities that limit the total flow they can accommodate. These capacity constraints over the links result of interest because in many applications the capacity limits are inherent. We find the applications of the CMST in the same areas as the applications of the MST; telecommunications network design, facility location planning, and vehicle routing. The CMST arises in telecommunications networks design when the presence of a central server is compulsory and the flow of information is limited by the capacity of either the server or the connection lines. Its study also results specially interesting in the context of the vehicle routing problem, due to the utility that spanning trees can have in constructive methods. By the simple fact of adding capacity constraints to the MST problem we move from a polynomially solvable problem to a non-polynomial one. In the first chapter we describe and define the problem, introduce some notation, and present a review of the existing literature. In such review we include formulations and exact methods as well as the most relevant heuristic approaches. In the second chapter two basic formulations and the most used valid inequalities are presented. In the third chapter we present two new formulations for the CMST which are based on the identification of subroots (vertices directly connected to the root). One way of characterizing CMST solutions is by identifying the subroots and the vertices assigned to them. Both formulations use binary decision variables y to identify the subroots. Additional decision variables x are used to represent the elements (arcs) of the tree. In the second formulation the set of x variables is extended to indicate the depth of the arcs in the tree. For each formulation we present families of valid inequalities and address the separation problem in each case. Also a solution algorithm is proposed. In the fourth chapter we present a biased random-key genetic algorithm (BRKGA) for the CMST. BRKGA is a population-based metaheuristic, that has been used for combinatorial optimization. Decoders, solution representation and exploring strategies are presented and discussed. A final algorithm to obtain upper bounds for the CMST is proposed. Numerical results for the BRKGA and two cutting plane algorithms based on the new formulations are presented in the fifth chapter . The above mentioned results are discussed and analyzed in this same chapter. The conclusion of this thesis are presented in the last chapter, in which we include the opportunity areas suitable for future research.En esta tesis nos enfocamos en el problema del Árbol de Expansión Capacitado de Coste Mínimo (CMST, por sus siglas en inglés), que es una extensión del problema del árbol de expansión de coste mínimo (MST, por sus siglas en inglés). El CMST considera un vértice raíz que funciona como servidor central y que envía y recibe bienes (información, objetos, etc) a un conjunto de vértices llamados terminales. Los bienes solo pueden fluir entre el servidor y las terminales a través de enlaces cuya capacidad es limitada. Dichas restricciones sobre los enlaces dan relevancia al problema, ya que existen muchas aplicaciones en que las restricciones de capacidad son de vital importancia. Dentro de las áreas de aplicación del CMST más importantes se encuentran las relacionadas con el diseño de redes de telecomunicación, el diseño de rutas de vehículos y problemas de localización. Dentro del diseño de redes de telecomunicación, el CMST está presente cuando se considera un servidor central, cuya capacidad de transmisión y envío está limitada por las características de los puertos del servidor o de las líneas de transmisión. Dentro del diseño de rutas de vehículos el CMST resulta relevante debido a la influencia que pueden tener los árboles en el proceso de construcción de soluciones. Por el simple de añadir las restricciones de capacidad, el problema pasa de resolverse de manera exacta en tiempo polinomial usando un algoritmo voraz, a un problema que es muy difícil de resolver de manera exacta. En el primer capítulo se describe y define el problema, se introduce notación y se presenta una revisión bibliográfica de la literatura existente. En dicha revisión bibliográfica se incluyen formulaciones, métodos exactos y los métodos heurísticos utilizados más importantes. En el siguiente capítulo se muestran dos formulaciones binarias existentes, así como las desigualdades válidas más usadas para resolver el CMST. Para cada una de las formulaciones propuestas, se describe un algoritmo de planos de corte. Dos nuevas formulaciones para el CMST se presentan en el tercer capítulo. Dichas formulaciones estás basadas en la identificación de un tipo de vértices especiales llamados subraíces. Los subraíces son aquellos vértices que se encuentran directamente conectados al raíz. Un forma de caracterizar las soluciones del CMST es a través de identificar los nodos subraíces y los nodos dependientes a ellos. Ambas formulaciones utilizan variables para identificar los subraices y variables adicionales para identificar los arcos que forman parte del árbol. Adicionalmente, las variables en la segunda formulación ayudan a identificar la profundidad con respecto al raíz a la que se encuentran dichos arcos. Para cada formulación se presentan desigualdades válidas y se plantean procedimientos para resolver el problema de su separación. En el cuarto capítulo se presenta un algoritmo genético llamado BRKGA para resolver el CMST. El BRKGA está basado en el uso de poblaciones generadas por secuencias de números aleatorios, que posteriormente evolucionan. Diferentes decodificadores, un método de búsqueda local, espacios de búsqueda y estrategias de exploración son presentados y analizados. El capítulo termina presentando un algoritmo final que permite la obtención de cotas superiores para el CMST. Los resultados computacionales para el BRKGA y los dos algoritmos de planos de corte basados en las formulaciones propuestas se muestran en el quinto capítulo. Dichos resultados son analizados y discutidos en dicho capítulo. La tesis termina presentando las conclusiones derivadas del desarrollo del trabajo de investigación, así como las áreas de oportunidad sobre las que es posible realizar futuras investigaciones

    Dynamic load balancing for the distributed mining of molecular structures

    Get PDF
    In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids

    Structured Sparsity: Discrete and Convex approaches

    Full text link
    Compressive sensing (CS) exploits sparsity to recover sparse or compressible signals from dimensionality reducing, non-adaptive sensing mechanisms. Sparsity is also used to enhance interpretability in machine learning and statistics applications: While the ambient dimension is vast in modern data analysis problems, the relevant information therein typically resides in a much lower dimensional space. However, many solutions proposed nowadays do not leverage the true underlying structure. Recent results in CS extend the simple sparsity idea to more sophisticated {\em structured} sparsity models, which describe the interdependency between the nonzero components of a signal, allowing to increase the interpretability of the results and lead to better recovery performance. In order to better understand the impact of structured sparsity, in this chapter we analyze the connections between the discrete models and their convex relaxations, highlighting their relative advantages. We start with the general group sparse model and then elaborate on two important special cases: the dispersive and the hierarchical models. For each, we present the models in their discrete nature, discuss how to solve the ensuing discrete problems and then describe convex relaxations. We also consider more general structures as defined by set functions and present their convex proxies. Further, we discuss efficient optimization solutions for structured sparsity problems and illustrate structured sparsity in action via three applications.Comment: 30 pages, 18 figure
    corecore