6 research outputs found

    Lo-Hi: Practical ML Drug Discovery Benchmark

    Full text link
    Finding new drugs is getting harder and harder. One of the hopes of drug discovery is to use machine learning models to predict molecular properties. That is why models for molecular property prediction are being developed and tested on benchmarks such as MoleculeNet. However, existing benchmarks are unrealistic and are too different from applying the models in practice. We have created a new practical \emph{Lo-Hi} benchmark consisting of two tasks: Lead Optimization (Lo) and Hit Identification (Hi), corresponding to the real drug discovery process. For the Hi task, we designed a novel molecular splitting algorithm that solves the Balanced Vertex Minimum kk-Cut problem. We tested state-of-the-art and classic ML models, revealing which works better under practical settings. We analyzed modern benchmarks and showed that they are unrealistic and overoptimistic. Review: https://openreview.net/forum?id=H2Yb28qGLV Lo-Hi benchmark: https://github.com/SteshinSS/lohi_neurips2023 Lo-Hi splitter library: https://github.com/SteshinSS/lohi_splitterComment: 29 pages, Advances in Neural Information Processing Systems, 202

    Casting Light on the Hidden Bilevel Combinatorial Structure of the Capacitated Vertex Separator Problem

    Get PDF
    Given an undirected graph, we study the capacitated vertex separator problem that asks to find a subset of vertices of minimum cardinality, the removal of which induces a graph having a bounded number of pairwise disconnected shores (subsets of vertices) of limited cardinality. The problem is of great importance in the analysis and protection of communication or social networks against possible viral attacks and for matrix decomposition algorithms. In this article, we provide a new bilevel interpretation of the problem and model it as a two-player Stackelberg game in which the leader interdicts the vertices (i.e., decides on the subset of vertices to remove), and the follower solves a combinatorial optimization problem on the resulting graph. This approach allows us to develop a computational framework based on an integer programming formulation in the natural space of the variables. Thanks to this bilevel interpretation, we derive three different families of strengthening inequalities and show that they can be separated in polynomial time. We also show how to extend these results to a min-max version of the problem. Our extensive computational study conducted on available benchmark instances from the literature reveals that our new exact method is competitive against the state-of-the-art algorithms for the capacitated vertex separator problem and is able to improve the best-known results for several difficult classes of instances. The ideas exploited in our framework can also be extended to other vertex/edge deletion/ insertion problems or graph partitioning problems by modeling them as two-player Stackel- berg games and solving them through bilevel optimization

    Models and algorithms for decomposition problems

    Get PDF
    This thesis deals with the decomposition both as a solution method and as a problem itself. A decomposition approach can be very effective for mathematical problems presenting a specific structure in which the associated matrix of coefficients is sparse and it is diagonalizable in blocks. But, this kind of structure may not be evident from the most natural formulation of the problem. Thus, its coefficient matrix may be preprocessed by solving a structure detection problem in order to understand if a decomposition method can successfully be applied. So, this thesis deals with the k-Vertex Cut problem, that is the problem of finding the minimum subset of nodes whose removal disconnects a graph into at least k components, and it models relevant applications in matrix decomposition for solving systems of equations by parallel computing. The capacitated k-Vertex Separator problem, instead, asks to find a subset of vertices of minimum cardinality the deletion of which disconnects a given graph in at most k shores and the size of each shore must not be larger than a given capacity value. Also this problem is of great importance for matrix decomposition algorithms. This thesis also addresses the Chance-Constrained Mathematical Program that represents a significant example in which decomposition techniques can be successfully applied. This is a class of stochastic optimization problems in which the feasible region depends on the realization of a random variable and the solution must optimize a given objective function while belonging to the feasible region with a probability that must be above a given value. In this thesis, a decomposition approach for this problem is introduced. The thesis also addresses the Fractional Knapsack Problem with Penalties, a variant of the knapsack problem in which items can be split at the expense of a penalty depending on the fractional quantity

    Causal failures and cost-effective edge augmentation in networks

    Get PDF
    Node failures have a terrible effect on the connectivity of the network. In traditional models, the failures of nodes affect their neighbors and may further trigger the failures of their neighbors, and so on. However, it is also possible that node failures would indirectly cause the failure of nodes that are not adjacent to the failed one. In a power grid, generators share the load. Failure of one generator induces extra load on other generators in the network, which could further trigger their failures. We call such failures causal failures. In this dissertation, we consider the impact of causal failures on multiple aspects of one network. More specifically, we list the content as follows. • In Chapter 1, we introduce basic concepts of networks and graphs, classical models of failures and formally define causal failures in a given network. • Chapter 2 addresses the network’s robustness and aims to find the maximum number of causal failures while maintaining a connected component with a size of at least a given integer. More specifically, we are looking into the number of causal node failures we can tolerate yet have most of the system connected with α being used to parametrize. • Chapter 3 deals with vulnerability, wherein we aim to find the minimum number of causal failures such that there are at least k connected components remaining. We are looking for the set of causal failures that will result in the network being disconnected into k or more components. • In Chapter 4, we consider causal node failures occurring in a cascading manner. Cascading causal node failures affect communication within nodes, which is dependent on the paths that connect them. Therefore, in this context of the cascading causal failure model, we study the impact of cascading causal failures on the distance between a pair of nodes in the network. More precisely, given a network G, a set of causal failures (containing possible cascading failures), a pair of nodes s and t, and a constant α ≥ 1, we would like to determine the maximum number of causal failures that can be applied (meaning that the nodes in the causal failures are removed), such that in the resulting network G′, dG′ (s, t) ≤ α × dG(s, t), where dG(s, t) and dG′ (s, t) are the distance between nodes s and t in the networks G and G′, respectively. • In Chapter 5, we consider causal edge failures in flow networks and investigate the impact of causal edge failures on flow transmission. We formulate an optimization problem to find the maximum number of causal edge failures after which the flow network can still deliver d units from source node s to terminal node t. • In Chapter 6, we consider edge-weighted network augmentation when facing causal failures. We look for a set of edges with minimum weight such that the network maintains an α-giant component when applying each causality individually. We show that the optimization problems in these chapters are NP-hard and provide the corresponding mixed integer linear programming models. Moreover, we design polynomial-time heuristic algorithms to solve them approximately. In each chapter, we run experiments on multiple synthetic and real networks to compare the performance of the mixed integer linear programming models and the heuristic algorithms. The results show that the heuristic algorithms show their efficacy and efficiency compared to the mixed-integer linear programming models

    The vertex k-cut problem

    No full text
    Given an undirected graph G=(V,E), a vertex k-cut of G is a vertex subset of V the removing of which disconnects the graph in at least k components. Given a graph G and an integer k≥2, the vertex k-cut problem consists in finding a vertex k-cut of G of minimum cardinality. We first prove that the problem is NP-hard for any fixed k≥3. We then present a compact formulation, and an extended formulation from which we derive a column generation and a branching scheme. Extensive computational results prove the effectiveness of the proposed methods

    The vertex k-cut problem

    No full text
    International audienceGiven an undirected graph G=(V, E), a vertex k-cut of G is a vertex subset of the removing of which disconnects the graph in at least components. Given a graph and an integer k≥2, the vertex k-cut problem consists in finding a vertex k-cut of of G minimum cardinality. We first prove that the problem is NP-hard for any fixed k≥3. We then present a compact formulation, and an extended formulation from which we derive a column generation and a branching scheme. Extensive computational results prove the effectiveness of the proposed methods
    corecore