7 research outputs found

    Interpretable network propagation with application to expanding the repertoire of human proteins that interact with SARS-CoV-2

    Get PDF
    BACKGROUND: Network propagation has been widely used for nearly 20 years to predict gene functions and phenotypes. Despite the popularity of this approach, little attention has been paid to the question of provenance tracing in this context, e.g., determining how much any experimental observation in the input contributes to the score of every prediction. RESULTS: We design a network propagation framework with 2 novel components and apply it to predict human proteins that directly or indirectly interact with SARS-CoV-2 proteins. First, we trace the provenance of each prediction to its experimentally validated sources, which in our case are human proteins experimentally determined to interact with viral proteins. Second, we design a technique that helps to reduce the manual adjustment of parameters by users. We find that for every top-ranking prediction, the highest contribution to its score arises from a direct neighbor in a human protein-protein interaction network. We further analyze these results to develop functional insights on SARS-CoV-2 that expand on known biology such as the connection between endoplasmic reticulum stress, HSPA5, and anti-clotting agents. CONCLUSIONS: We examine how our provenance-tracing method can be generalized to a broad class of network-based algorithms. We provide a useful resource for the SARS-CoV-2 community that implicates many previously undocumented proteins with putative functional relationships to viral infection. This resource includes potential drugs that can be opportunistically repositioned to target these proteins. We also discuss how our overall framework can be extended to other, newly emerging viruses.DBI-1759858 - National Science Foundation; Boston UniversityPublished versio

    Faster Algorithms for Edge Connectivity via Random 22-Out Contractions

    Full text link
    We provide a simple new randomized contraction approach to the global minimum cut problem for simple undirected graphs. The contractions exploit 2-out edge sampling from each vertex rather than the standard uniform edge sampling. We demonstrate the power of our new approach by obtaining better algorithms for sequential, distributed, and parallel models of computation. Our end results include the following randomized algorithms for computing edge connectivity with high probability: -- Two sequential algorithms with complexities O(mlogn)O(m \log n) and O(m+nlog3n)O(m+n \log^3 n). These improve on a long line of developments including a celebrated O(mlog3n)O(m \log^3 n) algorithm of Karger [STOC'96] and the state of the art O(mlog2n(loglogn)2)O(m \log^2 n (\log\log n)^2) algorithm of Henzinger et al. [SODA'17]. Moreover, our O(m+nlog3n)O(m+n \log^3 n) algorithm is optimal whenever m=Ω(nlog3n)m = \Omega(n \log^3 n). Within our new time bounds, whp, we can also construct the cactus representation of all minimal cuts. -- An O˜(n0.8D0.2+n0.9)\~O(n^{0.8} D^{0.2} + n^{0.9}) round distributed algorithm, where D denotes the graph diameter. This improves substantially on a recent breakthrough of Daga et al. [STOC'19], which achieved a round complexity of O˜(n11/353D1/353+n11/706)\~O(n^{1-1/353}D^{1/353} + n^{1-1/706}), hence providing the first sublinear distributed algorithm for exactly computing the edge connectivity. -- The first O(1)O(1) round algorithm for the massively parallel computation setting with linear memory per machine.Comment: algorithms and data structures, graph algorithms, edge connectivity, out-contractions, randomized algorithms, distributed algorithms, massively parallel computatio

    A Dynamic Shortest Paths Toolbox: Low-Congestion Vertex Sparsifiers and their Applications

    Full text link
    We present a general toolbox, based on new vertex sparsifiers, for designing data structures to maintain shortest paths in dynamic graphs. In an mm-edge graph undergoing edge insertions and deletions, our data structures give the first algorithms for maintaining (a) mo(1)m^{o(1)}-approximate all-pairs shortest paths (APSP) with \emph{worst-case} update time mo(1)m^{o(1)} and query time O~(1)\tilde{O}(1), and (b) a tree TT that has diameter no larger than a subpolynomial factor times the diameter of the underlying graph, where each update is handled in amortized subpolynomial time. In graphs undergoing only edge deletions, we develop a simpler and more efficient data structure to maintain a (1+ϵ)(1+\epsilon)-approximate single-source shortest paths (SSSP) tree TT in a graph undergoing edge deletions in amortized time mo(1)m^{o(1)} per update. Our data structures are deterministic. The trees we can maintain are not subgraphs of GG, but embed with small edge congestion into GG. This is in stark contrast to previous approaches and is useful for algorithms that internally use trees to route flow. To illustrate the power of our new toolbox, we show that our SSSP data structure gives simple deterministic implementations of flow-routing MWU methods in several contexts, where previously only randomized methods had been known. To obtain our toolbox, we give the first algorithm that, given a graph GG undergoing edge insertions and deletions and a dynamic terminal set AA, maintains a vertex sparsifier HH that approximately preserves distances between terminals in AA, consists of at most Amo(1)|A|m^{o(1)} vertices and edges, and can be updated in worst-case time mo(1)m^{o(1)}. Crucially, our vertex sparsifier construction allows us to maintain a low edge-congestion embedding of HH into GG, which is needed for our applications
    corecore