317,115 research outputs found

    Counting spanning trees in a small-world Farey graph

    Full text link
    The problem of spanning trees is closely related to various interesting problems in the area of statistical physics, but determining the number of spanning trees in general networks is computationally intractable. In this paper, we perform a study on the enumeration of spanning trees in a specific small-world network with an exponential distribution of vertex degrees, which is called a Farey graph since it is associated with the famous Farey sequence. According to the particular network structure, we provide some recursive relations governing the Laplacian characteristic polynomials of a Farey graph and its subgraphs. Then, making use of these relations obtained here, we derive the exact number of spanning trees in the Farey graph, as well as an approximate numerical solution for the asymptotic growth constant characterizing the network. Finally, we compare our results with those of different types of networks previously investigated.Comment: Definitive version accepted for publication in Physica

    Parallel Graph Connectivity in Log Diameter Rounds

    Full text link
    We study graph connectivity problem in MPC model. On an undirected graph with nn nodes and mm edges, O(logn)O(\log n) round connectivity algorithms have been known for over 35 years. However, no algorithms with better complexity bounds were known. In this work, we give fully scalable, faster algorithms for the connectivity problem, by parameterizing the time complexity as a function of the diameter of the graph. Our main result is a O(logDloglogm/nn)O(\log D \log\log_{m/n} n) time connectivity algorithm for diameter-DD graphs, using Θ(m)\Theta(m) total memory. If our algorithm can use more memory, it can terminate in fewer rounds, and there is no lower bound on the memory per processor. We extend our results to related graph problems such as spanning forest, finding a DFS sequence, exact/approximate minimum spanning forest, and bottleneck spanning forest. We also show that achieving similar bounds for reachability in directed graphs would imply faster boolean matrix multiplication algorithms. We introduce several new algorithmic ideas. We describe a general technique called double exponential speed problem size reduction which roughly means that if we can use total memory NN to reduce a problem from size nn to n/kn/k, for k=(N/n)Θ(1)k=(N/n)^{\Theta(1)} in one phase, then we can solve the problem in O(loglogN/nn)O(\log\log_{N/n} n) phases. In order to achieve this fast reduction for graph connectivity, we use a multistep algorithm. One key step is a carefully constructed truncated broadcasting scheme where each node broadcasts neighbor sets to its neighbors in a way that limits the size of the resulting neighbor sets. Another key step is random leader contraction, where we choose a smaller set of leaders than many previous works do

    Finding conserved patterns in biological sequences, networks and genomes

    Get PDF
    Biological patterns are widely used for identifying biologically interesting regions within macromolecules, classifying biological objects, predicting functions and studying evolution. Good pattern finding algorithms will help biologists to formulate and validate hypotheses in an attempt to obtain important insights into the complex mechanisms of living things. In this dissertation, we aim to improve and develop algorithms for five biological pattern finding problems. For the multiple sequence alignment problem, we propose an alternative formulation in which a final alignment is obtained by preserving pairwise alignments specified by edges of a given tree. In contrast with traditional NPhard formulations, our preserving alignment formulation can be solved in polynomial time without using a heuristic, while having very good accuracy. For the path matching problem, we take advantage of the linearity of the query path to reduce the problem to finding a longest weighted path in a directed acyclic graph. We can find k paths with top scores in a network from the query path in polynomial time. As many biological pathways are not linear, our graph matching approach allows a non-linear graph query to be given. Our graph matching formulation overcomes the common weakness of previous approaches that there is no guarantee on the quality of the results. For the gene cluster finding problem, we investigate a formulation based on constraining the overall size of a cluster and develop statistical significance estimates that allow direct comparisons of clusters of different sizes. We explore both a restricted version which requires that orthologous genes are strictly ordered within each cluster, and the unrestricted problem that allows paralogous genes within a genome and clusters that may not appear in every genome. We solve the first problem in polynomial time and develop practical exact algorithms for the second one. In the gene cluster querying problem, based on a querying strategy, we propose an efficient approach for investigating clustering of related genes across multiple genomes for a given gene cluster. By analyzing gene clustering in 400 bacterial genomes, we show that our algorithm is efficient enough to study gene clusters across hundreds of genomes