445 research outputs found

    Efficiently Computing Directed Minimum Spanning Trees

    Full text link
    Computing a directed minimum spanning tree, called arborescence, is a fundamental algorithmic problem, although not as common as its undirected counterpart. In 1967, Edmonds discussed an elegant solution. It was refined to run in O(min(n2,mlogn))O(\min(n^2, m\log n)) by Tarjan which is optimal for very dense and very sparse graphs. Gabow et al.~gave a version of Edmonds' algorithm that runs in O(nlogn+m)O(n\log n + m), thus asymptotically beating the Tarjan variant in the regime between sparse and dense. Despite the attention the problem received theoretically, there exists, to the best of our knowledge, no empirical evaluation of either of these algorithms. In fact, the version by Gabow et al.~has never been implemented and, aside from coding competitions, all readily available Tarjan implementations run in O(n2)O(n^2). In this paper, we provide the first implementation of the version by Gabow et al.~as well as five variants of Tarjan's version with different underlying data structures. We evaluate these algorithms and existing solvers on a large set of real-world and random graphs

    A clique-difference encoding scheme for labelled k-path graphs

    Get PDF
    AbstractWe present in this paper a codeword for labelled k-path graphs. Structural properties of this codeword are investigated, leading to the solution of two important problems: determining the exact number of labelled k-path graphs with n vertices and locating a hamiltonian path in a given k-path graph in time O(n). The corresponding encoding scheme is also presented, providing linear-time algorithms for encoding and decoding

    Spectral redemption: clustering sparse networks

    Get PDF
    Spectral algorithms are classic approaches to clustering and community detection in networks. However, for sparse networks the standard versions of these algorithms are suboptimal, in some cases completely failing to detect communities even when other algorithms such as belief propagation can do so. Here we introduce a new class of spectral algorithms based on a non-backtracking walk on the directed edges of the graph. The spectrum of this operator is much better-behaved than that of the adjacency matrix or other commonly used matrices, maintaining a strong separation between the bulk eigenvalues and the eigenvalues relevant to community structure even in the sparse case. We show that our algorithm is optimal for graphs generated by the stochastic block model, detecting communities all the way down to the theoretical limit. We also show the spectrum of the non-backtracking operator for some real-world networks, illustrating its advantages over traditional spectral clustering.Comment: 11 pages, 6 figures. Clarified to what extent our claims are rigorous, and to what extent they are conjectures; also added an interpretation of the eigenvectors of the 2n-dimensional version of the non-backtracking matri

    Infective flooding in low-duty-cycle networks, properties and bounds

    Get PDF
    Flooding information is an important function in many networking applications. In some networks, as wireless sensor networks or some ad-hoc networks it is so essential as to dominate the performance of the entire system. Exploiting some recent results based on the distributed computation of the eigenvector centrality of nodes in the network graph and classical dynamic diffusion models on graphs, this paper derives a novel theoretical framework for efficient resource allocation to flood information in mesh networks with low duty-cycling without the need to build a distribution tree or any other distribution overlay. Furthermore, the method requires only local computations based on each node neighborhood. The model provides lower and upper stochastic bounds on the flooding delay averages on all possible sources with high probability. We show that the lower bound is very close to the theoretical optimum. A simulation-based implementation allows the study of specific topologies and graph models as well as scheduling heuristics and packet losses. Simulation experiments show that simple protocols based on our resource allocation strategy can easily achieve results that are very close to the theoretical minimum obtained building optimized overlays on the network

    Algorithms and Data Structures for Coding, Indexing, and Mining of Sequential Data

    Get PDF
    In recent years, the production of sequential data has been rapidly increasing. This requires solving challenging problems about how to represent information, how to retrieve information, and how to extract knowledge, from sequential data. These questions belong to the areas of coding, indexing, and mining, respectively. In this thesis, we investigate problems from those three areas. Coding refers to the way in which information is represented. Coding aims at generating optimal codes, that are codes having a minimum expected length. Codes can be generated for different purposes, from data compression to error detection/correction. The Lempel-Ziv 77 parsing produces an asymptotically optimal code in terms of compression. We study algorithms to efficiently decompress strings from the Lempel-Ziv 77 parsing, using memory proportional to the size of the parsing itself. We provide the first implementation of an algorithm by Bille et al., the only work we are aware of on this problem. We present a practical evaluation of this approach and several optimizations which improve the performance on all datasets we tested. Through the Ulam-R{'e}nyi game, it is possible to provide optimal adaptive error-correcting codes. The game consists of discovering an unknown mm-bit number by asking membership questions the answers to which can be erroneous. Questions are formulated knowing the answers to all previous ones. We want to find an optimal strategy, i.e., a strategy that can identify any mm-bit number using the theoretical minimum number of questions. We studied the case where questions are a union of up to a fixed number of intervals, and up to three answers can be erroneous. We first show that for any sufficiently large mm, there exists a strategy to identify an initially unknown mm-bit number which uses at most four intervals per question. We further refine our main tool to turn the above asymptotic result into a complete characterization of those instances of the Ulam-R{'e}nyi game that admit optimal strategies. Indexing refers to the way in which information is retrieved. An index for texts permits finding all occurrences of any substring, without traversing the whole text. Many applications require to look for approximate substrings. One of these is the problem of jumbled pattern matching, where two strings match if one is a permutation of the other. We study combinatorial aspects of prefix normal words, a class of binary words introduced in this context. These words can be used as indices for the Indexed Binary Jumbled Pattern Matching problem. We present a new recursive generation algorithm for prefix normal words that is competitive with the previous one but allows to list all prefix normal words sharing the same prefix. This sheds lights on novel insights that may help solving the problem of counting the number of prefix normal words of a given length. We then introduce infinite prefix normal words, and we show that one of the operations used by the algorithm, when repeatedly applied to extend a word, produces an infinite prefix normal word. This motivates the seeking for other operations that produce infinite prefix normal words. We found that one of these operations establishes a connection between prefix normal words and Sturmian words. We also explored the relationship between prefix normal words and Abelian complexity, as well as between prefix normal words and lexicographic order. Mining refers to the way in which information is converted into knowledge. The process of knowledge discovery covers several processing steps, including knowledge extraction. We analyze the problem of mining assertions for an embedded system from its simulation traces. This problem can be modeled as a pattern discovery problem on colored strings. We present two problems of pattern discovery on colored strings: patterns for one color only, or for all colors at the same time. We present two suffix tree-based algorithms. The first algorithm solves both the one color problem and the all colors problem. We then, introduce modifications which improve performance of the algorithm both on synthetic and on real data. We implemented and evaluated the proposed approaches, highlighting time trade-offs that can be obtained. A different way of knowledge extraction is based on the information-theoretic perspective of Pearl's model of causality. It has been postulated that the true causality direction between two phenomena A and B is related to the problem of finding the minimum entropy joint distribution between A and B. This problem is known to be NP-hard, and greedy algorithms have recently been proposed. We provide a novel analysis of one of the proposed heuristic showing that this algorithm guarantees an additive approximation of 1 bit. We then, provide a general criterion for guaranteeing an additive approximation factor of 1. This criterion may be of independent interest in other contexts where couplings are used

    The Entropy of Lies: Playing Twenty Questions with a Liar

    Get PDF

    Reconstruction of Kauffman networks applying trees

    Get PDF
    AbstractAccording to Kauffman’s theory [S. Kauffman, The Origins of Order, Self-Organization and Selection in Evolution, Oxford University Press, New York, 1993], enzymes in living organisms form a dynamic network, which governs their activity. For each enzyme the network contains:•a collection of enzymes affecting the enzyme and•a Boolean function prescribing next activity of the enzyme as a function of the present activity of the affecting enzymes.Kauffman’s original pure random structure of the connections was criticized by Barabasi and Albert [A.-L. Barabasi, R. Albert, Emergence of scaling in random networks, Science 286 (1999) 509–512]. Their model was unified with Kauffman’s network by Aldana and Cluzel [M. Aldana, P. Cluzel, A natural class of robust networks, Proc. Natl. Acad. Sci. USA 100 (2003) 8710–8714]. Kauffman postulated that the dynamic character of the network determines the fitness of the organism. If the network is either convergent or chaotic, the chance of survival is lessened. If, however, the network is stable and critical, the organism will proliferate. Kauffman originally proposed a special type of Boolean functions to promote stability, which he called the property canalyzing. This property was extended by Shmulevich et al. [I. Shmulevich, H. Lähdesmäki, E.R. Dougherty, J. Astola, W. Zhang, The role of certain Post classes in Boolean network models of genetic networks, Proc. Natl. Acad. Sci. USA 100 (2003) 10734–10739] using Post classes. Following their ideas, we propose decision tree functions for enzymatic interactions. The model is fitted to microarray data of Cogburn et al. [L.A. Cogburn, W. Wang, W. Carre, L. Rejtő, T.E. Porter, S.E. Aggrey, J. Simon, System-wide chicken DNA microarrays, gene expression profiling, and discovery of functional genes, Poult. Sci. Assoc. 82 (2003) 939–951; L.A. Cogburn, X. Wang, W. Carre, L. Rejtő, S.E. Aggrey, M.J. Duclos, J. Simon, T.E. Porter, Functional genomics in chickens: development of integrated-systems microarrays for transcriptional profiling and discovery of regulatory pathways, Comp. Funct. Genom. 5 (2004) 253–261]. In microarray measurements the activity of clones is measured. The problem here is the reconstruction of the structure of enzymatic interactions of the living organism using microarray data. The task resembles summing up the whole story of a film from unordered and perhaps incomplete collections of its pieces. Two basic ingredients will be used in tackling the problem. In our earlier works [L. Rejtő, G. Tusnády, Evolution of random Boolean NK-models in Tierra environment, in: I. Berkes, E. Csaki, M. Csörgő (Eds.), Limit Theorems in Probability an Statistics, Budapest, vol. II, 2002, pp. 499–526] we used an evolutionary strategy called Tierra, which was proposed by Ray [T.S. Ray, Evolution, complexity, entropy and artificial reality, Physica D 75 (1994) 239–263] for investigating complex systems. Here we apply this method together with the tree–structure of clones found in our earlier statistical analysis of microarray measurements [L. Rejtő, G. Tusnády, Clustering methods in microarrays, Period. Math. Hungar. 50 (2005) 199–221]

    Distributing multipartite entanglement over noisy quantum networks

    Get PDF
    A quantum internet aims at harnessing networked quantum technologies, namely by distributing bipartite entanglement between distant nodes. However, multipartite entanglement between the nodes may empower the quantum internet for additional or better applications for communications, sensing, and computation. In this work, we present an algorithm for generating multipartite entanglement between different nodes of a quantum network with noisy quantum repeaters and imperfect quantum memories, where the links are entangled pairs. Our algorithm is optimal for GHZ states with 3 qubits, maximising simultaneously the final state fidelity and the rate of entanglement distribution. Furthermore, we determine the conditions yielding this simultaneous optimality for GHZ states with a higher number of qubits, and for other types of multipartite entanglement. Our algorithm is general also in the sense that it can optimize simultaneously arbitrary parameters. This work opens the way to optimally generate multipartite quantum correlations over noisy quantum networks, an important resource for distributed quantum technologies.info:eu-repo/semantics/publishedVersio
    corecore