20,056 research outputs found

    An output-sensitive algorithm for the minimization of 2-dimensional String Covers

    Full text link
    String covers are a powerful tool for analyzing the quasi-periodicity of 1-dimensional data and find applications in automata theory, computational biology, coding and the analysis of transactional data. A \emph{cover} of a string TT is a string CC for which every letter of TT lies within some occurrence of CC. String covers have been generalized in many ways, leading to \emph{k-covers}, \emph{λ\lambda-covers}, \emph{approximate covers} and were studied in different contexts such as \emph{indeterminate strings}. In this paper we generalize string covers to the context of 2-dimensional data, such as images. We show how they can be used for the extraction of textures from images and identification of primitive cells in lattice data. This has interesting applications in image compression, procedural terrain generation and crystallography

    Scalable Exact Parent Sets Identification in Bayesian Networks Learning with Apache Spark

    Full text link
    In Machine Learning, the parent set identification problem is to find a set of random variables that best explain selected variable given the data and some predefined scoring function. This problem is a critical component to structure learning of Bayesian networks and Markov blankets discovery, and thus has many practical applications, ranging from fraud detection to clinical decision support. In this paper, we introduce a new distributed memory approach to the exact parent sets assignment problem. To achieve scalability, we derive theoretical bounds to constraint the search space when MDL scoring function is used, and we reorganize the underlying dynamic programming such that the computational density is increased and fine-grain synchronization is eliminated. We then design efficient realization of our approach in the Apache Spark platform. Through experimental results, we demonstrate that the method maintains strong scalability on a 500-core standalone Spark cluster, and it can be used to efficiently process data sets with 70 variables, far beyond the reach of the currently available solutions

    An Efficient Algorithm for Enumerating Chordless Cycles and Chordless Paths

    Full text link
    A chordless cycle (induced cycle) CC of a graph is a cycle without any chord, meaning that there is no edge outside the cycle connecting two vertices of the cycle. A chordless path is defined similarly. In this paper, we consider the problems of enumerating chordless cycles/paths of a given graph G=(V,E),G=(V,E), and propose algorithms taking O(∣E∣)O(|E|) time for each chordless cycle/path. In the existing studies, the problems had not been deeply studied in the theoretical computer science area, and no output polynomial time algorithm has been proposed. Our experiments showed that the computation time of our algorithms is constant per chordless cycle/path for non-dense random graphs and real-world graphs. They also show that the number of chordless cycles is much smaller than the number of cycles. We applied the algorithm to prediction of NMR (Nuclear Magnetic Resonance) spectra, and increased the accuracy of the prediction

    Large induced subgraphs via triangulations and CMSO

    Full text link
    We obtain an algorithmic meta-theorem for the following optimization problem. Let \phi\ be a Counting Monadic Second Order Logic (CMSO) formula and t be an integer. For a given graph G, the task is to maximize |X| subject to the following: there is a set of vertices F of G, containing X, such that the subgraph G[F] induced by F is of treewidth at most t, and structure (G[F],X) models \phi. Some special cases of this optimization problem are the following generic examples. Each of these cases contains various problems as a special subcase: 1) "Maximum induced subgraph with at most l copies of cycles of length 0 modulo m", where for fixed nonnegative integers m and l, the task is to find a maximum induced subgraph of a given graph with at most l vertex-disjoint cycles of length 0 modulo m. 2) "Minimum \Gamma-deletion", where for a fixed finite set of graphs \Gamma\ containing a planar graph, the task is to find a maximum induced subgraph of a given graph containing no graph from \Gamma\ as a minor. 3) "Independent \Pi-packing", where for a fixed finite set of connected graphs \Pi, the task is to find an induced subgraph G[F] of a given graph G with the maximum number of connected components, such that each connected component of G[F] is isomorphic to some graph from \Pi. We give an algorithm solving the optimization problem on an n-vertex graph G in time O(#pmc n^{t+4} f(t,\phi)), where #pmc is the number of all potential maximal cliques in G and f is a function depending of t and \phi\ only. We also show how a similar running time can be obtained for the weighted version of the problem. Pipelined with known bounds on the number of potential maximal cliques, we deduce that our optimization problem can be solved in time O(1.7347^n) for arbitrary graphs, and in polynomial time for graph classes with polynomial number of minimal separators

    BioDiVinE: A Framework for Parallel Analysis of Biological Models

    Full text link
    In this paper a novel tool BioDiVinEfor parallel analysis of biological models is presented. The tool allows analysis of biological models specified in terms of a set of chemical reactions. Chemical reactions are transformed into a system of multi-affine differential equations. BioDiVinE employs techniques for finite discrete abstraction of the continuous state space. At that level, parallel analysis algorithms based on model checking are provided. In the paper, the key tool features are described and their application is demonstrated by means of a case study
    • …
    corecore