4,008 research outputs found

    Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases

    Full text link
    Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in reality, graphs are often noisy and uncertain due to various factors, such as errors in data extraction, inconsistencies in data integration, and privacy preserving purposes. Therefore, in this paper, we study subgraph similarity search on large probabilistic graph databases. Different from previous works assuming that edges in an uncertain graph are independent of each other, we study the uncertain graphs where edges' occurrences are correlated. We formally prove that subgraph similarity search over probabilistic graphs is #P-complete, thus, we employ a filter-and-verify framework to speed up the search. In the filtering phase,we develop tight lower and upper bounds of subgraph similarity probability based on a probabilistic matrix index, PMI. PMI is composed of discriminative subgraph features associated with tight lower and upper bounds of subgraph isomorphism probability. Based on PMI, we can sort out a large number of probabilistic graphs and maximize the pruning capability. During the verification phase, we develop an efficient sampling algorithm to validate the remaining candidates. The efficiency of our proposed solutions has been verified through extensive experiments.Comment: VLDB201

    A note on quantum algorithms and the minimal degree of epsilon-error polynomials for symmetric functions

    Full text link
    The degrees of polynomials representing or approximating Boolean functions are a prominent tool in various branches of complexity theory. Sherstov recently characterized the minimal degree deg_{\eps}(f) among all polynomials (over the reals) that approximate a symmetric function f:{0,1}^n-->{0,1} up to worst-case error \eps: deg_{\eps}(f) = ~\Theta(deg_{1/3}(f) + \sqrt{n\log(1/\eps)}). In this note we show how a tighter version (without the log-factors hidden in the ~\Theta-notation), can be derived quite easily using the close connection between polynomials and quantum algorithms.Comment: 7 pages LaTeX. 2nd version: corrected a few small inaccuracie

    An Overview of Schema Theory

    Full text link
    The purpose of this paper is to give an introduction to the field of Schema Theory written by a mathematician and for mathematicians. In particular, we endeavor to to highlight areas of the field which might be of interest to a mathematician, to point out some related open problems, and to suggest some large-scale projects. Schema theory seeks to give a theoretical justification for the efficacy of the field of genetic algorithms, so readers who have studied genetic algorithms stand to gain the most from this paper. However, nothing beyond basic probability theory is assumed of the reader, and for this reason we write in a fairly informal style. Because the mathematics behind the theorems in schema theory is relatively elementary, we focus more on the motivation and philosophy. Many of these results have been proven elsewhere, so this paper is designed to serve a primarily expository role. We attempt to cast known results in a new light, which makes the suggested future directions natural. This involves devoting a substantial amount of time to the history of the field. We hope that this exposition will entice some mathematicians to do research in this area, that it will serve as a road map for researchers new to the field, and that it will help explain how schema theory developed. Furthermore, we hope that the results collected in this document will serve as a useful reference. Finally, as far as the author knows, the questions raised in the final section are new.Comment: 27 pages. Originally written in 2009 and hosted on my website, I've decided to put it on the arXiv as a more permanent home. The paper is primarily expository, so I don't really know where to submit it, but perhaps one day I will find an appropriate journa

    Ala- ja ylärajoja merkkijonon etsinnälle verkosta

    Get PDF
    String Matching in Labelled Graphs (SMLG) is a generalisation of the classic problem of finding a match for a string into a text. In SMLG, we are given a pattern string and a graph with node labels, and we want to find a path whose node labels match the pattern string. This problem has been studied since 1992, and it was initially intended to model the problem of finding a link in a hypertext. Recently, the problem received attention due to its applications in bioinformatics, but all of the solutions, old and new, failed to run in truly sub-quadratic time. In this work, based on four published papers, we study SMLG from different angles, first proving conditional lower bounds, and then proposing efficient algorithms for special classes of graphs. In the first paper, we unveil the reason behind the hardness of SMLG, showing a quadratic conditional lower bound based on the Orthogonal Vectors Hypothesis and the Strong Exponential Time Hypothesis. The techniques that we employ come from the fine-grained complexity, and involve finding linear-time reductions from the Orthogonal Vectors problem to different variations of SMLG. In the second paper, we strengthen our findings by showing that an indexing data structure built in polynomial time is not enough to provide subquadratic time queries for SMLG. We devise a general framework for obtaining indexing lower bounds out of regular lower bounds, and we prove the indexing lower bound for SMLG as an application of this technique. In the third paper, we surpass the limitations of our lower bounds by identifying a class of graphs, called founder block graphs, which support linear time queries after subquadratic indexing. This class of graph effectively represents collections of strings called multiple sequence alignments, if gap characters are not present. In the fourth paper, we significantly improve our previous results on efficiently indexable graphs. We propose elastic founder graphs, a superset of founder block graphs, that are able to represent multiple sequence alignments with gaps. Moreover, we propose algorithms for constructing elastic founder graph, indexing them, and perform queries in linear time.Merkkijonon etsintä verkosta (engl. String Matching in Labelled Graphs, SMLG) on yleistys klassiselle ongelmalle etsiä merkkijonohahmon osumaa tekstistä. SMLG ongelmassa syötteenä ovat merkkijonohahmo ja verkko, jonka solmuilla on merkkijonotunnisteet. Tavoitteena on löytää polku, jonka solmujen tunnisteet muodostavat tekstin, joka sisältää annetun merkkijonohahmon. Ongelmaa on tutkittu vuodesta 1992 alun alkaen mallintamaan linkkien etsintää hypertekstistä. Viime aikoina ongelma on tullut uudestaan esille bioinformatiikan saralla. Sekä vanhat että uudet ratkaisut eivät ole onnistuneet oleellisesti murtamaan neliöllistä aikavaativuutta ongelman ratkaisussa. Tässä työssä SMLG ongelmaa tarkastellaan eri näkökulmista perustuen neljään julkaisuun. Ensin todistetaan ehdollinen alaraja ongelman vaativuudelle. Sitten esitetään tehokkaita ratkaisuja erilaisille verkkojen aliluokille. Ensimmäisessä julkaisussa paljastamme syyn SMLG ongelman vaikeudelle johtamalla ehdollisen alarajan perustuen kohtisuorien vektorien hypoteesiin (engl. Orthogonal Vectors Hypothesis) ja vahvaan eksponentiaalisen aikavaativuuden hypoteesiin (engl. Strong Exponential Time Hypothesis). Tähän tulokseen käytämme hienorakenteisen vaativuusteorian (engl. fine-grained complexity) tekniikoita, kuten lineaariaikaista reduktiota kohtisuorien vektoreiden ongelmasta kohdeongelmaan, tässä tapauksessa eri variaatioille SMLG ongelmasta. Toisessa julkaisussa vahvistamme edellistä tulosta osoittamalla, että polynomiaikainen verkon indeksointi ei riitä tukemaan alle neliöaikaista merkkijonohahmon etsintää. Kehitämme yleisen kehikon tämän kaltaisten indeksointialarajojen johtamiseen tavallisista alarajoista, ja todistamme SMLG ongelman alarajan sovellutuksena tästä tekniikasta. Kolmannessa julkaisussa ohitamme alarajat identifioimalla verkkojen aliluokan, kantasegmentteihin perustuvat verkot (engl. founder block graphs), joilla indeksointi onnistuu alle neliöllisessä ajassa, jonka jälkeen merkkijonohahmon etsintää voidaan suorittaa lineaarisessa ajassa. Kantasegmentteihin perustuvilla verkoilla voidaan esittää merkkijonokokoelmien monilinjaukset, mikäli linjauksessa ei tarvita poistoja ja lisäyksiä. Neljännessä julkaisussa parannamme merkittävästi aiempia tuloksiamme indeksoitavista verkoista. Laajennamme kantasegmentteihin perustuvat verkot elastisuuden käsitteellä, jolloin ne voivat esittää mielivaltaisia monilinjauksia, joissa linjauksessa sallitaan poistot ja lisäykset. Tämän lisäksi johdamme algoritmeja näiden elastisten kantasegmentteihin perustuvien verkkojen muodostamiseen, indeksointiin, sekä merkkijonohahmojen etsintään

    On the Comparison Complexity of the String Prefix-Matching Problem

    Get PDF
    In this paper we study the exact comparison complexity of the stringprefix-matching problem in the deterministic sequential comparison modelwith equality tests. We derive almost tight lower and upper bounds onthe number of symbol comparisons required in the worst case by on-lineprefix-matching algorithms for any fixed pattern and variable text. Unlikeprevious results on the comparison complexity of string-matching andprefix-matching algorithms, our bounds are almost tight for any particular pattern.We also consider the special case where the pattern and the text are thesame string. This problem, which we call the string self-prefix problem, issimilar to the pattern preprocessing step of the Knuth-Morris-Pratt string-matchingalgorithm that is used in several comparison efficient string-matchingand prefix-matching algorithms, including in our new algorithm.We obtain roughly tight lower and upper bounds on the number of symbolcomparisons required in the worst case by on-line self-prefix algorithms.Our algorithms can be implemented in linear time and space in thestandard uniform-cost random-access-machine model
    corecore