Search CORE

3,373 research outputs found

VoG: Summarizing and Understanding Large Graphs

Author: Faloutsos Christos
Kang U
Koutra Danai
Vreeken Jilles
Publication venue
Publication date: 01/01/2014
Field of study

How can we succinctly describe a million-node graph with a few simple sentences? How can we measure the "importance" of a set of discovered subgraphs in a large graph? These are exactly the problems we focus on. Our main ideas are to construct a "vocabulary" of subgraph-types that often occur in real graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the most succinct description of a graph in terms of this vocabulary. We measure success in a well-founded way by means of the Minimum Description Length (MDL) principle: a subgraph is included in the summary if it decreases the total description length of the graph. Our contributions are three-fold: (a) formulation: we provide a principled encoding scheme to choose vocabulary subgraphs; (b) algorithm: we develop \method, an efficient method to minimize the description cost, and (c) applicability: we report experimental results on multi-million-edge real graphs, including Flickr and the Notre Dame web graph.Comment: SIAM International Conference on Data Mining (SDM) 201

arXiv.org e-Print Archive

CiteSeerX

Composite repetition-aware data structures

Author: A Blumer
A Lempel
D Arroyuelo
D Belazzougui
DE Willard
J Radoszewski
J Sirén
J Ziv
M Crochemore
M Crochemore
M Raffinot
P Ferragina
S Kreft
T Gagie
V Mäkinen
V Mäkinen
W Rytter
Publication venue
Publication date: 01/01/2015
Field of study

In highly repetitive strings, like collections of genomes from the same species, distinct measures of repetition all grow sublinearly in the length of the text, and indexes targeted to such strings typically depend only on one of these measures. We describe two data structures whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time for counting and reporting all the exact occurrences of a pattern, and the space taken by the structure. The key component of our constructions is the run-length encoded BWT (RLBWT), which takes space proportional to the number of BWT runs: rather than augmenting RLBWT with suffix array samples, we combine it with data structures from LZ77 indexes, which take space proportional to the number of LZ77 factors, and with the compact directed acyclic word graph (CDAWG), which takes space proportional to the number of extensions of maximal repeats. The combination of CDAWG and RLBWT enables also a new representation of the suffix tree, whose size depends again on the number of extensions of maximal repeats, and that is powerful enough to support matching statistics and constant-space traversal.Comment: (the name of the third co-author was inadvertently omitted from previous version

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

User-friendly Support for Common Concepts in a Lightweight Verifier

Author: Lapets Andrei
Publication venue: CS Department, Boston University
Publication date: 14/05/2010
Field of study

Machine verification of formal arguments can only increase our confidence in the correctness of those arguments, but the costs of employing machine verification still outweigh the benefits for some common kinds of formal reasoning activities. As a result, usability is becoming increasingly important in the design of formal verification tools. We describe the "aartifact" lightweight verification system, designed for processing formal arguments involving basic, ubiquitous mathematical concepts. The system is a prototype for investigating potential techniques for improving the usability of formal verification systems. It leverages techniques drawn both from existing work and from our own efforts. In addition to a parser for a familiar concrete syntax and a mechanism for automated syntax lookup, the system integrates (1) a basic logical inference algorithm, (2) a database of propositions governing common mathematical concepts, and (3) a data structure that computes congruence closures of expressions involving relations found in this database. Together, these components allow the system to better accommodate the expectations of users interested in verifying formal arguments involving algebraic and logical manipulations of numbers, sets, vectors, and related operators and predicates. We demonstrate the reasonable performance of this system on typical formal arguments and briefly discuss how the system's design contributed to its usability in two case studies

Boston University Institutional Repository (OpenBU)