22,684 research outputs found

    Causality and the semantics of provenance

    Full text link
    Provenance, or information about the sources, derivation, custody or history of data, has been studied recently in a number of contexts, including databases, scientific workflows and the Semantic Web. Many provenance mechanisms have been developed, motivated by informal notions such as influence, dependence, explanation and causality. However, there has been little study of whether these mechanisms formally satisfy appropriate policies or even how to formalize relevant motivating concepts such as causality. We contend that mathematical models of these concepts are needed to justify and compare provenance techniques. In this paper we review a theory of causality based on structural models that has been developed in artificial intelligence, and describe work in progress on a causal semantics for provenance graphs.Comment: Workshop submissio

    NetLSD: Hearing the Shape of a Graph

    Full text link
    Comparison among graphs is ubiquitous in graph analytics. However, it is a hard task in terms of the expressiveness of the employed similarity measure and the efficiency of its computation. Ideally, graph comparison should be invariant to the order of nodes and the sizes of compared graphs, adaptive to the scale of graph patterns, and scalable. Unfortunately, these properties have not been addressed together. Graph comparisons still rely on direct approaches, graph kernels, or representation-based methods, which are all inefficient and impractical for large graph collections. In this paper, we propose the Network Laplacian Spectral Descriptor (NetLSD): the first, to our knowledge, permutation- and size-invariant, scale-adaptive, and efficiently computable graph representation method that allows for straightforward comparisons of large graphs. NetLSD extracts a compact signature that inherits the formal properties of the Laplacian spectrum, specifically its heat or wave kernel; thus, it hears the shape of a graph. Our evaluation on a variety of real-world graphs demonstrates that it outperforms previous works in both expressiveness and efficiency.Comment: KDD '18: The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, August 19--23, 2018, London, United Kingdo

    Algebraic optimization of recursive queries

    Get PDF
    Over the past few years, much attention has been paid to deductive databases. They offer a logic-based interface, and allow formulation of complex recursive queries. However, they do not offer appropriate update facilities, and do not support existing applications. To overcome these problems an SQL-like interface is required besides a logic-based interface.\ud \ud In the PRISMA project we have developed a tightly-coupled distributed database, on a multiprocessor machine, with two user interfaces: SQL and PRISMAlog. Query optimization is localized in one component: the relational query optimizer. Therefore, we have defined an eXtended Relational Algebra that allows recursive query formulation and can also be used for expressing executable schedules, and we have developed algebraic optimization strategies for recursive queries. In this paper we describe an optimization strategy that rewrites regular (in the context of formal grammars) mutually recursive queries into standard Relational Algebra and transitive closure operations. We also describe how to push selections into the resulting transitive closure operations.\ud \ud The reason we focus on algebraic optimization is that, in our opinion, the new generation of advanced database systems will be built starting from existing state-of-the-art relational technology, instead of building a completely new class of systems

    Pore-scale Modeling of Viscous Flow and Induced Forces in Dense Sphere Packings

    Full text link
    We propose a method for effectively upscaling incompressible viscous flow in large random polydispersed sphere packings: the emphasis of this method is on the determination of the forces applied on the solid particles by the fluid. Pore bodies and their connections are defined locally through a regular Delaunay triangulation of the packings. Viscous flow equations are upscaled at the pore level, and approximated with a finite volume numerical scheme. We compare numerical simulations of the proposed method to detailed finite element (FEM) simulations of the Stokes equations for assemblies of 8 to 200 spheres. A good agreement is found both in terms of forces exerted on the solid particles and effective permeability coefficients

    FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search

    Full text link
    We present FLASH (\textbf{F}ast \textbf{L}SH \textbf{A}lgorithm for \textbf{S}imilarity search accelerated with \textbf{H}PC), a similarity search system for ultra-high dimensional datasets on a single machine, that does not require similarity computations and is tailored for high-performance computing platforms. By leveraging a LSH style randomized indexing procedure and combining it with several principled techniques, such as reservoir sampling, recent advances in one-pass minwise hashing, and count based estimations, we reduce the computational and parallelization costs of similarity search, while retaining sound theoretical guarantees. We evaluate FLASH on several real, high-dimensional datasets from different domains, including text, malicious URL, click-through prediction, social networks, etc. Our experiments shed new light on the difficulties associated with datasets having several million dimensions. Current state-of-the-art implementations either fail on the presented scale or are orders of magnitude slower than FLASH. FLASH is capable of computing an approximate k-NN graph, from scratch, over the full webspam dataset (1.3 billion nonzeros) in less than 10 seconds. Computing a full k-NN graph in less than 10 seconds on the webspam dataset, using brute-force (n2Dn^2D), will require at least 20 teraflops. We provide CPU and GPU implementations of FLASH for replicability of our results

    Analysis of unbounded operators and random motion

    Full text link
    We study infinite weighted graphs with view to \textquotedblleft limits at infinity,\textquotedblright or boundaries at infinity. Examples of such weighted graphs arise in infinite (in practice, that means \textquotedblleft very\textquotedblright large) networks of resistors, or in statistical mechanics models for classical or quantum systems. But more generally our analysis includes reproducing kernel Hilbert spaces and associated operators on them. If XX is some infinite set of vertices or nodes, in applications the essential ingredient going into the definition is a reproducing kernel Hilbert space; it measures the differences of functions on XX evaluated on pairs of points in XX. And the Hilbert norm-squared in H(X)\mathcal{H}(X) will represent a suitable measure of energy. Associated unbounded operators will define a notion or dissipation, it can be a graph Laplacian, or a more abstract unbounded Hermitian operator defined from the reproducing kernel Hilbert space under study. We prove that there are two closed subspaces in reproducing kernel Hilbert space H(X)\mathcal{H}(X) which measure quantitative notions of limits at infinity in XX, one generalizes finite-energy harmonic functions in H(X)\mathcal{H}(X), and the other a deficiency index of a natural operator in H(X)\mathcal{H}(X) associated directly with the diffusion. We establish these results in the abstract, and we offer examples and applications. Our results are related to, but different from, potential theoretic notions of \textquotedblleft boundaries\textquotedblright in more standard random walk models. Comparisons are made.Comment: 38 pages, 4 tables, 3 figure
    • …
    corecore