22,684 research outputs found
Causality and the semantics of provenance
Provenance, or information about the sources, derivation, custody or history
of data, has been studied recently in a number of contexts, including
databases, scientific workflows and the Semantic Web. Many provenance
mechanisms have been developed, motivated by informal notions such as
influence, dependence, explanation and causality. However, there has been
little study of whether these mechanisms formally satisfy appropriate policies
or even how to formalize relevant motivating concepts such as causality. We
contend that mathematical models of these concepts are needed to justify and
compare provenance techniques. In this paper we review a theory of causality
based on structural models that has been developed in artificial intelligence,
and describe work in progress on a causal semantics for provenance graphs.Comment: Workshop submissio
NetLSD: Hearing the Shape of a Graph
Comparison among graphs is ubiquitous in graph analytics. However, it is a
hard task in terms of the expressiveness of the employed similarity measure and
the efficiency of its computation. Ideally, graph comparison should be
invariant to the order of nodes and the sizes of compared graphs, adaptive to
the scale of graph patterns, and scalable. Unfortunately, these properties have
not been addressed together. Graph comparisons still rely on direct approaches,
graph kernels, or representation-based methods, which are all inefficient and
impractical for large graph collections.
In this paper, we propose the Network Laplacian Spectral Descriptor (NetLSD):
the first, to our knowledge, permutation- and size-invariant, scale-adaptive,
and efficiently computable graph representation method that allows for
straightforward comparisons of large graphs. NetLSD extracts a compact
signature that inherits the formal properties of the Laplacian spectrum,
specifically its heat or wave kernel; thus, it hears the shape of a graph. Our
evaluation on a variety of real-world graphs demonstrates that it outperforms
previous works in both expressiveness and efficiency.Comment: KDD '18: The 24th ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining, August 19--23, 2018, London, United Kingdo
Algebraic optimization of recursive queries
Over the past few years, much attention has been paid to deductive databases. They offer a logic-based interface, and allow formulation of complex recursive queries. However, they do not offer appropriate update facilities, and do not support existing applications. To overcome these problems an SQL-like interface is required besides a logic-based interface.\ud
\ud
In the PRISMA project we have developed a tightly-coupled distributed database, on a multiprocessor machine, with two user interfaces: SQL and PRISMAlog. Query optimization is localized in one component: the relational query optimizer. Therefore, we have defined an eXtended Relational Algebra that allows recursive query formulation and can also be used for expressing executable schedules, and we have developed algebraic optimization strategies for recursive queries. In this paper we describe an optimization strategy that rewrites regular (in the context of formal grammars) mutually recursive queries into standard Relational Algebra and transitive closure operations. We also describe how to push selections into the resulting transitive closure operations.\ud
\ud
The reason we focus on algebraic optimization is that, in our opinion, the new generation of advanced database systems will be built starting from existing state-of-the-art relational technology, instead of building a completely new class of systems
Pore-scale Modeling of Viscous Flow and Induced Forces in Dense Sphere Packings
We propose a method for effectively upscaling incompressible viscous flow in
large random polydispersed sphere packings: the emphasis of this method is on
the determination of the forces applied on the solid particles by the fluid.
Pore bodies and their connections are defined locally through a regular
Delaunay triangulation of the packings. Viscous flow equations are upscaled at
the pore level, and approximated with a finite volume numerical scheme. We
compare numerical simulations of the proposed method to detailed finite element
(FEM) simulations of the Stokes equations for assemblies of 8 to 200 spheres. A
good agreement is found both in terms of forces exerted on the solid particles
and effective permeability coefficients
Single-top Wt-channel production matched with parton showers using the POWHEG method
We present results for the next-to-leading order calculation of single-top
Wt-channel production interfaced to Shower Monte Carlo programs, implemented
according to the POWHEG method. A comparison with MC@NLO is carried out.
Results obtained using the PYTHIA shower are also shown and the effect of
typical cuts is briefly discussed.Comment: 23 pages, 9 figure
FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search
We present FLASH (\textbf{F}ast \textbf{L}SH \textbf{A}lgorithm for
\textbf{S}imilarity search accelerated with \textbf{H}PC), a similarity search
system for ultra-high dimensional datasets on a single machine, that does not
require similarity computations and is tailored for high-performance computing
platforms. By leveraging a LSH style randomized indexing procedure and
combining it with several principled techniques, such as reservoir sampling,
recent advances in one-pass minwise hashing, and count based estimations, we
reduce the computational and parallelization costs of similarity search, while
retaining sound theoretical guarantees.
We evaluate FLASH on several real, high-dimensional datasets from different
domains, including text, malicious URL, click-through prediction, social
networks, etc. Our experiments shed new light on the difficulties associated
with datasets having several million dimensions. Current state-of-the-art
implementations either fail on the presented scale or are orders of magnitude
slower than FLASH. FLASH is capable of computing an approximate k-NN graph,
from scratch, over the full webspam dataset (1.3 billion nonzeros) in less than
10 seconds. Computing a full k-NN graph in less than 10 seconds on the webspam
dataset, using brute-force (), will require at least 20 teraflops. We
provide CPU and GPU implementations of FLASH for replicability of our results
Analysis of unbounded operators and random motion
We study infinite weighted graphs with view to \textquotedblleft limits at
infinity,\textquotedblright or boundaries at infinity. Examples of such
weighted graphs arise in infinite (in practice, that means \textquotedblleft
very\textquotedblright large) networks of resistors, or in statistical
mechanics models for classical or quantum systems. But more generally our
analysis includes reproducing kernel Hilbert spaces and associated operators on
them. If is some infinite set of vertices or nodes, in applications the
essential ingredient going into the definition is a reproducing kernel Hilbert
space; it measures the differences of functions on evaluated on pairs of
points in . And the Hilbert norm-squared in will represent
a suitable measure of energy. Associated unbounded operators will define a
notion or dissipation, it can be a graph Laplacian, or a more abstract
unbounded Hermitian operator defined from the reproducing kernel Hilbert space
under study. We prove that there are two closed subspaces in reproducing kernel
Hilbert space which measure quantitative notions of limits at
infinity in , one generalizes finite-energy harmonic functions in
, and the other a deficiency index of a natural operator in
associated directly with the diffusion. We establish these
results in the abstract, and we offer examples and applications. Our results
are related to, but different from, potential theoretic notions of
\textquotedblleft boundaries\textquotedblright in more standard random walk
models. Comparisons are made.Comment: 38 pages, 4 tables, 3 figure
- …