2,554 research outputs found
Large Graph Analysis in the GMine System
Current applications have produced graphs on the order of hundreds of
thousands of nodes and millions of edges. To take advantage of such graphs, one
must be able to find patterns, outliers and communities. These tasks are better
performed in an interactive environment, where human expertise can guide the
process. For large graphs, though, there are some challenges: the excessive
processing requirements are prohibitive, and drawing hundred-thousand nodes
results in cluttered images hard to comprehend. To cope with these problems, we
propose an innovative framework suited for any kind of tree-like graph visual
design. GMine integrates (a) a representation for graphs organized as
hierarchies of partitions - the concepts of SuperGraph and Graph-Tree; and (b)
a graph summarization methodology - CEPS. Our graph representation deals with
the problem of tracing the connection aspects of a graph hierarchy with sub
linear complexity, allowing one to grasp the neighborhood of a single node or
of a group of nodes in a single click. As a proof of concept, the visual
environment of GMine is instantiated as a system in which large graphs can be
investigated globally and locally
Fast Exact Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling
We propose a new exact method for shortest-path distance queries on
large-scale networks. Our method precomputes distance labels for vertices by
performing a breadth-first search from every vertex. Seemingly too obvious and
too inefficient at first glance, the key ingredient introduced here is pruning
during breadth-first searches. While we can still answer the correct distance
for any pair of vertices from the labels, it surprisingly reduces the search
space and sizes of labels. Moreover, we show that we can perform 32 or 64
breadth-first searches simultaneously exploiting bitwise operations. We
experimentally demonstrate that the combination of these two techniques is
efficient and robust on various kinds of large-scale real-world networks. In
particular, our method can handle social networks and web graphs with hundreds
of millions of edges, which are two orders of magnitude larger than the limits
of previous exact methods, with comparable query time to those of previous
methods.Comment: To appear in SIGMOD 201
Knowledge Base Population using Semantic Label Propagation
A crucial aspect of a knowledge base population system that extracts new
facts from text corpora, is the generation of training data for its relation
extractors. In this paper, we present a method that maximizes the effectiveness
of newly trained relation extractors at a minimal annotation cost. Manual
labeling can be significantly reduced by Distant Supervision, which is a method
to construct training data automatically by aligning a large text corpus with
an existing knowledge base of known facts. For example, all sentences
mentioning both 'Barack Obama' and 'US' may serve as positive training
instances for the relation born_in(subject,object). However, distant
supervision typically results in a highly noisy training set: many training
sentences do not really express the intended relation. We propose to combine
distant supervision with minimal manual supervision in a technique called
feature labeling, to eliminate noise from the large and noisy initial training
set, resulting in a significant increase of precision. We further improve on
this approach by introducing the Semantic Label Propagation method, which uses
the similarity between low-dimensional representations of candidate training
instances, to extend the training set in order to increase recall while
maintaining high precision. Our proposed strategy for generating training data
is studied and evaluated on an established test collection designed for
knowledge base population tasks. The experimental results show that the
Semantic Label Propagation strategy leads to substantial performance gains when
compared to existing approaches, while requiring an almost negligible manual
annotation effort.Comment: Submitted to Knowledge Based Systems, special issue on Knowledge
Bases for Natural Language Processin
VoroCrust: Voronoi Meshing Without Clipping
Polyhedral meshes are increasingly becoming an attractive option with
particular advantages over traditional meshes for certain applications. What
has been missing is a robust polyhedral meshing algorithm that can handle broad
classes of domains exhibiting arbitrarily curved boundaries and sharp features.
In addition, the power of primal-dual mesh pairs, exemplified by
Voronoi-Delaunay meshes, has been recognized as an important ingredient in
numerous formulations. The VoroCrust algorithm is the first provably-correct
algorithm for conforming polyhedral Voronoi meshing for non-convex and
non-manifold domains with guarantees on the quality of both surface and volume
elements. A robust refinement process estimates a suitable sizing field that
enables the careful placement of Voronoi seeds across the surface circumventing
the need for clipping and avoiding its many drawbacks. The algorithm has the
flexibility of filling the interior by either structured or random samples,
while preserving all sharp features in the output mesh. We demonstrate the
capabilities of the algorithm on a variety of models and compare against
state-of-the-art polyhedral meshing methods based on clipped Voronoi cells
establishing the clear advantage of VoroCrust output.Comment: 18 pages (including appendix), 18 figures. Version without compressed
images available on https://www.dropbox.com/s/qc6sot1gaujundy/VoroCrust.pdf.
Supplemental materials available on
https://www.dropbox.com/s/6p72h1e2ivw6kj3/VoroCrust_supplemental_materials.pd
Spatial Queries for Indoor Location-based Services
Indoor Location-based Services (LBS) facilitate people in indoor scenarios such as airports, train stations, shopping malls, and office buildings. Indoor spatial queries are the foundation to support indoor LBSs. However, the existing techniques for indoor spatial queries are limited to support more advanced queries that consider semantic information, temporal variations, and crowd influence. This work studies indoor spatial queries for indoor LBSs. Some typical proposals for indoor spatial queries are compared theoretically and experimentally. Then, it studies three advanced indoor spatial queries, a) Indoor Keyword-aware Routing Query. b) Indoor Temporal-variation aware Routing Query. c) Indoor Crowd-aware Routing Query. A series of techniques are proposed to solve these problems.</p
High-Performance Reachability Query Processing under Index Size Restrictions
In this paper, we propose a scalable and highly efficient index structure for
the reachability problem over graphs. We build on the well-known node interval
labeling scheme where the set of vertices reachable from a particular node is
compactly encoded as a collection of node identifier ranges. We impose an
explicit bound on the size of the index and flexibly assign approximate
reachability ranges to nodes of the graph such that the number of index probes
to answer a query is minimized. The resulting tunable index structure generates
a better range labeling if the space budget is increased, thus providing a
direct control over the trade off between index size and the query processing
performance. By using a fast recursive querying method in conjunction with our
index structure, we show that in practice, reachability queries can be answered
in the order of microseconds on an off-the-shelf computer - even for the case
of massive-scale real world graphs. Our claims are supported by an extensive
set of experimental results using a multitude of benchmark and real-world
web-scale graph datasets.Comment: 30 page
LIPIcs, Volume 274, ESA 2023, Complete Volume
LIPIcs, Volume 274, ESA 2023, Complete Volum
Metabolic Network Alignments and their Applications
The accumulation of high-throughput genomic and proteomic data allows for the reconstruction of the increasingly large and complex metabolic networks. In order to analyze the accumulated data and reconstructed networks, it is critical to identify network patterns and evolutionary relations between metabolic networks. But even finding similar networks becomes computationally challenging. The dissertation addresses these challenges with discrete optimization and the corresponding algorithmic techniques. Based on the property of the gene duplication and function sharing in biological network,we have formulated the network alignment problem which asks the optimal vertex-to-vertex mapping allowing path contraction, vertex deletion, and vertex insertions. We have proposed the first polynomial time algorithm for aligning an acyclic metabolic pattern pathway with an arbitrary metabolic network. We also have proposed a polynomial-time algorithm for patterns with small treewidth and implemented it for series-parallel patterns which are commonly found among metabolic networks. We have developed the metabolic network alignment tool for free public use. We have performed pairwise mapping of all pathways among five organisms and found a set of statistically significant pathway similarities. We also have applied the network alignment to identifying inconsistency, inferring missing enzymes, and finding potential candidates
- …